Recommendations from the
National Cancer Institute's Working Group
on Non-Mammalian Models of Human Cancers:
Foundation for February 1999 Meeting


The NCI Non-mammalian Models of Cancer subgroup of the Preclinical Models of Cancer Working Group, chaired by Dr. H. Robert Horvitz and Dr. Marc Kirschner, met June 25-26, 1997 to develop recommendations about how to facilitate the use of non-mammalian experimental organisms for cancer research. The working group reported its recommendations to the Advisory Committee to the Director, NCI on January 12, 1998. Drs. Horvitz and Kirschner presented the recommendations to the NIH Institute Directors on March 5, 1998.






Recommendation 1. Promote the analysis of the functions of human oncogenes and tumor suppressor genes and of pathways that include such genes in non-mammalian organisms. More generally, promote the acquisition of basic knowledge about non-mammalian organisms important for use as preclinical models for cancer.

1.A. Rationale: While the human genome is estimated to contain approximately 100,000 genes, these genes will likely encode the components of perhaps only a few hundred biological processes. One of the most striking findings of contemporary biology is that these components, the ways in which they interact, and even their developmental and physiological roles seem to be highly conserved among greatly diverse organisms, including humans. The experimental tools exist for certain non-mammalian organisms for readily assembling genes into gene pathways. A major contribution of such organisms over the next five to ten years should be the reduction of the 100,000 individual components encoded by the human genome into a much smaller number of multi-component core processes of known biochemical function. For example, the methods of genetics, biochemistry, and cell biology can be used to reveal the biological functions of homologs of human cancer genes and to define the pathways in which such genes act. The definition of such pathways can define both novel cancer genes and novel anti-cancer drug targets (e.g., if a loss-of-function mutation in a suppressor gene can prevent the oncogenic action of a cancer gene, that suppressor gene becomes a candidate for being a drug target, since its inactivation by a small molecule should similarly prevent the oncogenic action of the cancer gene). More generally, the acquisition of basic knowledge concerning organisms that provide important preclinical models for cancer will facilitate the study of human cancer gene homologs and their pathways. Studies of the homologs of the Ras proto-oncogene in C. elegans and Drosophila are illustrative. Together such studies led both to the elucidation of the normal biological function of Ras as a key element in signal transduction during development and to the definition of the Ras gene pathway. Basic studies of developmental biology (of sexual differentiation in C. elegans and of eye development in Drosophila) led to these discoveries about Ras. Similarly, basic studies of DNA repair in S. cerevisiae, of the cell cycle in S. cerevisiae and Xenopus , and of the Notch and Wnt signal transduction pathways and of apoptosis (programmed cell death) in C. elegans and Drosophila have provided crucial insights into human cancer genes and gene pathways. The NCI has historically supported little research involving non-mammalian organisms. Yet studies of these organisms have contributed enormously to the understanding of human cancer. Much remains to be learned, and major NCI support for studies of non-mammalian organisms is vital for these organisms to be used most effectively in analyzing human cancer gene functions.

1.B. Specific Recommendations.

1.B.i. Provide broad support for studies of human cancer genes in non-mammalian organisms; for the development of the infrastructure (resources and technologies) important for such studies; and, more generally, for broad basic research using such organisms. NCI should clearly and broadly inform the biological research community of its desire to fund studies of non-mammalian organisms with relevance to human cancer. Such research could involve individual investigators, groups of investigators at one or more institutions, or collaborative efforts between academia and industry. Detailed recommendations will be presented below on an organism-by-organism basis.

1.B.ii. Broaden the research efforts at NCI-supported cancer centers and perhaps non-NCI centers to incorporate non-mammalian organisms. Both to stimulate appropriate research efforts and to stimulate interactions between cancer biologists and biologists studying non-mammalian organisms, NCI should support efforts to develop and integrate programs involving non-mammalian organisms in such centers and should state its interest in this area in its program announcements and indicate that cancer center core facilities used in part or exclusively for the study of non-mammalian organisms will be supported by NCI.

1.B.iii. Develop new mechanisms for evaluating and funding research proposals involving non-mammalian organisms. Although noted by Subgroup 3, this Recommendation was not discussed in sufficient detail to result in specific proposals for implementation. One possibility is that new NCI study sections with appropriate expertise in both cancer biology and non-mammalian organisms be established for the review of such proposals.


Recommendation 2. Support the establishment and/or facilitate the accessibility of pan-organism resource centers that can accessed by individual laboratories using non-mammalian organisms for cancer-relevant research.

2.A. Rationale: Although there are core needs specific for different organisms, there are also a wide variety of materials, equipment and methods that serve the entire biological community. The existence of such resources, most often provided by commercial suppliers, has facilitated the development of many areas of biology, most especially the biology of human cancer. For the most part these supplies and equipment are of small enough unit cost that they can best be funded through the conventional granting mechanisms. Today research laboratories benefit from materials such as oligonucleotides, antibodies, kits for various molecular biology manipulations, as well as many pieces of small equipment from PCR machines to table-top centrifuges. However, certain innovative technologies of immense power to contribute to the study of cancer in many settings do not fall into the category of small equipment that can be funded by individual research grants or even program grants of moderate size. Rather, these technologies are best acquired for large groups of investigators and operated as common facilities. These technologies are now or soon will be limiting in the exploitation of recognizable leads. Their lack of availability increases the time and ultimately the cost of discovery. Judicious investment in these technologies by the NCI would be expected to accelerate the process of discovery in several areas related to cancer.

2.B. Specific Recommendations.

2.B.i. Support the development and use of array technologies for studies of a broad variety of cellular processes at a genome-wide level. Several methods have been developed and are in the process of refinement for assaying the expression of a large number of genes simultaneously. The principle of this method is that oligonucleotides or larger nucleic acid segments are attached to a glass matrix, which is then used for hybridization against paired RNA samples. For budding yeast, arrays have been constructed that contain each of yeast's 6,400 open reading frames. For C. elegans, arrays to date contain about 1,200 cDNAs. These arrays can be used for hybridization against paired RNA samples, for example, from cells grown under different conditions or from strains that differ in genotype. This method allows the identification of all genes that are induced by (or repressed by) heat shock, osmotic shock, antimitogenic factors, DNA damage, ectopic expression of a transcription factor, etc. The catalogue of genes expressed coordinately in this manner promises to provide a rich picture of cell physiology and is an integral part of a Cell or Gene Function Project (which follows a Genome Project, with the goal of determining a genome sequence). Array technology has obvious and immediate application for organisms such as yeast and for cells in culture, e.g., normal cells and cancer cells. For multicellular organisms such as C. elegans and Drosophila, full exploitation of this technology will require methods for separating different cell types from each other. One possible strategy is to express fluorescent reporter proteins in different tissues and subsequently sort these cells from other cells. Development of such methods is crucial for full exploitation of the array technology in metazoans. There is no doubt that the array technology for global analysis of gene expression is going to have a profound impact on learning about cell physiology. This technology is in its infancy, and several different methods are being developed. For implementation of this technology, there are three components, with multiple possibilities for each. (1) Arrays: Arrays most likely would be produced as needed using an array maker, although some may become available from commercial sources. The array maker is basically an ink-jet printer that deposits polynucleotides on a glass slide. (2) Hybridization chambers: A set of 10 such chambers would likely be used in parallel. (3) Readers. It is worth noting that the most effective use of the array technology could well be the comparison of expression pattern differences seen under different experimental conditions. For this purpose, hybridizations and the reading of the arrays would be performed at a single site, by either a commercial or non-commercial group specializing in this technology. Different investigators would contribute paired RNA samples (in the form of cDNA samples). All of the data so generated could be made accessible to the entire community of researchers (perhaps with some time delay for the RNA/cDNA donors to analyze the data first) as an expression database. One virtue of such a scheme is that the data would be collected in a standardized manner, which would make it more valuable for comparisons. It should be emphasized that for such applications it would be crucial for array-based studies of gene expression within a given organism to be performed at a single site. The development and exploitation of array technologies is still at any early stage will require further thought for proper planning.

2.B.ii. Support the development and use of mass spectrometry for the rapid identification of proteins based upon limited quantities of material and available DNA databases. The understanding of the molecular mechanisms underlying cancer cell biology must ultimately be on a biochemical basis. The developments of genetics and genomics have greatly stimulated biochemical investigations by offering new targets for study and by offering new mechanisms to test. While there has been a qualitative revolution in genetic and genomic studies in the past generation, the methodologies of biochemistry can best be viewed as evolving more gradually. Many of the techniques of protein chemistry are in fact rather venerable, such as column chromatography, ultracentrifugation, Edman degradation, gel electrophoresis, and immunological methods. With the completion of genomic databases for yeast and microorganisms, and the anticipated completion of these databases for nematodes within the next year and for other organisms, including humans, relatively soon thereafter, the process of identifying proteins could be rapidly accelerated if there were facile methods for getting partial sequence information from the small quantities of proteins usually isolated in biochemical experiments. The recent developments of mass spectrometry coupled with protein fragmentation will provide sufficient information to identify rapidly proteins purified at the picomole level. In addition, mass spectrometry will be the preferred method for identifying protein modifications, such as phosphorylation, glycosylyation and lipid addition. Thus, the combination of DNA databases and mass spectrometry will constitute a revolution in protein biochemistry and will play an important role in understanding the pathways of cellular control important to cancer. Mass spectrometry is a well known method in chemistry in which molecules, separated based on their molecular weights, can be weighed to extraordinary accuracy. Recent improvements have allowed these approaches to be applied to proteins. Knowledge of accurate molecular weights in itself is useful for identifying protein modifications. However, combined with proteolytic fragmentation one can obtain a fingerprint of the protein structure that allows protein identification based upon inferred protein sequences available from DNA databases. Mass spectrometry facilities require knowledgeable technicians and several instruments (including two kinds of mass spectrometers, liquid chromatography equipment, and computers). These facilities are most suitably considered multiuse facilities. As the technology for protein mass spectrometry is in a period of rapid development, one would probably want the facility operated by a Ph.D.-level scientist. We would envision several such facilities in institutions at which biochemical approaches to cancer biology are concentrated.


Recommendation 3. Support the establishment and maintenance of organism-specific databases for appropriate non-mammalian organisms and support the development of a pan-organism database or of software allowing the effective cross-referencing of such organism-specific databases with each other and with mouse and human databases. Such organism-specific databases, which should be curated by biologists with expertise concerning the organism, are at present supported by the NHGRI in the cases of yeast, Drosophila, and mouse, and by the Sanger Centre (Hinxton, England) in the case of C. elegans. A collaborating U.S.

effort should be established for C. elegans. The Subgroup concluded that a sufficiently high percentage of genes would prove to be directly or indirectly relevant to cancer that a "Cancer Gene Database" as distinct from a general Genome Database was neither practical nor desirable.

3.A. Rationale. Information obtained from the study of non-mammalian organisms should be made readily available, both to investigators studying other non-mammalian organisms and to cancer researchers studying mice or humans. For example, as soon as a cancer gene is found to have non-mammalian homologs, complete information about these homologs and the pathways in which they act should be available to the cancer community, and information about the cancer gene and its homologs should be available to researchers studying the relevant non-mammalian organisms.

3.B. Specific recommendation. Convene a group of experts to make specific suggestions concerning the establishment of a pan-organism database or the development of software that could obviate the need for such a database. The Subgroup concluded that its members were not sufficiently expert to adequately address this issue.


Recommendation 4. Promote the use of non-mammalian organisms in high-throughput screens for the identification of novel anti-cancer therapeutic drug leads. Identify academic and/or industrial groups with the appropriate capabilities and encourage them to develop and employ high-throughput screens using non-mammalian organisms to identify drug leads.

4.A. Rationale. There is a striking universality of genes and gene pathways among organisms, and many cancer genes and gene pathways exist in non-mammalian organisms. The development of high-throughput screens designed to use these organisms (either with their endogenous cancer gene homologs or with human cancer genes introduced as transgenes) should be encouraged.

4.B. Specific recommendation. Convene a group of experts from academia and industry to define ways of promoting the use of non-mammalian organisms in drug screens. The Subgroup concluded that its members were not sufficiently expert to adequately address this issue.


SPECIFIC ORGANISM RECOMMENDATIONS. Advocate and support basic genetic, molecular and developmental studies of cancer-related genes and pathways in those model organism important for cancer research. In addition, support the development of the infrastructure (resources and technologies) that would best facilitate such studies. Infrastructural needs that are recommended for support are listed below for each organism, in each case in the order of the priorities defined by the Subgroup. No attempt has been made to order priorities among projects involving different organisms.

Specific Organism Recommendation 1: S. cerevisiae.

S1.A. Rationale. The budding yeast, S. cerevisiae, is a unicellular eukaryote that offers exceptional opportunities to study fundamental cellular processes of relevance to cancer, such as the control of cell division and the maintenance of genome fidelity at a molecular level. This organism is amenable to the most sophisticated techniques of manipulative molecular genetics, such as the ability to construct gene knock-outs or subtly altered genes with ease. Because yeast is a microorganism, it is straightforward to perform many different types of selections and screens to identify genes by a wide variety of methods: as mutants defective in a particular process, as high copy plasmids that allow a particular phenotype to be exhibited, as genes that exhibit a particular expression pattern, as genes with protein products that interact with other protein products, and as suppressor and enhancer mutations that influence the phenotype caused by another mutation. All of these opportunities are enhanced and facilitated by the fact that the DNA sequence of the entire genome of budding yeast has been determined. The yeast Genome Project, carried out by an international consortium, has completed its task. The next phase is to continue to learn the function of all yeast genes, an endeavor that might be termed the yeast Cell or Gene Function Project. Knowledge of the functions of yeast genes will continue to provide valuable insights into the mechanisms responsible for fundamental cellular processes in eukaryotes and is expected to provide valuable information for unraveling the function of human genes.

S1.B. Specific Recommendations.

S1.B.i. Support the construction of a set of mammalian and other metazoan cDNA libraries that can be expressed in budding yeast. Yeast provides invaluable opportunities to learn about the functions of human and other metazoan genes. There are two main strategies: (1) identify human genes that can complement a yeast mutant defective in a particular yeast gene, and (2) identify human genes that when overexpressed can bypass a mutant yeast phenotype or confer a measurable phenotype on the yeast cell. Yeast has approximately 6,400 genes, of which at least half are essential for some cellular function -- that is, in which mutants exhibit a scorable phenotype. Among the hundreds of yeast laboratories throughout the world, there is an immense expertise in assaying these phenotypes. If mammalian cDNA libraries were available, it would be a simple matter for these yeast researchers to identify mammalian genes that can complement specific yeast mutants or otherwise alter a phenotype of interest. These phenotypes reflect the entire gamut of fundamental cell biological processes: chromosome distribution, the mitotic apparatus, DNA replication, RNA synthesis, plasma membrane biogenesis, iron metabolism, protein secretion, etc. Each human cDNA identified would be analyzed to determine if its sequence bears any resemblance to the yeast gene that is complemented. In addition, the position of the human cDNA on the human genetic map would be determined to see whether this gene is associated with an inherited disease (an activity that will be facilitated by the XREF Project). The fact that a mammalian gene product exhibits a function in yeast provides an opportunity to screen for drugs that prevent the mammalian protein from functioning in yeast. In other words, complementation provides a facile functional assay for a mammalian product. The precise choice of which types of cDNA libraries to produce requires some thought. There should be several different human libraries constructed from different tissues. It would also be desirable to construct libraries from C. elegans and Drosophila, both to facilitate the functional analyses of these organisms (see Sections S2 and S3 below) and to provide additional potential links to human genes. Such libraries would be made available for a nominal charge to yeast workers with an agreement that complementation information be collected at a central data clearing house, perhaps connected with the XREF Project or the Saccharomyces Genome Database. These additional activities would require a supplement to their budget. A total of approximately ten cDNA libraries could be constructed a year.

S1.B.ii. Support the construction of a budding yeast unigene library. To analyze the function of each of yeast's 6,400 genes, a set of 6,400 plasmids carrying this set of genes would be invaluable (a "unigene library"). At present, yeast researchers exploit yeast genomic libraries that are constructed by traditional means (for example, from sheared or partially digested genomic DNA segments that are then amplified in bacteria). Screening these libraries has been highly successful, but these classical libraries are not always fully representative: a unigene library would make a variety of analyses feasible and would undoubtedly reveal much about the function of yeast genes. In particular, it would be completely straightforward to clone any yeast gene for which a mutant has been obtained by classical methods. Similarly it would be possible to test every yeast gene to determine whether overexpression of that gene bypasses, relieves, or exacerbates a mutant phenotype. The unigene libraries would be constructed by PCR amplification of yeast open reading frames. To be optimally useful, these ORFs should be put into a variety of vectors (low copy and high copy number) and under the control of different types of regulation (low-level constitutive, high-level constitutive, galactose-inducible). It might be desirable to tag each of the plasmids with an oligonucleotide "bar-code," so that the plasmids would be recognizable in a facile manner. Such libraries could be constructed over a two-year period.

S1.B.iii. Support the determination of the cellular localization of all yeast proteins. Given that all yeast ORFs have been identified, it is now possible to contemplate determining the cellular location of all of the encoded proteins. Some projects to construct epitope-tagged versions of all yeast genes are underway, with the idea of localizing the proteins by immunocytochemical methods. A parallel project would be to perform a systematic analysis of the localization of these proteins at higher resolution, for example, using the electron microscope. Such a project would take at least three years and would cost $100,000-$200,000 per year.

S1.B.iv. Support the development of an interactive yeast strain and reagent database. The yeast community is generating a vast number of valuable reporter genes, plasmid constructs, and mutant strains. Accessing this type of information from individual journal articles is imperfect and access would be immeasurably facilitated if there was a central clearing house for such information. This type of clearing house would be in essence an electronic stock center: instead

of sending strains to a stock center, individuals would be encouraged to submit entries of key strains and other genetic reagents electronically. Such an interactive database could be established within a year and would cost approximately $50,000-$100,000 per year to maintain.

S1.B.v. Support the use of array technology to study a wide variety of yeast cellular processes at a genome-wide level. It is now possible to assay the expression pattern of all yeast genes under a wide variety of conditions and in a wide variety of mutant backgrounds. This type of analysis is facilitated by the ability to construct oligonucleotide arrays on glass "chips." The yeast arrays are then used for quantitative hybridization analysis using RNA collected from paired strains grown under different conditions or from strains of differing genotypes. The ability to correlate these expression patterns with defined genetic alterations promises to reveal a vast amount of information about cellular physiology and will likely become a staple for analysis of yeast physiology. Further discussion is provided above (see Section 2.B.i).


Specific Organism Recommendation 2: C. elegans.

S2.A. Rationale. C. elegans is a superb organism for genetic analysis. Because of its small size (1 mm) and rapid life history (3 days), large numbers of animals can be generated rapidly and handled easily and inexpensively (e.g., 100,000 animals on a single Petri dish). C. elegans is a self-fertilizing hermaphrodite, which means that it is easy to identify mutations that cause recessive phenotypes and to maintain homozygous mutant strains incapable of reproducing by mating. Worm stocks can be stored indefinitely in a -80oC freezer or in liquid nitrogen. At present, mutants are available for about 2,000 of the approximately 16,000 genes in the worm. The animal is cellularly very simple, with only 959 somatic cells, and the complete cellular anatomy, including the wiring diagram of the nervous system, is known at the ultrastructural level. The complete cell lineage is also known, making genetic mosaic analysis and laser microsurgery possible at the resolution of single cells. The genome is small (100 Mb) and high in gene density (about one per 6 kb). Genes are small (average 4 kb) and contain very small introns (generally less than 100 bp). A physical map of the genome is essentially complete, as is more than 70% of the genomic DNA sequence (as of August 13, 1997). The complete sequence of the genome is scheduled for completion during 1998. Studies of genetic pathways of C. elegans have contributed to the understanding of many areas of human biology and disease. The Ras and lin-12/Notch pathways and the pathway for programmed cell death (apoptosis) provide three examples with direct relevance to human cancer.

S2.B. Specific Recommendations.

S2.B.i. Support the determination of genomic sequence of the nematode Caenorhabditis briggsae, with a focus by the NCI on cancer-related genes. Conserved genomic sequence between C. elegans and C. briggsae has proved to be a superb predictor of functionally important regions, including functionally important protein domains (which allow both structure-function analyses and the identification of those regions to be used in seeking homologs in other organisms, including humans); small genes; alternatively spliced exons; and cis-acting regulatory regions (which are often the sites of gain-of-function mutations). C. elegans investigators have been surprised to discover the magnitude of the impact of obtaining C. briggsae genomic sequence, which leverages the currently available C. elegans genomic sequence information enormously: predicted gene structures have been corrected, alternative gene products have been identified, and elusive regulatory regions have been revealed. Making C. briggsae genomic sequence available is perceived as the highest priority need of the C. elegans community. Furthermore, by defining regulatory regions throughout the genome, it should be possible to identify networks of interacting genes (e.g., all potential DNA targets for a specific transcription factor), information that could not be easily obtained by an individual investigator focused on a single gene product. Thus, this approach of comparative genomic sequence differs inherently in its potential from some other large-scale projects, such as the isolation of full-length cDNAs, since the latter can be done effectively on a gene-by-gene basis. Another benefit of the generation of the genomic sequence of C. briggsae is that this information would allow evolutionary comparisons: knowledge of which portions of a protein are conserved can help identify family members in other organisms (including humans), and the definition of the complete C. briggsae genome would allow the study of the evolution of an animal genome at a global level. Such evolutionary problems are of major biological interest and define an important new approach toward cancer genes and pathways. Some funding for this comparative genomics project has been supplied by the NHGRI, and about 20 Mb of sequence are scheduled for completion by the Summer of 1998. We propose that the NCI support the determination of the genomic sequences of those regions of the C. briggsae genome containing genes relevant to cancer (genes that correspond to proto-oncogenes, tumor suppressor genes, cell cycle genes, signal transduction genes, cell death genes, etc., as well as genes that interact genetically and/or biochemically with these genes). Most such genes could be readily identified using synteny with C. elegans in conjunction with currently available and anticipated C. briggsae resources: the expected 20 Mb of sequence from the NHGRI program, the emerging physical map and more than 3,000 ESTs. Sequences of the fosmids containing each of the 1,000 or so cancer-related genes could be determined. About 32 Mb (1,000 fosmids X 40 Kb/fosmid less the 20% of sequence already determined) of sequence would be needed. This project should take about three years. For a modest cost, this project could be expanded to include the determination of C. briggsae genomic sequence for any gene in the genome in response to the specific request of an investigator. More generally, we propose that the NCI should work with the NHGRI to develop broad-based NIH support from multiple NIH Institutes for the determination of the complete genomic sequence of C. briggsae, since the resulting information would be broadly relevant to human disease.

S.2.B.ii. Support the generation of a set of C. elegans gene knock-outs (KOs), with an initial focus on cancer-related genes. Such mutants would greatly leverage the utility of the increasingly complete genomic DNA sequence information that is available for C. elegans. Interested investigators could immediately obtain strains lacking the function of genes identified by sequence and homologous to those defined in a biological or medical context. Given current technologies and costs, the best estimate is that each KO now takes an average of one week and costs about $2K. Improving technologies, the use of robots, and efficiencies of scale should substantially accelerate the pace and reduce the cost. These same factors underscore the advantages of approaching KOs in a centralized fashion, rather than on a case-by-case basis in individual laboratories. It has been suggested that the project be divided into three phases: a pilot phase of two years, a production phase of three years, and a closure phase of two years. The total project would be brought to completion over a seven-year period. Initial KOs could be focused on specific genes of interest, e.g. those relevant to cancer and those requested by investigators (ensuring their rapid and detailed study). The center(s) responsible for generating KOs should function very efficiently in response to the community, keeping response time to a minimum. Frivolous requests should be discouraged, e.g. by requiring a payment for each KO. A charge of $600, for example, would serve this purpose and would also reduce central costs significantly. All existing KOs should be made immediately available to the academic community. An ongoing systematic effort of generating non-requested KOs should exist in parallel to that of generating requested KOs, with its pace dependent upon the demands placed upon the facility by the community. Because worm strains can be stored frozen, the costs of maintaining and distributing KOs would be minimal. The generation of a KO library might well be coordinated between the NCI and other agencies and groups. Industrial efforts might well be included.

S.2.B.iii. Support the generation of a set of C. elegans full-length sequenced cDNAs. The generation of full-length cDNAs is a rate-limiting step in various types of gene-specific experiments, including misexpression studies and the generation of protein products for biochemical experiments. In addition, a set of characterized (gridded and sequenced) and presumably normalized cDNAs would confirm the expression of predicted genes and identify the exon structure(s) of predicted and non-predicted genes. At present, Yuji Kohara (Mishima, Japan) has identified cDNAs for 6,500 C. elegans genes (from 25,000 cDNAs from a partially normalized library); not all of these cDNAs are full-length. Clearly, a more complete set would have substantial value. As noted above (see Section S.2.B.i), the generation of a full-length cDNA is an experiment that an individual investigator may be able to do nearly as efficiently as a center dedicated to the effort. Nonetheless, a comprehensive set of cDNAs would provide an available permanent resource. Furthermore, such a set could be used in ways not possible with individual cDNAs generated by individual investigators. For example, a gridded expression library could be used to synthesize proteins in vitro to be tested for modification by specific kinases, proteases, etc. Such kinase and protease substrates could well define new cancer genes. A normalized and gridded cDNA expression library useful for this specific type of application might be generated without characterizing it fully. The value of a comprehensive set of cDNAs coupled with the relatively low cost of extending the existing set (perhaps 8,000 new genes, 12 MB of sequence) argue strongly that this effort would be highly worthwhile. To complement this effort, RT-PCR also could be used to obtain cDNAs generated using two primers defined by genomic sequence. This approach could be used specifically to generate cDNAs predicted from the genomic sequence but not isolated from cDNA libraries. In addition, it could be enormously

useful in identifying alternative splice forms. If all 16,000 genes were studied in this way (to search systematically for alternative splice forms), about 22 Mb of coding sequence would be generated.

S.2.B.iv. Support both basic research in areas relevant to cancer and the development of infrastructure at a level that can be appropriate to one or a few standard C. elegans research laboratories. Some examples include: better informatic and experimental methods to define proteins given the availability of genomic sequence; the definition of temporal and spatial patterns of gene expression (including the development and use of array technologies; see Section 2.B.i); the definition of gene pathways, using genetic or biochemical methods; the development of improved systems for driving gene expression in particular cells at particular times; improved methods for germline transformation; the development of methods for C. elegans cell culture.


Specific Organism Recommendation 3: Drosophila.

S3.A. Rationale. Like C. elegans, Drosophila melanogaster is a superb organism for genetic analysis. Highly sophisticated methods have been developed for the controlled misexpression of genes, generation of genetic mosaics, and maintenance of animals containing mutations in essential genes. A wide variety of biochemical and cell biological methods and reagents are also in place, such as in vitro transcription and stable cell lines. Drosophila is morphologically and physiologically more complex than C. elegans and exhibits phenomena, such as regulative growth control during development and metastatic tumor formation, not known to occur in C. elegans. The Drosophila genome is about 50% bigger than that of C. elegans, but is thought to contain about the same number of genes. The genome project for Drosophila lags that for C. elegans by about three years. A high quality physical map covering 90% of the euchromatic genome exists and large scale sequencing is underway with an estimated completion date of 2001. Already sequence and detailed biological data concerning about 10% of all Drosophila genes have been provided by individual researchers, and over half of all genes are represented as ESTs. Drosophila has been used extensively for studies of processes relevant to cancer, such as cell cycle control, repair of DNA damage, the Ras, Hedgehog, Wnt and Notch signal transduction pathways, and tumor formation.

S3.B. Specific Recommendations.

S3.B.i. Support the generation of a set of Drosophila full-length sequenced cDNAs. Because the complete genomic sequence of Drosophila will be completed no earlier than the end of 2001, it is very important that an alternative approach be taken to provide a catalog of Drosophila genes. The most efficient approach would be the production of a set of full-length cDNA sequences, which could allow the prediction of greater than 75% of fly proteins within one to two years. This information would facilitate the identification of Drosophila homologs of cancer genes found in other organisms as well as allow the use of mass spectroscopic methods for protein identification. cDNA sequences will be required even after the completion of the genomic sequence, since it is unlikely that programs will exist capable of unequivocally defining genes from fly genomic sequence. The Howard Hughes Medical Institute has funded a Drosophila EST project to determine the sequences of 40,000 5' ESTs from fly cDNAs. To date, 20,000 have been done, and the target date for completion is August, 1998. These ESTs are being generated from libraries in which the majority of the clones are full-length or nearly full-length. cDNAs from more than one-third of all Drosophila genes are now available from these sequenced clones, and that number should reach 70-80% by the conclusion of the project, particularly if the sequences of an additional 20,000 ESTs were determined. Thus, it would be feasible at any time to begin providing these clones to a sequencing center that could convert them to full-length sequence. The existing EST sequences could be used to effectively normalize the library, so that only one cDNA from each gene was chosen for full-length sequencing. It was proposed that the cost could be significantly lower if one had relaxed quality standards, that is, if one were willing to tolerate error rates greater than used for the genomic sequence. This approach is reasonable, since eventually high quality genomic sequence will be available to correct any errors in the cDNA sequence.

S3.B.ii. Support the generation of a set of Drosophila gene knock-outs (KOs). There are proven and highly effective methods for making KOs in Drosophila using insertional mutagenesis using the transposable P element, which allows the generation of a set of lines each carrying a single stable P element insertion that can be rapidly mapped to the nucleotide. However, the insertions are not directed at a particular gene, so this process is most efficient when the goal is to disrupt a large fraction of all genes in a genome-wide project, rather than having individual investigators attempt to disrupt single genes. A small scale effort along these lines is funded by the NHGRI Drosophila Genome Center grant. This project will lead to the disruption of 25% to 35% of all essential Drosophila genes when the project is completed by the end of 1998. Pilot projects to extend this analysis to the 2/3 of Drosophila genes that do not mutate to a detectably abnormal phenotype are underway. Since transposable elements have some site preference, these methods are applicable to only about 2/3 of all Drosophila genes. The approach would be to generate about 30,000 random insertions, determine the sequence at the site of insertion to map the insertion to the nucleotide by comparison to the genomic sequence, and then select a non-redundant set of 5,000 to 6,000 lines for long-term maintenance. The cost to maintain and distribute these lines could be recovered by a recharge system. Existing technology makes it possible for the P-elements inserted in these lines to carry other features, such as enhancer traps, site-specific recombination sites, dominant markers and regulated promoters. Such promoters can be used to study the effects of misexpression of the gene at the site of insertion. Spatially and temporally targeted misexpression of individual genes provides an alternative way to perturb gene regulatory networks.

S3.B.iii. Support the determination of the genomic sequence of the fruit fly Drosophila virilis. As discussed above (see Section S2.B.i), one approach to the identification of functionally important genomic elements is to identify evolutionarily conserved regions by interspecies comparisons. For example, DNA sequence comparisons of the promoter regions of four different rhodopsin genes from D. melanogaster and D. virilis revealed an interchangeable conserved set of core sequences with additional upstream sequences conferring cell-type specificity. Detailed mutagenesis studies of 31 regulatory regions revealed that seven of the eight conserved sequences are compromised in their functions when mutagenized, whereas none of the 23 nonconserved regions perturb normal function when altered. Thus, it is likely that computer comparisons of sequences between related species will reveal a proportion of conserved core regulatory sequences for genes. In addition, such comparisons could lead to the identification of small exons or non-protein coding genes that would be missed by computational analysis of genomic DNA sequences alone. The value of such comparisons as a highly efficient method of genomic sequence analysis is undeniable. However, the high costs of such an approach at current DNA sequencing costs may mean that fiscal realities prevent the initiation of a full-scale effort, at least until the higher priority project of determining the D. melanogaster sequence is complete. In this case, it would still be highly valuable to target the sequencing of selected genomic regions containing genes of known relevance to cancer.


Specific Organism Recommendation 4: Danio rerio.

S4.A. Rationale. As a vertebrate, the zebrafish is more closely related to humans than are yeast, worms or flies. The zebrafish is very well suited for studies of early embryogenesis and organogenesis, since the embryo is transparent and develops from the time of fertilization outside the mother's body. Understanding the processes responsible for organogenesis should be directly relevant to understanding the processes of dysmorphogenesis that characterize cancerous growth. Furthermore, fish get tumors. Zebrafish can be subjected to genetic screens of a scale far greater than is possible with mice (although of a much smaller scale than is routine with yeast, worms and flies). Two large-scale screens have been performed to date and have generated mutants that have defined about 600 genes important to the development of many vertebrate organs, including pancreas, gut, kidney, blood and vessels and relevant to a variety of human disorders, including cancer, neurodegenerative diseases, arrhythmias and heart failure. Thus, the zebrafish offers the opportunity of using classical genetics to define gene functions. Once oncogene and tumor suppressor gene homologs have been placed on the zebrafish genetic map, such genes become candidate genes for correspondence to existing mutant strains. Insertional mutagenesis, which can allow the direct cloning of mutationally defined genes, is also possible (although at present only 1/70th as efficient as chemical mutagenesis).

S4.B. Specific Recommendations.

S.4.B.i. Support the generation of a zebrafish microsatellite map at an average marker density of 2 cM. The most powerful route to the discovery of novel genes using zebrafish is through genetics, using mutant phenotypes to identify novel genes and gene functions. To clone such mutationally defined genes requires mapping them between two physically defined markers and then obtaining DNA between those markers. A high-resolution microsatellite map is needed to (1) provide closely spaced markers to facilitate such direct positional cloning of mutant genes and (2) provide anchors to the genetic map for a radiation hybrid-based EST map and for a future physical map (both of which will further facilitate positional cloning). Microsatellite markers are robust, designed to be fully informative with diploids and can be useful in any of the commonly used strains. To attain a 2 cM marker density, at least 2,500 markers will be needed. (The zebrafish genome is 1.7 X 103 Mb and about 2,500 cM.) Currently available M13 libraries should be sufficient to complete a map to this density. To define 2,500 markers, about 25,000 clones that hybridize to CA repeats should be isolated and have their sequences determined. (Based on pilot studies, of any 100 such clones isolated on the basis of their hybridization to CA repeats, on average 72 will be discarded because their CA repeats are outside the 200-300 bp range or require primers that cannot be used under standard conditions; an additional 16 will be discarded because they are not polymorphic or are not codominant; thus, about 10% of the original clones isolated eventually will define useful markers.)

S.4.B.ii. Determine the DNA sequences of at least 100,000 zebrafish expressed sequences. Expressed sequences both can define fish homologs of genes identified in other organisms, such as human cancer genes, and can be used to define open reading frames (ORFs) and hence facilitate positional gene cloning as well as gene cloning after insertional mutagenesis. Software to identify ORFs from genomic sequence is at present inadequate and likely to remain so. An initial effort focused on expressed sequence tags (ESTs) is recommended. Such an effort should provide some coding sequence, help normalize a subsequent cDNA effort, and provide tools for the cloning and mapping of particular cDNAs of interest, such as those encoded by cancer genes. A second effort to obtain and determine the sequences of full-length cDNAs would provide far more information of the same sort as well as generate reagents for rescue and overexpression studies. The precise number of cDNAs needed will depend upon the normalization procedure used, but 100,000 cDNAs should correspond to about 50% of the expressed genes.

S.4.B.iii. Support additional infrastructure and studies of the potential utility of additional genetic techniques. Such techniques include the generation of a panel of local BAC contigs, which might facilitate "walks" from BAC to BAC; the development of methods for targeted gene deletion, for the complementation of mutant phenotypes by injected wild-type DNA, and for driving gene expression in particular tissues at particular developmental stages; and improvements in the efficiency of insertional mutagenesis. Support for a zebrafish stock center is critical for the maintenance and distribution of mutant and wild-type fish; a grant application for funds to establish a stock center (in Eugene, OR) is currently under review.


Specific Organism Recommendation 5: Xenopus

S5.A. Rationale. The frog has been an important biological model for embryonic development for over a century but it has also been an particularly important tool for biochemical investigation of cellular processes in the past 10 years. As a system there are several advantages that are easily exploited: the eggs develop rapidly with synchronously and various states of the cell cycle can be obtained in stable form; the eggs and oocytes are easily injectable enabling the assay of complex intracellular organelles and macromolecules; the early stages of development are completely accessible and are replete with signaling pathways found in mammals and in human tumors; they can be used to identify and isolate genes by ectopic expression and by transgenesis; extracts of eggs can carry out the complex steps of the cell cycle and serve as an excellent starting point for biochemical investigations. Although not a system that has been exploited for genetic investigation, the frog has benefitted from the extraordinary conservation of the core reactions involved in cell proliferation and in intracellular communication. For example, the lessons obtained from yeast genetics and frog egg extract biochemistry have been the principal source of our understanding of the cell cycle and have been readily applied to human tumors.

S5.B. Specific Recommendations.

S5.B.i. Support the Development of a Xenopus Expressed Sequence Tag (EST) Library. Since the principal research activities using the frog have been biochemical purification and gene discovery, there is a need to identify readily isolated proteins or clones. In many cases the strong conservation of core components between mammals and frogs (and often even yeast) has enabled this identification. However, the identification of such similarity often requires extensive sequencing to define regions of homology. The presence of families of regulatory molecules leaves an ambiguity in these assignments. To proceed from protein sequence to gene is often a laborious and sometimes an unsuccessful practice. Although gene and protein identification will be simplified for human tissue when the human genomic sequence is completed, it will be a long time before this benefit is extended to other vertebrate organisms. The EST databases have been relatively inexpensive and immensely useful to biochemists and to those identifying genes by functional or other assays. The establishment of a Xenopus EST database would facilitate the interface between discoveries made during studies of amphibians and discoveries made during studies of mammals.

S5.B.ii. Improve facilities for frog husbandry. The cost of keeping non-mammalian vertebrates, such as frogs and fish, is far less than that of keeping mice, and this fact alone will allow extensive screening, mutagenesis and transgenesis studies using these organisms beyond what is practical with mammals. However, despite the lower costs of keeping frogs, there are difficulties faced by frog investigators in animal quality, availability and health that have not been addressed. Adequate animal facilities are generally not available and inbred lines carrying transgenes have not been disseminated and propagated. We propose that some money be set aside to develop frog facilities capable of meeting the need for biochemical quantities of eggs and for the maintenance of inbred and genetically selected animals.



NCI Preclinical Models for Cancer Working Group

Subgroup 3: Non-Mammalian Models for Human Cancers

Holiday Inn Bethesda Bethesda, MD

June 25-26, 1997

Faye C. Austin, Ph.D.
Division of Cancer Biology
National Cancer Institute, NIH
Executive Plaza North, Room 500
6130 Executive Boulevard
Bethesda, MD 20892
(301) 496-8636
(301) 496-8656 (FAX)
Carol A. Dahl, Ph.D.
Assistant to the Director
National Cancer Institute, NIH
Federal Building, Room 312
7550 Wisconsin Avenue
Bethesda, MD 20892
(301) 496-1550
(301) 496-7807 (FAX)
Mark C. Fishman, M.D.
Director, Cardiovascular Research Center
Massachusetts General Hospital
Building 149, 13th Street, 4th Floor
Charlestown, Massachusetts 02129
(617) 726-3738
(617) 724-9564 (FAX)
Edward Harlow, Ph.D.
Professor of Medicine
Massachusetts General Hospital Cancer Center
Building 149, 13th Street
Charlestown, MA 02129
(617) 726-7800
(617) 726-7808 (FAX)
Ira Herskowitz, Ph.D. Professor
Department of Biochemistry & Biophysics
University of California, San Francisco
513 Parnassus Avenue, Room S-964
San Francisco, CA 94143-0448
(415) 476-4977
(415) 476-0943 (FAX)
H. Robert Horvitz, Ph.D.
Professor of Biology, Massachusetts
Institute of Technology;
Investigator, Howard Hughes Medical Institute
Department of Biology 68-425
77 Massachusetts Avenue
Cambridge, MA 02139-4320
(617) 253-4671
(617) 253-8126 (FAX)
Marc Kirschner, Ph.D.
Chairman, Department of Cell Biology
Harvard Medical School
240 Longwood Avenue, C-517
Boston, MA 02115-5730
(617) 432-2230
(617) 432-0420 (FAX)
Richard Klausner, M.D.
National Cancer Institute, NIH
Building 31, Room 11A48
31 Center Drive, MSC 2590
Bethesda, MD 20892-2590
(301) 496-5615
(301) 402-0338 (FAX)
John W. Newport, Ph.D.
Department of Biology, 0347
University of California, San Diego
9500 Gilman Drive, Room 2122A Pacific Hall
La Jolla, California 92093-0347
(619) 534-3423
(619) 534-0555 (FAX)
Patrick O'Farrell, Ph.D.
Department of Biochemistry & Biophysics
University of California, San Francisco Room S-964
San Francisco, California 94143-0448
(415) 502-5143 (FAX)
Anthony J. Pawson, Ph.D.
Head, Programme in Molecular Biology and Cancer
Samuel Lunenfeld Research Institute
Mount Sinai Hospital
600 University Avenue, Room 989
Toronto, Ontario M5G 1X5, Canada
(416) 586-8262
(416) 586-8857 (FAX)
Alan S. Rabson, M.D.
Deputy Director
National Cancer Institute, NIH
Building 31, Room 11A48
31 Center Drive
Bethesda, MD 20892
(301) 496-1927
(301) 496-2471 (FAX)
Gerald M. Rubin, Ph.D.
John D. MacArthur Professor of Genetics
University of California, Berkeley / Howard Hughes Medical Institute
Life Sciences Addition, Box 539
Berkeley, CA 94720-3200
(510) 643-9945
(510) 643-9947 (FAX)
Edward A. Sausville, M.D., Ph.D.
Associate Director
Developmental Therapeutic Program
Division of Cancer Treatment Diagnosis & Centers
National Cancer Institute
Executive Plaza North, Room 843
6130 Executive Boulevard
Bethesda, MD 20892-7458
(301) 496-8720
(301) 402-0831 (FAX)
Susan Sieber, Ph.D.
Deputy Director
Division of Cancer Epidemiology and
National Cancer Institute
Executive Plaza North, Room 540
6130 Executive Boulevard
Bethesda, MD 20892
(301) 496-5947
(301) 402-3256 (FAX)
Robert L. Strausberg, Ph.D.
Assistant to the Director
National Cancer Institute, NIH
Federal Building, Room 312
7550 Wisconsin Avenue
Bethesda, MD 20892-9010
(301) 496-1550
(301) 496-7807 (FAX)
George Vande Woude, Ph.D.
Scientific Advisor to the Director for
Basic Sciences
National Cancer Institute, NIH
Building 31, Room 3A11
Bethesda, MD 20892-2440
(301) 496-4345
(301) 480-0956 (FAX)
Susan Waldrop
Assistant Director for Program Coordination
Office of Science Policy
National Cancer Institute, NIH
Federal Building, Room 312
7550 Wisconsin Avenue, MSC 9010
Bethesda, MD 20892-9010
(301) 496-1458
(301) 496-7807 (FAX)
Robert H. Waterston, M.D., Ph.D.
James S. McDonnell Professor and Head Department of Genetics
Washington University School of Medicine
Genome Sequencing Center, Box 8501
4444 Forest Park
St. Louis, MO 63108
(314) 286-1803
(314) 286-1810 (FAX)
Robert E. Wittes, M.D.
Division of Cancer Treatment, Diagnosis and Centers
National Cancer Institute, NIH
Building 31, Room 3A44
31 Center Drive, MSC 2440
Bethesda, MD 20892-2440
(301) 496-4291
(301) 496-0826 (FAX)
<< Back Table of Content