February 16-17, 1999

Summary Prepared by
Raju Kucherlapati and David Valle, Co-Chairs

The Human Genome Project has stimulated increasing interest in genome biology for a number of model organisms as the utility of genomic technologies and resources, such as cDNA and genomic sequences, is rapidly being realized. Several groups have advocated that the NIH support the generation of genomic and genetic resources for a number of non-mammalian model organisms. In 1997, the National Cancer Institute (NCI) convened a small panel to discuss the use of non-mammalian model organisms to facilitate the study of human cancer. Among this panel's recommendations was the development of the infrastructure (genetic and genomic resources and technologies) needed to facilitate basic research in those model organisms important for cancer research. This panel outlined a series of specific recommendations for five model organisms: S. cerevisiae (yeast); C. elegans; (round warm) D. melanogaster (fruit fly); D. rerio (zebrafish) and X. laevis (Xenopus). The list of recommendations can be found at: nci_nmm_report.html. Similarly, in 1997 the zebrafish community presented the NIH with a list of genomic resources needed for their research efforts. Because of the costs of such large-scale projects and the shared interest of many Institutes and Centers in supporting the development of these genomic resources, the NIH is facing the challenge of providing resources for studying non-mammalian model organisms.

To address this challenge, the NIH convened a workshop for the purpose of evaluating the current status of genomic resource development for the non-mammalian model organisms already undergoing genomic analysis, identifying additional resource needs for these organisms and considering what additional model organisms might be suitable to similar development. Approximately 80 scientists, together with an equal number of staff from the NIH, as well as representatives from several other governmental agencies, including the National Science Foundation, the Department of Energy and the U.S. Department of Agriculture, met on February 16-17, 1999, on the NIH campus in Bethesda, MD. This workshop was designed to have a broad group of scientists provide input on this subject, and it was recognized that these discussions likely would be followed by more focused dialogs. One of the desired outcomes of the workshop was the generation of ideas as to how both the NIH and the relevant research communities could move forward in this area in the future.

The workshop focused primarily on establishing priority needs for the five model organisms identified by the NCI panel. Approximately 8-10 investigators from each of these communities were present at the workshop. Scientists working on a diverse set of other organisms were also in attendance. There were five breakout session groups, each of which focused its discussion predominantly on the resource needs of one of the five major organisms, and a prioritized list of recommendations was developed. A summary of these recommendations is presented at the end of this Executive Summary; individual reports are presented in the section "Breakout Group Reports" and "Recommendations for Additional Selected Model Organisms". Presentations were also made on nine additional non-mammalian model organisms. These organisms were chosen primarily because they are the subject of a significant level of NIH-supported research. Scientists representing the additional models made recommendations about priority needs for those other organisms and participated in a general discussion of value of model organisms beyond those that are already well studied. For all of the organisms that were discussed, a table featuring the major characteristics was compiled by participants and can be found in the section, "Tables Summarizing the Features of Selected Non-Mammalian Model Organisms".

The workshop was very successful in enhancing communication across many lines. Representatives from each of the five communities, as well as several of the others, put a great deal of effort into canvassing their communities prior to the workshop to develop a consensus on research needs. This was especially striking for the Xenopus and G. gallus (chicken) researchers who had not previously considered their resource needs as a community. Remarkably, approximately 100 researchers representing the chicken community submitted a proposal for a chicken genome project at the time of the workshop. At another level, there was considerable interaction between the different communities. Those groups representing organisms for which there is already a large amount of experience with structural and functional genomics, especially S. cerevisiae and C. elegans, conveyed lessons that they had learned. In addition, representatives from the zebrafish community, which had recently come together as a cohesive group to work on the generation of genetic and genomic resources, met with Xenopus investigators to discuss the lessons learned from launching the zebrafish genome project. Lastly, several of the groups are planning follow-up meetings that will focus specifically on the genomic resource needs for their model organism.


A number of common themes and issues were identified:


Genomic Sequences for Primary Model Organisms. The different breakout groups considered genomic sequencing, progress and needs. The genomic sequence of the yeast S. cerevisiae has been completed. More than 99% of the C. elegans sequence is complete, and the group recommended that closing the remaining gaps in the sequence of this organism as a top priority. Completion of the sequence of D. melanogaster may occur within the next year, depending on the success of the collaboration between researchers at the University of California, Berkeley and Celera Genomics. In any event, the sequence should be completed by 2001. The group recommended that a project to sequence the zebrafish genome be initiated with the goal of completing this sequence by the end of 2008. The Xenopus community did not consider genomic sequencing a high priority at this time.

Comparative Genomic Sequences. Comparative sequencing of genomes has proved to be a good predictor of gene structure and functionally important transcriptional regulatory regions. Identification of conserved regulatory regions may make it possible to assemble regulatory cascades by searching whole genome sequence for conserved transcription factor binding sites. The limited data from C. elegans and C. briggsae have shown the power of this approach. The complete sequence of C. briggsae and ultimately another more distantly related nematode, perhaps a parasite, would provide powerful tools for biologists. Since the genomic sequence of D. melanogaster will possibly be completed within a year, limited sequencing of a related species (e.g., D. virilis) would provide valuable information about functionally important sequences.

EST Sequences. Expressed sequence tags (ESTs), short sequences of cDNA clones, have proved extremely useful for a variety of research applications. For example, human ESTs have been extraordinarily useful for identification of human genes based on homology to genes identified in model organisms. Similarly, ESTs have been useful for gene identification in genomic sequence in a region of interest in positional cloning efforts or in regions surrounding an insert in insertional mutagenesis studies. For Xenopus and zebrafish, assembly of a large set of EST sequences was considered higher priority than genomic sequencing.

High cost is a general concern relevant to all large-scale sequencing efforts. There are few laboratories where the cost and efficiency of sequencing are such that the above recommendations can be implemented immediately. Thus, efforts should be made to improve sequencing technology, reduce the cost and increase the opportunity for more groups to become efficient in large-scale sequencing.

Full-length cDNA Clones and Sequences

All the breakout groups felt that availability of a fully representational, complete set of sequenced full-length cDNA clones would be an important resource. Such a unigene set of full-length cDNAs would be useful for confirming the expression of predicted genes and determining patterns of alternative pre-mRNA splicing; monitoring changes in genome-wide patterns of transcription using, for example, microarrays; systematic RNA-mediated interference (RNAi); two-hybrid analysis; and in vitro synthesis of protein products to be used for functional biochemical experiments. Therefore, efforts to generate such sets of clones and sequences should be given a top priority. While the availability of full-length cDNAs holds great promise for many types of experiments, the technology for systematically and efficiently isolating full-length cDNAs needs further development. Support for improving this critical technology should be continued.

cDNA Microarrays

The availability of the complete genome sequence and, in particular, identification of all transcription units, is revolutionizing the study of yeast biology. Similar consequences are expected for the study of other model organisms as their sequence becomes known. The development of microarray technology is playing a central role in this revolution, greatly facilitating and expanding functional analysis of genes and genomes. Currently, however, microarray technology is not widely available because it is not easily transferable, requires a high initial investment, and methods to quantify and interpret the results are just beginning to be developed. To make this technology as robust and broadly available as possible, it was recommended that NIH provide additional resources to enhance the dissemination of the technology, especially to academic researchers; to generate analytic tools to interpret the results; and to create of sets of standard controls to allow comparison of results between experiments and laboratories.

Genome-wide Gene Knock-outs

The status of the technology to obtain genetic inactivation or modification of genes differs widely for each of the organisms. Efficient technologies to generate such gene modification are still needed for zebrafish and C. elegans, for example. A genome-wide effort for modification of genes in yeast is underway, and a smaller scale project for D. melanogaster is in progress. Genome-wide knockouts should be developed as a central resource that is then made readily available to the community.


The availability of easily accessible, up-to-date, public databases is essential for storage, utilization and manipulation of the large amounts of genomic and genetic data that are being generated. To promote accessibility and interaction between model organism communities, it was considered of high importance that the databases for each of the model organisms have similar formats. Some of the features considered important for all databases included: effective links to the databases of other organisms; curated pathways (e.g., for metabolic and signal transduction pathways);curated and cross-referenced expression array data and methods for sorting existing array data; a phenotype-based search engine; image data for protein localization; and expansion to include new features, such as polymorphism data and unpublished information on mutant phenotypes. For all five organisms, databases at varying degrees of development are available, but the need for them to be significantly enhanced was recognized. Therefore, each group recommended an increase in support for the databases. Similarly, support to develop public databases for other models will be critical.

Centralized Resources and Their Distribution

To enable research on model organisms, several vital resources were identified. One of these is a stock center for each organism. Currently, individual laboratories are unable to store and distribute all the mutants they identify due to lack of space and funds for maintenance. Existing stock centers are also facing the same problem. The anticipated increase in the number of mutants that will be generated will require that the capacity for storing stocks and the funds for maintenance of these stocks be increased. Another resource that will be useful is a set of commonly used vectors, as is access to genomic and cDNA clones and libraries. It is necessary to provide adequate funds for individual research laboratories and large centers to store and distribute these key molecular reagents.

Cost Estimates

The participants made approximate yearly total cost estimates to implement the recommendations. These numbers were estimated at the time of the meeting and may not reflect the actual costs of these resources accurately. These cost estimated can be found in the breakout session reports.


Beyond the five main models that were considered by the breakout groups, the other major focus of the workshop was consideration of additional model organisms. What follows is a summary of this discussion:

Model organisms serve biomedical research in several ways. First, they exemplify intrinsically interesting biology. Investigators interested in a particular biological question utilize an appropriate model organism as their experimental system. Although medical concerns may not have figured in to the formulation of the question, the answer sometimes has great medical relevance. Examples of this serendipitous process include the discovery of the role of mismatch repair genes in familial cancer syndromes based on work in E. coli and S. cerevisiae; the elucidation of apoptosis as a common mechanism in neurodegenerative disease based on work in C. elegans, and the realization of the importance of hedgehog signaling in human developmental defects first worked out in D. melanogaster and in D. rerio. Second, investigators interested in studying a particular human problem may find that it is easier to approach using a model system. The recent explosion in our knowledge of the genes involved in genetic disorders of peroxisome biogenesis and function based on their initial identification in yeast, is a good example. Third, model organisms serve as models for models. Currently, the genome-wide approaches to functional genomics being developed in S. cerevisiae serve as a model for investigators working in C. elegans, D. melanogaster and other model systems.

For the purpose of this meeting, five model organisms were designated as "major" on the basis of their phylogeny, their experimental history, the size of their investigator community and the magnitude of their contribution to the sum of our biomedical knowledge. But this handful of organisms does not begin to encompass the biological diversity and experimental advantages of the millions of species comprising the 35 phyla of extant animal life. Thus, the organizers of the meeting felt it was important to consider additional model organisms in terms of the biological properties they best exemplify, their experimental utility and their value in providing a more complete sampling of phylogenetic and biologic diversity. Information was assembled and briefly presented on nine of these (summarized below).

Chlamydomonas (C. reinhartii). A unicellular organism with prominent chloroplasts, flagellum and basal bodies, Chlamydomonas has a 100 Mb genome and typically exists as a haploid organism although it is possible to construct diploids. Flagella and the closely related cilia are vital for many human cells and tissues including ciliated epithelia and sperm. About 10% of the Chlamydomonas genome is estimated to encode proteins necessary for flagellar structure and function; 34 of these have already been cloned. There is a well-developed investigator community and a stock center. EST sequences from organisms at certain stages of the cell cycle and a physical map with a BAC contig are top priorities of the Chlamydomonas community.

Tetrahymena (T. thermophila). A ciliated unicellular organism with interesting nuclear dimorphism: a transcriptionally inactive diploid micronucleus and a transcriptionally active, ~200 Mb macronucleus with ~250 chromosomes each ~1 Mb in size. Tetrahymena undergoes homologous recombination allowing facile gene disruption or replacement and research in Tetrahymena has led to the identification of self-splicing introns, telomere structure and identification of telomerase and telomerase RNA. Top priorities of the Tetrahymena community include support for a pilot project to explore direct shotgun sequencing of ~10% of the macronuclear genome. This would provide insight into genome organization and identify a set of genes to manipulate and characterize as models of human counterparts. This pilot would also speed the development of the technology for construction of high-resolution maps, cloning by complementation, insertional mutagenesis and the development of highly engineered strains. Additionally, funds for an annual course to train biologists interested in using Tetrahymena would enhance its value as a model.

Dictyostelium (D. discoideum). A free-living amoeba that undergoes aggregation and differentiation into a simple multi-cellular organism, Dictyostelium is a powerful model for the molecular genetics of phagocytosis, cytokinesis, cell/cell interactions and signal transduction pathways. A Dictyostelium genome project is underway with ~20% of the 34 Mb genome completed and a collection of ~10,000 ESTs. Relatively modest additional funds to support the finishing steps of the genome project and to support enhancement of a central database and stock storage and distribution center would greatly enhance the value of Dictyostelium as a model. Resources to be developed in the future include cDNA arrays.

Fission yeast (S. pombe). Fission yeast is a simple unicellular eukaryote readily amenable to genetic manipulation with stable haploid and diploid forms, homologous recombination and thousands of mutants and hundreds of genes already in hand. A genome project is underway with ~75% of the 14 Mb genome complete. S. pombe has proved to be a valuable model for the elucidation of cell cycle regulation and other vital cellular processes. The ancestors of S. cerevisiae and S. pombe diverged ~500-100 Myr ago; this evolutionary separation makes comparison of their genes a powerful tool for identification and analysis of human genes. Additional funds to complete and annotate the genome sequence, develop DNA arrays and genome wide mutagenesis would enhance the value of S. pombe as a model. In particular the synergism afforded by adding S. pombe to the list of models with completed genome sequences will be substantial in helping to decipher gene identification and function in the human sequence.

Neurospora (N. crassa). Neurospora and the filamentous fungi, in general, have been important model organisms for some time. Beadle and Tatum used Neurospora to develop the one gene/one enzyme hypothesis and more recently it has been used for studies of a wide variety of cellular processes many of which are possible in yeast. Homologous recombination, high frequency transformation and more than a 1000 identified mutants enhance its usefulness as a genetic model. Genome analysis is in progress with funds committed for sequencing ~30% of the genome and ongoing EST projects that have identified about 40% of the estimated 13,000 genes. Funds to continue these genomic studies plus to develop DNA arrays would increase the usefulness of Neurospora as a model.

Aplysia (A. california). A mollusc with simple, learned behaviors, Aplysia provides a useful model for neuronal interactions, synaptic plasticity, physiology and the study of memory. The cell/cell connections are easy to map and electrophysiology is facilitated by the very large neuronal size. Genetic study of Aplysia is only minimally developed with ~ 225 identified cDNAs. The accumulation of EST sequences, support for a central database and assembly of cDNA arrays would greatly increase the value of Aplysia as a model and would facilitate comparisons of genes important for neuronal function and behavior in other models such as Drosophila and mouse.

Sea urchin (S. purpuratus). A deuterostome metazoan, sea urchin has been a powerful model for gene regulation and early embryogenesis and as a representative of a non-vertebrate metazoan phyla. A modest EST collection is available and an urchin genome project is underway with private funding. Additional support for these projects would provide much improved access to an important segment of the biologic panoply.

Fugu (F. rubripes). The pufferfish, Fugu, is a model vertebrate genome characterized by a relative lack of repetitive DNA, small introns and dense gene packing so that there is ~1 gene/6-7 kb in a total genome of about 400 Mb. This characteristic plus its position in phylogeny makes sequencing the Fugu genome an efficient way to identify and characterize vertebrate genes. About 0.5 Mb of Fugu genomic sequence has been determined and is 21% coding sequence with conservation of synteny with mammalian genomes. Additional support for a Fugu genome project would provide information useful for gene identification in the human genome.

Chicken (G. gallus). Chicken has been a productive model for experimental embryology. A variety of methods have been developed to study and manipulate embryogenesis in ovo. These have lead to important contributions to our understanding of limb development, neurogenesis, body axis development, somite formation and other aspects of embryogenesis. These developmental studies and transfer of their results to human systems would be greatly improved by the generation of genetic resources including a robust chick EST database, a physical map of the chicken genome, support for a chicken mutant repository and development of a web-based database. A proposal for a chicken genome project was presented at the workshop.
Conclusions of model organism discussion. Each of these organisms has advantages for the study of certain biological processes and, at least for some, significant genomic resources are already being developed. Additionally, it was recognized that availability of genomic information on multiple models has a synergistic value for research in human genetics. For nearly all of these additional models, a relatively small investment would greatly enhance their genomic resources and value as experimental systems. Of particular interest was the development of EST sequences, genome sequencing and databases to collate the information and make it accessible to the entire community of biomedical scientists.Understanding our evolutionary history will provide enormous insight to development, gene function and to the role of genetic variation in human disease. Additional consideration of how the current collection of model organisms represents phylogeny should be made. The possible value of an insect model in addition to Drosophila and of an ascidian model to represent early chordates should be explored. Additional discussion will be required to select specific models and prioritize resources.
<< Back Table of Content Next >>