NIH News Release
NATIONAL INSTITUTES OF HEALTH
National Human
Genome Research Institute

EMBARGOED BY JOURNAL
Wednesday, December 4, 2002
2:00 p.m. ET
NHGRI Contact:
Geoff Spencer
(301) 402-0911

Whitehead Institute Contact:
Seema Kumar
(617) 252-1420

The Mouse Genome And The Measure of Man

Washington, D.C. — The international Mouse Genome Sequencing Consortium today announced the publication of a high-quality draft sequence of the mouse genome — the genetic blueprint of a mouse — together with a comparative analysis of the mouse and human genomes describing insights gleaned from the two sequences. The paper appears in the Dec. 5 issue of the journal Nature.

The achievement represents a landmark advance for the Human Genome Project. It is the first time that scientists have compared and contrasted the contents of the human genome with that of another mammal. This milestone is all the more significant given that the laboratory mouse is the most important animal model and is widely used in the study of human diseases.

"Publishing the sequence in 2001 of the first mammalian genome — our own — was a remarkable and historical achievement. To sequence another mammalian genome in less than two years and to discover the treasure trove of information one can derive from a comparison of the two is beyond nearly anyone's dreams. It constitutes a tremendously exciting and defining moment for biomedical research," said Francis S. Collins, M.D., Ph.D., Director of the National Human Genome Research Institute (NHGRI).

The mouse sequence provides scientists a powerful research tool to extract meaning from the human genome sequence, the "Book of Life" published in draft form last year. It allows them to recognize functionally important regions in the human genome by virtue of the fact that they are conserved through the 75 million years of evolution separating humans and mice. (See backgrounder on comparative genomics).

"This is an extraordinary milestone. For the first time we have an opportunity to see ourselves in an evolutionary mirror," says Eric Lander, Ph.D., Director of the Whitehead/MIT Center for Genome Research. "The mouse genome represents a very important chapter in evolution's lab notebook. Being able to read this notebook and compare genomic information across species allows us to glean important information about ourselves."

Because the mouse carries virtually the same set of genes as the human but can be used in laboratory research, this information will allow scientists to experimentally test and learn more about the function of human genes, leading to better understanding of human disease and improved treatments and cures. (See backgrounder on mouse models).

"The mouse genome sequence will change the way research is done in this important experimental animal, just as genome sequences have opened new avenues of study for yeast, worms, and flies" says Robert Waterston, M.D., Ph.D., Director of the Genome Sequencing Center at Washington University School of Medicine. "The genome sequence allows us to study genes and proteins in context and will spur the development of methods to study many genes in parallel. This more detailed molecular understanding of mouse biology will in turn produce new opportunities for understanding human disease and for devising effective therapies."

The draft sequence was assembled by the Mouse Genome Sequencing Consortium, an international team of scientists at the Whitehead Institute in Cambridge, Mass., Washington University in St. Louis, and the Wellcome Trust Sanger Institute and the European Bioinformatics Institute, in Hinxton, England. The project was funded in part by NHGRI of the U.S. National Institutes of Health (NIH) and the Wellcome Trust in the U.K. The sequencing centers were joined in the analysis effort by scientists from 27 institutions in six countries. These include computational biologists at University of California Santa Cruz in Santa Cruz, Calif., Pennsylvania State University in University Park, Penn., University of Oxford in Oxford, England, The Institute for Systems Biology in Seattle, Washington University School of Medicine in St. Louis, the University of Geneva in Switzerland and the Universitat Pompeu Fabra in Barcelona, Spain, among others. (See Nature paper for a complete listing of authors and institutions).

"This is another stunning success for a publicly funded international consortium that is already acting to stimulate biomedical research," says Jane Rogers, Ph.D., Head of Sequencing at The Wellcome Trust Sanger Institute. "Its value is predicated upon the quality of both the sequence and the tools to interpret it, all of which are freely available through Ensembl and similar browsers. Last week, as many people used the mouse sequence in Ensembl as used the human."

The sequence shows the order of the DNA chemical bases A, T, C, and G along the 20 chromosomes of a female mouse of the "Black 6" strain — the most commonly used mouse in biomedical research. It includes more than 96 percent of the mouse genome with long, continuous stretches of DNA sequence and represents a seven-fold coverage of the genome. This means that the location of every base, or DNA letter, in the mouse genome was determined an average of seven times, a frequency that ensures a high degree of accuracy.

Earlier this year, the mouse consortium announced that it had assembled the draft sequence of the mouse and deposited it into public databases. The consortium's paper this week reports the initial description and analysis of this text and the first global look at the similarities and the differences in the genomic landscapes of the human and mouse. The analysis was led by the Mouse Genome Analysis Group. Below are some of the highlights.

The sequence information from the mouse consortium has been immediately and freely released to the world, without restrictions on its use or redistribution. The information is scanned daily by scientists in academia and industry, as well as by commercial database companies, providing key information services to biotechnologists.

The work reported in this paper will serve as a basis for research and discovery in the coming decades. Such research will have profound long-term consequences for medicine. It will help elucidate the underlying molecular mechanisms of disease. This in turn will allow researchers to design better drugs and therapies for many illnesses.

"The mouse genome is a great resource for basic and applied medical research, meaning that much of what was done in a lab can now be done through the web. Researchers can access this information through www.ensembl.org, where all the information is provided with no restriction," says Ewan Birney, Ph.D., Ensembl Coordinator at the European Bioinformatics Institute.

Technology, Assembly and Analysis

The mouse sequence reported in today's journal is a high-quality draft sequence. The consortium plans to produce a "finished" version with the remaining gaps filled in and errors resolved. This phase will proceed using clone-based, or hierarchical, sequencing using the publicly available mouse genome clone map. The quality of the working draft sequence far exceeds the consortium's original expectations for this stage and was completed much sooner than initially expected, reflecting the tremendous efficiencies gained in sequencing and computational technologies in the past few years. Methods for efficient sequencing of large genomes continue to advance dramatically, and it is now becoming possible to sequence genomes much more rapidly.

The mouse sequencing strategy combines the features of the clone-based-hierarchical-shotgun and whole-genome-shotgun strategies. In the hierarchical-shotgun approach, individual large DNA fragments of known position are subjected to shotgun sequencing (shredded into small fragments that are sequenced and then reassembled in the basis of sequence overlaps). In the whole-genome-shotgun approach, the entire genome is shredded into small fragments that are sequenced and put back together on the basis of sequence overlaps. The scientists used data from more than 33 million individual sequencing experiments. Using two different computer systems, called genome assemblers, the team reconstructed the 33 million individual fragments into a draft sequence. These whole-genome assemblers, called ARACHNE and PHUSION were developed at the Whitehead Institute and at the Sanger Institute, respectively.

These long stretches of sequence, called contigs, were then linked into larger fragments called supercontigs of a typical length of 16.9 million base pairs. These supercontigs were then anchored to the mouse genetic and BAC clone maps. Finally, adjacent supercontigs were joined into even larger ultracontigs on the basis of other linking information. In the end, nearly the entire chromosomal sequence is contained in a mere 88 ultracontigs with a typical size of 50 megabases each.

Once assembled, the Ensembl database team, based at the EBI and Sanger Institute and funded primarily by the Wellcome Trust, coordinated the annotation of the assembled genome with interesting features, including genes. Ensembl automatically estimates where the genes are, primarily by finding similarities with known genes — both in mouse and in other organisms, such as human.

The work was then taken up by the mouse genome analysis group, comprised of genome analysis experts from 27 institutions in six countries. This "virtual genome analysis center" convened over the internet and by regular conference calls to interpret and analyze the mouse-human comparison.

"It's been very exciting to see how much we can learn from comparing the mouse genome to the human genome and yet we see how much more there is to study," says Kerstin Lindblad-toh, Ph.D., lead author of the Nature paper and Senior Program Manager Mouse Genome at Whitehead Institute.

What's Next?

The consortium's ultimate goal is to produce a completely "finished" mouse sequence — with no gaps and 99.99 percent accuracy. Researchers expect this will take two or three years. Although the near-finished version is adequate for most biomedical research, the Human Genome Project (HGP) has made a commitment to filling all gaps and resolving all ambiguities, as it did with the human sequence. The human sequence is expected to be finished by April 2003.

The HGP also plans to sequence the genomes of many other species, because comparing genomes across species will provide researchers additional tools for understanding the essential elements that evolution has designated as important to survival. This information will in turn translate into practical knowledge toward developing better therapies in the future.

Next in line for sequencing are the chimpanzee, the chicken, the cow, the dog, several species of fungi, a sea urchin, the honey bee and two simple organisms commonly used in laboratory studies (Oxytricha trifallax and Tetrahymena thermophila). Each organism will shed unique insights into human health and disease.

Comparative genomics will also offer scientists insights into important regions in the sequence that perform regulatory functions, such as switching on or off a gene or controlling its expression.

Finally, the sequence will serve as a foundation for a broad range of functional genomic tools to help biologists to probe the function of the genes in a more systematic manner. Development of such genomic tools will be one of the major thrusts for biologists in the next decade, according to the scientists.

All of the results from this analysis can be found at several websites, including http://mouse.ensembl.org at the European Bioinformatics Institute, http://www.ncbi.nlm.nih.gov/genome/guide/mouse at the National Center for Biotechnology Information at the National Library of Medicine, and http://genome.ucsc.edu at the University of California, Santa Cruz.

The $130 million effort has been funded by government agencies and public charities in the various countries, including the NIH's NHGRI, the National Cancer Institute, the National Institute on Deafness and Other Communication Disorders, the National Institute of Diabetes and Digestive and Kidney Disease, the National Institute of Neurological Disorders and Stroke, and the National Institute of Mental Health; the U.S. Department of Energy; The Wellcome Trust and the Medical Research Council in England; and agencies in Canada, Japan, Germany, Switzerland and Spain. In addition, funding for the draft sequence was also provided by GlaxoSmithKline, The Merck Genome Research Institute and Affymetrix, Inc.

NHGRI is one of the 27 institutes and centers at the NIH, which is an agency of the Department of Health and Human Services (DHHS). The NHGRI Division of Extramural Research supports grants for research and for training and career development at sites nationwide. Additional information about NHGRI can be found at its Web site, www.genome.gov.

Additional backgrounders on the mouse can be accessed at: http://www.genome.gov/page.cfm?pageID=10005831.