This achievement represents a major milestone for the Human Genome Project because it provides a key tool needed to interpret the human sequence, a draft version of which was published last year. This information will allow researchers to gain insights into the function of many human genes because the mouse carries virtually the same set of genes as the human but can be used in laboratory research.
"The mouse sequence is much further along in the process than the human sequence was at the draft stage," said Francis S. Collins, M.D., Ph.D., director of the National Human Genome Research Institute. "Methods for efficient sequencing of large genomes continue to advance dramatically, and the sophistication of the team that accomplished this goal is truly impressive. This sets a new standard for speed, accuracy, and public accessibility."
For most human illnesses, from cancer to autoimmune disease, important insights have come from the study of mouse models. Having this advanced draft of the mouse sequence will greatly accelerate precise identification of the genetic contributors to those illnesses, leading to better understanding of human disease and improved tests and treatments. The mouse sequence will also allow researchers to recognize functionally important regulatory elements in the human genome by virtue of the fact that they are conserved through the 100 million years of evolution separating humans and mice.
"The mouse sequence provides a very important chapter from evolution's lab notebook," says Eric Lander, Ph.D., director of the Whitehead/MIT Center for Genome Research. "Being able to read evolution's notebook and compare genomic information across species will allow us to glean important information about ourselves. That's because evolution preserves the most important genetic information across species; if specific DNA sequences have been preserved by evolution over hundreds of millions of years, then they must be functionally important."
The draft sequence was assembled by the Mouse Genome Sequencing Consortium, an international team of researchers from the Whitehead Institute in Cambridge, MA, Washington University School of Medicine in St. Louis, MO, and the Wellcome Trust Sanger Institute and the European Bioinformatics Institute, in Hinxton, England, with funding from the National Institutes of Health and the Wellcome Trust in the United Kingdom.
The draft sequence shows the order of the DNA chemical bases A, T, C, and G along the mouse chromosomes. The current assembly includes more than 96 percent of the mouse genome with long, continuous stretches of DNA and represents a seven-fold coverage of the genome. This means that the location of every base, or DNA letter, in the mouse genome was determined an average of seven times, a frequency that ensures a high degree of accuracy.
"It is remarkable that we were able to complete the mouse genome in such a short time and with such great accuracy," says Robert Waterston, M.D., Ph.D., director of the Genome Center at Washington University. "We are now working hard with an international group of experts to explore the content of the sequence and to use it to improve our understanding of the human sequence."
The mouse genome is contained in 20 chromosome pairs and the current results suggest that it is about 2.7 billion base pairs in size, or about 15 percent smaller than the human genome. The human genome is 3.1 billion base pairs spread out over 23 pairs of chromosomes (22 autosomes and the X and the Y sex chromosomes).
Analysis of the genome assembly indicates roughly the same number of genes for the mouse as the human. So far researchers have found more than 22,500 high-quality gene predictions, with additional predictions expected to take the total to about 30,000.
The quality of the working draft sequence far exceeds the consortium's original expectations for this stage and was completed much sooner than initially expected, reflecting the tremendous efficiencies gained in sequencing and computational technologies in the past few years.
The mouse sequencing strategy combines the best features of the clone-based, hierarchical-shotgun and whole-genome-shotgun strategies. The scientists used data from more than 33 million individual sequencing experiments. Using two different computer systems, called genome assemblers, the team reconstructed the 33 million individual fragments into a draft sequence. These whole-genome assemblers, called ARACHNE and PHUSION were developed at the Whitehead Institute and at the Sanger Institute, respectively.
These long stretches of sequence, called contigs, were then linked into larger fragments called supercontigs of a typical length of 16.9 million base pairs. These supercontigs were then anchored to the mouse genetic and BAC clone maps. Finally, adjacent supercontigs were joined into even larger ultracontigs on the basis of other linking information. In the end, nearly the entire chromosomal sequence is contained in a mere 89 ultracontigs with a typical size of 50 megabases each.
"The mouse genome project has stimulated the development of two excellent computer algorithms for assembling very large genomes in the public domain," says Jane Rogers, Ph.D., at the Sanger Institute. "This will be enormously valuable for analyzing further genomes."
The sequence information is immediately and freely available to the world. The information will be utilized thousands of times daily by scientists in academia and industry, as well as by commercial database companies providing information services to biotechnologists.
The results from this analysis can be found at several websites, including http://mouse.ensembl.org at the European Bioinformatics Institute; at http://www.ncbi.nlm.nih.gov/genome/guide/mouse at the National Center for Biotechnology Information at the National Library of Medicine, and http://genome.ucsc.edu at the University of California, Santa Cruz. A comparison between the mouse sequence and the human sequence can be found at all three sites.
This milestone concludes the second phase of the consortium's mouse-sequencing effort: the production of a draft sequence by whole-genome shotgun method. In Phase III, the consortium will produce a "finished" version with the remaining gaps (the 4 percent where the sequence has yet to be determined) filled in and errors resolved. This phase will proceed using clone-based, or hierarchical, sequencing using the publicly available mouse genome clone map. A mapped set of BAC clones that covers the entire mouse genome is being sequenced. The BAC data will be combined with the draft genome sequence to finish the mouse sequence to the same high quality to which the human sequence is being completed. Clone-based sequencing remains the only method proven to produce a complete, fully accurate version of a complex genome. The complete genome sequence of the mouse will be available within 3 years.
2. These results reported today built on work originally performed by the Mouse Sequencing Consortium (MSC), a public-private consortium that included 16 NIH institutes, GlaxoSmithKline of Research Triangle Park, NC, the Wellcome Trust, Merck & Co. of Whitehouse Station, NJ, and Affymetrix of Santa Clara, CA. The MSC achieved the first three-fold coverage of the mouse genome using the whole genome shotgun technique, which represented the first phase of the project.
3. The National Institutes of Health funding for this effort included support from the National Human Genome Research Institute, National Cancer Institute, National Institute of Dental and Craniofacial Research, National Institute of Diabetes and Digestive and Kidney Diseases, National Institute of General Medical Sciences, National Eye Institute, National Institute of Environmental Health Sciences, National Institute of Aging, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institute on Deafness and Other Communication Disorders, National Institute of Mental Health, National Institute on Drug Abuse, National Center for Research Services and the Fogarty International Center.