October 24, 2011

Genome Comparison Casts Light on Dark Areas of DNA

Illustration of a DNA double helix with insets showing close-up views of amino acids within.

A massive effort to sequence and compare 29 mammalian genomes has shed new light on the “dark matter” of the genome, the over 98% of DNA that doesn’t code for proteins.

The DNA that lies outside of gene sequences was once called “junk DNA.” But researchers now know that these non-coding regions have important biological functions. Many disease-causing mutations have been found in these areas, and scientists have pieced together some clues to their functions. For example, some regions regulate the expression of genes, controlling when genes are turned on and off. Nevertheless, this vast genetic dark matter remains largely uncharted.

To gain new insights, an international team of researchers set out to compare the sequences of several mammalian species. Regions that remain the same or have only gradually evolved, they reasoned, must have some function. Their work, which began in 2005, was funded by NIH’s National Human Genome Research Institute (NHGRI), National Institute of General Medical Sciences (NIGMS) and others. The team was led by scientists at the Broad Institute of MIT and Harvard, the Genome Institute at Washington University and the Baylor College of Medicine Human Genome Sequencing Center.

In the early online edition of Nature on October 12, 2011, the researchers reported the sequencing of 20 new mammalian genomes, including rabbit, dolphin and elephant. They compared these new sequences with 9 others that were previously described, including humans.

The scientists found that at least 5% of the genome appears to be constrained by evolution. They were able to identify 3.6 million specific elements under constraint, which together make up over 4% of the human genome. These elements include hundreds of new families of RNA, thousands of previously undetected segments of protein-coding DNA, and 2.7 million elements thought to play a role in controlling gene expression.

“Using a single genome, the language of DNA seems cryptic,” says senior author Dr. Manolis Kellis of MIT. “When studied through the lens of evolution, words light up and gain meaning.”

Significantly, the researchers found that many of the elements they identified overlap with variants that were linked to diseases and conditions in previous genomics studies.

“This catalog will make it easier to decipher the function of disease-related variation in the human genome,” says first author Dr. Kerstin Lindblad-Toh. “The power of this resource is that it continues to improve with the inclusion of more species. It’s a very systematic and unbiased approach that will only become more powerful with the inclusion of additional genomes.”

— by Harrison Wein, Ph.D.

Related Links