June 22, 2007

Beyond the Human Genome

Illustration of DNA double helix

In April 2003, researchers produced a high-quality version of the human genome — a sequence and map of all the genes in a human being. For those who thought that was the end of the story, an international research consortium has just published a massive set of papers that sets the stage to reshape our understanding of the genome.

In the past few years, scientists have made major strides in using DNA sequence data to identify genes, which are traditionally defined as the parts of the genome that code for proteins. However, protein-coding genes make up just a small fraction of the human genome — about 1.5-2%. Researchers have come to understand that other parts of the genome also have important but largely unknown functions.

The ENCyclopedia Of DNA Elements (ENCODE) consortium, organized by NIH's National Human Genome Research Institute (NHGRI), set out to test the feasibility of producing a comprehensive catalog of all components of the human genome that are crucial for biological function. In its pilot phase, 35 research groups from 80 organizations around the world focused on 1% of the human genome. They published the results of their exhaustive, four-year effort in the June 14, 2007, issue of Nature and in 28 companion papers in the June issue of Genome Research.

In this pilot phase, the researchers devised and tested approaches for quickly identifying functional elements in the genome. They discovered that the majority of DNA in the human genome is copied into functional mirror molecules called RNA transcripts, and that these transcripts extensively overlap one another. This broad pattern puts to rest the long-held view that the human genome consists of a relatively small set of discrete genes, along with a vast amount of “junk” DNA. The identified elements included protein-coding genes, non-protein-coding genes, regulatory regions involved in controlling gene activity, and elements that affect chromosome structure, dynamics and replication.

The ENCODE project represents the first systematic effort to determine where all types of functional elements are located and how they are organized. The new data show that the genome contains very little unused sequence and, in fact, is a complex, interwoven network. Genes are just one of many types of DNA sequences with a biological function.

NHGRI last month launched a companion project to ENCODE that will identify all functional elements in the genomes of the fruit fly and the roundworm. That four-year effort, dubbed model organism ENCODE (modENCODE), will examine the functional landscape of these smaller, and therefore more manageable, genomes, which should aid efforts to tackle such questions in humans.

Related Links