September 6, 2016

Catalog of human genetic diversity expands

At a Glance

  • Analysis of an exome sequence data set from more than 60,000 people with diverse ancestries yielded new insights into human genetics.
  • The data catalog will help speed and refine future studies of inherited human diseases, such as identifying the genetic basis of rare disorders.
Diverse people stacking their hands together A new catalog of human genetic diversity is far larger than previous efforts. mangostock/iStock/Thinkstock

Genomics—the study of all the genetic instructions in an organism—can help researchers develop new diagnostic tools, tailor treatments based on an individual’s genetic make-up, and design novel approaches to study and treat disease. Whole-genome sequencing analyzes all the 6 billion DNA bases in a person's genome. Another approach called whole-exome sequencing focuses only on the DNA that codes for proteins. Such regions make up about 1% of the human genome, but contain many disease-causing variants.

In order to study sequence variants and their consequences in human DNA, researchers need access to large data sets. The Exome Aggregation Consortium (ExAC)—an international group led by scientists at the Broad Institute of MIT and Harvard—built on previous efforts and assembled the largest data set of human exomes to date. The effort was supported by several NIH components as well as funding agencies around the world. The team demonstrated the benefits of this massive data set in a report published on August 17, 2016, in Nature.

ExAC scientists assembled a data set of more than 60,000 human exomes from diverse populations. Significantly, all the DNA had been sequenced deeply—that is, each nucleotide was sequenced enough times to ensure the data’s accuracy.

The researchers discovered more than 7.4 million DNA variants across the exome. These corresponded to an average of one variant for every 8 DNA base pairs. The majority of these occurred at very low frequency and weren’t detected in previous data sets. Patterns of variation differed among people of European, African, South Asian, East Asian, and Latino ancestry.

The researchers found that the density of genetic variation isn’t uniform across the exome. This is partly because some DNA sequences are more susceptible to mutation than others. Some mutations also have greater consequences than others and are thus more likely to be preserved or eliminated.

Using their new findings, the researchers categorized genes according to how resistant they are to change, and thus how crucial they are to function. This knowledge could help investigators make decisions about which genes of interest to prioritize in future studies.

The scientists assessed whether the new data set could yield insights into inherited disease. They identified hundreds of suspected disease variants that should be reclassified as benign (harmless)—particularly in South Asian or Latino people, who were underrepresented in previous reference databases. These results show that the data set can serve as an important resource for interpreting genetic variants seen in the clinic.

“The scale and diversity of the ExAC resource is invaluable,” says senior author Dr. Daniel MacArthur of the Broad Institute, Massachusetts General Hospital, and Harvard Medical School. “It gives us the ability to discover extremely rare variants and offers an unparalleled window into the roots of rare genetic diseases.” The data catalog is freely available to the biomedical community.

—by Harrison Wein, Ph.D.

Related Links

Reference: Analysis of protein-coding genetic variation in 60,706 humans. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, Tukiainen T, Birnbaum DP, Kosmicki JA, Duncan LE, Estrada K, Zhao F, Zou J, Pierce-Hoffman E, Berghout J, Cooper DN, Deflaux N, DePristo M, Do R, Flannick J, Fromer M, Gauthier L, Goldstein J, Gupta N, Howrigan D, Kiezun A, Kurki MI, Moonshine AL, Natarajan P, Orozco L, Peloso GM, Poplin R, Rivas MA, Ruano-Rubio V, Rose SA, Ruderfer DM, Shakir K, Stenson PD, Stevens C, Thomas BP, Tiao G, Tusie-Luna MT, Weisburd B, Won HH, Yu D, Altshuler DM, Ardissino D, Boehnke M, Danesh J, Donnelly S, Elosua R, Florez JC, Gabriel SB, Getz G, Glatt SJ, Hultman CM, Kathiresan S, Laakso M, McCarroll S, McCarthy MI, McGovern D, McPherson R, Neale BM, Palotie A, Purcell SM, Saleheen D, Scharf JM, Sklar P, Sullivan PF, Tuomilehto J, Tsuang MT, Watkins HC, Wilson JG, Daly MJ, MacArthur DG; Exome Aggregation Consortium. Nature. 2016 Aug 17;536(7616):285-91. doi: 10.1038/nature19057. PMID: 27535533.

Funding: NIH’s National Heart, Lung, and Blood Institute (NHLBI), National Human Genome Research Institute (NHGRI), National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institute of General Medical Sciences (NIGMS), National Institute of Mental Health (NIMH), National Institute on Minority Health and Health Disparities (NIMHD), National Institute of Neurological Disorders and Stroke (NINDS); and various other funding agencies worldwide. For full list, see Supplement, page 49.