April 12, 2022

First complete sequence of a human genome

At a Glance

  • Researchers finished sequencing the roughly 3 billion bases (or “letters”) of DNA that make up a human genome.
  • Having a complete, gap-free sequence of our DNA is critical for understanding human genomic variation and the genetic contributions to certain diseases.
Illustration of DNA and human forms Researchers completed the first gap-free sequence of the entire human genome.vitstudio / Shutterstock

The Human Genome Project, completed in 2003, covered about 92% of the total human genome sequence. The technologies to decipher the gaps that remained didn’t exist at the time. But scientists knew that the last 8% likely contained information important for fundamental biological processes.

Since then, researchers have developed better laboratory tools, computational methods, and strategic approaches. The final, complete human genome sequence was described in a set of six papers in the April 1, 2022, issue of Science. Companion papers were also published in several other journals.

The work was done by the Telomere to Telomere (T2T) consortium. T2T is led by researchers at NIH’s National Human Genome Research Institute (NHGRI), the University of California, Santa Cruz, and the University of Washington, Seattle. NHGRI was the primary funder.

“Short-read” technologies were originally used to sequence the human genome. These provide several hundred bases of DNA sequence at a time, which are then stitched together by computers. Such methods still leave some gaps in genome sequences. 

Over the past decade, two new DNA sequencing technologies emerged that can read longer sequences without compromising accuracy. The PacBio HiFi DNA sequencing method can read about 20,000 letters with nearly perfect accuracy. The Oxford Nanopore DNA sequencing method can read even more—up to 1 million DNA letters at a time—with modest accuracy. Both were used to generate the complete human genome sequence.

In total, the new project added nearly 200 million letters of the genetic code. This last 8% of the genome includes numerous genes as well as repetitive DNA sequences, which may influence how cells function. Most of the newly added sequences were in the centromeres, the dense middle sections of chromosomes, and near the repetitive ends of each chromosome.

The complete genome sequence will be particularly valuable for studies that aim to understand how DNA differs from person to person. For example, T2T researchers used the sequence as a reference to discover more than 2 million previously unknown sequence variants in the human genome. These included variants within many medically relevant genes. 

“This complete human genome sequence has already provided new insight into genome biology, and I look forward to the next decade of discoveries about these newly revealed regions,” says Dr. Karen Miga, a co-chair of the T2T consortium at the University of California, Santa Cruz.

“Truly finishing the human genome sequence was like putting on a new pair of glasses,” says consortium co-chair Dr. Adam Phillippy, whose group at NHGRI led the effort. “Now that we can clearly see everything, we are one step closer to understanding what it all means.” 

This accomplishment can now serve as a model for sequencing genomes from globally diverse people—a goal researchers are pursuing. Further work is also needed to finish the complete sequence of the Y chromosome, which was not contained in the cells used for this study.

“This foundational information will strengthen the many ongoing efforts to understand all the functional nuances of the human genome, which in turn will empower genetic studies of human disease,” says Dr. Eric Green, director of NHGRI.

Related Links

References: The complete sequence of a human genome. Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, Aganezov S, Hoyt SJ, Diekhans M, Logsdon GA, Alonge M, Antonarakis SE, Borchers M, Bouffard GG, Brooks SY, Caldas GV, Chen NC, Cheng H, Chin CS, Chow W, de Lima LG, Dishuck PC, Durbin R, Dvorkina T, Fiddes IT, Formenti G, Fulton RS, Fungtammasan A, Garrison E, Grady PGS, Graves-Lindsay TA, Hall IM, Hansen NF, Hartley GA, Haukness M, Howe K, Hunkapiller MW, Jain C, Jain M, Jarvis ED, Kerpedjiev P, Kirsche M, Kolmogorov M, Korlach J, Kremitzki M, Li H, Maduro VV, Marschall T, McCartney AM, McDaniel J, Miller DE, Mullikin JC, Myers EW, Olson ND, Paten B, Peluso P, Pevzner PA, Porubsky D, Potapova T, Rogaev EI, Rosenfeld JA, Salzberg SL, Schneider VA, Sedlazeck FJ, Shafin K, Shew CJ, Shumate A, Sims Y, Smit AFA, Soto DC, Sović I, Storer JM, Streets A, Sullivan BA, Thibaud-Nissen F, Torrance J, Wagner J, Walenz BP, Wenger A, Wood JMD, Xiao C, Yan SM, Young AC, Zarate S, Surti U, McCoy RC, Dennis MY, Alexandrov IA, Gerton JL, O'Neill RJ, Timp W, Zook JM, Schatz MC, Eichler EE, Miga KH, Phillippy AM. Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31. PMID: 35357919.

A complete reference genome improves analysis of human genetic variation. Aganezov S, Yan SM, Soto DC, Kirsche M, Zarate S, Avdeyev P, Taylor DJ, Shafin K, Shumate A, Xiao C, Wagner J, McDaniel J, Olson ND, Sauria MEG, Vollger MR, Rhie A, Meredith M, Martin S, Lee J, Koren S, Rosenfeld JA, Paten B, Layer R, Chin CS, Sedlazeck FJ, Hansen NF, Miller DE, Phillippy AM, Miga KH, McCoy RC, Dennis MY, Zook JM, Schatz MC. Science. 2022 Apr;376(6588):eabl3533. doi: 10.1126/science.abl3533. Epub 2022 Apr 1. PMID: 35357935.

Funding: NIH’s National Human Genome Research Institute (NHGRI), National Institute of General Medical Sciences (NIGMS), National Institute of Mental Health (NIMH), National Cancer Institute (NCI), National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), and National Library of Medicine (NLM); National Science Foundation; National Institute of Standards and Technology; Mark Foundation for Cancer Research; Fulbright fellowship.