News Release

Thursday, June 11, 2020

NIH researchers identify key genomic features that could differentiate SARS-CoV-2 from other coronaviruses that cause less severe disease

Illustration of COVID-19The full genomes of all human coronaviruses were aligned to identify regions (red) that might code for lethal differences in the virus that causes COVID-19 as well as SARS and MERS. These differences could be targets for testing or treatments.Donny Bliss, NLM

A team of researchers from the National Library of Medicine (NLM), part of the National Institutes of Health, identified genomic features of SARS-CoV-2, the virus that causes COVID-19, and other high-fatality coronaviruses that distinguish them from other members of the coronavirus family. This research could be a crucial step in helping scientists develop approaches to predict, by genome analysis alone, the severity of future coronavirus disease outbreaks and detect animal coronaviruses that have the potential to infect humans. The findings were published this week in the Proceedings of the National Academy of Sciences.

COVID-19, an unprecedented public health emergency, has now claimed more than 380,000 lives worldwide. This crisis prompts an urgent need to understand the evolutionary history and genomic features that contribute to the rampant spread of SARS-CoV-2.

“In this work, we set out to identify genomic features unique to those coronaviruses that cause severe disease in humans,” said Dr. Eugene Koonin, an NIH Distinguished Investigator in the intramural research program of NLM’s National Center for Biotechnology Information, and the lead author of the study. “We were able to identify several features that are not found in less virulent coronaviruses and that could be relevant for pathogenicity in humans. The actual demonstration of the relevance of these findings will come from direct experiments that are currently getting under way.”

Using integrated comparative genomics and machine learning techniques, the researchers compared the genome of the SARS-CoV-2 virus against the genomes of other members of the coronavirus family and identified protein features that are unique to SARS-CoV-2 and two other coronavirus strains with high fatality rates, SARS-CoV and MERS-CoV. The identified features correspond with the high fatality rate of these coronaviruses, as well as their ability to move from animal to human hosts.

These features include insertions of specific stretches of amino acids into two virus proteins, the nucleocapsid and the spike. These features are found in all three high-fatality coronaviruses and their closest relatives that infect animals, such as bats, but not in four other human coronaviruses that cause non-fatal disease. In particular, the insertions in the spike protein are predicted, from protein structure analysis, to facilitate the recognition of the coronavirus receptors on human cells and the subsequent penetration of the virus into those cells. Finding these features in animal coronavirus isolates could predict the jump to humans and the severity of disease caused by such isolates.

“This innovative research is critical to improve researchers’ understanding of SARS-CoV-2 and aid in the response to COVID-19,” said NLM Director Patricia Flatley Brennan, R.N., Ph.D. “Predictions made through this analysis can inform possible targets for diagnostics and interventions.”

This press release describes a basic research finding. Basic research increases our understanding of human behavior and biology, which is foundational to advancing new and better ways to prevent, diagnose, and treat disease. Science is an unpredictable and incremental process — each research advance builds on past discoveries, often in unexpected ways. Most clinical advances would not be possible without the knowledge of fundamental basic research.

NLM, part of the NIH, is a leader in research in biomedical informatics and data science, and the world’s largest biomedical library. NLM conducts and supports research in methods for recording, storing, retrieving, preserving, and communicating health information. It creates resources and tools that are used billions of times each year by millions of people to access and analyze molecular biology, biotechnology, toxicology, environmental health, and health services information. Additional information is available at

About the National Institutes of Health (NIH): NIH, the nation's medical research agency, includes 27 Institutes and Centers and is a component of the U.S. Department of Health and Human Services. NIH is the primary federal agency conducting and supporting basic, clinical, and translational medical research, and is investigating the causes, treatments, and cures for both common and rare diseases. For more information about NIH and its programs, visit

NIH…Turning Discovery Into Health®