June 9, 2014

Revealing the Human Proteome

Illustration of human body with various proteins throughout.

Researchers completed a draft map of the human proteome—the set of all proteins in the human body. The accomplishment will help advance a broad range of research into human health and disease.

In 2003, the Human Genome Project created a draft map of the human genome—all the genes in the human body. Genomics has since driven many advances in medical science.

Genes control the most basic functions of the cell, including what proteins to make and when. Researchers have identified more than 20,000 protein-coding genes. However, scientific understanding of the proteome has lagged behind that of the genome, partly because of the proteome’s complexities. The relationship between genes and proteins isn’t a simple matter of one gene coding for one protein. Stretches of DNA can be read and translated into proteins in different ways. Proteins are also more difficult to sequence than genes.

Several projects are now underway to characterize the human proteome. In their new study, a team of researchers headed by Drs. Akhilesh Pandey at Johns Hopkins University and Harsha Gowda at the Institute of Bioinformatics in Bangalore, India, used an advanced form of mass spectrometry to sequence proteins and create a draft map of the human proteome. Their work was funded in part by NIH’s National Institute of General Medical Sciences (NIGMS), National Cancer Institute (NCI), and National Heart, Lung, and Blood Institute (NHLBI).

The team examined 30 normal human tissue and cell types: 17 adult tissues, 7 fetal tissues, and 6 blood cell types. Samples from 3 people per tissue type were processed through several steps, and then the protein fragments (peptides) were analyzed on high-resolution Fourier-transform mass spectrometers. The amino acid sequences were next compared to known sequences. Results were published on May 29, 2014, in Nature.

The resulting draft human proteome map includes proteins encoded by more than 17,000 genes—about 84% of the total known protein-coding genes. Among these are hundreds of proteins from regions already known to encode other proteins. The map also includes 193 novel proteins from regions previously thought to be non-coding.

“Housekeeping genes” that are expressed in all tissues and cell types have been thought to be involved in basic cellular functions. However, the resulting “housekeeping proteins” haven’t been well understood. The team detected proteins encoded by 2,350 genes across all human cells and tissues. These housekeeping proteins comprised about 75% of total protein mass. They included histones, ribosomal proteins, metabolic enzymes, and cytoskeletal proteins.

The study also revealed new insights into how genes are expressed. For instance, nearly 200 genes begin at locations other than those predicted based on genetic sequence.

“The fact that 193 of the proteins came from DNA sequences predicted to be non-coding means that we don’t fully understand how cells read DNA, because clearly those sequences do code for proteins,” Pandey says. The discovery highlights the importance not only of studying genomics to understand cellular biology, but to study and understand the protein content of cells as well.

Another draft of the human proteome using a combination of publicly available data and newly generated data from tissues and cell lines was also published in the same issue of Nature. Both teams have developed online tools to help other scientists advance their research into human health and disease.

—by Harrison Wein, Ph.D.

Related Links

References: A draft map of the human proteome. Kim MS, Pinto SM, Getnet D, Nirujogi RS, Manda SS, Chaerkady R, Madugundu AK, Kelkar DS, Isserlin R, Jain S, Thomas JK, Muthusamy B, Leal-Rojas P, Kumar P, Sahasrabuddhe NA, Balakrishnan L, Advani J, George B, Renuse S, Selvan LD, Patil AH, Nanjappa V, Radhakrishnan A, Prasad S, Subbannayya T, Raju R, Kumar M, Sreenivasamurthy SK, Marimuthu A, Sathe GJ, Chavan S, Datta KK, Subbannayya Y, Sahu A, Yelamanchi SD, Jayaram S, Rajagopalan P, Sharma J, Murthy KR, Syed N, Goel R, Khan AA, Ahmad S, Dey G, Mudgal K, Chatterjee A, Huang TC, Zhong J, Wu X, Shaw PG, Freed D, Zahari MS, Mukherjee KK, Shankar S, Mahadevan A, Lam H, Mitchell CJ, Shankar SK, Satishchandra P, Schroeder JT, Sirdeshmukh R, Maitra A, Leach SD, Drake CG, Halushka MK, Prasad TS, Hruban RH, Kerr CL, Bader GD, Iacobuzio-Donahue CA, Gowda H, Pandey A. Nature. 2014 May 29;509(7502):575-81. doi: 10.1038/nature13302. PMID: 24870542.

Funding: NIH’s National Institute of General Medical Sciences (NIGMS), National Cancer Institute (NCI), and National Heart, Lung, and Blood Institute (NHLBI); the Sol Goldman Pancreatic Cancer Research Center; India's Council of Scientific and Industrial Research; and Wellcome Trust/DBT India Alliance.