News Release

Sunday, November 12, 2006

NIH Launches dbGaP, a Database of Genome Wide Association Studies

The National Library of Medicine (NLM), part of the National Institutes of Health (NIH), announces the introduction of dbGaP, a new database designed to archive and distribute data from genome wide association (GWA) studies. GWA studies explore the association between specific genes (genotype information) and observable traits, such as blood pressure and weight, or the presence or absence of a disease or condition (phenotype information). Connecting phenotype and genotype data provides information about the genes that may be involved in a disease process or condition, which can be critical for better understanding the disease and for developing new diagnostic methods and treatments.

dbGaP, the database of Genotype and Phenotype, will for the first time provide a central location for interested parties to see all study documentation and to view summaries of the measured variables in an organized and searchable web format. The database will also provide pre-computed analyses of the level of statistical association between genes and selected phenotypes. Genotype data are obtained by using high-throughput genotyping arrays to test subjects’ DNA for single nucleotide polymorphisms (SNPs), areas of the genome that have been found to vary among humans.

The database was developed and will be managed by the National Center for Biotechnology Information (, a division of NLM. dbGaP is located at the website

The initial release of dbGaP contains data on two studies: the Age-Related Eye Diseases Study (AREDS), a 600-subject, multicenter, case-controlled, prospective study of the clinical course of age-related macular degeneration and age-related cataracts that was supported by the National Eye Institute (; and the National Institute of Neurological Disorders and Stroke ( Parkinsonism Study, a case-controlled study that gathered DNA, cell line samples and detailed phenotypic data on 2,573 subjects. NEI and NINDS worked closely with NCBI in placing data from the two studies in dbGaP.

"The availability of AREDS data in this database, which can be accessed free of charge, signals a whole new way of conducting vision research," said Paul Sieving, M.D., Ph.D., director of NEI. "Having this information widely available will help researchers better understand gene-based eye diseases, will likely speed development of effective therapies, and, thereby, will prove to be a worthwhile investment for the taxpayers who funded this important medical research."

Danilo Tagle, Ph.D., a program director for NINDS's neurogenetics program, commented: "The launch of dbGaP addresses the critical need for sharing of genotype and phenotype information coming from genome wide association studies. The large collection of DNA samples and well-described clinical information from these studies, and subsequent genotyping analyses, are strategic investments by the institute that will surely pay huge returns. They will continue to pay dividends as other groups access dbGaP to do meta-analyses of GWA datasets."

In order to protect research participant privacy, all studies in dbGaP will have two levels of access: open and controlled. The open-access data, which can be browsed online or downloaded from dbGaP without prior permission or authorization, generally will include all the study documents, such as the protocol and questionnaires, as well as summary data for each measured phenotype variable and for genotype results. Preauthorization will be required to gain access to the phenotype and genotype results for each individual; this individual-level data will be coded so as to protect the identity of study participants. The AREDS and NINDS individual-level data is expected to be available in several weeks, when the dbGaP authorization system is put in place.

For AREDS and the NINDS Parkinsonism Study, pre-computed analyses of the associations between phenotypic variables and genotypes will be provided in the unrestricted part of the database. The policy on providing access to pre-computed associations will be determined on a study-by-study basis by the NIH institute overseeing each study wishing to be included in dbGaP. In some cases, the pre-computed association analyses may only be provided in the controlled-access portion of the database, or it may be held in the controlled-access portion for the duration of a publication embargo and then moved to the open-access section.

"The dbGaP project marks a new milestone in data sharing," said NLM Director Donald A. B. Lindberg, M.D. "Researchers, students and the public will have access to a level of study detail that was not previously available and to genotype-phenotype associations that should provide a wealth of hypothesis-generating leads," he said. "These data will be linked to related literature in PubMed and molecular data in other NCBI databases, thereby enhancing the research process."

NCBI expects to add database enhancements and a number of additional studies over the coming year. GWA studies that will be added encompass a broad range of disease areas and study models. The studies focus on heart disease, women’s health, neurological disorders, neuropsychiatric disorders, diabetes, and environmental factors in disease. The Framingham SHARe Study, for instance, will provide data from the landmark Framingham Heart Study, which is funded by the National Heart, Lung, and Blood Institute ( Blood samples from approximately 7,000 of the study subjects are being genotyped, and that data will be linked to the numerous types of phenotype data collected in the study.

Data from the Genetic Association Information Network (GAIN), a public-private partnership, also will be added to dbGaP. The project is being led by the Foundation for NIH (FNIH), with participation and/or funding from Pfizer, Affymetrix, Perlegen Sciences, Abbott, the Broad Institute of MIT and Harvard, and NIH. Private donors have contributed $26 million to help fund GAIN, which provides for genotyping DNA samples from participants in clinical studies that were already conducted. In October, FNIH selected an initial group of six studies to fund.

Development of dbGaP involved the participation of many NIH institutes. The effort was led by NCBI's Information Engineering Branch, headed by Branch Chief James Ostell, Ph.D. "dbGaP links together the fruits of the world"s recent investment in sequencing the human genome with our decades-long investment in clinical research," Dr. Ostell said. "Correlating that information and making it widely available is a key step in providing researchers with the data they need to understand disease and attempt to develop cures. The potential scientific advances from this information represent a payoff not only for the taxpayers who financed genomic and clinical research, but for the patients who participated in clinical trials in hopes of furthering public health."

Established in 1988 as a national resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing molecular and genomic data, and disseminates biomedical information —all for the better understanding of processes affecting human health and disease. NCBI is a division of the National Library of Medicine ( at the NIH.

About the National Institutes of Health (NIH): NIH, the nation's medical research agency, includes 27 Institutes and Centers and is a component of the U.S. Department of Health and Human Services. NIH is the primary federal agency conducting and supporting basic, clinical, and translational medical research, and is investigating the causes, treatments, and cures for both common and rare diseases. For more information about NIH and its programs, visit

NIH…Turning Discovery Into Health®