Data Sharing

Data management and sharing is a critical component of INCLUDE’s goal to accelerate discovery of etiology and biologic pathways underlying conditions that co-occur with Down syndrome. To support these efforts, NIH works closely with the INCLUDE Data Coordinating Center to provide researchers with centralized, shareable resources such as the INCLUDE Data Hub. With these resources, researchers can easily collaborate and address pressing scientific and medical questions more quickly than ever before.

To learn more, please visit the INCLUDE Data Coordinating Center.

Frequently Asked Questions

What are the data sharing expectations for INCLUDE?

Data sharing is a critical component of INCLUDE’s goal to accelerate discovery of etiology and biologic pathways underlying co-occurring conditions of Down syndrome. All recipients of INCLUDE funding are expected to share data with the wider research community as rapidly as feasible in alignment with the goals of the program. Consistent with the new NIH Policy for Data Management and Sharing (https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html), all applications, regardless of the amount of direct costs requested for any one year, are required to include a Data Management and Sharing Plan outlining how scientific data and any accompanying metadata will be managed and shared. This includes all data generated by a study, not just data used to support scholarly publication. The plan should describe data types, file formats, submission timelines, and standards used in collecting or processing the data. It is expected that all de-identified human data generated by INCLUDE-funded projects will be submitted to NIH-designated repositories in coordination with the INCLUDE Data Coordinating Center (DCC).

What are the NIH-designated data repositories for INCLUDE?

Unless already slated for deposition and sharing through a specific NIH-designated repository, all de-identified human data will be submitted and shared through the INCLUDE Portal, the entry point to the INCLUDE Data Hub (portal.includedcc.org). All researchers generating de-identified human-derived data are expected to share the data in coordination with the INCLUDE Data Coordinating Center (DCC; www.includedcc.org), which will facilitate deposition to the appropriate NIH-designated data repository. The INCLUDE DCC seeks to maximize the FAIRness (Findability, Accessibility, Interoperability, and Reusability) of all INCLUDE-relevant data through searchability in the INCLUDE Portal. The Data Hub and its associated data harmonization and cloud-based platform empower diverse research communities to come together within a common Down syndrome resource while harnessing robust interoperability standards.

Other repositories relevant to INCLUDE:

For the NIH data repository list, visit: Repositories for Sharing Scientific Data

What is the role of the INCLUDE Data Coordinating Center (DCC) and how will data submitted to the INCLUDE Data Hub be shared?

  • The INCLUDE DCC has launched the INCLUDE Data Hub to facilitate data submission, harmonization, sharing, and interoperability of data generated by INCLUDE projects and other NIH-designated data repositories, as appropriate. To explore the data available in the INCLUDE Data Hub, visit portal.includedcc.org/.
  • The INCLUDE Data Hub’s data sharing model is based on the following set of core principles:
    • Accelerating research through broad data sharing
    • Fostering transparency and collaboration among researchers and other community members
    • Maximizing data availability and searchability through indexing and visualizations in the INCLUDE Portal
    • Managing sensitive data according to participant consent and existing governance structures where appropriate
  • The INCLUDE DCC intends to make all INCLUDE data Findable, Accessible, Interoperable and Reusable (FAIR) directly through the INCLUDE Portal, the primary entry point to the INCLUDE Data Hub. However, access to some datasets may require additional approvals, for example NIH Data Access Committees (via dbGaP) for individual-level genomic datasets or consortia approvals (see below).
  • INCLUDE data will be shared as rapidly as feasible and in line with the Final NIH Policy for Data Management and Sharing (NOT-OD-21-013) and the goals of the NIH INCLUDE Project.
  • For any datasets submitted to the INCLUDE Data Hub that require controlled access (e.g., genomic data), any consent-based limitations on data use will be documented using the Institutional Certification and access will be managed through dbGaP.

For questions about submitting and sharing data through the INCLUDE Data Hub, contact: info@includedcc.org

Do all INCLUDE applications require a Data Management and Sharing Plan?

Yes, all applications for projects generating scientific data, regardless of the amount of direct costs requested for any one year, should include a Data Management and Sharing Plan. The Data Management and Sharing Plan will be considered by program staff as award decisions are being made as appropriate and consistent with achieving the goals of the program as well as the NIH Policy for Data Management and Sharing.

Applicants are also encouraged to describe plans for communicating and disseminating findings and results to the public through open access publications or other methods.

How should I indicate how my data will be shared?

The Data Management and Sharing Plan should make it clear how the applicant will manage and share data and / or software in accordance with the notice of funding opportunity and the NIH Policy for Data Management and Sharing. It is expected that data (including resultant, raw, derived, aggregated and summary data) created or used with support from INCLUDE funding opportunities will be provided to the INCLUDE Data Hub or other NIH-designated repositories in coordination with the INCLUDE Data Coordinating Center to be shared with the wider scientific community that would enable other researchers to replicate and build on the analyses for future research efforts.

In the Data Management and Sharing Plan, applicants should describe the anticipated timeline, standards, formats, and methods of sharing the data with the broader research community and in alignment with the Elements of an NIH Data Management and Sharing Plan (https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-014.html).

Note that genomic data are subject to the NIH Genomic Data Sharing Policy (NOT-OD-14-124) and require the submission of an Institutional Certification and registration in dbGaP.  INCLUDE expects sharing of all genomic data, including from small-scale studies, and release of sequence reads, variant call files, and associated phenotypic data no later than 6-months after the data have been generated.

Examples of data types to be submitted could include, but are not limited to:

  • Deep phenotypic data from CRFs or surveys (e.g., REDCap)
  • Clinical data extracted from electronic health records
  • Structural or functional neuroimaging files
  • Histological images
  • Ontology and annotation files resulting from phenotypic analysis
  • Network / pathway analysis result files
  • VCFs from multi-sample joint variant calls / comparisons
  • Summary statistics on variant data or other “genomic summary results”
  • Gene lists
  • GWAS results
  • Manhattan plot files / heatmaps / graph plots
  • Any form of derived data from genomic / statistical analysis

For additional guidance, see:

Should I share software and tools created with INCLUDE funds?

Where applicable, applicants should describe how they plan to share any analytical tools, pipelines, workflows, or software used or created through open access channels (e.g., public GitHub repositories). These plans should be provided separate in the Resource Sharing Plan, which is separate from the Data Management and Sharing Plan.  For more information on software sharing best practices visit: https://datascience.nih.gov/tools-and-analytics/best-practices-for-sharing-research-software-faq.

What resources are available for developing consents that address data sharing?

What standards should be used for collecting or processing INCLUDE data?

To maximize comparisons across datasets or studies, facilitate data and platform integration, and foster collaboration, INCLUDE researchers are strongly encouraged to use standards and resources developed by and used within the INCLUDE program and other existing standards, where applicable:

  • Applicants are encouraged to ensure that data collected by the study conform to Findable, Accessible, Interoperable, and Reusable (FAIR) principles.
  • NIH encourages researchers to explore the use of the HL7 FHIR® (Fast Healthcare Interoperability Resources) standard to capture, integrate, and exchange clinical data for research purposes and to enhance capabilities to share research data (NOT-OD-19-122). The FHIR® standard may be particularly useful in facilitating the flow of data with EHR-based datasets, tools, and applications.
  • NIH encourages clinical research programs and researchers to adopt and use the standardized set of data classes, data elements, and associated vocabulary standards specified in the United States Core Data for Interoperability (USCDI)(link is external) standards, as they are applicable (NOT-OD-20-146). Use of the USCDI can complement the FHIR® standard and enable researchers to leverage structured EHR data for research and enable discovery.
  • NIH encourages the use of data standards including common data elements, such as those available through the PhenX Toolkit (www.phenxtoolkit.org (link is external)) and the NIH CDE repository (cde.nlm.nih.gov), terminologies and ontologies such as Mondo Disease Ontology (mondo.monarchinitiative.org), Human Phenotype Ontology (hpo.jax.org (link is external)), and common data models such as the Observational Medical Outcomes Partnership (OMOP; ohdsi.org).
  • To learn more about health data standards visit: https://www.nlm.nih.gov/healthit/index.html.

What biospecimen repositories does INCLUDE use?

Are funds available to support data preparation and submission to the INCLUDE Data Hub? 

Per the Final NIH Policy for Data Management and Sharing, costs associated with data management and data sharing are allowable under the budget for the proposed project. Data sharing is critical to INCLUDE goals, and it is important that researchers ensure they appropriately budget to support their DMS activities throughout the life of the project.

See Supplemental Information to the NIH Policy for Data Management and Sharing: Allowable Costs for Data Management and Sharing (https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-015.html), which states that reasonable, allowable costs may be included in NIH budget requests such as those associated with curating data and developing supporting documentation for transmission to and storage at a selected repository for long-term preservation and access.

The INCLUDE Data Hub may accept data donations from groups not funded by INCLUDE with approval from the NIH INCLUDE DCC Steering Committee. Researchers planning to submit data who do not have NIH funding to prepare the data submission may seek funding from opportunities such as:

How do I access data in the INCLUDE Data Hub?

Explore, find, and access data in the INCLUDE Portal, the primary entry point to the INCLUDE Data Hub: portal.includedcc.org/, which requires agreeing to terms and conditions and creating a user account. Some datasets are accessible in the INCLUDE Portal, while other datasets may require additional approvals (e.g., dbGaP) and terms and conditions of access and use.

To apply to access datasets that require dbGaP approval, submit a Data Access Request (DAR) here: https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login.

For Tips for Preparing a Successful Data Access Request, visit: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?document_name=GeneralAAInstructions.pdf. Secondary users and their supporting Institution’s Signing Official and IT Director must agree to the conditions of the Data Use Certification (sample agreement: https://osp.od.nih.gov/wp-content/uploads/Model_DUC.pdf), including any data use limitations (DULs) or DUL modifiers pertinent to the requested dataset.

All internal and external collaborators must be listed on the application, except for technicians, graduate students, and postdoctoral fellows who are under the requestor’s direct supervision. External collaborators from other institutions are required to submit separate data access requests (DARs) for approved access to the same dataset(s). The DAR(s) will be reviewed by an NIH Data Access Committee.

Are data shared through the INCLUDE Data Hub protected by a Certificate of Confidentiality?

Certificates of Confidentiality protect the privacy of research participants by prohibiting disclosure of protected information for non-research purposes to anyone not connected with the research except in specific situations.

Data that are stored in and shared through the INCLUDE Data Hub are protected by a Certificate of Confidentiality. Therefore, users of the INCLUDE Data Hub, whether funded by the NIH, who access a copy of information protected by a Certificate held by the INCLUDE Data Hub, are also subject to the requirements of the Certificate of Confidentiality and subsection 301(d) of the Public Health Service Act.

Under Section 301(d) of the Public Health Service Act and the NIH Policy for Issuing Certificates of Confidentiality, recipients of a Certificate of Confidentiality shall not:

  • Disclose or provide, in any Federal, State, or local civil, criminal, administrative, legislative, or other proceeding, the name of such individual or any such information, document, or biospecimen that contains identifiable, sensitive information about the individual and that was created or compiled for purposes of the research, unless such disclosure or use is made with the consent of the individual to whom the information, document, or biospecimen pertains; or
  • Disclose or provide to any other person not connected with the research the name of such an individual or any information, document, or biospecimen that contains identifiable, sensitive information about such an individual and that was created or compiled for purposes of the research.

Disclosure is permitted only when:

  • Required by Federal, State, or local laws (e.g., as required by the Federal Food, Drug, and Cosmetic Act, or state laws requiring the reporting of communicable diseases to State and local health departments), excluding instances of disclosure in any Federal, State, or local civil, criminal, administrative, legislative, or other proceeding;
  • Necessary for the medical treatment of the individual to whom the information, document, or biospecimen pertains and made with the consent of such individual;
  • Made with the consent of the individual to whom the information, document, or biospecimen pertains; or
  • Made for the purposes of other scientific research that is compliant with applicable Federal regulations governing the protection of human subjects in research.

For more information see: https://grants.nih.gov/grants/guide/notice-files/NOT-OD-17-109.html

This page last reviewed on March 8, 2023