Data Sharing

Data management and sharing is a critical component of INCLUDE’s goal to accelerate discovery of etiology and biologic pathways underlying conditions that co-occur with Down syndrome. To support these efforts, NIH works closely with the INCLUDE Data Coordinating Center to provide researchers with centralized, shareable resources such as the INCLUDE Data Hub. With these resources, researchers can easily collaborate and address pressing scientific and medical questions more quickly than ever before.

To learn more, please visit the INCLUDE Data Coordinating Center.

Frequently Asked Questions

What are the data sharing expectations for INCLUDE?

Data sharing is a critical component of INCLUDE’s goal to accelerate discovery of etiology and biologic pathways underlying the comorbidities of Down syndrome. All recipients of INCLUDE funding are expected to share data with the wider research community as rapidly as feasible in alignment with the goals of the program. Consistent with the new NIH Policy for Data Management and Sharing (https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html), all applications, regardless of the amount of direct costs requested for any one year, are required to include a Data Management and Sharing Plan outlining how scientific data and any accompanying metadata will be managed and shared, regardless of whether the data are used to support scholarly publication. The plan should describe data types, file formats, submission timelines, and standards used in collecting or processing the data. It is expected that all de-identified human data generated by INCLUDE-funded projects will be submitted to NIH-designated repositories in coordination with the INCLUDE Data Coordinating Center (DCC).

What are the NIH-designated data repositories for INCLUDE?

Unless already slated for deposition and sharing through a specific NIH-designated repository, all de-identified human data will be submitted and shared through the INCLUDE Data Hub. All researchers generating de-identified human which will facilitate deposition to the appropriate NIH-designated data repository.The INCLUDE DCC seeks to maximize the FAIRness (Findability, Accessibility, Interoperability, and Reusability) of all INCLUDE-relevant data through searchability in the INCLUDE Portal, the primary entry point of the INCLUDE Data Hub. The Data Hub and its associated data harmonization and cloud-based platform empower diverse research communities to come together within a common Down syndrome resource while harnessing robust interoperability standards.

Other repositories relevant to INCLUDE include:

  • For autism data, the National Institute of Mental Health Data Archive (NDA) makes available human subjects data collected from hundreds of research projects across many scientific domains. NDA provides infrastructure for sharing research data, tools, methods and analyses enabling collaborative science and discovery.
  • The National Institute on Aging (NIA) hosts the AD Knowledge Portal, an NIH-designated repository and distribution site for multi-omic data from human samples, cell-based and animal models, analysis results, analytical methodology and research tools generated by multiple Alzheimer’s disease research programs and consortia supported by NIA. Data are available to qualified investigators as open or controlled access depending on the data type and data source.
  • NIBIB’s LONI Image Data Archive (IDA) is a user-friendly environment for archiving, searching, sharing, tracking and disseminating neuroimaging and related clinical data. The LONI IDA is utilized for dozens of neuroimaging research projects across North America and Europe and accommodates MRI, PET, MRA, DTI and other imaging modalities

What is the role of the INCLUDE Data Coordinating Center (DCC) and how will data submitted to the INCLUDE Data Hub be shared?

  • The INCLUDE DCC is developing the INCLUDE Data Hub to facilitate data submission, harmonization, sharing, and interoperability of data generated by INCLUDE projects and other NIH-designated data repositories, as appropriate.
  • The INCLUDE Data Hub’s data sharing model is based on the following set of core principles:
    • Accelerating research through broad data sharing
    • Fostering transparency and collaboration among researchers and other community members
    • Maximizing data availability and searchability through indexing and visualizations in the INCLUDE Portal
    • Managing sensitive data according to participant consent and existing governance structures where appropriate
  • The INCLUDE DCC intends to make all INCLUDE data Findable, Accessible, Interoperable and Reusable (FAIR) directly through the INCLUDE Portal, the primary entry point to the INCLUDE Data Hub. However, access to some datasets may require additional approvals, for example NIH Data Access Committees (via dbGaP) for individual-level genomic datasets or consortia approvals.
  • INCLUDE data will be shared as rapidly as feasible and in line with the Final NIH Policy for Data Management and Sharing (NOT-OD-21-013).
  • For any datasets submitted to the INCLUDE Data Hub that require controlled access (e.g., genomic data), the limitations on data use will be documented based on the consent using the Institutional Certification and access will be managed through dbGaP.

The INCLUDE Data Hub will be launched by March 21, 2022. For questions about submitting and sharing data through the INCLUDE Data Hub, contact: info@includedcc.org

Do all INCLUDE applications require a Data Management and Sharing Plan?

Yes, all applications, regardless of the amount of direct costs requested for any one year, should include a Data Management and Sharing Plan.The Data Management and Sharing Plan will be considered during peer review and by program staff as award decisions are being made as appropriate and consistent with achieving the goals of the program.

Applicants are also encouraged to describe plans for communicating and disseminating findings and results to the public through open access publications or other methods.

How should I indicate how my data will be shared?

The data sharing plan should make it clear how the applicant will manage and share data and / or pipelines in accordance with the FOA. It is expected that data (including resultant, raw, derived, aggregated and summary data), tools, workflows and / or pipelines created or used with support from INCLUDE FOAs will be provided to the INCLUDE Data Hub or other NIH-designated repositories in coordination with the INCLUDE Data Coordinating Center to be shared with the wider scientific community that would enable other researchers to replicate and build on the analyses for future research efforts.

In the Data Management and Sharing Plan, applicants should describe the anticipated timeline, standards, formats and methods of providing the data and other products used or created under the FOA to the INCLUDE Data Coordinating Center.

Note that genomic data are subject to the NIH Genomic Data Sharing Policy (NOT-OD-14-124) and require the submission of an Institutional Certification and registration in dbGaP. INCLUDE expects sharing of all genomic data, including from small-scale studies, and release of sequence reads, variant call files, and associated phenotypic data no later than 6-months after the data have been generated.

Examples of data types to be submitted could include, but are not limited to:

  • Deep phenotypic data from CRFs or surveys (e.g., REDCap)
  • Clinical data extracted from electronic health records
  • Structural or functional neuroimaging files
  • Histological images
  • Ontology and annotation files resulting from phenotypic analysis
  • Network / pathway analysis result files
  • VCFs from multi-sample joint variant calls / comparisons
  • Summary statistics on variant data or other “genomic summary results”
  • Gene lists
  • GWAS results
  • Manhattan plot files / heatmaps / graph plots
  • Any form of derived data from genomic / statistical analysis

Where applicable, applicants should describe how they plan to share any analytical tools, pipelines or workflows used or created through open access channels (e.g., public GitHub links)

There is no “Data Management and Sharing Plan” section of the FOA or application. Where should I include this plan?

INCLUDE applicants can include the Data Management and Sharing Plan in the Data and Resource Sharing section of the application.

Plans for sharing biospecimens and open source software, including source code, use cases and system design documentation, in appropriate repositories, should also be specified in this section, where applicable.

Is there sample language available for data sharing?

The following language can be adapted to individual applications:

“We will submit [insert data types] to the [insert INCLUDE Data Hub or other relevant NIH-designated data repository], in coordination with the INCLUDE Project Data Coordinating Center (INCLUDE DCC). We will work with the INCLUDE DCC to identify the best ways to share file formats, including [insert specific file formats], which will be shared directly through the INCLUDE Portal and / or appropriate public, NIH-designated repositories [indicate NIH-designated repository, i.e. Gene Expression Omnibus (GEO)] to ensure broad accessibility to these datasets. Datasets that are not directly shared through the INCLUDE Portal will be indexed in the Portal to facilitate searching, aggregation and visualizations.

We plan to submit the data by [insert data submission timeline]. Data will be collected and processed in accordance with [insert standards used for collecting and processing data]. Biospecimens from human genetic or non-genetic studies will be stored in [insert biospecimen repository]”

For additional guidance, see: Supplemental Information to the NIH Policy for Data Management and Sharing: Elements of an NIH Data Management and Sharing Plan

What standards should be used for collecting or processing INCLUDE data?

To maximize comparisons across datasets or studies, facilitate data and platform integration, and foster collaboration, INCLUDE researchers are strongly encouraged to use standards and resources developed by and used within the INCLUDE program and other existing standards, where applicable:

  • Applicants are encouraged to ensure that data collected by the study conform to Findable, Accessible, Interoperable, and Reusable (FAIR) principles.
  • NIH encourages researchers to explore the use of the HL7 FHIR® (Fast Healthcare Interoperability Resources) standard to capture, integrate, and exchange clinical data for research purposes and to enhance capabilities to share research data (NOT-OD-19-122). The FHIR® standard may be particularly useful in facilitating the flow of data with EHR-based datasets, tools, and applications.
  • NIH encourages clinical research programs and researchers to adopt and use the standardized set of data classes, data elements, and associated vocabulary standards specified in the United States Core Data for Interoperability (USCDI) standards, as they are applicable (NOT-OD-20-146). Use of the USCDI can complement the FHIR® standard and enable researchers to leverage structured EHR data for research and enable discovery.
  • NIH encourages the use of data standards including common data elements, such as those available through the PhenX Toolkit (www.phenxtoolkit.org) and the NIH CDE repository (cde.nlm.nih.gov), terminologies and ontologies such as Mondo Disease Ontology (mondo.monarchinitiative.org), Human Phenotype Ontology (hpo.jax.org), and common data models such as the Observational Medical Outcomes Partnership (OMOP; ohdsi.org).

What biospecimen repositories does INCLUDE use?

This page last reviewed on September 7, 2022