Twelve awards totaling $9 million in Fiscal Year 2017 will launch a National Institutes of Health Data Commons Pilot Phase. A data commons is a shared virtual space where scientists can work with the digital objects of biomedical research, such as data and analytical tools. The NIH Data Commons will be implemented in a four-year pilot phase to explore the feasibility and best practices for making digital objects available through collaborative platforms. This will be done on public clouds, which are virtual spaces where service providers make resources, such as applications and storage, available over the internet. The goal of the NIH Data Commons Pilot Phase is to accelerate biomedical discoveries by making biomedical research data Findable, Accessible, Interoperable, and Reusable (FAIR) for more researchers.
“Harvesting the wealth of information in biomedical data will advance our understanding of human health and disease,” said NIH Director Francis S. Collins, M.D., Ph.D. “However, poor data accessibility is a major barrier to translating data into understanding. The NIH Data Commons Pilot Phase is an important effort to remove that barrier.”
“The NIH Data Commons Pilot Phase will create new opportunities for research not feasible before,” said NIH Data Commons Pilot Phase Program Manager, Vivien Bonazzi, Ph.D. “Making biomedical data sets accessible and connected at an unprecedented scale will lead to creative new ways to combine, analyze, and ask questions of the data to generate new knowledge.”
The recipients of the 12 awards will form the nucleus of an NIH Data Commons Pilot Phase Consortium in which researchers will start developing the key capabilities needed to make an NIH Data Commons a reality. These key capabilities, which were identified by NIH, collectively represent the principles, policies, processes, and architectures of a data commons for biomedical research data. Key capabilities include making data transparent and interoperable, safe-guarding patient data, and getting community buy-in for data standards.
Three NIH-funded data sets will serve as test cases for the NIH Data Commons Pilot Phase. The test cases include data sets from the Genotype-Tissue Expression and the Trans-Omics for Precision Medicine initiatives, as well as the Alliance of Genome Resources(link is external), a consortium of Model Organism Databases established in late 2016. These data sets were chosen based on their value to users in the biomedical research community, the diversity of the data they contain, and their coverage of both basic and clinical research. While just three datasets will be used at the outset of the project, it is envisioned the NIH Data Commons efforts will expand to include other data resources once the pilot phase has achieved its primary objectives.
NIH has acquired support from a Federally Funded Research and Development Center, the MITRE Corporation, to assist in establishing new NIH sustainable infrastructure for data science (people, processes, technologies). The MITRE Corporation will provide a broad range of support services for the NIH Data Commons Pilot Phase including innovative approaches to assure cost-effective cloud-based computing and storage for scientific data; analyses related to usage, cost, and comparative business models; and, other considerations to assure long-term viability of NIH data science efforts.
The trans-NIH Data Commons Pilot Phase receives funding from multiple NIH Institutes and Centers and is managed by the NIH Common Fund within in the NIH Office of the Director. The Common Fund; the National Heart, Lung, and Blood Institute; and the National Human Genome Research Institute are the lead NIH entities involved in management of the NIH Data Commons Pilot Phase.