Seven Bridges said today it has been selected by the NIH to lead a public-private consortium that will develop the pilot phase of a new data ecosystem for the agency.
The Data Commons Pilot Phase is NIH’s latest effort to accelerate biomedical research through informatics, with the goal of advancing the agency’s vision for a virtual biomedical data discovery and computing environment.
The NIH says its Data Commons Pilot Phase will apply the FAIR (Findable, Accessible, Interoperable and Reusable) Guiding Principles for scientific data management and stewardship, as published last year in the journal Scientific Data.
Seven Bridges will lead a team that includes Repositive, a UK-based software company developing tools to improve access to genomic research data; Elsevier, the informatics analytics business specializing in science and health; and the Boston Veterans Affairs Research Institute, the creator of the Million Veteran Program, the world’s largest genomic database.
The partners have formed public-private consortium called FAIR4CURES, which plans to work within the overall Data Commons pilot to build a full-stack solution designed to unify data from a variety of research environments into a single ecosystem that advances data discovery, access, and computation.
“The NIH Data Commons promises to transform the way public biomedical data is stored and analyzed,” Seven Bridges CEO Brandi Davis-Dusenbery, Ph.D., said in a statement. “An effort of this scale has never been tried before and its focus on interoperable data accessibility answers the call to break open data siloes, setting new standards for healthcare research.”
Seven Bridges, a developer of biomedical data analysis tools, is partnering with University of Chicago to curate datasets for the Blood Profiling Atlas, a pilot project of the “Cancer Moonshot” launched last year. The Atlas is an open database for liquid biopsy data aimed at advancing development of blood profiling diagnostic technologies.
For the Data Commons Pilot Phase, Seven Bridges said it will use its existing cloud infrastructure for biomedical data analysis, which includes Amazon Web Services (AWS), Google, and local compute storage solutions. Seven Bridges also plans to accelerate collaborative research and open source development by continuing to build interoperability standards, such as Common Workflow Language.
In addition, Seven Bridges said, it will also create interoperable APIs to connect biomedical data from the Gabriella Miller Kids First Data Center and Cancer Genomics Cloud—one of three pilot systems funded by the NIH’s National Cancer Institute to explore colocalizing massive genomics datasets alongside secure and scalable computational resources to analyze them—to additional NIH-funded datasets that include:
- Trans-Omics for Precision Medicine (TOPMed), which collects whole-genome sequencing and other -omics data to improve the understanding of heart, lung, blood, and sleep disorders and advance precision medicine.
- Genotype-Tissue Expression (GTEx), designed to offer insights into the mechanisms of gene regulation by studying human gene expression and regulation in multiple tissues from health individuals; exploring disease-related perturbations in a variety of human diseases; and examining sexual dimorphisms in gene expression and regulation in multiple tissues.
- Model Organism Databases (MODs), designed to enable researchers worldwide to uncover basic, conserved biological mechanisms for organisms such as yeast, worm, fly, fish, and mouse.
Other members of the FAIR4CURES consortium will contribute access to petabytes of additional data, including more than one million indexed datasets from the Repositive Platform, Elsevier’s Mendeley data hub, and the VA’s GenHub Ecosystem.
The FAIR4CURES consortium has been funded $304,253 under NIH grant No. 1OT3OD025463-01.