A pair of companies specializing in applying computing to genomics said today they will make their products available via Google Cloud Platform, through partnerships whose values were not disclosed.
Genomenon said its Cited Variants Reference (CVR) data will be available through a public dataset in BigQuery, Google Cloud’s big data and machine learning data warehouse, for genomic applications.
CVR is designed to help clinicians and researchers prioritize and scale their genomic interpretation. The dataset can be used as an evidence filter for clinical actionability in genomic analysis pipelines, based on the presence of evidence in medical literature, as well as a fast way to get insight into the literature for variant curation with direct links into the company’s Mastermind Genomic Search Engine for genome data stored in Google Cloud Platform.
The CVR dataset is also useful for researchers exploring novel, unpublished variants across patient cohort genomic data sets by looking for variants with little or no evidence in the medical literature, according to Genomenon.
Data in CVR is generated from Genomenon’s Mastermind genomic database, which contains over 4.1 million genomic variants found in medical literature—what the company calls one of the world’s most comprehensive indexes of genomic literature. Each variant is annotated with a citation count based on the number of scientific publications mentioning the variant, as well as a link into the Mastermind Genomic Search Engine, which enables users to view full search results for those articles.
Mastermind Genomic Search Engine reads titles and abstracts of over 30 million scientific medical papers published in PubMed. The full text of articles found to have genomic information is indexed in order to develop a comprehensive view of the genomic literature.
To date, Genomenon said, Mastermind has indexed the text of over 6.2 million genomic publications containing the over 4.1 million genomic variants. The variants and article citation counts for each variant are now available on Google Cloud Platform.
GPU-Based Software Solution
Separately, Parabricks said it is making available its namesake accelerated, deep learning-based product suite for primary and secondary analysis of sequencing data on the Google Cloud Platform (GCP) Marketplace.
The Parabricks graphics processing unit (GPU)-based software solution is designed to reduce the time and cost required to go from raw sequencing data to variants for a whole genome, producing variant calls in less than an hour—about 45 minutes, the company says—compared to the standard 30 hours when running the same industry standard pipeline on HPC CPU clusters.
“This kind of performance is critical for both clinical settings where patient treatment decisions need to be made in a timely fashion, along with large scale population studies where tens of thousands of genomes must be sequenced, analyzed and often re-analyzed to standardize the data set to one analysis,” Parabricks CEO Mehrzad Samadi, PhD, said in a statement. “By collaborating with GCP, we are now bringing these same benefits to the cloud.”
Parabricks said its enterprise-grade analysis suite is intended to provide 100% reproducibility and containerization of the latest secondary analysis tools used by bioinformaticians worldwide.
Parabricks introduced its solution on the GCP Marketplace at the Bio-IT World Conference & Expo, being held through Thursday, with a presentation at the Google Cloud booth. The solution will be available for a limited free trial period that ends on July 1.
“We are excited to announce a relationship that will bring the cost and performance benefits of Parabricks’ GPU-optimized analytics to Google Cloud Platform,” stated Parabricks CTO Ankit Sethia, PhD. “By incorporating the latest and greatest from accelerated GPU computing and deep learning, we can provide what a white paper reported as being 30 to 50 times faster analysis at lower costs while generating results that are equivalent to the established industry-standard tools.”
Parabricks said it supports the full GATK4 germline, somatic and variant calling pipelines, DeepVariant and population study tools.
“The Parabricks solution is widely used in in-house clusters and with the growing adoption of cloud computing in genomics, it was important for us to provide an easy to use cloud solution. Google Cloud Platform has a strong user base in the genomics market and our solution can greatly benefit that base,” Samadi added.