New Association Aims to Advance GATK for Precision Medicine in China

June 29, 2017
New Association Aims to Advance GATK for Precision Medicine in China
The GATK Chinese Association for Precision Medicine will seek to broaden Chinese usage of the most widely used genome analysis software. [© kentoh/Fotolia]

Intel, Inspur, BGI (Beijing Genomics Institute), and Alibaba Cloud—the cloud computing arm of Alibaba Group—have launched a group focused on advancing the adoption and use of Genome Analysis Toolkit (GATK) tools among Chinese users for precision medicine.

The GATK Chinese Association for Precision Medicine will seek to broaden usage of GATK, the most widely used genome analysis software. Developed in the Data Sciences Platform at the Broad Institute of MIT and Harvard, GATK offers a variety of tools with a primary emphasis on variant discovery and genotyping.

According to the Broad, GATK has become the industry standard for identifying single-nucleotide polymorphisms (SNPs) and indels in germline DNA and RNA-seq data. The toolkit’s scope is also expanding to include somatic variant calling tools, and to tackle copy number variation (CNV) and structural variation (SV).

In addition to the variant callers themselves, GATK also includes utilities designed to perform related tasks such as processing and quality control of high-throughput sequencing data.

GATK tools were primarily designed to process exomes and whole genomes generated with Illumina sequencing technology, but can be adapted to handle other technologies and experimental designs. While originally developed for human genetics, GATK has since evolved to handle genome data from any organism.

Last month, the Broad announced plans to beta-release version 4 of GATK (GATK4) under an open-source software license. GATK4 uses new architecture designed to allow significant streamlining of individual tools and support for performance-enhancing technologies such as Apache SparkTM. According to the Broad, the new framework is intended to bring improvements to parallelization, capitalizing on cloud deployment and making the process of analyzing vast amounts of genomic data easier, faster, and more efficient.

Also in May, the Broad and Intel announced that they had developed a breakthrough architecture, the Broad-Intel Genomics Stack (BIGstack), that they said would be made available to run GATK4. The partners said BIGstack can run GATK4 up to five times faster than previous versions of the Toolkit through the use of Intel’s CPUs, Omni-Path Fabric, and solid state drives.

BIGstack emerged from the 5-year, $25 million collaboration launched by the Broad and Intel in November, and aimed at improving researchers' ability to analyze massive amounts of genomic data from diverse sources. The Institute and Intel said they hoped to enable researchers worldwide to run more data-intensive studies and generate robust results more quickly by accessing data that may have previously been unavailable to them.

Intel said the Association will be an important segment of its partnership program for precision medicine. Intel’s genome database is embedded in Inspur's Genomics Appliance, which is designed to perform data analysis for the entire genome and allow additional resequencing input with second-generation analyzers.

BGI has said it will adopt the most current GATK tools, including the Broad and Intel optimizations, in a move toward global alignment of standards in the rapidly growing genomics community.

The four association members announced the formation of the group in Beijing at Intel's Bio IT Forum.