Georgetown Makes Its Brain Cancer Data Freely Available Worldwide

August 20, 2018
Georgetown Makes Its Brain Cancer Data Freely Available Worldwide
Source: raxion/Getty Images

A cache of brain cancer biomedical data, REMBRANDT (REpository for Molecular BRAin Neoplasia DaTa) hosted and supported by Georgetown, has been made available free to researchers worldwide. It is one of only two such large collections in the United States.

Unlike many other such datasets, REMBRANDT contains not just genomic information, but also include diagnostic (including brain scans), treatment and outcomes data.

"We want this data to be widely used by the broadest audience -- the entire biomedical research community -- so that imagination and discovery is maximized," said Yuriy Gusev, Ph.D., associate professor a faculty member of the  Innovation Center for Biomedical Informatics (ICBI) at Georgetown Lombardi . "Our common goal is to tease apart the clues hidden within this biomedical and clinical information in order to find ways that advance diagnostic and clinical outcomes for these patients."

The brain cancer dataset contains information on 671 adult patients collected from 14 contributing institutions. Already, thousands of researchers in the U.S. and internationally log on to the data site on a daily basis, and word about the resource is expected to increase its use, says Subha Madhavan, PhD, chief data scientist at Georgetown University Medical Center and director of the Innovation Center for Biomedical Informatics (ICBI) at Georgetown Lombardi.

The dataset was originally created at the National Cancer Institute (NCI) and funded by Glioma Molecular Diagnostic Initiative led by Howard Fine, M.D,. from New York Presbyterian Hospital, and Jean-Claude Zenklusen, Ph.D., from the NCI. They collected the data from 2004-2006. NCI transferred the data to Georgetown in 2015 where it is a part of the Georgetown Database of Cancer (G-DOC), a cancer data integration and sharing platform for hosting alongside other cancer studies.

The genomic data includes the specific genes within individual tumors that are either over-expressed or under-expressed as well as the number of times that gene is repeated within a chromosome.

"We inherit two copies of a gene -- one from Mom and one from Dad -- but in cancer cells, DNA segments containing important tumor suppressor or oncogenes can be entirely deleted or amplified. It isn't unusual to see a chromosome within a tumor that has 11 copies of a gene, each of which may be producing a toxic protein that helps the cancer grow uncontrollably," Madhavan said.

The data collection also includes information on RNA, which is produced by genes (DNA) and can be measured to assess genes that are dysregulated.

The data is hosted in the cloud by Amazon Web Services. It allows researchers to easily search for a gene of interest, check its expression and amplification status and link it to clinical outcomes. They can save their findings to their workspace on the G-DOC site and share with their collaborators. REMBRANDT includes genomic data from 261 samples of glioblastoma, 170 of astrocytoma, 86 tissues of oligodendroglioma, and a number that are mixed or of an unknown subclass. Outcomes data include more than 13,000 data points.