The Peaks and Valleys: Metabolomics Databases Abound, But a Lack of Data Standards Presents Challenges

November 2, 2018
The Peaks and Valleys: Metabolomics Databases Abound, But a Lack of Data Standards Presents Challenges
Stephen Barnes, Ph.D., and colleague Janusz Kabarowski, Ph.D., in the UAB Targeted Metabolomics and Proteomics Laboratory.

Christina Bennett, Contributing Editor

A stubborn, mountainous peak kept appearing in the lab’s mass spectrometry analyses. “We had absolutely no idea what it was,” recalled Stephen Barnes, Ph.D., director of the Targeted Metabolomics and Proteomics Laboratory at the University of Alabama at Birmingham. He suspected a contaminant. At the time in 2010, however, small molecule databases were quite thin for scientists running metabolomics experiments. So his lab turned to America’s favorite search engine, Google. After some trial and error, the lab entered a highly accurate mass of the unknown molecule into the search field and got a hit, a journal article from Richard Caprioli’s lab at Vanderbilt University. The identity of the mystery peak was now known: dimethyl-octadecyl ammonium chloride, a molecule frequently found in cleaning products.

“I didn’t go to databases because at that point it wasn’t an obvious thing to do,” Barnes said.

Building a Treasure Trove

Today, that Achilles heel for metabolomics researchers now longer exist, as databases that aid in the identification of compounds for metabolomics research have become larger and more widely used. The databases come in many flavors: paywalled, freely accessible, downloadable, cloud-based, limited user input, full user input, and so on. One database that has grown to become one of the largest is METLIN, a spectral database from Scripps Research that is freely available and lives in the cloud, making it accessible to virtually any researchers and requiring a relatively simple setup.

“The METLIN database is actually incredibly unique right now in terms of its size,” said Gary Siuzdak, Ph.D., senior director of the Scripps Center for Metabolomics and co-developer of METLIN.

First launched online in 2005, METLIN has grown rapidly over the course of the last year from about 15,000 compounds to 150,000 compounds—and spanning more than 350 chemical classes—for which relevant fragmentation information is available. “This has been the result of a lot of things coming together over the last year to allow us to perform high-throughput analyses on a variety of different types of molecules that we’ve been able to get our hands on,” Siuzdak noted. By comparison, he said, the database from the government-funded National Institute of Standards and Technology, or NIST, is one order smaller, in the tens of thousands.

Although METLIN may be one of the largest available metabolomics databases, it is far from complete. “We don’t have any comprehensive spectral databases out there,” said Lloyd Sumner, Ph.D., director of the Metabolomics Center at the University of Missouri. “We don’t even know what the size of the metabolome is.”

 

Click here to access the rest of this article.