RNA Gets Big-Data-Enabled Structural Analysis and Annotation Tool

May 21, 2018
RNA Gets Big-Data-Enabled Structural Analysis and Annotation Tool
An example of the annotation provided by a new software tool for RNA secondary structure researchers. [Source: David Hendrix/OSU College of Science]

DNA has often overshadowed its single-stranded brethren in many aspects of molecular biology, but as technology continually improves, allowing researchers to manipulate this complex molecule, we are beginning to understand just how vital RNA is on the daily cellular level, as well as its impact on disease. Now, researchers at Oregon State University (OSU) have developed a new computer program that allows for greater in-depth analysis of RNA molecules and represents a key step toward better understanding the connections between mutant genetic material and disease. The new program, called bpRNA, is a big-data annotation tool for secondary structures in RNAs. Findings from the new study were published recently in Nucleic Acid Research, in an article entitled “bpRNA: Large-Scale Automated Annotation and Analysis of RNA Secondary Structure.”

Senior study investigator David Hendrix, Ph.D., assistant professor at OSU, stated that the new annotation tool is “capable of parsing RNA structures, including complex pseudoknot-containing RNAs, so you end up with an objective, precise, easily interpretable description of all loops, stems, and pseudoknots. You also get the positions, sequence, and flanking base pairs of each structural feature, which enables us to study RNA structure en masse at a large scale."

RNA works in tandem with DNA to produce the proteins needed throughout the body. DNA contains a person's hereditary information, and RNA delivers the information's coded instructions to the protein-manufacturing sites within the cells. Many RNA molecules do not encode a protein, and these are known as noncoding RNAs.

"There are plenty of examples of disease-associated mutations in noncoding RNAs that probably affect their structure, and in order to statistically analyze why those mutations are linked to disease, we have to automate the analysis of RNA structure," noted Dr. Hendrix. "RNA is one of the fundamental, essential molecules for life, and we need to understand RNA's structure to understand how RNA functions."

Secondary structures are the base-pairing interactions within a single nucleic acid polymer or between two polymers. DNA has mainly fully base-paired double helices, but RNA is single stranded and can form complicated interactions.

Dr. Hendrix remarked that the bpRNA program features the largest and most detailed database to date of secondary RNA structures.

"To be fair it's a meta-database, but our special sauce is the tool to annotate everything," Dr. Hendrix stated. "Before, there was no way of saying where all the structural features were in an automated way. We provide a color-coded map of where everything is. These annotations will enable us to identify statistical trends that may shed light on RNA structure formation and may open the door for machine-learning algorithms to predict secondary RNA structure in ways that haven't been possible."

The authors have successfully tested the tool on more than 100,000 structures, "many of which are very complex, with lots of complex pseudoknots."

"Every day new RNAs are discovered, and researchers are making huge progress in understanding their function," Dr. Hendrix concluded. "We're starting to appreciate that the genome is full of noncoding RNAs in addition to messenger RNAs, and they're important biological molecules with big effects on human health and disease."