Decoding the Dark Genome

March 23, 2017
Decoding the Dark Genome
Nadav Ahituv, Ph.D., (right) in his UCSF laboratory consulting with postdoc Fumitaka Inoue, Ph.D.

Diana Kwon, Contributing Editor

In 2003, the Encylopedia of DNA Elements (ENCODE) project was launched with hopes of creating a comprehensive catalog of all the coding and noncoding elements in the human genome. This February, the National Human Genome Research Institute (NHGRI), a part of the National Institutes of Health, announced that they would provide the funds to support ENCODE for four more years. All new data generated by the newest phases of the project will be released into public databases and will be available through ENCODE’s Data Coordination Center.

Over a decade ago, when the Human Genome Project was complete, researchers realized that despite being able to read all three billion base pairs in the human genome, they understood very little about how the genome functioned. Since only 2% of the genome coded for proteins, the rest was thought to be nonfunctional DNA. However, scientists quickly realized that these noncoding regions contained elements that played a crucial role in gene expression.

“The ENCODE project was launched with the goal to actually be able to functionally annotate the human genome—the rest of the 98% that we call ‘dark matter,’” said Yin Shen, Ph.D., a professor in the department of neurology at the University of California, San Francisco (UCSF), and one of the grant recipients. 

“I think now [there are] a lot of tools being developed, particularly in recent years, that allow us to start dissecting the function of the dark matter of the genome,” Dr. Shen said. According to Dr. Shen, these include massive parallel reporter assays, CRISPRCas9, and 3D genome structure assays such as HiC (a method for investigating the higher-order DNA folding, a crucial component of gene regulation). 

ENCODE’s expanded efforts will include five characterization centers at four institutions—UCSF, Stanford University, Cornell University, and the Lawrence Berkeley National
Laboratory—where researchers will use some of these tools to shed light on how the regulatory elements in the genome influence gene expression and cell function. 

Shen, along with her colleague Bing Ren, Ph.D, at the University of California, San Diego, will use a high-throughput CRISPRCas9 method to conduct large-scale functional validation screens of the regulatory sequences in the mammalian genome. “The hope is that by interrogating tens of thousands of regulatory sequences, we’ll have enough power to discover the features that can be used to predict the functional regulatory DNA elements in the genome and as a result we will gain better abilities of reading the grammar in our genetic blueprint,” Dr. Shen noted. 

Another grant went to Nadav Ahituv, Ph.D., a professor in the department of bioengineering and therapeutic sciences at UCSF and his collaborator, Jay Shendure, M.D., Ph.D., a professor of genome sciences at the University of Washington (UW). With a previous ENCODE grant, Dr. Ahituv’s and Dr. Shendure’s labs developed a massive parallel reporter assay that could test thousands of sequences for regulatory activity in one experiment.

Using the new NHGRI funds, the UW researchers plan to use this technique to characterize 100,000 sequences and mutate 10,000 sequences with CRISPR-Cas9 to observe the resulting effects on functional output. 

“By carrying out these mass-scale sequence characterizations, we hope we can understand how mutations in these regulatory elements lead to human disease,” Dr. Ahituv said. “We see that more and more diseases are associated with noncoding regions—if you look at the genome-wide association studies, 90% of the associations are noncoding.”

Other researchers, such as William Greenleaf, Ph.D., and Michael Bassik, Ph.D., genetics professors at Stanford University, are interested in uncovering the genome’s physical regulatory landscape. “The genome is about two meters long, and it’s stored in a nucleus that is about five microns in diameter, so it’s a spectacular topological problem,” Dr. Greenleaf explained. “So the solution to some of that is the portion of the genome that is important to cell function remains accessible and able to be bound by protein factors, and regions that are not being used are folded up and sequestered away.”

Dr. Greenleaf’s and Dr. Bassik’s groups will be using a variety of methods to characterize the regulatory elements of the genome. “The first steps were to map the regulatory elements [in the genome],” Dr. Greenleaf said. “Now we’re interested in trying to perturb these elements that have been identified and to link them back to a biological phenotype, trying to understand exactly what they do in a mechanistic way.” 

In addition to characterization centers, the NHGRI also awarded funds to mapping centers, data coordinating centers, data analysis centers, and computational analysis centers at various institutions across the U.S. 

Along with revealing the secrets of the dark genome, the new insights gained from these collective efforts may also help develop novel treatments for disease. “Having a catalog of regulatory elements…[is] also a very important therapeutic tool because these could be targeted,” Dr. Ahituv said.

To see more articles from the March/April issue of Cliniclal OMICs click here.