Genome’s Dark Matter Illuminated by CRISPR, Nanopore Sequencing

Genome’s Dark Matter Illuminated by CRISPR, Nanopore Sequencing
Rotating DNA, Genetic engineering scientific concept, blue tint, 3d rendering

Expansions of short tandem repeats (STRs) in DNA are genetic variants that have been implicated in multiple neuropsychiatric and other disorders, but which are hard to analyze using current methods. Researchers in Germany, led by a team at the Max Planck Institute for Molecular Genetics in Berlin, have now developed a method that combines CRISPR-Cas technology with nanopore sequencing and stem cells, to evaluate in detail these previously inaccessible regions of the genome. The technology could potentially help more rapidly and accurately diagnose a range of diseases.

“With the CRISPR-Cas system and our algorithms, we can scrutinize any section of the genome—especially those regions that are particularly difficult to examine using conventional methods,” said Franz-Josef Müller, PhD, at the Max Planck Institute for Molecular Genetics and the University Hospital of Schleswig-Holstein Müller, who heads the project. “We created the tools that enable every researcher to explore the dark matter of the genome.” The team reported on the new method in Nature Biotechnology, in a paper titled, “Analysis of short tandem repeat expansions and their methylation state with nanopore sequencing.”

Large parts of the genome consist of regions where short sections of DNA sequence repeat hundreds or thousands of times. However, expansions of these DNA repeats in the wrong places can have serious consequences. “The expansion of unstable genomic short tandem repeats causes more than 30 Mendelian human disorders,” the team wrote. For example, expansion of the CGG repeat in the FMR1 gene on the X chromosome causes fragile X syndrome, one of the most commonly identifiable hereditary causes of cognitive disability in humans. Expansion of a GGGGCC repeat within another gene, C9orf72, is the most frequent monogenic cause of frontotemporal dementia and amyotrophic lateral sclerosis. Yet while these expanded repeats can have dramatic consequences, they are effectively unknown territory that cannot be studied in detail, even with modern methods, the investigators acknowledged. “… their assessment remains challenging with current polymerase-based methods.”

Müller’s team has now become the first to determine the length of genomic tandem repeats in patient-derived stem cell cultures. The technology also allowed them to derive data on the epigenetic state of the repeats by scanning individual DNA molecules.

The FMR1 gene is known to be essential for normal brain development, as Müller explained. “Without the FMR1 gene, we see severe delays in development leading to varying degrees of intellectual disability or autism.” In fragile X syndrome, the cell recognizes and switches off the repetitive region of the gene by attaching methyl groups to the DNA. While these small epigenetic changes leave the underlying genetic information intact, “unfortunately, the epigenetic marks spread over to the entire gene, which is then completely shut down,” Müller further commented.

Fragile X syndrome is more severe in males than it is in females, who have two X chromosomes. The expanded repeat region is usually located on only one of the two X chromosomes, and the normal second copy of the gene is not epigenetically altered, and is able to compensate for the genetic defect. In contrast, males have only one X chromosome, and so only one copy of the FMR1 gene, and they display the full range of clinical symptoms.

Conventional sequencing methods analyze the entire genome of a patient. For their study, Müller and colleagues developed an approach for analyzing the genome of stem cells derived from patient tissue, which could be used to evaluate specific regions selectively.

“Conventional methods are limited when it comes to highly repetitive DNA sequences,” commented co-first author Björn Brändl. “Not to mention the inability to simultaneously detect the epigenetic properties of repeats.” Brändl used CRISPR-Cas genome editing technology to cut DNA segments from the genome that contained the repeat region. These segments then go through intermediate processing steps and each strand is threaded through one of a hundred tiny nanopores on the Nanopore sequencing chip. At the same time, electrically charged particles flow through the pores and generate a current. When a DNA molecule moves through one of these pores, the current varies depending on the chemical properties of the DNA. These fluctuations of the electrical signal allow a computer to reconstruct the genetic sequence and epigenetic chemical modifications. This process takes place at each pore and, thus, each strand of DNA.

Using this technology the team was able to determine the length of the repeat regions and their epigenetic signature, something that hadn’t been possible with conventional sequencing methods. The researchers discovered that the length of the repetitive region could vary to a large degree, even among the cells of a single patient. They also tested their process with patient-derived cells that contained an expanded repeat in one of the two copies of the C9orf72 gene. “We were the first to map the entire epigenetics of extended and unchanged repeat regions in a single experiment,” said Müller. Furthermore, the region of interest on the DNA molecule remained physically wholly unaltered. “We developed a unique method for the analysis of single molecules and for the darkest regions of our genome—that’s what makes this so exciting for me.”

“If we had not pre-sorted the molecules in this way, their signal would have been drowned in the noise of the rest of the genome,” added bioinformatician Pay Giesselmann, PhD, who developed an algorithm specifically for the interpretation of the electrical signals generate

The researchers proceeded step by step and first selectively enriched molecules with the repeat, analyzed the electrical signal, and then determined the length of the repeats and their epigenetic signature. [MPI f. Molecular Genetics/ Pay Gießelmann]

by the repeats. “Most algorithms fail because they do not expect the regular patterns of repetitive sequences.” While Giesselmann’s program, STRique—short tandem repeat identification, quantification, and evaluation—does not determine the genetic sequence itself, it counts the number of sequence repetitions with high precision. The researchers are making the program freely available on the internet.

 

“The CRISPR–Cas nuclease-based-target enrichment and STRique can be rapidly adapted to any other genomic region of interest, ensuring broad applicability to overcome challenges associated with the single-molecule analysis,” the authors concluded. “This allows for immediate integration of genetic and epigenetic signals associated with unstable repeat expansions or any other currently unsequenceable genomic regions in human health and disease … Notably, our method does not require any additional instruments in contrast to other previously reported enrichment strategies …” The team suggested that this type of analysis will improve diagnostic research, by improving accuracy and the resolution at which unstable repeats can be characterized, “ … while enabling efforts to gain mechanistic insights into the effects of differentiation, aging, and future therapeutic agents on STR expansions and their associated DNA methylation.”

Müller can foresee great potential for the technology in basic research. “There is evidence that the repeats grow during the development of the nervous system, and we would like to take a closer look at this.” He also envisions numerous applications in clinical diagnostics. Repetitive regions are involved in the development of cancer, and the new method is relatively inexpensive and fast. Müller said, “We are very close to clinical application.”