Milestone in DNA Sequencing is (More Than) One in a Million

December 27, 2017
Milestone in DNA Sequencing is (More Than) One in a Million
Martin A. Smith, Ph.D., led researchers at the Garvan Institute in Sydney, Australia, in sequencing a single DNA read of more than one million bases long using the MinION nanopore sequencing device from U.K.-based Oxford Nanopore Technology. [Source: Garvan Institute / Martin A. Smith]

Researchers at the Garvan Institute in Sydney, Australia, recently acknowledged what could be called a one-in-a-million milestone and then some.

On December 14, Garvan investigators led by Martin A. Smith, Ph.D., head of genomic technologies in the Kinghorn Centre for Clinical Genomics, announced via Twitter that they had sequenced a single DNA read of more than one million bases long using the MinION nanopore sequencing device from U.K.-based Oxford Nanopore Technology (ONT).

Randi Hernandez of Clinical OMICs sister publication Genetic Engineering & Biotechnology News (GEN) recently spoke with Dr. Smith to find out what this really means in the scope of sequencing, what the team sequenced, and what the lab plans to do next.


What was the most challenging part about getting a read of more than 1 million bases long?

Smith: There are quite a few challenges, but DNA extraction and sample preparation are without a doubt the most challenging steps. Even simple pipetting can shear DNA molecules longer than 100 kilobases (kb), so special sample preparation tricks are required, many of them revisited from the ‘70s and ‘80s. High-molecular-weight DNA samples form a big clump of stringy gel-like substance, rather than a liquid, so even measuring the concentration of the sample is hard because you cannot pipette small volumes of the highly viscous solution (you almost need scissors). Also, an unexpected challenge from reads this long was data analysis—most software tools have been developed for short (<50 kb) reads, therefore, these ultra-long reads are good at finding bugs in code or problems with memory. 


What about the accuracy of the read—did it suffer as a result of being ultra-long?

Smith: Not discernibly. The uncorrected read is 90% identical to the human reference sequence, which is within the typical quality range for native genomic DNA sequencing using tools from ONT. Keep in mind that native DNA molecules, such as the input for [the >1 Mb run], contain methylated nucleotides, which were not explicitly considered during the conversion of raw electronic signal to nucleotide sequence and can contribute to miscalled bases. 


In general, what is your viewpoint on nanopore-based sequencing and how does it compare with optics-based sequencers?

Smith: [Nanopore sequencing is] fundamentally different in its nature, more accessible, and more fun! As a bioinformatician, the unique nature of nanopore sequencing opens a whole new dimension of algorithms and software development, particularly relating to raw signal analysis. There is still a big role for other types of sequencing, which have complementary qualities, so they work quite well together. 


What kind of genetic information is your team looking at? What questions does your research address?

Smith: At the Kinghorn Centre for Clinical Genomics, we are primarily interested in human genomics and transcriptomics. My team focuses on implementing and evaluating new genomic technologies for various applications, including single-cell sequencing, epigenetics, and metagenomics. This megabase-long read was generated through a collaborative project seeking to resolve the sequence and structure of cancer-associated neochromosomes, which contain aberrant genomic sequences stitched together from hundreds of fragments of DNA. Some of the ultra-long reads have contiguously aligned to several (>30) regions of the reference genome, exposing complex genomic rearrangements at single-nucleotide resolution.


I saw an analogy that sequencing 1 Mb strand of DNA is like running a 3.2-kilometer piece of rope through your fist—is this correct?

Smith: That’s right! You would also have to thread the rope through your fist at about 1.5 meters per second. Here’s another analogy: If the genome were represented by the entire human population, then a single copy of chromosome 19 would correspond to the population of Italy. This megabase-long read would encompass the entire population of Milan, while a single read from short-read sequencing platforms would correspond to a busy subway carriage.


What technical tips do you have for researchers who are also trying to perform long-read nanopore sequencing?

Smith: Cut the tips off your pipettes and follow Nick Loman, Ph.D. (@pathogenomenick) and Joshua Quick (@Scalene) on Twitter!


Now you’ve crossed the 1 Mb threshold, is there any reason you can’t push for 2 Mb?

Smith: I think this is within reach and will be achieved sometime soon. Some research funding to directly pursue such an endeavor would be greatly beneficial.