A collaboration of investigators at the Baylor College of Medicine, Rice University, Texas Children's Hospital, and the Broad Institute of MIT and Harvard have just published their new findings on the development of a novel technique to sequence genomes, which they note can perform de novo assembly of genomes considerably cheaper and faster than current methodologies. The results from the research team’s new study were published recently in Science through an article entitled “De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds.”
The multi-institutional team reports that their method—called 3D genome assembly—can create a human reference genome, entirely from scratch, for less than $10,000. The ability to quickly and easily generate a reference genome from scratch would open the door to creating reference genomes for everything from patients to tumors to all species on earth.
To illustrate the power of 3D genome assembly, the researchers have assembled the 1.2 billion letter genome of the Aedes aegypti mosquito, which carries the Zika virus, producing the first end-to-end assembly of each of its three chromosomes. The new genome will enable scientists to better combat the Zika outbreak by identifying vulnerabilities in the mosquito that the virus uses to spread.
Despite the decline in the cost of DNA sequencing, determining the sequence of each chromosome from scratch via de novo genome assembly remains extremely expensive because chromosomes can be hundreds of millions of base-pairs long. In contrast, today's inexpensive DNA sequencing technologies produce short reads, or hundred-base-pair-long snippets of DNA sequence, which are designed to be compared to an existing reference genome. Generating a reference genome and assembling all those long chromosomes involves combining many different technologies at the cost of hundreds of thousands of dollars. Unfortunately, because human genomes differ from one another, the use of a reference genome generated from one person in the process of diagnosing a different person can mask the true genetic changes responsible for a patient's condition.
“As physicians, we sometimes encounter patients who we know must carry some sort of genetic change, but we can't figure out what it is,” said Aviva Presser Aiden, Ph.D., M.D., a physician-scientist in the Pediatric Global Health Program at Texas Children's Hospital, and a co-author of the new study. “To figure out what's going on, we need technologies that can report a patient's entire genome. But, we also can't afford to spend millions of dollars on every patient's genome.”
To tackle the challenge, the team developed a new approach, called a 3D assembly, which determines the sequence of each chromosome by studying how the chromosomes fold inside the nucleus of a cell.
“Our method is quite different from traditional genome assembly,” said Olga Dudchenko, Ph.D., a postdoctoral fellow at the Center for Genome Architecture at Baylor College of Medicine, who led the research. “Several years ago, our team developed an experimental approach that allows us to determine how the 2-meter-long human genome folds up to fit inside the nucleus of a human cell. In this new study, we show that, just as these folding maps trace the contour of the genome as it folds inside the nucleus, they can also guide us through the sequence itself.”
By carefully tracing the genome as it folds, the team found that they could stitch together hundreds of millions of short DNA reads into the sequences of entire chromosomes. Since the method only uses short reads, it reduces the cost of de novo genome assembly, which is likely to accelerate the use of de novo genomes in the clinic.
“Sequencing a patient's genome from scratch using 3D assembly is so inexpensive that it's comparable in cost to an MRI,” said Dr. Dudchenko, who also is a fellow at Rice University's Center for Theoretical Biological Physics. “Generating a de novo genome for a sick patient has become realistic.”
Unlike the genetic tests used in the clinic today, de novo assembly of a patient genome does not rely on the reference genome produced by the Human Genome Project. “Our new method doesn't depend on previous knowledge of the individual or the species that is being sequenced,” Dr. Dudchenko noted. “It's like being able to perform a human genome project on whoever you want, whenever you want.”
“Or whatever you want,” added Erez Lieberman Aiden, Ph.D., director of the Center for Genome Architecture at Baylor and corresponding author on the new work. “Because the genome is generated from scratch, the 3D assembly can be applied to a wide array of species, from grizzly bears to tomato plants. And it is pretty easy. A motivated high school student with access to a nearby biology lab can assemble a reference-quality genome of an actual species, like a butterfly, for the cost of a science fair project.”
The effort took on added urgency with the outbreak of Zika virus, which is carried by the Aedes aegypti mosquito. Researchers hoped to use the mosquito's genome to identify a strategy to combat the disease, but the Aedes genome had not been well characterized, and its chromosomes are much longer than those of humans.
“We had been discussing these ideas for years, writing a chunk of code here, doing a proof-of-principle assembly there,” explained Dr. Lieberman Aiden, also an assistant professor of molecular and human genetics at Baylor, computer science at Rice and a senior investigator at the Center for Theoretical Biological Physics. “So we had assembly data for Aedes aegypti just sitting on our computers. Suddenly, there's an outbreak of Zika virus, and the genomics community was galvanized to get going on Aedes. That was a turning point.”
“With the Zika outbreak, we knew that we needed to do everything in our power to share the Aedes genome assembly, and our methods, as soon as possible,” according to Dr. Dudchenko.”This de novo genome assembly is just a first step in the battle against Zika, but it's one that can help inform the community's broader effort.”
The team also assembled the genome of the Culex quinquefasciatus mosquito, the principal vector for West Nile virus. “Culex is another important genome to have since it is responsible for transmitting so many diseases,” said Dr. Lieberman Aiden. “Still, trying to guess what genome is going to be critical ahead of time is not a good plan. Instead, we need to be able to respond quickly to unexpected events. Whether it is a patient with a medical emergency or the outbreak of an epidemic, these methods will allow us to assemble de novo genomes in days, instead of years.”