A New Day? Fast, Cheap Human Genome Sequencing Will Open Doors

Rabiya S. Tuma, PhD

February 23, 2018

Researchers completed the first draft of the human genome sequence almost 20 years ago. The multiyear project cost $2.7 billion dollars, with sequencing cost making up at least $500 million of that bill. This year, an international research team reported that they completed the human genome sequence using handheld nanopore devices in about 2 months at a cost of around $30,000.

The small sequencing devices have been used previously to track the Ebola and Zika epidemics, but the human genome is many orders of magnitude larger than those bacterial and viral genomes, making this "a huge technical achievement," Benedict Paten, PhD, one of the collaborators in the latest sequencing effort, told Medscape Medical News.

"We were able to generate enough data to put together a human genome assembly that was in many respects superior to that initial draft," continued Paten, who currently oversees the Center for Big Data in Translational Genomics and is an assistant professor in the Department of Biomolecular Engineering at the University of California, Santa Cruz. "Not in all respects — let me be clear, there are some rough edges — but in many respects superior to the original draft. So that, just as a technical achievement, is quite amazing."

Eric Topol, MD, founder and director of the Scripps Translational Science Institute in La Jolla, California, seconds that statement, noting that many people thought it would take years before the human genome sequence could be assembled inexpensively from small machines.

The portability and low cost will be important for translation to patient care, continues Topol, who is also editor-in-chief of Medscape. "It opens up opportunities in patient care. It takes genomics all over the world, to less developed countries. There are undiagnosed diseases all around the word that would be demystified with this.

"Also, if we can read a genome, we can read a cancer genome and maybe see how best to treat it," he continued enthusiastically.

"Or for sick newborns, if we could sequence and get results quickly, it could be a big advantage," Topol said. Speeding test results for inborn genetic defects would reduce the risk for lasting damage for affected infants.

Real Real-Time Data Analysis

Speed is an advantage of the MinION nanopore sequencer, said Miten Jain, PhD, a postdoctoral fellow at the University of California, Santa Cruz, Genomics Institute. Jain is one of the first authors on the article, published online January 29 in Nature Biotechnology, that describes the genome sequencing project.

A 2-day run on the sequencer provides about threefold coverage of the genome, Jain says. That would not be enough to assemble a genome from scratch, but it is enough to start an analysis, especially as data continue to accumulate while the machine runs.

"The minute I turn the sequencer on, I start to get data. So you get to do real real-time analysis," Jain explains.

Jain and colleagues also report that they are able to get much longer reads than has been possible with the typical Illumina sequencer, which is most commonly used for sequencing projects. Instead of getting 10 kb of sequence per read, the international team was able to get 100- to 300-kb reads.

(In December 2017, Oxford Nanopore announced that Martin Smith, PhD, a researcher at the Kinghorn Centre for Clinical Genomics at the Garvan Institute, Darlinghurst, Australia, and colleagues were the first to sequence more than 1 million bases of continuous sequence using the MinION sequencer.)

Those ultralong reads mean scientists can see parts of the genome in ways they have not before. With short, 10-kb reads, accurately stitching together a repetitive part of the genome, such as the centromeres, telomeres, and satellite regions, is difficult, if not impossible. "You might get a basic idea of what is there from short-read assemblies, but you miss the overall structure," Jain explains.

Having 300-kb stretches of contiguous data mitigates the problem, revealing regions that have been missing from other assemblies.

"We talk about 'whole-genome sequencing,' but it's a misnomer because there are parts of the genome that have been unseen, and there have been parts that have, what I'd call, weak assembly," Topol told Medscape Medical News. "It's been as if we're stitching together parts; here it's without the jigsaw effect, so we get to really know what is there. It's a huge advantage."

Jain says the team was not specifically targeting the repetitive regions in the project; they just showed up because of the quality of the data. (He's careful to note that the long reads come about because researchers in Nick Loman's lab at the University of Birmingham in the United Kingdom, one of the five collaborating laboratories, came up with a gentler way of isolating DNA, which delivers longer pieces for sequencing.)

The collaborators have already started to build on their success with these repetitive regions. They report in a preprint published online that they have assembled the sequence of the Y chromosome centromere, something that has not been done before. It is still proof of concept, they say, noting they will need even longer reads to assemble other centromeres. Thus far, however, the sequence resembles what has been predicted from short-read sequences.

More Accuracy Needed

"I think it's important that we be measured about the successes and the deficiencies," Paten emphasizes. "One of the strong aspects of the nanopore genome assembly is that it is highly contiguous, in that you get long stretches of DNA that are put together without any break, without a region of uncertainty in terms of order or orientation. But if you look at the accuracy of individual bases — the As, Cs, Gs, and Ts — the base accuracy is substantially lower" than what an Illumina machine provides.

In the Nature Biotechnology paper, the authors describe several methods aimed at improving the accuracy, but the best they could get still results in an error rate in the range of 1 in 100 or 1 in 1000 bases. Moreover, there is some concern that the errors are not random. If that's the case, then no matter how many times a region is sequenced, the accuracy would not improve.

"So the lower accuracy and the lower accuracy in particular regions are two of the problem spots with the current nanopore setup. That said, things have improved a huge amount in the last few years, and I think we expect those wrinkles to go away in the not massively distant future," Paten said.

(To improve the accuracy of the Y chromosome centromere sequence, Jain says the team supplemented the long reads produced by the nanopore sequencer with the more accurate short reads from standard Illumina machines. Together, the data accurately reveal both the nucleotide sequence and the overall structure of the centromere, which has not been seen before.)

Taking It to the Clinic

Jain notes that because the nanopore sequencer analyzes native DNA, epigenetic marks such as methylcytosine can be detected during regular sequencing runs. As the team shows in the recent paper, knowing which regions are methylated provides information about gene expression.

Methylation data, along with single-nucleotide polymorphisms, can also be used to determine whether a stretch of DNA is maternally or paternally inherited, which can help in assessing a disease state. "If you can track which allele mattered here and where it came from, suddenly it allows you to understand what may have gone wrong," Jain says.

To make the most of sequence data, however, one needs to have access to it, and this new technology, with its diminutive size and relatively low cost, may reduce barriers. The $1000 genome has been a marker for years, but even at that price only some people will have their genome sequenced, according to Jain. At $100 per genome, pretty much anyone who is interested could have their genome sequenced.

We are not at that point yet, Jain continued. But thinking about those infants Topol talked about, maybe it is time for them. "Infants are the primary situation where the healthcare system should start thinking about it. Not only are they the future of the country and the world, but it is where we can take the medical knowledge that we have, combine it with the cutting edge informatics and data we can generate, and learn to give them the best possible chance to live a healthy life."

Several coauthors report being members of the MinION access program and have received free-of-charge flow cells and kits for nanopore sequencing for this and other studies, and travel and accommodation expenses to speak at Oxford Nanopore Technologies conferences. One or more coauthors have received one or more of the following: honorarium, and travel and accommodation expenses to speak at an Oxford Nanopore company meeting or conferences, and research funding. Topol serves as a director, officer, partner, employee, advisor, consultant for: Apple, Cypher Genomics, Dexcom, Edico Genome, GenapSys, Gilead Sciences, Google, Illumina, Molecular Stethoscope, MyoKardia, Quanttus, Quest Diagnostics, ToSense, and Walgreen Company. He has received research grants from the National Institutes of Health and Qualcomm Foundation.

Nat Biotechnol. Published online January 29, 2018. Full text

For more news, join us on Facebook and Twitter


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.