Next-generation Sequencing and its Applications in Molecular Diagnostics

Zhenqiang Su; Baitang Ning; Hong Fang; Huixiao Hong; Roger Perkins; Weida Tong; Leming Shi


Expert Rev Mol Diagn. 2011;11(3):333-343. 

In This Article

Next-generation Sequencing Technologies

Roche/454 Pyrosequencing

The Roche/454 GS FLX System is based on emulsion PCR[8] and pyrophosphate detection[9] techniques. A library of DNA templates is constructed by a highly efficient DNA amplification method known as emulsion PCR, where sheared DNA fragments are ligated to specific oligonucleotide adapters, resulting in the binding of each DNA fragment to a fragment-carrying bead. The beads are then captured in separate emulsion droplets that function as amplification reactors to produce approximately 10 million clonal copies of the DNA template that are needed for sufficient light signal intensities. Upon completion of the emulsion PCR amplification, the emulsion is disrupted and the beads containing clonally amplified template DNAs are enriched. The beads are then separated by limiting dilution and deposited into individual picotiter-plate wells. The picotiter plates serve as sequencing reactors to allow individual enzymatic sequencing reactions to occur without interference from adjacent wells. Visible light emitted from the subsequent pyrosequencing reactions are detected by a charge-coupled device (CCD) that is bonded to a fiber-optic bundle. During each cycle of a pyrosequencing reaction, a single species of unlabeled nucleotide is supplied to the reaction mixture to all beads on the chip, so that the complementary strand of DNA is sequentially synthesized. With the incorporation of each base in the growing chain, an inorganic pyrophosphate group is released and converted to ATP. During sequencing, the ATP molecule is used by luciferase to convert luciferin to oxyluciferin, producing a light pulse. Detecting the light emissions together with the known nucleotide identity in each step allows the incorporated base to be determined. Through a series of such pyrosequencing reaction cycles, the sequences of the DNA templates carried by individual beads are determined.

In a given pyrosequencing reaction cycle, multiple consecutive incorporations may occur owing to the lack of a terminating moiety. Thus, the length of homopolymers (i.e., repeats of the same base, such as AAAA) in sequence reads must be inferred from light signal intensity, with a higher intensity corresponding to more repeats of the same base. The error rate of calling consecutive repeats increases when the length of the homopolymers is greater than three-to-four repeating bases. Consequently, the major error type for the Roche/454 system is insertions and deletions (or indels), other than substitutions.[10]

Compared with other NGS platforms, the strength of the Roche/454 system is its longer sequence reads. The Roche/454 GS FLX, with its newest chemistry GS FLX Titanium series reagents, can generate more than 1 million individual sequence reads with read lengths over 400 bases during a 10-h timespan.[201] Although its per-base cost is much higher than that of other NGS platforms (e.g., Life Technologies/SOLiD and Illumina/GA IIx), the Roche/454 system is best suited for certain applications, such as de novo sequencing of new genomes, for which long read length is critical for de novo genome assembly.

Illumina Sequencing Technology

The Illumina GA system is the first short-read sequencing platform and currently dominates the NGS market;[1] it uses an array technique to achieve cloning-free DNA amplification. Reversible terminator chemistry is the defining characteristic that provides massively parallel sequencing of millions of DNA fragments at a low cost. DNA samples are randomly sheared into fragments that are then end-repaired to generate 5'-phosphorylated blunt ends. The Klenow fragment of DNA polymerase is then used to attach a single 'A' base to the 3'-end of the DNA fragments, which prepares the DNA fragments for ligation to oligonucleotide adapters. After ligation to adapters at both ends, the DNA fragments are denatured and single-stranded DNA fragments are attached to reaction chambers that are located on an optically transparent solid surface called a flow cell. The attached DNA fragments are extended and amplified by bridge PCR amplification in order to obtain sufficient light signal intensity for reliable detection. The bridge PCR amplification can create an ultra-high-density sequencing array on the flow cell, containing hundreds of millions of clusters with each cluster containing approximately 1000 copies of the same DNA template. These templates are finally sequenced through the sequencing-by-synthesis technique that applies reversible terminators with removable fluorescent dyes.

For sequencing and DNA synthesis, the reaction mixtures comprising primers, DNA polymerase and four reversible terminator nucleotides, each labeled with a different fluorescent dye, are supplied to the flow cell. In each sequencing cycle, a specific terminator is incorporated according to sequence complementarity in each template DNA strand in a clonal cluster. After incorporation, the identity (base calling) and the position of the specifically incorporated terminator on the flow cell is determined according to the fluorescence dye emission, and the signal is recorded using a CCD camera. In the following cycle, the reversible terminator is unblocked and the fluorescent dye label is removed from the base so that a new nucleotide can be incorporated and a new base can be detected using the same strategy. This repetitive sequencing-by-synthesis process takes approximately 2.5 days to generate 50 million reads per flow cell, with a read-length of 36 bases. The overall sequencing output of the Illumina GA system is more than 1 billion bases (Gb) per analytical run. The throughput is dramatically increased with new models, such as the GA IIx and HiSeq 2000.[202]

In a given cycle of sequencing, any modified nucleotide could be incorporated with decreased or increased efficiency, resulting in an under- or over-incorporation and a heterogeneous mixture of synthesis lengths and concomitant degradation of signal purity and precision. Moreover, the 'dark' bases (without a fluorophore) can also result in leading or lagging dephasing. In addition, chemical cleavage of terminating moieties and florescent dye labels are subject to incompletion. Therefore, Illumina's sequencing strategy generates much shorter reads and its most common error type is substitutions.[10] The base-call error rate increases with read length owing to 'dephasing noise'.[11] In addition, an over-representation of GC-rich regions and an under-representation of AT-rich regions have been observed.[11]

Life Technologies/SOLiD

The SOLiD system relies on the techniques described by Shendure et al.[12] and McKernan et al..[101] Library construction for the SOLiD system is similar to Roche/454 technology, in which DNA is stochastically sheared into fragments that are subsequently ligated to oligonucleotide adapters, attached to beads and clonally amplified by emulsion PCR. After denaturing the templates, template-carrying beads are enriched and deposited onto a solid substrate. The templates on the selected beads are then 3'-modified for the purpose of covalent attachment to the slide. After this, 3'-modified beads are deposited onto a derivitized-glass flow cell surface to generate a dense, disordered array. Sequencing reactions are started by hybridizing a primer oligonucleotide complementary to the adapter at the adapter–template junction. Unlike the Roche/454 sequencing approach, the sequencing-by-synthesis in the SOLiD system is based on ligation chemistry. Briefly, a mixture of partially degenerate oligonucleotide octamers is competitively hybridized to the DNA fragments as probes, and a universal primer is oriented to provide a 5'-phosphate group for ligation. The specificity of the probe ligated to a primer is determined by the fourth and fifth bases of the probe that are complementary to the template, and the identities (base calling) of the fourth and fifth bases of probes are characterized by one of four fluorescent labels at the end of the octamer, so that the interrogation of the fourth and fifth base is achieved. After ligation, the ligated octamer oligonucleotides are cleaved off after the fifth base and the fluorescent label is removed, so that the next hybridization and ligation cycle can proceed. In this way, the fourth and fifth bases in the template are determined in the first cycle, and the ninth and tenth bases in the second cycle, and so on. The ligation sequencing can also be carried out in the same way with another primer offset by one base in the adapter, so bases three and four, eight and nine, and so on, in the template can be determined. By any given five-cycle rounds, each base is interrogated twice with two different fluorescent labels, resulting in a significantly reduced base-call error rate.

By using ligation-based sequencing-by-synthesis, the SOLiD system mitigates homopolymeric sequencing error. The build-in two-base encoding system can also correct most of the read errors where the two-base transition can be identified. The dominant error type is substitutions. The raw error rate is high, ranging from approximately 2% in the 5'-end to approximately 8% in the 3'-end.[13] But according to a study,[14] an accuracy of 99.99% can be achieved by Roche 454, Illumina GA and SOLiD platforms under saturated coverage.

In addition, Life Technologies has recently acquired Ion Torrent, which has developed a non-light-based sequencing technology. Ion Torrent sequencing is based on a natural biochemical process in which a hydrogen ion is released when a nucleotide is incorporated into a DNA strand. By monitoring the pH of the solution, the incorporated bases can be determined. As no proprietary chemistries, fluorescence, chemiluminescence or optics are required, Ion Torrent sequencing technology makes it a simpler, faster, and more cost-effective and scalable system than other commercialized platforms.

Helicos HeliScope Genetic Analysis System

The HeliScope Genetic Analysis System is the first commercialized single-molecule DNA sequencer. It is based on the true single molecule sequencing technology stemmed from the work by Braslavsky et al.[15] and relies on the cyclic interrogation of a dense array of sequencing features. By directly sequencing single molecules of DNA or RNA without requiring clonal amplification like other NGS systems, the Helicos' true single molecule sequencing technology significantly increases the speed and decreases the cost of sequencing.

In the HeliScope system, a DNA library is constructed by random fragmentation of a DNA sample, and 3'-end polyadenylation of DNA fragments with the adenosine terminal transferase. Denatured poly-A fragments are captured on a flow-cell surface by hybridization to surface-tethered poly-T oligomers to yield a disordered array of primed single-molecule sequencing templates. In each cycle of sequencing, DNA polymerase and one of four fluorescently labeled nucleotides are supplied to the flow cell. The template-dependent incorporation of single dye-labeled nucleotide is imaged with a CCD camera to make a base calling. Followed by dye-label cleavage and washing, the next cycle of nucleotide extension and imaging is repeated. Each sequencing cycle consists of the successive addition of polymerase and a different type of dye-labeled nucleotide. The HeliScope instrument is currently capable of imaging billions of single molecules per run and producing over 21–35 Gb of sequence data per day.

Similar to the Roche/454 platform, the HeliScope system is asynchronous, meaning that some DNA strands will fall behind or ahead of others in a sequence-dependent manner, and some DNA templates simply fail to incorporate by chance on a given cycle; therefore, base substitution error is likely to occur. However, the substitution error rate is the lowest (0.01–1% with one pass and 0.001% with two passes) for Helicos technology. On the other hand, homopolymers are problematic. Helicos has since developed a Virtual Terminator™ technology to correct the homopolymer errors, increasing sequencing accuracy.[16] In general, as a result of incorporation of unlabeled bases, deletion is the dominant error type in the HeliScope system. The deletion and insertion error rate is 3–7% with one pass and 2–5% with two passes.[17]

There are important differences among the aforementioned NGS technologies in terms of costs, advantages, limitations and practical aspects of use for specific applications. For example, the Illumina and Life Technologies platforms are particularly well suited to variant discovery by resequencing the human genome,[14] where a reference genome is available. The Roche/454 sequencer may be preferable for de novo sequencing owing to its longer read length. The Helicos platform is well suited to RNA sequencing (RNA-Seq) that relies on tag counting or direct RNA-Seq.[18]Table 1 provides a summary of the main features of the four NGS technologies already discussed. In addition, new DNA sequencing technologies with higher throughput or read lengths are also being developed at other companies and research institutions worldwide, such as Pacific Biosciences (CA, USA),[203] VisiGen Biotechnologies (TX, USA),[204] Sequenom (CA, USA),[205] Complete Genomics (CA, USA),[206] Oxford Nanopore Technologies (Oxford, UK),[207] and the Center for Computational Genetics at Harvard Medical School (MA, USA).[208] However, their advantages as clinical applications are yet to be demonstrated and validated.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.
Post as: