Next-generation Sequencing and its Applications in Molecular Diagnostics

Zhenqiang Su; Baitang Ning; Hong Fang; Huixiao Hong; Roger Perkins; Weida Tong; Leming Shi


Expert Rev Mol Diagn. 2011;11(3):333-343. 

In This Article

Analyses of NGS Data

Overview of NGS Data Analyses

The NGS technologies are promising to refine and advance scientific approaches across many fields, including molecular diagnostics. However, the dramatic increase in sequence throughput has come at a cost of much lower read accuracy and shorter sequence length compared with traditional Sanger sequencing. Although these shortcomings can be partially mitigated through increasing sequence coverage and bioinformatics means, the realization of many promises for diagnostic applications is predicated on progress in overcoming obstacles in handling massive datasets and in developing tools to check and assure sequence quality, conduct sequence alignment and assembly, and biologically interpret and draw inferences from the data. NGS experiments generate immense volumes of short-read sequence data (Table 1).[19] Data acquisition for such volumes is problematic alone, requiring an infrastructure with high bandwidth pipelines between processes that will be computationally intensive.

Translating such volumes of short-reads to diagnostic biomarkers can be described as requiring three analysis stages, as depicted in Figure 2. In the first stage, images from NGS sequencers are analyzed and converted into sequence reads using the manufacturer's base-calling system. The reads are quality filtered and aligned in the second stage. Depending on the intended application, as well as considerations of the cost, labor intensity and time requirement, the alignment can be performed by de novo assembly or by mapping to a reference sequence that can be a complete genome, subsets of a genome (e.g., expressed genes and individual chromosomes of interest), a transcriptome or an exome. In the third and final stage, mapped and unmapped reads can be used to understand the genetic mechanisms underlying many human diseases through gene-expression profiling; discovering single-nucleotide variants, novel transcripts or splice variants; and detecting gene fusion and transcription factors, methylation status and histone modifications.

Figure 2.

A typical workflow for analyses of next-generation sequencing short reads. (A) Conversion of images to sequence reads; (B) alignment of sequence reads (map to a reference or de novo assembly); and (C) experiment-specific downstream analyses that depend on diagnostic applications.
MAQ: Mapping and assembly with quality.

Quality Control of NGS Data

The rapid expansion of applications of NGS technologies in solving biological, biomedical and clinical problems makes the topic of NGS quality control, including data quality, reliability, reproducibility and biological relevance, increasingly important because of the inherently higher error rate in raw sequence data. It is preferable to establish an early consensus of standardized benchmarks for sequencing quality metrics[20] in order to avoid future dilemmas when comparing data from different NGS platforms, such as has occurred for microarray platforms in the past few years.[21,22] The third phase of the MicroArray Quality Contro[22,23] project, also called the sequencing quality control project, is such an effort that aims to assess the technical performance of NGS platforms by generating benchmark datasets with reference samples and evaluating the advantages and limitations of various bioinformatics strategies in RNA and DNA analyses.[209]

Bioinformatics Tools for NGS Data Analyses

A number of bioinformatics tools are currently available for analyzing NGS data (Table 2). They can be grouped into four general categories: base calling; alignment of reads to a reference; de novo assembly; and genome browsing, annotation and variant detection. However, there are some limitations with current analytical tools, and many challenges and questions remain due to the fact that each NGS platform generates different types of data and often leads to different needs. Efficient data analysis pipelines are still needed for many applications and the advantages and limitations of existing tools need to be objectively evaluated. Proper alignment is mandatory to render NGS data biologically meaningful. NGS alignment is quite different from Sanger sequence alignment. Sanger data is already a cluster technology and the read out is already a consensus call, while NGS uses alignments to build a consensus call. In addition, the short read length, relatively high error rate in base calling and huge volume of NGS data make NGS alignment much more difficult than that for Sanger sequencing data.[24] One limitation of aligning and assembling NGS short reads is that a large portion of them cannot be uniquely aligned to a reference when sequence reads are too short and the reference is too complex.[19] In addition, the chance of unique alignment or assembly is reduced not only by the presence of repeat sequences in complex genomes, but also by shared homologies within closely related gene families and pseudogenes.[19]

Conventional alignment solutions such as the basic local alignment search tool (BLAST)[25,210] and the BLAST-like alignment tool (BLAT)[26,211] are efficient for aligning long reads, such as those generated by Sanger sequencing, but inadequate for handling NGS short reads. To date, a variety of algorithms and software packages have been specifically developed for dealing with millions of NGS short reads (Table 2). For example, MAQ[27] and Bowtie[28] are popularly used alignment tools; TopHat[29] can be used to align RNA-Seq reads, create a view of the junctions or align to a known set of junctions; while Cufflinks[30] and Scripture[31] are useful for assembling transcriptome and detecting differentially expressed genes.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.