Next-generation Sequencing and its Applications in Molecular Diagnostics

Zhenqiang Su; Baitang Ning; Hong Fang; Huixiao Hong; Roger Perkins; Weida Tong; Leming Shi


Expert Rev Mol Diagn. 2011;11(3):333-343. 

In This Article

Applications of NGS Technologies

Next-genetation sequencing technologies have already been widely investigated and are increasingly being applied to many research areas, including de novo sequencing of bacterial and viral genomes;[17,32] searching for genetic variants by resequencing whole genome or targeted genome regions;[33] understanding the genetic mechanisms underlying human gene expression variation[34] and characterizing the transcriptomes of cells, tissues and organisms by RNA-Seq;[35] and genome-wide profiling of DNA-binding proteins and epigenetic marks by ChIP-Seq.[36] These applications have been well reviewed elsewhere.[1,37,38] Here, we focus on recent applications of NGS technologies for understanding and classification of human diseases.

Sequencing of Cancer Genomes

Cancer is a collection of genetic diseases, but epigenetic and non-cell-intrinsic factors, such as immune responses, angiogenesis and stromal cell interactions, also play a role in cancer development.[39] The majority of human cancers derive from sporadic tumors that arise through a multistage process in which somatic, genetic and epigenetic alterations lead to changes in gene sequence, structure, copy number, and expression and clonal selection of variant progeny, with the most robust and aggressive growth properties playing major roles.[40] Therefore, identification of all genetic events that lead to or potentially lead to cancer development will revise our understanding and classification of human tumors and accelerate the discovery of new approaches for clinical diagnosis, outcome prediction and risk stratification. Eventually, individualized therapy can be determined based on the specific somatic mutational landscape of a patient's cancer genome.[41,42] Currently, for molecular diagnostic application, NGS can be adapted to obtain specific cancer-related gene-sequence information, which is the key for clinical decision-making.

In addition, searching for cancer-related genes is also a laborious task. Early efforts to identify cancer genes and cancer-related mutation patterns in cancer genomes were restrained by the lack of a sequence-based map of normal human genomes and the limited resolution of technologies available to detect genomic alterations. This has been changed since 2003 when the Human Genome Project was completed and the sequenced base map of the normal human genome draft was published.[43] The recent advances in next-generation DNA sequencing technologies have made it possible to sequence a large number of entire cancer genomes and allowed researchers to systematically characterize human cancer genomes at the genomic, transcriptomic and epigenetic levels. Although the cost of sequencing a cancer genome is still high, the field has witnessed an increasing number of efforts[44,45] using NGS technologies to comprehensively explore the somatic mutations that underlie various human cancers over the past few years. For example, Pleasance et al. have detected 33,345 somatic base substitutions, 680 small deletions, 303 small insertions, 51 somatic rearrangements, and all well-supported somatic copy number alterations and regions of loss of heterozygosity in a single experiment by sequencing the genomes of a malignant melanoma and a lymphoblastoid cell line obtained from the same patient.[44] Lee et al. have found more than 50,000 high-confidence single-nucleotide variants by comparatively sequencing a tumor genome from a primary lung tumor and a normal genome from adjacent normal tissue.[44] Marra et al. have found 32 somatic nonsynonymous coding mutations by sequencing the genome and transcriptome of an estrogen-receptor-α-positive metastatic lobular breast cancer.[46] These pioneering studies have demonstrated the power of NGS technologies in tracing processes of DNA damage, repair, mutation and selection that underlie the development of all human cancers. This type of research will revise our understanding of cancer causation and development, and eventually provide the foundation for cancer diagnosis, prognosis, prevention and treatment selection.

Sequencing of Whole Exomes

The protein coding regions of all genes (i.e., 'whole exome') of the human genome constitute only approximately 1–2% of the entire human genome but contain 85% of all DNA mutations that have large effects on human disease.[47] There may be many more mutations that are in regulatory regions, but they have not been studied as much as those in the coding regions. Therefore, combining NGS technologies with DNA fragment-capture approaches for selectively sequencing the complete protein coding regions not only reduces sequencing cost, but is also an efficient way to discover most mutations that underlie rare and common human diseases. This could be a scenario where we may actually see a true molecular diagnostic assay. Through this approach, we could build an understanding of the sensitivity and specificity for mutation detection. For example, in a recent study, the sequencing of a whole exome has led to the discovery of a new recessive mutation in WDR62 in severe brain malformations.[48] In another study, Berger et al. used RNA-Seq as a systematic means of identifying cancer-associated transcripts and other genetic alterations that are expressed in melanoma.[49] In total, 11 novel melanoma gene fusions and 12 novel readthrough transcripts were discovered. The clinical diagnostic utility of this approach has been illustrated by another study[47] in which a homozygous missense D652N mutation at a position in the congenital chloride diarrhea locus (SLC26A3) was identified as a genetic marker for the diagnosis of Bartter syndrome (a renal salt-wasting disease).

Sequencing of Plasma/Serum DNA

Cell-free DNA and RNA molecules circulating in human blood are very important early molecular diagnostic markers for particular diseases, such as diabetes, cancer, myocardial infarction and stroke.[50,51] Identification of tumor-derived and fetal-derived circulating nucleic acids has been investigated for clinical oncology and prenatal diagnosis for a long time.[52] Traditional cloning and DNA-sequencing techniques were also investigated for characterization of circulating nucleic acids. Nonetheless, such methods are laborious and can only detect a small portion of sequences from a few genomic loci. The advent of NGS technologies has provided an alternative approach for the detection, measurement and cataloging of circulating nucleic acids. For example, two recently reported studies,[53,54] using Illumina GA platform to randomly sequence DNA molecules obtained from the plasma of pregnant women, have found that the amount of chromosome 21 DNA sequences in maternal plasma increases when pregnant women are affected by a chromosomal aneuploidy (e.g., trisomy 21). This method thus forms a foundation for the development of a noninvasive prenatal diagnosis for fetal chromosomal aneuploidies. However, many caveats and challenges have been described for the use of NGS for noninvasive prenatal diagnostics.[55] Another study, using Roche/454 sequencing to detect cytosine-methylation patterns present in both breast cancer patients and cancer-free subjects by sequencing selected genomic loci obtained from clinical tissues and sera, has revealed that both tumor and cancer-free tissues and sera contain DNA molecules with conceivable cytosine-methylation patterns, thus highlighting the challenges for developing highly specific DNA methylation-based cancer diagnostic markers.[56]

RNA Sequencing

mRNA-expression profiles using microarray technologies have been used as biomarkers for the diagnosis and prognosis of diseases and for the selection of drug treatment.[22,23] Directly sequencing RNA/cDNA offers an alternative approach for high-throughput transcriptome analysis.[37] RNA-Seq is revolutionary in its abilities to provide precision in measuring transcriptome data.[57] Its higher resolution improves discovery of novel transcripts, differential allele expression, alternative splice variants, post-transcriptional mutations and isoforms compared with more conventional Sanger sequencing and microarray-based approaches.[58] Recent studies using RNA-Seq to characterize RNA populations have provided more complicated pictures of RNA regulation and expression through alternative splicing, alternative polyadenylation and RNA editing.[59,60] These findings have expanded our traditional view of the extent and complexity of gene expression,[61] and advanced our understanding of mechanisms of RNA expression regulation in both eukaryotic[4] and prokaryotic[62] genomes.

Our study on the evaluation of the technical performance of RNA-Seq on quantifying the expression level of transcripts revealed that the concordance between differentially expressed genes identified with RNA-Seq and microarrays is moderate. RNA-Seq identified more genes as significantly differentially expressed than microarrays under the same selection criteria. The differences are probably caused by many factors, such as inherent differences in signal detection between RNA-Seq and microarrays, the effect sizes (i.e., the magnitude of log2 fold changes), sequence aligners, the sequence depth and microarray data normalization methods.

Clinical Utility of a Personal Genome

With the cost of genome sequencing falling sharply, more and more personal genomes for individuals will become available. The challenge will soon become how to efficiently turn the large volumes of whole-genome sequence data into clinically useful information, including molecular diagnostics and treatment selection. In a study recently published in The Lancet,[63] a group of researchers from Stanford University (CA, USA) integrated the complete genome of Stephen R Quake, previously sequenced with a Helicos system,[64] into a medical assessment. Quake has a family history of vascular disease and early sudden death. The researchers queried disease-specific mutation databases and pharmacogenomics databases to identify genes and mutations with known associations with diseases and drug responses. Analysis of the 2.6 million single-nucleotide polymorphisms and 752 copy number variations in Quake's genome demonstrated increased genetic risk for myocardial infarction, Type 2 diabetes and some cancers. Specifically, rare variants in three genes that are clinically associated with sudden cardiac death – TMEM43, DSP and MYBPC3 – were identified, and a variant in LPA was consistent with a family history of coronary artery disease. A heterozygous null mutation in CYP2C19 suggests probable resistance to clopidogrel; several variants associated with a positive response to lipid-lowering therapy were also found; and variants in CYP4F2 and VKORC1 suggest a low initial dosing for warfarin. Many variants of uncertain importance were also reported. Although many challenges remain for routine application of personal genome in clinical diagnostics, the results suggest that whole-genome sequencing can yield useful and clinically relevant information for individual patients. When databases for disease-specific mutations and pharmacogenomics become more comprehensive, whole-genome sequencing will become more useful in clinical evaluation than traditional molecular diagnostic methods that are usually used for one intended purpose.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.