Metagenomics for the Discovery of Novel Human Viruses

Patrick Tang; Charles Chiu


Future Microbiol. 2010;5(2):177-189. 

In This Article

Abstract and Introduction


Modern laboratory techniques for the detection of novel human viruses are greatly needed as physicians and epidemiologists increasingly deal with infectious diseases caused by new or previously unrecognized pathogens. There are many clinical syndromes in which viruses are suspected to play a role, but for which traditional microbiology techniques routinely fail in uncovering the etiologic agent. In addition, new viruses continue to challenge the human population owing to the encroachment of human settlements into animal and livestock habitats, globalization, climate change, growing numbers of immunocompromised people and bioterrorism. Metagenomics-based tools, such as microarrays and high-throughput sequencing are ideal for responding to these challenges. Pan-viral microarrays, containing representative sequences from all known viruses, have been used to detect novel and distantly-related variants of known viruses. Sequencing-based methods have also been successfully employed to detect novel viruses and have the potential to detect the full spectrum of viruses, including those present in low numbers.


Collectively, we face a future where emerging viral pathogens will become more prevalent, fueled by advances in medical therapies, changes in the environment, globalization and better disease surveillance. Modern immunosuppressive therapies, targeting neoplastic and certain chronic diseases, have created new opportunities for viruses that are normally nonpathogenic. As the barriers between animals and humans are removed by the expansion of the human population, and global warming extends the range of certain vectors and pathogens, new viruses will be regularly introduced into the human population. Compounding these forces is globalization, which facilitates the rapid spread of viruses from one continent to another. Recent examples include the West Nile Virus, which has emerged in North America;[1] the SARS coronavirus, which triggered a worldwide epidemic;[2–4] and the swine-origin influenza A/H1N1 virus, which has become the latest pandemic virus.[5–9] In addition, there exist many common clinical syndromes, such as respiratory, gastrointestinal and encephalitic infections, in which many of the causative agents are believed to be viruses but extensive conventional diagnostic tests have been unable to identify specific pathogens.[10–15] Finally, the specter of bioterrorism is ubiquitous, as the knowledge and tools for creating novel, more effective biological agents become more widespread.[16–19]

Enhancements in our disease surveillance networks and techniques have allowed us to rapidly identify more instances of emerging infectious diseases.[20–24] However, the traditional paradigm of finding the unknown causes of these diseases relies upon diagnostic tests for known agents. This makes it difficult and inefficient, and sometimes impossible, to identify unexpected or novel viral pathogens using conventional methods. In contrast, more comprehensive metagenomics techniques, such as microarrays and high-throughput sequencing, are ideally suited for the task of systematic virus discovery.

Although conventional methods for virus detection, such as virus culture, electron microscopy, serology and PCR, have been used successfully for identifying new viruses, these methods each have limitations for systematic virus discovery. Many viruses cannot be amplified in cell culture, or will not exhibit characteristic cytopathic effects during their growth.[25] Electron microscopy is a relatively insensitive method for virus detection and provides only morphologic clues regarding the identity of the virus.[26] Sera from previously infected hosts can be used to label viruses to enhance detection in cell culture or electron microscopy, but sera containing high titers of specific viral antibodies can be difficult to obtain.[26,27] PCR targeting conserved genetic regions can be used to detect variants of known viruses, but may not be able to detect more divergent or completely novel viruses for which there exists no a priori sequence data.[28] These conventional tests can be combined and run together, or in series, to detect a wider range of pathogens, but this approach can be costly, inefficient and time consuming.[29,30]

Metagenomics strategies to virus discovery typically employ an algorithmic approach (Figure 1). Singleplex PCR assays run in parallel, or multiplex PCR assays, can be used to immediately screen for the most common viral pathogens associated with the infectious disease being studied.[31,32] The resulting PCR products from the various pathogens are resolved using gel electrophoresis, differentially labeled probes or microarrays to identify the virus present in the sample.[33–35] Samples that test negative for the common viruses are then tested by a broad-range virus detection assay, such as a pan-viral microarray, which is capable of detecting viruses that are formally represented on the microarray, as well as novel variants of these viruses.[36–38] Any viruses detected by these methods can then be fully sequenced to detect potential variants of known viral species.[39] If viruses are not detected by PCR or microarray, samples are then subjected to high-throughput sequencing to detect novel viruses that do not have significant sequence homology to any known viruses.[40,41] Prevalence and association studies can be performed to link novel viruses identified by microarray or high-throughput sequencing with disease.[42] Based on the current costs for PCR, microarray analysis and high-throughput sequencing, this algorithmic approach to virus detection and discovery allows for the systematic detection of a potentially unlimited range of viruses in the most cost-effective, yet comprehensive, manner. As the relative costs of these tests change and new technologies are made available, the algorithm will naturally evolve to reflect these changes.

Figure 1.

Algorithmic approach to novel virus discovery.
Clinical specimens that are suspected to contain viruses, but which test negative for pathogens by conventional microbiology tests, are screened for common viruses through PCR. If common viruses are present, these are sequenced to determine whether they are novel variants and to facilitate molecular epidemiology investigations. Specimens which are negative for common viruses are hybridized to a pan-viral microarray to detect a broader range of known and unknown viruses. If viruses cannot be found through microarray techniques, then the sample is subjected to high-throughput sequencing to search for viral sequences. Any viral signatures detected by microarray or high-throughput sequencing will lead to attempts to recover the entire virus genome sequence in order to better characterize the novel virus.