Insights into Antibiotic Resistance Through Metagenomic Approaches

Robert Schmieder; Robert Edwards


Future Microbiol. 2012;7(1):73-89. 

In This Article

Culture-independent Study of Resistance Through Metagenomics

In 1985, Pace et al. were the first to propose the direct cloning of environmental DNA to classify unculturable microorganisms.[26] The first successful function-driven screening of metagenomic libraries, termed zoolibraries by the authors, was conducted in 1995.[27] The term 'metagenomics' was coined in 1998 by Handelsman et al., referring to the function-based analysis of mixed environmental DNA species.[21] Initially, metagenomics was used mainly to recover novel biomolecules (especially DNA) from environmental microbial assemblages. The development of next-generation sequencing techniques led to an alternative approach where a fraction of the DNA in the sample was sequenced en masse, without regard to cloning. This approach was sometimes called random community genomics, which also became known as metagenomics. Metagenomic sequencing represented a powerful alternative to rRNA sequencing for analyzing complex microbial communities[20,28] and has a tremendous impact on the study of microbial diversity in environmental and clinical samples. Nowadays, the field of metagenomics can roughly be divided into two different approaches: functional metagenomics and sequence-based metagenomics (Figure 3).

Figure 3.

Metagenomic analysis of antibiotic resistance in microbial communities.

Functional Metagenomics

Functional metagenomics involves cloning and heterologous expression of environmental DNA in a surrogate host with coupled activity-based screening to discover functions of genes that might not be obvious from their sequence. By creating a functional metagenomic library in which cloned genomic fragments are expressed and selecting directly for resistance to antibiotics, traditional challenges associated with studying genes of unknown sequence were circumvented. The metagenomic analysis revealed novel antibiotic resistance proteins that were previously of unknown function and unrecognizable by sequence alone. Functional metagenomics has been used to identify genes encoding proteins that inactivate antibiotics, genes encoding multidrug efflux pumps and genes conferring resistance to the folate antagonist trimethoprim. Functional metagenomics has additionally been applied to cultured isolates of a community.[29]

Sequence-based Metagenomics

Sequence-based metagenomics involves extracting and random sequencing of DNA directly from the environment, including the DNA of uncultured bacteria. Typically, the eukaryotic cells, bacteria, viruses, and free DNA are separated by size (using filtration or centrifugation), and total DNA extracted from the appropriate fraction. A sample of the DNA is sequenced, and that sample is assumed to be a random fraction of the whole community. The metagenomic sequences are then compared to the known sequences that have been accumulated over the years in national and international databanks (reference sequences) to identify resistance genes and/or mutations that are known to cause resistance (Figure 3 & 4). Using a wide range of reference resistance genes, the resistance potential for multiple antibiotics can be predicted from a single metagenome. The metagenomic sequences additionally represent the diversity of the community, including strains that cannot be cultured, valuable information for the study of community changes as a result of antibiotic treatment.

To date, more than one thousand different metagenomes have been sequenced from a large variety of environments, such as soil, ocean and human gut (Figure 5). In addition, extinct species such as the woolly mammoth[30] and the Neanderthals[31] have been analyzed by sequence-based metagenomic approaches.

Figure 5.

Overview of sequencing cost and number of metagenomic libraries submitted to the international Sequence Read Archive.
The cost of sequencing is based on data provided by the US National Human Genome Research Institute,107 illustrating the more than logarithmic decrease and sudden change when the sequencing centers transitioned to next-generation sequencing technologies. The cost of sequencing is compared to Moore's Law, which describes a long-term trend in the computer hardware industry that involves the doubling of computing power every 2 years (here halving of sequencing cost every 2 years). The stacked bars show the number of metagenomic libraries that were submitted to the Sequence Read Archive (as of 1 July 2011) with a specified study type of 'Metagenomics' and a library strategy of 'WGS' (whole genome shotgun sequencing). The sequencing technologies used to generate the metagenomic libraries are marked as shown in the legend. As of July 2011, only one metagenomic library in this category was generated using the SOLiD system. Major milestones in next-generation sequencing and major submissions (>100 libraries) are highlighted.
HMP: Human Microbiome Project; MetaHIT: Metagenomics of the Human Intestinal Tract; MPSP: Marine Phage Sequencing Project; SRA: Sequence Read Archive.

The volume of sequence data generated in the last few years spawned a new generation of analysis tools (Figure 4) and sequence databases, such as IMG,[32] MG-RAST,[33] CAMERA[34] and the Sequence Read Archive.[35] Unfortunately, there is not a single authoritative source for all of the raw metagenomic sequence data, and the quality and descriptions of the data varies between databases and between datasets.

Figure 4.

A proposed bioinformatics methodology to identify antibiotic resistance from sequence-based metagenomes. The methodology includes preprocessing steps, where a number of freely available tools can be applied, an analytical step where the sample is compared to databases such as the NCBI NR or the ARDB, and post-processing steps to identify the resistance potential. We encourage the reader to consult online resources, such as SEQanswers105 and BioStar,106 for up-to-date information of current programs and databases.
ARDB: Antibiotic resistance database; NR: Nonredundant protein database.