Proteomics and Metabolomics in Inflammatory Bowel Disease

Yunki Yau; Rupert W Leong; Ming Zeng; Valerie C Wasinger


J Gastroenterol Hepatol. 2013;28(7):1076-1086. 

In This Article

Defining the Field: Proteomics and Metabolomics

Determining the domains of proteomics and metabolomics may seem trivial to physicians facing practical situations on a day-to-day basis. Yet the question of how the omics can progress clinical medicine lies in this crux. In defining proteomics and metabolomics, we ask ourselves—how do we understand biological process? What is the difference between systems biology and traditional physiology?[12,102]

As a starting point, the suffix of -omics refers to the totality of a given system, and the effort to profile it.[103] Differentiating and defining proteomics and metabolomics, therefore, is grounded in what we constitute as the total protein content, and total metabolite content, in a given cell.

The proteome, first described as "the total protein complement able to be encoded by a given genome,"[104] has undergone stark revision to include "the set of all protein isoforms and modifications, the interactions between them, (and) the structural description of proteins and their higher order complexes."[16,105,106] Because the human genome project accounted the number of unique human genes to be in the vicinity of 35 000, scientists have been able to deduce that there are possibly 10 million unique protein species that are the result of, and subject to, approximately 200 different PTMs.[107,108] Modifications to proteins can be multiple, transient, and are responsible for their specific biological function.[109] They are also low in abundance and difficult to detect.[109] The pure arithmetic and subtlety of protein transformations and interactions has led to many semiofficial subcategories of proteomics that constrict their profiling domains to specific molecule classes, cellular functions, or PTMs, etc.[110–112] Simply put, the scale of the proteome is staggering.

The constitution of the metabolome meanwhile continues to be deliberated. From an original study of glucose processing in E. coli under specific growth rates, the "metabolome" could be defined as "total complement of metabolites in a cell," with an emphasis on representing the global metabolic processes of a cell by low-molecular weight compounds.[113] This definition was further expounded by successive metabolome studies, perpetuating a correlation between small molecule size (a metabolite weighs less than 15 kilodaltons[114]), with biochemical finality ("metabolites serve as direct signatures of biochemical activity"[21]) because a small molecule must be the result of many enzymatic processes from a gene-transcribed protein origin. While this may be true, it is inadequate in describing the important functional roles of metabolites in biology. The specific physiological functions of metabolites are appreciated and scrutinized in detail in metabolomic studies,[21,115] but the widely accepted definition of the metabolome, "the comprehensive study of naturally occurring small molecules,"[116] or some variation thereof,[21] does not generally take this into account. Metabolites may be end points of metabolism but they are not end points of physiological process, acting as catalysts, signaling molecules, and nutrients, among other roles.[115,117]

The metabolome can perhaps be more comprehensively described as the study of the complete expression and biological function of molecules less than 25 kilodaltons within a given cell. This higher molecular mass inclusion takes into account the capability of NMR spectroscopy and MS in reliably resolving higher mass molecules in metabolomic studies than what some would consider a metabolite.[118] Molecule size runs along a continuum, and those not chemically or proportionally considered a metabolite or a protein may be important. This aspect is further discussed in this review.

Fundamentally, reevaluations of what constitutes the proteome and metabolome illustrate the immense complexity of human biology and how "omics" have allowed us to understand this in a more complete way. The next challenge is in maneuvering this new expanse of information to unravel the mechanisms behind complex diseases such as the IBDs and build functional tools for their treatment and management. Some of the principles behind the novel ways in which the proteomic and metabolomic toolboxes are being used to these effects are discussed.

Too Big to be a Metabolite, Too Small to be a Protein

The search for disease specific biomarkers in easily accessible mediums such as blood serum, plasma, and urine have led investigators to search for novel low-mass, low-abundance species that may often be overshadowed by several dominant highly abundant proteins.[119] Exploring biological fluid for these candidates in global proteomic studies require extensive sample fractionation to isolate the low-mass portion, followed by enrichment of this portion to detect lowly abundant proteins.[119–122]

A standardized global low-mass, low-abundance proteomic experiment and global metabolomics experiment is comparable (Figs 2,3). Standard methods of analyte precipitation and mass fractionation can be used to isolate the molecules of interest (i.e. immunoaffinity chromatography columns, electrophoresis, or ultrafiltration), and samples are injected into an LC-MS (or MS/MS) system with or without enzymatic digestion.[21,96,120,123] Enzymatic digestion is often not employed in low-mass proteomics analysis with a rationale that disconnect between peptide and in vivo protein convolutes later stage identification;[124] however, without protein cleavage, free in vivo small peptides can escape detection as they do not ionize well in their endogenous state. To maximize small peptide discovery, it is advisable to enzymatically digest the sample but to treat the ensuing MS data as undigested in subsequent compound identification analysis (as databases may contain entries for peptides that were able to be detected in previous nondigested experiments).

Figure 2.

An untargeted peptide/low-mass protein proteomics workflow. Sample collection: The biological medium typically selected for biomarker discovery includes blood serum, plasma, urine, and feces because of the noninvasive nature of their collection. Sample prefractionation: The sample is typically fractionated and proteins and peptides precipitated by methods such as centrifugal ultrafiltration, electrophoresis, or affinity chromatography.120, 123 Enzymatic digestion: Enzymatic digestion of proteins should be performed with a protease such as trypsin to aid detection in "peptidomics" discovery. Liquid chromatography: Peptides and proteins are separated by chromatography before injection into a tandem mass spectrometry (MS/MS) system. Tandem MS: Tandem MS systems are organized into three "quadrupoles." In the first (Q1), the peptide m/z is detected. In the second (Q2—collision cell), the peptide is fragmented by collision-induced dissociation, and in the third (Q3), the sum parts of the fragmented peptide (daughter ions) are detected. Protein/peptide identification: Molecular sequences are calculated from MS/MS spectra using theoretical physicochemistry modeling and matched with known MS/MS information of catalogued proteins and peptides. Further validation experimentation: To confirm the abundance profile and in vivo nature of the protein/peptide biomarker candidate, techniques such as Western blotting and selected reaction monitoring (SRM)/multiple reaction monitoring (MRM) MS may be performed.

Figure 3.

A prototypical untargeted metabolomics workflow. Sample collection: The biological medium selected for metabolite biomarker discovery includes blood serum, plasma, urine, and feces. Precipitation of metabolites: The sample is usually homogenized and metabolites are precipitated by use of a various combination of centrifugation, temperature settings, and organic/aqueous solvents.21, 116 The precise method will depend on the investigators interest in hydrophilic or hydrophobic species. gas chromatography (GC)/liquid chromatography (LC): Metabolomics experiments are usually conducted with relative short chromatography times.21 Mass spectrometry: MS scan range is in the vicinity of 80–1000 m/z.96, 116 Metabolite identification: MS peaks of interest are identified by accurate mass matching (± 5 parts per million). Further experimentation: Database matching by mass is putative and validation can be performed by further experiments matching the tandem mass spectrometry (MS/MS) spectra of the peak of interest and a standard compound.21

A typical MS (parent ion) scan for small proteins and peptides may be set at a range of approximately 350–1800 m/z, with molecules detected in multiple charge states depending on the sample type and MS instrument. A global metabolomics study meanwhile has a scan range commonly between 35 and 1000 m/z, with an expectation of singularly charged molecules.

A second analyzer can be used for further fragmentation and characterization, with the resulting mass spectrum (product ions) being representative of a peptide/small protein's sequence and structure. Protein/peptide sequence and structural information is attributed to the experimentally observed MS/MS spectra by mathematical physicochemistry modeling, and identification is made by matching the experimental MS data with catalogued protein/peptide MS and sequence information. This allows for specific and accurate compound recognition in a complex biosample. Confidence score of an identification is based on the number of peptides in the sample that are attributable to the hypothetical protein. For global metabolomics, identification is made by accurate m/z measurement (Fig. 3).[21,116]

Peptides/small proteins/metabolites may exist freely or be part of a larger protein or complex in media such as the blood circulation and have specific functions as hormones, neurotransmitters, cytokines, etc. based on this circumstance (that may be transitory).[14,105,115,119] The nature and context of a molecule's presence in a sample is simply untested in these global studies and further complimentary techniques, such as immunoblotting, cell culture, multiple reaction monitoring, or MS/MS using a synthetic model compound, are needed to elucidate the physiological framework of the molecule of interest.

"Peptidomics" as such—the analysis of small protein and peptide-level compounds in complex biological samples—is not necessarily functionally different from proteomic workflows, but a distinct application of high accuracy and sensitivity MS power to focus the search of novel biomarker candidates to a ground at the junction of proteomics and metabolomics where classical proteomic technologies cannot easily decipher, and may more likely yield new and meaningful compounds.

Combining Proteomics and Metabolomics

The IBDs are recognized as phenotypic manifestations of a combination of genetic and environmental factors,[125] and yet the complexity of the pathogeneses of these diseases are not always considered in the search for clinical biomarkers.

While the initial genetic circumstance of disease is intensively described in genome-wide association studies, the environmental contributing factors of disease (exposure elements) are largely ignored.[117] The elements of exposure to which an individual is subject ranges from diet,[126] pathogens,[127] psychosocial stress,[128] drugs,[129] pollution,[130] and more. This interface, or "space," has previously been functionally described as the combination of proteomics and metabolomics—the summation of which forms the set of biologically active chemicals in an individual from endogenous and exogenous processes.[117,131] Given the moniker "exposome"—the entirety of all environmental exposures received by an individual during life—Rappaport et al. contends that the proteome and metabolome consists of both causal and reactive pathways, and examining these two fields in tandem could reflect the interplay between genetic and environmental factors, though the authors attest that this would be far too complex as straight exploratory studies without qualification[117] (Fig. 4).

Figure 4.

Simplified concept of the interplay between genetic susceptibility and environmental exposure to inflammatory bowel disease (IBD) pathogenesis. A black circle represents an individual with IBD susceptibility gene(s) while a blue circle is an individual without. An individual with a combination of IBD susceptibility genes (e.g. gene A and B) may develop disease with only limited exposure to IBD risk factors (e.g. smoking for Crohn's disease [CD]), while those with select susceptibility genes (e.g. only A) may develop disease when exposed to certain risk element(s). Still, there may be those with IBD risk gene(s) (e.g. only B), but because they were not subject to a particular risk element (or combination thereof), do not develop disease. The possible pathogenic combinations between gene and exposure may be immense.

Cross-"omics" principles are beginning to take shape in IBD research,[90] and across other medical fields.[132,133] The "exposome" conjecture is very much a commentary on utilizing composite "omics" methodologies to consider elements of both disease and exposure in exploring chronic multifaceted conditions such as the IBDs. In deciphering high-volume data, care must be taken in considering whether a characteristically abundant entity is associated with disease causation, or manifests as a result.[115]