GWAS vs Whole-Exome Sequencing: What's the Difference and Why We Should Care

Jacqueline K. Beals, PhD


September 21, 2010

The first genome-wide association study (GWAS) came on the scene in March 2005.[1] Over the next 3+ years, GWAS was used to identify associations between more than 150 genetic loci related to more than 60 traits and diseases -- a stunning increase in knowledge -- and more GWAS and data collection are under way. However, if you keep up with the literature on disease-gene associations, you may have noticed the advent of next-generation procedures -- specifically whole-exome sequencing -- as well as a subtle rivalry between researchers who use these differing approaches.

Why should you care? Simply put, the approaches used by GWAS and whole-exome sequencing look at genetic analysis from very different perspectives -- which, in turn, might influence how we look at the disease-gene associations that each one yields.

Guilt by Association?

While working on an article for Medscape Medical News,[2] I corresponded with the authors of a study who explained to me that GWAS was "designed to identify common variants...contributing to common disease." These common variants -- ie, those that account for more than 5% of the allele frequency -- are usually located in intronic regions of the chromosome, which don't code for specific functional proteins. Instead, to claim an association between the gene and a disease, these single-nucleotide polymorphisms (SNPs) either have to lie hundreds of kilobases (kb) up- or downstream from a known causative gene, or the SNPs have to be in linkage disequilibrium with other SNPs of potential interest, meaning that they would have to be associated more frequently than would be expected randomly.

Thus, a SNP's involvement is inferred due to its being in a place or associated with a gene known to have an effect on the disease or pathology of interest, much as a person found in a given place and hanging out with the right (or "wrong") crowd may become a "person of interest" in a criminal investigation. Whether the SNP is involved at all, or is a victim of "guilt by association," then becomes an issue that requires considerable probing.

By contrast, whole-exome sequencing looks at exonic regions of the chromosome, which code for functional proteins. This is no longer a matter of keeping bad company, but evidence of how the "crime" was committed, ie, exactly how the protein was rendered nonfunctional and why a patient might be susceptible to or actually have the disease in question.

Not surprisingly, this approach is more likely to identify mutations -- albeit often rare -- that have a greater impact on disease: "...80% to 90% of all [inherited] disease-causing mutations are located within protein coding regions," the authors told me. "So just by looking at 1% of the genome [using whole-exome sequencing] can identify almost 90% of disease-causing mutations."

Using this logic, whole-exome sequencing would seem to be a better way to identify true disease-gene associations, and result in a savings of both time and money.

Does that necessarily mean that we should toss out GWAS?

My Big Fat European GWAS

I duly noted the authors' use of next-generation sequencing technologies, and gave it no further thought until a pair of papers published in the August 5 issue of Nature[3,4] crossed my desk reporting on GWAS data obtained from more than 100,000 individuals of European descent to locate genetic variants associated with levels of 4 blood lipids: total cholesterol; low-density lipoprotein (LDL) cholesterol; high-density lipoprotein (HDL) cholesterol; and triglycerides.

The first paper[3] reported on collaborative research by institutions throughout the world that identified 95 genetic loci associated with the levels of at least 1 of these 4 blood lipids, some of which were thought to play a role in regulating blood lipid levels.

It was their closing comment that held my attention:

It has recently been suggested that conducting genetic studies with increasingly larger cohorts [with GWAS] will be relatively uninformative for the biology of complex human disease....As the reasoning goes, analysis of a few thousand individuals will uncover the common variants with the strongest effect on phenotype. Larger studies will suffer from a plateau phenomenon in which either no additional common variants will be found or any common variants that are identified will have too small an effect to be of biological interest. Our study provides strong empirical evidence against this assertion [italics mine].

Why were these researchers so convinced that their study demonstrated the worth of GWAS?

They started with the premise that 15 of the 18 genes already associated with inherited lipid disorders lie within 100 kb of one of the leading SNPs, which, in GWAS parlance, proves an association. They then ran 1 million simulations testing the likelihood that these genes and the lipid disorder SNPs that they identified overlapped by chance; most simulations produced no overlapping loci, and the largest number obtained was 8. Thus, they demonstrated that the presence of 15 lipid disorder genes in the vicinity of SNPs associated with blood lipid levels was not coincidental.

Next, they set out to "prove" this association by demonstrating in mice that increasing or decreasing expression of a few of these genes dramatically affected blood lipids. For example, overexpression of the Galnt2 gene lowered plasma HDL cholesterol by 24% after 4 weeks, whereas knocking out 95% of Galnt2 activity raised HDL cholesterol levels by 71% over the same time period.

But Wait, There's More

The second paper[4] delved into the effects of SORT1, the gene in the prior study that demonstrated the strongest association with blood levels of LDL cholesterol and that was strongly associated in other GWAS with coronary artery disease and myocardial infarction.

As in the preceding paper, the effects of SORT1 were demonstrated by overexpression and knockdown studies in mice: Overexpression of SORT1 resulted in 70% reduction of total plasma cholesterol after 2 weeks, and 46% reduction at 6 weeks. Observations in humans also noted an association between higher expression of SORT1 in the liver and lower levels of LDL cholesterol. This target had never been considered for therapeutic intervention in patients at risk for heart disease, but the striking findings in this study may point the way to a new area of research for lipid management.

Will GWAS soon be relegated to the past, sidelined by next-generation sequencing techniques? As genetic research and technology improve, someday that may be inevitable. However, by demonstrating the biological relevance of their findings in mice, these investigators moved GWAS from the realm of "guilt by association" to a direct functional validation of the effects of specific genes on blood lipids.

With heart disease remaining the leading cause of mortality in developed countries, any technique that can point the way to new therapeutic targets, such as SORT1, is far from obsolete. As these investigators demonstrated, when bolstered by functional studies, GWAS shows every sign of retaining its strong foothold in the world of genomic research.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.
Post as: