Epidemiology: Separating the Wheat From the Chaff

Christopher Labos, MD, CM, MSc, FRCPC


August 14, 2018

"There are only a handful of ways to do a study properly, but one thousand ways to do it wrong."[1]

Christopher Labos, cardiologist and epidemiologist

If you watch the news today, you know that everything you eat causes cancer. Fortuitously, many of these very same foods also prevent cancer, and it turns out that pretty much every food has been associated with a cancer claim at some point.[2]

And it's not just cancer. Coffee can both decrease[3] and increase[4] your risk for cardiovascular disease. Alcohol may[5] or may not[6] be good for you, depending on what day it is. And although we are uncertain about the actual medical benefits of chocolate, at least it will help you win a Nobel Prize.[7]

Look hard enough, and you can find a study to support any idea you want. Granted, this is cherry-picking the data, but since cherries are heart-healthy, you might as well go ahead.[8,9] The problem, of course, is that a lot of these studies are nonrandomized observational analyses that are plagued by confounding, selection bias, and myriad other issues—with the result that most of this medical literature is wrong.

Not All Epidemiology Is Bad

But although there are a lot of subpar epidemiology studies, there are some truly remarkable ones that have changed medicine. The pivotal British Doctors Study from Doll and Hill[10] linking smoking to lung cancer was in fact an observational cohort. The Framingham Heart Study[11] demonstrated that smoking, cholesterol, hypertension, and physical inactivity were associated with heart disease and pretty much defined what we call the "traditional" cardiac risk factors. The Nurses' Health Study, the Coronary Artery Risk Development in Young Adults (CARDIA) study, and the Atherosclerosis Risk in Communities Study (ARIC) have all been invaluable in the research of cardiovascular disease, because some issues simply cannot be studied in the setting of randomized controlled trials (RCTs).

For example, one could not, ethically speaking, randomly assign healthy volunteers to cigarettes versus placebo (at least I hope no ethics committee would allow such a thing). Similarly, most risk factors cannot be studied in a RCT setting because one cannot be "randomized" to having high blood pressure. RCTs of therapies and medications can test effectiveness but are often not powered to detect rare side effects. The cardiovascular harms seen with such medications as rosiglitazone demonstrate that research into drug safety cannot end once the randomized trials are published.[12]

For situations where randomized trials are simply not possible and we have to rely on classic epidemiology to answer our research questions, it is imperative that we differentiate the "good" studies from the ones that champion oranges and grapefruits as a way to prevent stroke.[13]

Trust, but Verify

In sorting the wheat from the chaff, the first question to ask is how the researchers measured whatever it is they studied, which can be surprisingly challenging to do accurately. Were the risk factors measured or self-reported? Asking someone whether they smoke rather than measuring cotinine levels in the blood can yield very different results because of social desirability bias.[14] Social desirability bias occurs when survey-takers answer questions in a way that will be seen as more favorable or beneficial; underestimating their weight, overestimating their height, or reporting healthier diets than in actuality.[15] In fact, social desirability bias occurs to such a predictable degree that correction factors have been established to adjust self-reported measures of height and weight.[16]

I want you, the reader, to think right now how often you ate eggs in the past year.

Another issue is poor recall. Most food studies use food questionnaires to assess what people ate. In a recent study from China examining whether eggs had cardiovascular benefits,[17] participants were administered a single questionnaire and asked to remember their average egg consumption over the past year. To understand the difficulty in this proposition, I want you, the reader, to think right now how often you ate eggs in the past year. Other studies have required greater memory skills, such as asking participants to estimate their berry consumption over the prior 4 years.[8]

Notwithstanding the difficulty of trying to remember what you had for lunch in 2014, when food questionnaires are validated against more rigorous methods, they often perform poorly.[18] In the aforementioned Chinese egg study, when the one-time questionnaire was compared with a small subset of participants who got multiple questionnaires over the year, the correlation coefficient was an uninspiring 0.58, suggesting poor correlation.

Finally, food questionnaires must, by necessity, be fairly crude measures of our food intake, given how complex meal preparation actually is. Consider a recent study linking potatoes with hypertension.[19] In this study, "potatoes" included both french fries and potato chips, as well as baked, boiled, and mashed potatoes, which are obviously prepared in very different ways. One can easily imagine that frying potatoes or preparing them with cream and butter will yield very different results than if one ate plain boiled potatoes. And yet teasing out these different methods of food preparation is not easily done in these types of analyses.

Adjustment Can Only Do so Much

The major limitation of observational research, compared with randomized trials, is that the baseline differences between groups results in confounding and creates associations where none exist. For example, observational research suggested that vitamin C was cardioprotective,[20] but a subsequent randomized trial demonstrated that it was not.[21] This probably occurred because the people who took vitamin C were healthier on average than those who did not. If we closely examine the Chinese egg study,[17] which demonstrated a protective effect of daily egg consumption, we find that persons who did not eat eggs were older, smoked more, had more hypertension, and were less affluent. All of these factors could easily have contributed to the increased cardiovascular risk of the egg abstainers.

It is possible to adjust for these confounding variables statistically; however, it is important to remember that statistical adjustment cannot completely erase these between-group differences. First, you can only adjust for differences you can measure, and unmeasured confounders are always a potential problem. For example, you cannot adjust for socioeconomic status if you do not have data on subjects' income.

Second, successful statistical adjustment assumes that the baseline variables were recorded accurately. Even adjusting for something as simple as smoking can prove difficult. If you treat smoking as a binary yes/no variable, then what do you do with ex-smokers? Also, how do you differentiate between 2-pack-a-day smokers and someone who smokes only occasionally? Treating smoking as a simple yes/no variable loses some of this granular detail.

As a general rule, studies with access to more data and more detailed patient information can more thoroughly adjust for baseline differences. Also, studies that directly measure patient variables, rather than relying on self-report, probably have more accurate information to work with. When evaluating how much credence to lend a new study, how effectively the baseline differences between groups were addressed is a major factor to bear in mind.

Groundbreaking Versus Outlier

One final point to keep in mind when evaluating a study is whether the result is an isolated observation or has been seen in other patient populations as well. A new study that contradicts the nine previous studies on the topic is not groundbreaking; it is an outlier. But even within the same manuscript, a few elements can help the reader decide whether the result is a truly causative rather than a random association.

Results replicated in multiple cohorts carry much greater weight than results from a single cohort of patients, especially if these cohorts reflect different populations. In the field of genetics, given that researchers often test thousands of candidate genes, the potential for false-positives is high, and it has become standard to require that all positive results be replicated in an independent cohort. Unfortunately, in most other branches of epidemiology, replication is not yet standard, and a surprising number of research findings, even those from randomized trials, are not subsequently replicated.[22]

Absent replication, it is informative to look at the results in each group when multiple cohorts are used. For example, in the study linking potatoes to hypertension,[19] researchers used data from three different cohorts: the Nurses' Health Study, the Nurses' Health Study II, and the Health Professionals Follow-up Study. Although the association was significant overall, it was not consistently positive across all three cohorts. Further complicating the issue is that the results from two Swedish cohorts found no association between potato consumption and cardiovascular disease.[23] To be fair, sometimes subtle effects might not be seen in smaller cohorts and only become manifest when results are meta-analyzed together. However, a result seen consistently in several different patient populations should carry more weight than a result that has not been replicated.


In today's world, clinicians and scientists face a deluge of scientific publications, many of which will not stand the test of time. There are, however, a few key indices that can differentiate poor studies from better-quality ones.

The problem with food questionnaires means that much of the body of evidence surrounding food research is questionable. Statistical adjustment can help attenuate some the problems of confounding that occur in nonrandomized studies, but that depends on researchers having access to good-quality individual level data for factors they want to adjust for. Replication can help decrease the problem of false-positives. Of course, that requires the collaboration and sharing of data between research groups.

Poor studies distract clinicians, confuse the public, and erode confidence in the scientific methods. If we persist in performing mediocre research, we eventually may end doing something insane like taking coffee, which is either neutral or slightly protective in terms of cancer risk,[24] and declaring it a carcinogen.

For more Cardiology, follow us on Twitter @theheartorg


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.
Post as: