Andrew J. Vickers, PhD


December 08, 2009

Statistical testing is, at least in theory, pretty straightforward:

  • State your null hypothesis;

  • Design a study and get some data;

  • Apply a statistical test to the data to obtain a P value; and

  • If the P value is less than 5%, reject the null hypothesis; if the P value is .05 or more, don't reject the null hypothesis.

Statistics textbooks -- and statisticians for that matter -- focus almost exclusively on the second and third stages: It takes no trouble to look up information in a textbook on, eg, how to design a study, and statisticians are commonly asked to run statistical tests. However, thinking through how to interpret the results of a test (the fourth stage) or how to state your null hypothesis (the first stage) is actually much harder. Paradoxically, neither stage is much written about or discussed. I looked at how to interpret the results of statistical tests in 2 previous articles in this series (To P or Not to P and Michael Jordan Won't Accept the Null Hypothesis: Notes on Interpreting High P Values). Here, I'll discuss the null hypothesis.

We'll use surgical complications as an example. Let's say that an operation is widely known to have a 20% infection rate. It has also been shown consistently in the literature that about 1 in 4 infections are serious, leading to fever or hospitalization. Now assume that a surgeon conducts a study on antibiotic prophylaxis in 200 patients and reports the following results in the Table.

Table. The Effect of Antibody Prophylaxis on Infection Rates

  No Infection Infection
(n = 100)
92 Total, 8
Minor infection, 6
Serious infection, 2
(n = 100)
80 Total, 20
Minor infection, 15
Serious infection, 5

The results of the study are quite clear. The infection rate in controls is as expected (20%), and in both groups, again as expected, 1 in 4 infections is serious. Antibiotic prophylaxis reduces infection rates by 60%, and it has the same effects on serious and minor infections.

You only start running into problems when you apply statistical tests. By Fisher's exact test, the P value for the overall infection rate is .024. This would lead you to reject the null hypothesis that "the infection rate is the same for antibiotics and placebo," and you would consider using antibiotic prophylaxis. However, a skeptical surgeon might argue that minor infections are no big deal and, once they occur, can easily be treated with antibiotics. What is a problem, in this surgeon's view, is serious infection. Comparing serious infections rates by Fisher's exact test gives a P value of .4. The surgeon might, therefore, argue that the experiment has failed to reject the null hypothesis of no difference between groups for serious infections. This might lead to a claim that there was no justification for routine antibiotic prophylaxis.

There are actually 2 possible null hypotheses for the analysis of serious infection. The first, the one tested by the surgeon, is that no difference exists between antibiotic and placebo for serious infections. A quite different null hypothesis is that no difference exists between the drug's effect on minor compared with serious infection. It turns out that both P values are nonsignificant, such that we fail to reject either of 2 incompatible hypotheses (which is somewhat of a problem).

I take the view that treatments don't often work differently for different patients, or for variations on the same endpoint. Indeed, I'd need a pretty good reason to even consider the possibility that an antibiotic would prevent a minor infection, but not a more serious one. As such, I'd test the second of these 2 null hypotheses, find that there was no reason to believe that antibiotics worked differently for minor compared with serious infection, and conclude that giving antibiotics would indeed reduce serious infection rates. However, that's just one statistician's opinion. The key point is that often a choice exists among several, incompatible null hypotheses, and selection of a null hypothesis should be a conscious decision, not a side effect of whatever test happens to be easiest.

If you liked this article, you'll love Andrew Vickers' collection of stories on statistics: "What is a p-value anyway?"