Michael Jordan Won't Accept the Null Hypothesis: Notes on Interpreting High P Values

Andrew J. Vickers, PhD


May 15, 2006

Finally, after many hours of packing and loading, the bags are in the car, the children in their booster seats and the snack bag in easy reach; everyone is buckled in and my hand is on the ignition key. At which point my wife asks: "Where is the camera?" Being a statistician, I instantly convert this question into 2 hypotheses: "The camera is in the car" and "The camera is still in the house." Given that it is easier to pop back inside the house than to unload the car, I decide to test the second hypothesis. A few minutes later, I tell my wife that I have looked inside the house in all the places where we normally keep the camera and couldn't find it. We conclude that "it must be in the car somewhere" and head off on our road trip.

There is something a little odd about this story: we concluded one thing (that the camera was in the car) because we couldn't prove something else (the camera was in the house). But as it happens, this is exactly what we do in statistics. First, we establish what is known as the "null hypothesis": roughly speaking, we establish that nothing interesting is going on. We then run our analyses, obtain our P value and, if P is less than .05 (statistical significance), we reject the null and conclude that we have an interesting phenomenon on our hands. Drug trials are a simple example: here, our null hypothesis is that drug and placebo are equivalent, so if P is less than .05, we say that the drug and placebo differ.

What we do if P is greater than .05 is a little more complicated. The other day I shot baskets with Michael Jordan (remember that I am a statistician and never make things up). He shot 7 straight free throws; I hit 3 and missed 4 and then (being a statistician) rushed to the sideline, grabbed my laptop and calculated a P value of .07 by Fisher's exact test. Now, you wouldn't take this P value to suggest that there is no difference between my basketball skills and those of Michael Jordan, you'd say that our experiment hadn't proved a difference.

Yet, a good number of researchers, physicians, and commentators come to exactly the opposite conclusion when interpreting the results of medical research. Take the recent Women's Health Initiative trial evaluating a low-fat diet for breast cancer. This study, published in the Feburary 8, 2006 issue of JAMA, reported a ~10% reduction in breast cancer risk in women eating a diet low in fat compared with controls.[1] If this is indeed the true difference, low-fat diets could reduce the incidence of breast cancer by many tens of thousands of women each year, an astonishing health benefit for an inexpensive and nontoxic intervention. The P value for the difference in cancer rates was .07, and, here is the key point: This was widely interpreted as indicating that low-fat diets don't work. For example, The New York Times editorial page trumpeted that "low fat diets flub a test" and claimed that the study provided "strong evidence that the war against all fats was mostly in vain."[2] However, failure to prove that a treatment is effective is not the same as proving it ineffective. This is what statisticians call "accepting the null hypothesis" and, unless you accept that a British-born statistician got game with Michael Jordan, it is something you'll want to avoid.

My own view on the low-fat diet trial is that the results were very encouraging, but weren't quite good enough to prove a difference. A prediction: As years go by and more data come in, the difference between groups in the breast cancer trial will reach statistical significance. At this point, there will be a general outcry of "wait a minute, they just said it didn't work" with a consequent increase in cynicism about medical research and nihilism about diet. Which is to say, The New York Times should stick to hoops.