Shoot First and Ask Questions Later: How to Approach Statistics Like a Real Clinician

Andrew J. Vickers, PhD


July 26, 2006

It was just before an early morning meeting, and I was really trying to get to the bagels, but I couldn't help overhearing a conversation between one of my statistical colleagues and a surgeon.

Statistician: "Oh, so you have already calculated the P value?"

Surgeon: "Yes, I used multinomial logistic regression."

Statistician: "Really? How did you come up with that?"

Surgeon: "Well, I tried each analysis on the SPSS drop-down menus, and that was the one that gave the smallest P value".

That comment deserves top marks for honesty -- which is more than I can say for many of the presentations I saw at a recent conference. In a typical study, a clinical investigator loaded data on a group of cancer patients into SPSS (a basic software package used for statistical analysis) and ran what is known as a multivariable Cox proportional hazards model. The investigator then read down a list of variables and concluded that each either did (P < .05) or did not (P ≥ .05) predict survival. Calculating a Cox model involves some very complicated mathematics and is impractical without a computer. But computers are exactly the problem.

In my favorite picture of R.A. Fisher, one of the founders of modern statistics, he is seated at a desk operating a mechanical counting device. Conducting a complex statistical analysis on such a machine is extremely time-consuming: Anyone who, like Fisher, had to depend on mechanical calculators would have had to think extremely hard about the analysis he or she wanted to conduct before starting.

With modern computing, it is possible to conduct an analysis with a minimum of time or brain power -- you just select something from a drop-down menu. The inevitable result of this ease of calculation is a proliferation of analyses that have not been sufficiently thought through.

Here is a simple example: In one of the studies presented at the conference, a surgeon had created a multivariable regression model to predict tumor recurrence in terms of stage, grade, a tumor biomarker, and obesity. The P value for obesity was less than .05, and the presenter concluded that "obesity may have some effect on survival." This isn't particularly illuminating.

How I would have approached the problem is as follows: The biomarker is measured in ng/mL, that is, a weight divided by a volume. Two patients with similar tumors, 1 obese and 1 nonobese, will have a similar weight of the biomarker, but it seems likely that the biomarker is distributed in a greater volume of body tissue in the obese patient. Accordingly, the biomarker value in ng/mL will be lower. So if our 2 patients have similar levels of the biomarker, it is reasonable to suppose that the obese patient has a larger tumor.

Now, this is pretty much what the multivariable regression model asks: If 2 patients are identical in all other respects, what is the impact of obesity on outcome? To test whether the apparent increased risk of obesity merely results from "dilution" of the biomarker in a larger volume of body tissue, I would add what is known as an "interaction term."

If the P value for obesity was significant in a model including the interaction term, we could conclude that there is something about the biology or behavior of obese individuals that increases the risk of recurrence; if the interaction term was significant we would conclude that the implications of having a certain level of a biomarker level differ between obese and nonobese patients.

My approach here was to think about biology, turn it into math, and then think how to apply the results of the math back to biology again. It is, of course, easier just to shove everything into SPSS and interpret the resulting P values as "yes" and "no." And if this is how you want to approach statistics, you'll have plenty of company. But, please, keep it to yourself, and don't block the bagels.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.