With the change of President, it seems that we no longer live in a country where torture is tolerated... outside of statistics departments, that is. Medical researchers routinely ask statisticians to torture a data set until it confesses.
Here is an example. I was approached by a researcher who believed that high levels of a protein increased the chance that a breast cancer patient would progress to metastatic disease. I had my team analyze the data and we found that, if anything, the protein was associated with better outcome. The researcher criticized our results, saying that we had included the wrong years. Now the obvious response would have been, "hey, you gave us the data set," but the researcher was a big shot, and I am not uninterested in the prospect of promotion, so I smiled and said fine, and we ran the analyses again, excluding some early patients. I think the hazard ratio changed from something like 0.41 to 0.42.
That was nearly a year ago. I only just heard last week that the researcher had finally submitted the paper for publication. Here are some of the analyses we were asked to run in the meantime:
Change the endpoint from metastases to cancer-specific death;
Report the main results in terms of 5-year survival rather than 10-year survival;
Adjust the results using risk groups (high, intermediate, low) rather than individual risk factors (stage, tumor size, and nodes);
Run the analyses separately within each risk group, reporting 3 separate sets of results for high-, intermediate-, and low-risk patients;
As (4) but for cancer-specific death rather than metastasis;
Run the analyses separately within each risk group, for both cancer-specific death and metastasis, but without adjustment for other risk factors;
Take out hormone receptor-negative patients;
Report a table with the hazard ratio for stage, tumor size, and nodes;
Analyze for differences in adjuvant therapy; and
Probably some other stuff that I have forgotten, but by now I am so depressed living through it again that I don't even want to look at our 20-page project file detailing every new analysis that was requested.
The problem with all this is that it is just plain bad science. (Ok, it is also annoying and a waste of my time, but let's call that secondary for now.) A general rule in science is: the more questions you ask, the more likely you are to get a silly answer to at least one of them. If I flip coins every day, and I look at my results over the course of a year, I probably won't find that I throw more heads than expected by chance. If, on the other hand, I start analyzing my results by time of day, date, and weather, it wouldn't be surprising if I found that, say, I threw more heads than expected (P = .002) on wet Wednesday mornings in October. Richard Peto, the famous epidemiologist, made a similar point when he published an analysis showing that patients born under Libra or Gemini don't benefit from aspirin after a heart attack.
A more practical "general rule of science" is: the more analyses you ask for, the more back and forth you have with the statistician, and the more likely an error is to slip in. I sneaked an example of this into my article on the sample size samba. The investigators kept asking the statistician for more and more sample size calculations, and there was eventually a typo: a trial with 90% power to detect a difference between groups of 0.5, with a standard deviation of 2, requires a sample size of 674, not 774.
Experts on torture have pointed out that the information you get from inflicting pain doesn't tend to be particularly reliable. This is a message that medical researchers need to take on board as well. If you disagree, then drop me a note, and I'll send you some astrology charts to use when treating patients after heart attack.
Medscape Business of Medicine © 2009 Medscape
Cite this: Waterboarding and Wilcoxon: What Medical Researchers Might Learn About Statistics From the CIA - Medscape - Feb 18, 2009.
Comments