A New Look at an Old Study: How Do We Stop Data Spinning?

Nassir Ghaemi, MD, MPH


October 20, 2015


This article has been updated from an earlier version

In 2001, a pharmaceutical industry-conducted trial[1] published in the most prestigious American journal of child psychiatry reported that paroxetine was more effective than placebo in treating major depression in 275 adolescents. The study was conducted by the drug manufacturer in typical fashion: dozens of clinical sites were used throughout the country to recruit patients, the data were analyzed in-house by statistical employees of the company, and academic leaders in the field reviewed and revised the paper and became its authors, the first being the chairman of the department of psychiatry at Brown University. The study had been conducted in order to obtain US Food and Drug Administration (FDA) approval for extension of the indication of paroxetine for major depressive disorder from adults to adolescents.

In over a decade, the publication has been cited 576 times, making it a highly cited paper, which is not unusual for any large, randomized clinical trial. Because of the large sample and the use of the most valid research design (randomization), such studies tend to get published in the most widely read scientific journals, which leads to frequent citation. This effect translates into more impact on clinical practice.

Now, 14 years later, in the context of a class-action lawsuit, the same database used for that publication has been reanalyzed by independent researchers in the British Medical Journal (BMJ).[2] In this reanalysis, the same results are reported but with opposite interpretations. Paroxetine was found to be ineffective.

Why the difference?

Keep in mind that the data were the same and the results were the same. The distinction was a matter of interpretation, or how the results were presented. The original paper was written by the pharmaceutical company. All companies are designed to make profits. Naturally, they presented their results in as positive a light as possible.

The reanalysis brings to light, for the first time in detail, exactly how the nuances of positive interpretation were used to present an overall impression that was misleading.

Here is how the original 2001 paper was presented in its abstract:

"RESULTS: Paroxetine demonstrated significantly greater improvement compared with placebo in HAM-D total score < or = 8, HAM-D depressed mood item, K-SADS-L depressed mood item, and CGI score of 1 or 2. The response to imipramine was not significantly different from placebo for any measure. Neither paroxetine nor imipramine differed significantly from placebo on parent- or self-rating measures. Withdrawal rates for adverse effects were 9.7% and 6.9% for paroxetine and placebo, respectively. Of 31.5% of subjects stopping imipramine therapy because of adverse effects, nearly one third did so because of adverse cardiovascular effects.

CONCLUSIONS: Paroxetine is generally well tolerated and effective for major depression in adolescents."

Notice that the results section begins with a positive result: Paroxetine demonstrated improvement in four outcomes (HAMD total score <8, HAM-D mood item, K-SADS depressed mood item, and CGI score 1 or 2). What it does not say is that the primary statistical analysis originally planned for the study was not about the four items above. It was an overall analysis (ANOVA) comparing paroxetine versus placebo on all time points of the 8-week trials. A small benefit was seen (less than 2 points overall on the depression rating scale) and was statistically nonsignificant. The four items described above were "post-hoc" analyses, done after the fact, as part of analyses of probably dozens of comparisons of outcomes.

It is statistically common to have a few "positive" outcomes out of dozens of analyses by chance. That's the whole meaning of "statistical significance," which is defined as a "P value" of .05 or less. This means that if statistical analyses are made of 20 outcomes, one will occur by chance (1/20 = .05). In the original paper, the authors did not report that these results were post-hoc nor did they report the denominator of how many post-hoc analyses were conducted to produce these few positive outcomes. Nor did they report that the major a priori analysis of the main study outcome was not statistically significant. And in the conclusion of the abstract, there is no intimation that the medication was anything but effective.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.