Veracity of Clinical-Trial Publications Comes Under Fire

Patrice Wendling

June 08, 2017

CHICAGO, IL — New research by a UK anesthesiologist with a track record of exposing fraudulent research suggests baseline data distributions reported in dozens of clinical-trial publications are improbable[1].

The data in question are from such well-known trials as PREDIMED, SPARCL, and AASK and were published in the New England Journal of Medicine (NEJM), Journal of the American Medical Association (JAMA), and six anesthesia journals.

"This screening tool raises question about data in some studies, which on full investigation may turn out to involve misinterpretation, statistical error, or plain simple mistakes. However, on the basis of previous studies it is likely that some of the data highlighted in this latest research may have been deliberately falsified. At a minimum, it is clear that the reporting of some randomized, controlled trials may be seriously flawed," author Dr John Carlisle (Torbay Hospital, Torquay, UK) said in a statement[2] released by his own journal Anaesthesia, which published Carlisle's research June 5, 2017 and is among the journals implicated in the study.

JAMA editor in chief Dr Howard Bauchner told heartwire from Medscape by email, "We are aware of the article and the allegations. We receive numerous concerns about various aspects of the articles we publish."

He continued, "We will review the allegation, its validity, and then decide what to do next. However, we never assume that allegations are true, they need to be verified, and we believe authors have a right to respond to any allegation."

NEJM editor in chief Dr Jeffrey Drazen wrote heartwire that "the analysis is complex and multifaceted, and we are still working on fully understanding it. Until then, it is premature to comment."

Carlisle has used the same statistical approach to detect aberrations in baseline data published by Drs Joachim Boldt and Scott Reuben and to help identify fabricated data in trials reported by Dr Yoshitaka Fujii[3], who tops the list for scientific misconduct with 183 retractions.

For the present study, Carlisle cast a wider net, analyzing the distribution of 72,261 means of 29,789 baseline variables among participants in 5087 trials published in the three aforementioned journals as well as Anesthesia and Analgesia, Anesthesiology, British Journal of Anaesthesia, Canadian Journal of Anesthesia, and European Journal of Anaesthesiology. The primary outcome was the distribution of P values, calculated for differences between means, for individual variables, and when combined within trials.

Carlisle's method has been previously detailed but essentially "identifies papers in which the baseline characteristics (eg, age, weight) exhibit either too narrow or too wide a distribution than expected by chance, resulting in an excess of P values close to either one or zero," Drs John Loadsman and Timothy J McCulloch (University of Sydney, Australia) explain in an accompanying editorial[4], provocatively titled: "Widening the search for suspect data—is the flood of retractions about to become a tsunami?"

Based on Carlisle's calculations, the distribution of the 72,261 baseline means "was largely consistent with random sampling," but each journal had an excess of trials with "baseline means that were similar (P value near 0) or dissimilar (P value near 1)." In all, 15.6% of trials had P values that were within 0.05 of 0 or 1. "That is 5.6% more than expected, or one in 18 trials," he noted.

Retracted trials had a higher proportion of P values in the extreme 10% of the expected distribution than unretracted trials (43% vs 15%).

"The association of extreme distributions with trial retraction suggests that further investigation of uncorrected, unretracted trials and their authors will result in most trials being corrected and some retracted," Carlisle said.

While fabricated data could explain some of the anomalies, there are more mundane explanations such as correlated baseline variables, statistician Dr F Perry Wilson (Yale School of Medicine, New Haven, CT) said in an interview. Indeed, the paper points out that correlated variables might explain the P value of 9.6 x10-4 calculated for the CARISA trial, in which six of the nine baseline variables from patients with severe chronic angina were derived from the same exercise test.

Some discrepancies could also be due to simple transcription errors or replacing standard deviations with standard errors or vice versa, Wilson said.

"Reporting something as a standard error when it's really a standard deviation makes your baseline variables look like they're crazy balanced, that these groups are too well-matched, and the opposite of that is true as well. So the simple reporting of something as a standard deviation instead of a standard error would explain a lot of these findings."

Carlisle acknowledged that "some P values were so extreme that the baseline data could not be correct." For instance, for 43/5015 unretracted trials, the probability that random allocation would result in the baseline mean distributions was less than 1 in 1015, or "equivalent to one water drop in 20,000 Olympic-sized swimming pools."

Notably, the analysis revealed no difference in the distributions of baseline means for the 1453 trials in NEJM and JAMA and the 3634 trials in the anesthesia journals (P=0.30).

The rate of retracted articles from the two elite journals (6/1453), however, was one-quarter the rate from the six anesthesia journals (66/3634) (relative risk 0.23; 99% CI 0.08–0.68).

"It is unclear whether trials in the anesthesia journals have been more deserving of retraction or perhaps there is a deficit of retractions from JAMA and NEJM," Carlisle said.

The Carlisle screening tool is freely available online and "should be used by medical publications around the world," Anaesthesia editor in chief Dr Andrew Klein said in the journal's statement.

He added, "The editors, owners, and publishers of these journals have an ethical duty to do everything in their power to ensure accuracy in scientific publishing. Patients expect accuracy and safety in the research on which their care is based."

Wilson countered that "there are papers retracted all the time for people making up data and we need tools to recognize that, but this is a screening test, not a diagnostic test."

The editorialists took issue with the study's "arbitrary and nonspecific" threshold of P<0.01 and questioned whether prospective application of Carlisle's screening tool will prove useful in the longer term to prevent future publication of fraudulent material but added that "for the time being at least, we have the benefit of a new tool."

Loadsman and McCulloch also urged relevant stakeholders to consider the wider implications of the study and that if a prospective screening system were put in place, "there will need to be a debate about the value of and justification for the sharing of information so derived with other journals."

In a widely published editorial[5] earlier this week, however, the International Committee of Medical Journal Editors (ICMJE) backed off a 2016 proposal to mandate universal data sharing and required only that as of January 2018, clinical-trial manuscripts submitted to ICMJE journals need only contain a "data-sharing statement."

Although these statements may factor into publication decisions, the ICMJE editors write, "Over the past year . . . we have learned that the challenges are substantial and the requisite mechanisms are not in place to mandate universal data sharing at this time."

Carlisle is an editor of Anaesthesia. Wilson reports no relevant financial relationships.  

Follow Patrice Wendling on Twitter: @pwendl. For more from, follow us on Twitter and Facebook.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.