Cancer Research Faces a Reproducibility Problem, New Study Shows

Donavyn Coffey

December 15, 2021

The ability to replicate research findings remains a mainstay of the scientific process, allowing experts to assess and challenge an evolving evidence base.

But a growing body of research shows that "nonreplicability may be occurring at higher rates than recognized, potentially undermining credibility and self-correction," Tim Errington, PhD, director of research at the Center for Open Science, Charlottesville, Virginia, and colleagues write.

To understand the scope of the problem, Errington and colleagues at the Center for Open Science embarked on a project to investigate the reproducibility of preclinical cancer research.

And the findings were bleak. 

Overall, Errington and colleagues could only repeat 46% of the studies initially assessed, and of these, only 46% of the replication efforts were deemed successful or in line with the original findings.

Equally troubling: none of the original papers contained enough details to repeat the experiment. When Errington and colleagues asked the original authors for additional information or clarification, responses for 41% of experiments were "extremely or very helpful," but for 32% of experiments, the original authors either did not respond or their responses were "not at all helpful."  

"People are busy and we tried to give them time, but the flat-out, 'I won't help you' was disappointing," Errington, lead author of the study, told Medscape Medical News.

The study and its companion article, both published in eLife on December 7, are the culmination of 8 years of work. Starting in 2013, the researchers examined 50 major studies, which included 193 experiments evaluating preclinical cancer research. All studies were published in high-impact journals between 2010 and 2012.

Overall, the authors repeated 50 experiments from 23 papers, generating data about the replicability of 158 effects — 136 reported as positive results in the original work and 22 null effects. The researchers then used seven methods to assess replicability.

When focusing on one method — effect sizes — Errington and colleagues found that 79% of their replication efforts on the positive findings were also in the positive direction, though 92% of replication effects were smaller than the original.

However, when using between three and five methods to assess the experiments, only 40% of replications on positive results were considered successful (39/97); for null effects, that success rate was higher at 80% (12/15). Combining positive and null effects, the success rate was 46%.

The authors cautioned that "a successful replication does not definitively confirm an original finding or its theoretical interpretation. Equally, a failure to replicate does not disconfirm a finding, but it does suggest that additional investigation is needed to establish its reliability."

However, not much data or protocol sharing among researchers happens, Errington noted.

From many researchers "we got the equivalent of 'the dog ate my homework,'" he said. Researchers who did respond often couldn't help for numerous reasons: the server where they'd been storing the data was wiped clean; everyone who worked on the study was no longer at that university; or there was just so much data they didn't know where to look. 

"It's understandable but disheartening," Errington said. In the digital age we need a better way to store and share information. The unwillingness or inability to share data indicate that reproducibility isn't a priority, he said. 

Sharing data, methods, and reagents is "more of an issue than many of us would want it to be," said Nima Sharifi, MD, an oncologist at the Cleveland Clinic, Ohio, who leads a research lab focusing on prostate cancer and was not involved in the research. "Generally, we should all want the same thing: once the information is public, to make methods and reagents readily available to anyone who wants to replicate the finding or take it to the next level."

But even among the studies that were reproducible, Errington found no obvious commonality among them to indicate why they were easier to replicate. Errington's own qualitative assessment is that when an original finding was well described, had a big effect, and a good sample size it tended to hold up when repeated. Small effects in a small sample size with uncertain explanations tended to be shakier, he said. 

This isn't the first time the Center for Open Science has identified a problem with reproducibility. In 2015, Errington and colleagues highlighted similar issues in psychology, though the two fields are hard to compare because the research is so different. Psychology tends to involve more complex datasets with many conditions, which lends itself to p-hacking — when researchers misuse data analyses to find statistically significant patterns when no real underlying effect exists.

Problems like p-hacking are less prevalent in preclinical cancer research where the experimental design is simpler. Selective reporting and lack of detail are more often the problems in these studies, Errington explained. 

For example, let's say a cell line experiment was performed 20 times but only worked once. That one success may ultimately be what's published.

Overall, "it is unlikely that the challenges for replicability have a single cause or a single solution. Selective reporting, questionable research practices, and low-powered research all contribute to the unreliability of findings in a range of disciplines," the authors explain.

And, Sharifi noted, "it's not [always] a matter of what's published, but what's not included in the paper."

Publication bias makes it easier to publish positive findings over negative, which means researchers may be less likely to include data that don't support their main thesis.

And the demand for innovative, new findings and clean stories may incentivize researchers to leave out the messier details, Errington added.

While Errington and colleagues focused on replicating preclinical data, early results matter, and this project helps expose inefficiencies in the clinical trial pipeline.

"The success of phase 2 clinical trials in oncology isn't great," in part because some of the preclinical findings won't translate to the clinic, Errington said. Bottom line, he stressed, is we may be rushing to clinical trials too early when we could be failing faster and cheaper in the preclinical space. 

But there's a way forward.

Errington envisions a more rigorous evaluation or trial approach at the preclinical stage to make sure we are getting things right earlier before investing in clinical studies.

The authors hope that helping shine a light on the replication problem will "spur one of science's greatest strengths, self-correction," the authors write. After all, "science pursuing and exposing its own flaws is just science being science."

eLife. Published online December 7, 2021. Research article, Feature article

Follow Medscape on Facebook, Twitter, Instagram, and YouTube.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.