How Did This Pass Peer Review? Thoughts on the Lancet and NEJM COVID-19 Retractions

Andrew D. Althouse, PhD


June 15, 2020

On June 4, two of our most prestigious medical journals (The New England Journal of Medicine and The Lancet) retracted two papers based on "data" from a previously little-known healthcare analytics company called Surgisphere. The short version of the story is that this tiny company claimed to house a system of fully integrated data from the electronic health records of at least 671 hospitals across six continents. However, it seems unlikely that the data exist as advertised.

In the aftermath, some may wonder why peer review did not identify the articles as questionable, being that peer review is the main quality-control system for the integrity of the medical literature. How did the reviewers miss what, in retrospect, seems obvious?

Flawed vs Fraud

First, we must distinguish research that is merely flawed from research that is fraudulent. Research may be "flawed" when a study was performed and real data exist but the researchers applied inappropriate statistical techniques or made concluding statements that are not fully supported by the study data. We should acknowledge that all research is flawed to some extent. There is no such thing as a perfect study, but as long as the data are presented honestly, scientists may discuss strengths, weaknesses, and proper interpretation. In contrast, outright fraudulent research presents data that are doctored in some way or perhaps fabricated entirely.

Fraudulent papers may have clues in the form of impossible or contradictory statistics: an average that cannot exist, a confidence interval that's impossibly narrow, or a P value that doesn't match the rest of the data. Fabricating summary statistics without any underlying data is fiendishly difficult to do without leaving a few such clues. Some scientists have made a hobby of creating techniques and tools to identify potentially suspicious data (see GRIM, SPRITE, RIVETS). Many were inspired by their investigation into a series of suspicious papers from a high-profile nutrition researcher. But these tools are most efficiently deployed on papers that already have aroused suspicion; it would be cumbersome to apply them to every paper as part of the routine peer-review process in its current form.

Furthermore, the two retracted Surgisphere papers did not seem to include any such impossibilities (or at least my admittedly incomplete efforts failed to find a striking example). Clever fraudsters can create data that would pass the basic checks named above if they first create a realistic-looking database by simulation and then perform "statistical analysis" on the simulated data, thereby assuring that all statistics in the paper appear internally consistent. If this is done, a reviewer focused solely on the manuscript could easily believe that the results came from a real database.

Context Is Key

These papers would not have been easily flagged as suspicious by hunting for contradictory statistics without some outside knowledge, which is where the large number of interested readers came through in postpublication review. Suspicions were raised about the Surgisphere papers thanks to knowledge of the spectrum of coronavirus pandemic data and, more generally, the plausibility of a small company claiming to have integrated electronic health records from so many hospitals. The NEJM paper allegedly included data from 169 hospitals across three continents; the Lancet paper allegedly included data from 671 hospitals across six continents. Certainly this feels like something that could exist in the era of Big Data, but anyone familiar with electronic health records integration knows that it would be a tremendous undertaking requiring substantial time, effort, and support staff to create and maintain such a database, to say nothing of navigating the complex legal and ethical issues that arise from allowing a private company access to hospital data, no matter what "de-identification" practices are in place. Yet Surgisphere has a minimal footprint — few employees that could be located and seemingly none with a background suitable for this work. No hospitals have come forward to state that they partnered with Surgisphere; that does not mean that they do not exist, only that no proof has yet been given that they do exist.

The description of the study database in each manuscript was replete with impressive-sounding terms but left many readers asking about important details that were glossed over or omitted. Each manuscript had a vague passage about a "manual data entry" process for quality assurance that seemed at odds with a database supposedly compiled automatically from electronic health records. Astute readers pointed out that it would be especially curious for hospital data from around the world to use categories of "race" consistent with American-style research when that would be unusual (or perhaps even illegal) to record outside the United States.

Others noted that much of the data seemed inconsistent with publicly available data on coronavirus patients. For example, the average age of hospitalized patients in both studies was only about 50 years, while publicly available data suggest that this would be higher. Suspiciously, the number of deaths reported for the five Australian hospitals in the Lancet paper exceeded the number of deaths reported in all of Australia to that point. Tepid explanations were offered for a few of these things at first; others were not addressed by the authors.

At the time of writing, we do not know exactly what happened. The retraction notices were vague, essentially saying that the non-Surgisphere authors no longer stood behind the data because they could not verify its existence, while the lone Surgisphere author refused to share the data with his co-authors, citing agreements with clients and confidentiality.

So, back to where we started: Why didn't the peer reviewers catch this?

Truthfully, peer review in its current form is ill-equipped to catch outright frauds — especially if the fraudsters are reasonably competent. Writing for The Guardian, James Heathers has already explored this in some detail (you should definitely read his article). I will add that most of us are inherently trusting and approach every peer review under the basic assumption that the study data actually exist, even if we might quibble with how the data are analyzed or the correct interpretation. It would be a substantive change to require reviewers to prove that the data exist before beginning their critique. Reviewers are expected to critique the science put in front of them, but not necessarily to question whether it was real in the first place.


Peer Review Imperfect but Useful

It's an unpopular position right now, but I believe that peer review did do something useful in this case, unsatisfying as the final result may have been. The author group also posted a preprint (since removed) allegedly showing superior outcomes in patients treated with ivermectin. The preprint was quite underdeveloped in comparison to the peer-reviewed papers that appeared in the NEJM and the Lancet. If we infer that the initial journal submissions looked like the ivermectin preprint, quite a bit of supplemental content was added, presumably at the request of the peer reviewers. This information was later used by observant commentators to raise questions about the veracity of the data, such as the suspicious similarity of nearly all patient characteristics across six continents and the implausible number of deaths reported in Australia. So even if the review process (including the journal editors) did not actually prevent the Surgisphere "data" from being published, it may have given the end readers the details they needed to probe further and flag the paper as questionable.

The peer-review process is not perfect, nor does it have to be for it to be useful. But this episode makes it clear that postpublication review and discussion (often facilitated by social media) is now an essential component of the scientific process. Papers do not become unassailable just because they have gone through peer review; that is a step in the process but not the end of the process. If legitimate concerns about the integrity of a paper are raised after publication, it is incumbent on both the authors and journal to see to it that nonspecious concerns are addressed, otherwise they risk losing credibility.

Andrew Althouse is assistant professor of medicine at the University of Pittsburgh as well as statistical editor for Circulation: Cardiovascular Interventions. His primary research interests are designing and analyzing clinical trials that hopefully may be used to inform medical practice. In his spare time, he enjoys lifting weights, cooking (and eating), and sampling craft beer.

Follow Andrew Althouse on Twitter

Follow | Medscape Cardiology on Twitter

Follow Medscape on Facebook, Twitter, Instagram, and YouTube


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.