AI Assistance Does Not Worsen Assessments of Bias in Health Research

By Linda Carroll

June 01, 2022

NEW YORK (Reuters Health) - Reviewers assisted by the open-access RobotReviewer platform, which uses machine learning and natural-language processing to partially automate assessment of potential bias in health-research papers, fared no worse than human reviewers working alone, according to a new report.

"Systematic reviews are an essential tool in summarizing the breadth of scientific knowledge; in short they aim to take in everything we know on a subject, and then to summarize it in an actionable way," said first author Dr. Anneliese Arno of the EPPI-Centre at University College London, U.K.

"'Everything' is the key word here," Dr. Arno, also at the School of Public Health and Preventive Medicine, Monash University, in Melbourne, Australia, told Reuters Health by email. "The increasingly rapid rate of evidence production offers the opportunity to know more, but it's also a challenge in collecting all that knowledge in a reliable way. Automation offers one solution to that problem, but many are justifiably concerned it would bias the science produced by human judgement, making systematic-review conclusions less reliable instead of more reliable."

"In the case of RobotReviewer, the AI predicts each study's risk 'score' (i.e. high/unclear or low) based on the language used to report that study," she explained. "To do this, it must first know what patterns of language are associated with which score; that is, it must be 'trained'. RobotReviewer is trained using existing risk assessments conducted manually by expert systematic reviewers."

To take a look at whether using RobotReviewer would save time while not being inferior to reviews performed by humans alone, Dr. Arno and her colleagues collected data from seven review teams, 145 studies, 290 individual assessment forms and 1,160 risk-of-bias (RoB) judgments across four of seven Cochrane RoB domains provided by RobotReviewer.

The researchers found that the proportion of accurate RobotReviewer-assisted assessments was 0.89 as compared to 0.90 among human-only assessments. Dr. Arno and her colleagues also measured person-time for RobotReviewer-assisted assessments and human-only assessments. The time to complete an RoB assessment was 8.97 minutes with RobotReviewer assistance and 10.36 minutes with human-only effort, a non-significant difference.

"These results are insufficient to conclude that the person time required for RobotReviewer-assisted RoB assessments is less than that required for human only RoB assessments," the researchers note.

Dr. Noemie Elhadad, an associate professor of biomedical informatics at Columbia University Vagelos College of Physicians and Surgeons in New York City, welcomed the new findings.

"Because of the steep increase in biomedical publications, getting help from a tool like RobotReviewer when reviewing evidence could be a game changer," said Dr. Elhadad.

"That's exactly the promise of artificial-intelligence techniques, like natural-language processing and machine learning: to support humans in complex, time-consuming tasks," Dr. Elhadad told Reuters Health by email. "This study is exciting, because it takes a rigorous approach to assessing the effectiveness of an AI tool. It suggests the potential of semi-automated approaches to reviewing evidence and provides a great framework to validate and continue building technology in the service of high-quality health evidence."

The study had no commercial funding.

SOURCE: and Annals of Internal Medicine, online May 30, 2022.