Biased Algorithms Affect Healthcare for Millions

Nicola M. Parry, DVM

October 29, 2019

Algorithm-based healthcare is not unbiased and can lead to substantial inequality in patient healthcare, according to a study published online October 24 in Science.

"We show that a widely used algorithm, typical of this industry-wide approach and affecting millions of patients, exhibits significant racial bias," write Ziad Obermeyer, MD, from the University of California, Berkeley, and colleagues.

The scale of impact of this bias is large; bias affects healthcare decisions for tens of millions of patients every year, Obermeyer said in an interview with Medscape Medical News.

Most healthcare systems use high-risk care management programs to provide extra resources to patients with complex health needs, Obermeyer explained. These programs aim to reduce emerging health problems and their associated extra healthcare costs.

However, providing this extra help is expensive, he said, so health systems use algorithms to identify the patients who most need this help.

But there is mounting concern that, because humans create the algorithm inputs and design, these machine-learning programs are not unbiased. Rather, the algorithms may contain racial and gender biases of the people who develop them.

To find out if that was happening in practice, Obermeyer and colleagues tested a commercially available algorithm called Impact Pro, from Optum, on a large patient dataset at an academic hospital. Their main sample comprised 6079 patients who self-identified as black and 43,539 patients who self-identified as white. Overall, 71.2% of the sample were commercially insured, and 28.8% were enrolled in Medicare.

The researchers followed the patients over 11,929 and 88,080 patient-years, respectively, and obtained algorithmic risk scores for each patient-year.

They found that, at a given risk score, black patients were considerably sicker than white patients, as demonstrated by signs of uncontrolled illnesses.

Removing the bias in this algorithm would more than double the number of black patients who would be eligible for a program that provides extra medical help to the highest-risk patients, said Obermeyer — raising it from 17.7% to 46.5% in this particular health system.

Cost as Input Measure

Obermeyer explained that the issue of this bias arises not necessarily in the algorithm itself but in the problem that the algorithm aims to solve.

Although identifying patients who need the most care seems straightforward, companies must choose a variable in the dataset that will accomplish this, he said.

Cost is typically an easy variable to choose in this case, Obermeyer explained, especially because it is frequently used as a proxy for health in some settings.

However, that proxy is not always reasonable, such as when comparing black patients with white patients. Black patients tend to generate fewer costs at the same level of health, he said. Inequalities in access to healthcare significantly contribute to this — for example, it can be much harder to get to a doctor's appointment if one can't pay for transportation or can't take the day off from work.

"And in our paper, we find that subtle distinction gets magnified by algorithmic prediction tools, leading to perverse consequences," Obermeyer said.

But some hopeful news has emerged from these findings, said Obermeyer.

Their team contacted the algorithm manufacturer to discuss their study findings. The manufacturer's technical teams then obtained similar findings when they replicated the study using their own dataset.

This has led Obermeyer's team and the manufacturer to begin to collaborate on solutions.

"Our early results show that simply by changing the label being predicted, we were able to achieve an 84% reduction in bias," he said.

The teams are continuing to work together to develop better methods of predicting multidimensional measures of health. Ultimately, the manufacturer plans to build new insights into the algorithm, according to Obermeyer.

"We appreciate the researchers' work, including their validation that the cost model within Impact Pro was highly predictive of cost, which is what it was designed to do," Optum said in a statement emailed to Medscape Medical News.

"It's also important to note that the tool applies complementary analytics from over 600 clinical measures to identify gaps in care based on well-established, evidence-based clinical guidelines. These gaps, often caused by social determinants of care and other socio-economic factors, can then be addressed by the health systems and doctors to ensure people, especially in underserved populations, get effective, individualized care," the statement continued.

Pervasive Problem

The researchers in this study had a unique data source ― the inputs for the AI algorithm, the outputs produced by the algorithm, and the actual events that occurred after the predictions were made ― notes Gabriela Schmajuk, MD, from the University of California, San Francisco.

"This allowed them to see how the algorithm performed and to detect potential biases introduced or exacerbated by the algorithm," said Schmajuk, who has written about potential bias in AI and healthcare.

"They were able to confirm our fears about AI algorithms having the ability to exacerbate biases — black patients have to be sicker compared to white patients in order to qualify for referral to a targeted 'high-risk patient' program."

Schmajuk believes that this study is a demonstration of a pervasive problem with AI. "Such AI algorithms are already 'live' in healthcare as well as in other sectors — therefore, we must be vigilant about how they may exacerbate biases and work to reduce them."

Indeed, the researchers note that such systems are being widely used in healthcare. "The Society of Actuaries conducted a comprehensive evaluation of the ten most widely-used algorithms, including the particular algorithm we study," Obermeyer and colleagues write in supplemental materials published with their article. "The accuracy metric used to measure algorithmic performance was cost prediction.... [T]he enthusiasm for cost prediction is not restricted to industry: similar algorithms are developed and used by non-profit hospitals, academic groups, and governmental agencies, all with the goal of predicting cost and resource utilization. This approach is likewise described in academic literature on targeting population health interventions."

This needs to change, according to Schmajuk. "Insurers, healthcare systems, and academic centers must find ways to assess and reduce bias in whatever algorithms they use, whether they be designed to predict costs, readmissions, no-shows in clinic, or specific clinical events," she said.

"The European Commission has developed Ethics Guidelines for 'Trustworthy AI' ― as a first step, organizations should adhere to these guidelines."

The study was supported by the National Institute for Health Care Management Foundation. The authors and Schmajuk have disclosed no relevant financial relationships.

Science. Published online October 24, 2019. Full text, Supplementary materials

Have a tip for us? Contact us


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.