Algorithmic Fairness: Mitigating Bias in Healthcare AI

Elle Lett, PhD, MA, MBiostat


July 25, 2022

Healthcare is undergoing an artificial intelligence (AI) revolution. Facilitated by improved infrastructure for handling large quantities of data, AI-derived tools are quickly becoming integrated in everyday clinical practice. Scientists and clinicians are enthused by the potential to make huge advances in the efficiency and quality of healthcare in this era, and with good reason.

But healthcare data are generated in a society that is subject to discrimination. And few AI models take into account the fact that the data pool is inherently limited, resulting in performance that is optimized for the majority and is poorer for marginalized minority groups.

In response to this algorithmic injustice, a field of algorithmic fairness has been developed to try to build "fairness aware" models that optimize within-subgroup performance across social groups, such as race or gender. Within this field are approaches that target equitable model solutions before, after, and during data processing. 

However, because the process for building AI-assisted healthcare tools is not regulated, this attention to fairness is not required. Some models are deployed without an awareness of bias and can result in a cost to marginalized groups that is not detected until after the fact.

High Risks of Bad Math

Evidence of this already exists. One study of a commercial risk prediction model commonly used to identify high-risk patients systematically estimated lower risk for Black patients compared with White patients. That algorithm used health costs as a predictive measure. Because of systemic inequities in access to healthcare in the US, more money is spent on the health of White patients than on Black patients. This led the algorithm to falsely conclude that White patients have higher risk levels.

The model implied a lower necessity for additional resources on racial lines, despite the same level of illness. That bias creates an environment in which Black and White patients with similar illnesses are not being recommended for the same support. 

This translates into a real-world situation in which a White patient might be assigned a primary care follow-up or be referred to a dedicated nurse team to track their patient care — but a Black patient might just be sent home.

This "algorithmic bias," or differential performance of AI, was rooted not only in how the model was made, but in the data used to make it.

Marginalization Before the Math

The data we use in healthcare AI models are generated from a system that discriminates against individuals from marginalized groups. This discrimination happens interpersonally, between healthcare staff and practitioners, and societally, due to barriers driven by structural discrimination.

In order to be included in the electronic health records (EHRs) that become the data basis for AI systems, a person must become a patient — and be a patient within an EHR system that will be used for research purposes. For many, either or both of these events can be a challenge.

Consider a hypothetical Black transgender woman who is trying to obtain regular primary care. Her path to healthcare is complicated before she even attempts to make an appointment. More than half of all Americans get insurance coverage through their jobs, but transgender people disproportionately face employment discrimination, which can create both severe economic hurdles and barriers to insurance coverage. 

Without employer-provided insurance, Medicaid is an option for our would-be patient. But Medicaid is subject to state-level regulations, and many states explicitly exclude some transgender-related healthcare needs.

Our aspiring patient must conquer these social and economic barriers in order to even have a primary care appointment and be present in healthcare data.

Even in the doctor's office, she is potentially subject to discrimination at the hand of healthcare providers and is at risk of being misdiagnosed or denied care by practitioners not well-versed in trans-competent care.

Adding to that is her intersectional experience as a Black transgender woman: Racial bias negatively affects the quality of healthcare and can degrade the related data.

As recently as 2016, half of a sample of White residents and medical students believed that "Black people's skin is thicker than White people's skin," according to a study that examined racial bias in pain management. Furthermore, the participants who held those incorrect beliefs about biological differences between Black and white people underrated Black patients' pain compared with White patients' pain and made less accurate treatment recommendations.

These experiences of discrimination are integrated into healthcare data and map directly onto challenges that produce algorithmic bias. Structural discrimination, underrepresentation, and biased care all lead to lower-quality data for individuals from marginalized groups relative to others.

AI models that are "fairness-aware" attempt to equalize that data by incorporating approaches that mitigate bias into the model-building pipeline.

Calibrating for Equality

The field of algorithmic fairness is still in its nascency but is undergoing an explosion, with new approaches being developed rapidly. These approaches vary widely but are united by attempting to optimize model performance in subpopulations within the data, in addition to or sometimes at the expense of overall model performance.

Generally, there are three types of approaches to building fairness-aware models: (1) preprocessing, where you adjust the training data used for building the model; (2) in-processing, where you use a model fitting-algorithm that accounts for subgroup specific model performance, and (3) postprocessing, where you adjust the outputs of the machine learning model to be more fair.

One preprocessing approach is "re-weighting" data from specific groups. If a model to predict cardiovascular health outcomes was built on a dataset of 9100 participants, but there were only 100 Black individuals included (~1%), one approach would be to count (or duplicate) the data from each of those Black individuals 10 times to create a "new" data set that is 10% Black to build a model that should (hopefully) have better predictive performance on Black individuals than the original data.

Re-weighting can help, but it has many limitations. More sophisticated approaches to building fairness-aware models improve upon this simple approach, each with their own strengths and limitations. 

One example is multicalibration, where you iteratively update the predictions from the model to minimize the error across subgroups under a certain threshold. 

In one method, it takes predictions from a model and then "audits'' them for groups defined by specific attributes. The "auditor" can be built to look at model performance across subgroups such as race, gender, and education status.

Suppose a model for identifying patients who are likely to adhere to a medication program has a strong performance (low error rates) for White men with college degrees, but poor performance (high error rates) for Asian women without college degrees, and intermediate performance for Black men with some college education but without a formal degree.

The multicalibration procedure would randomly select subgroups based on the prespecified attributes (race, gender, and education). Within those groups, it will update the predictions to reduce the error rate for that group. This procedure would repeat until all groups had an error rate below a prespecified threshold. 

In this way, you could ensure that in the final model, the error rate for everyone — Black men with some college, Asian women without college degrees, White men with college degrees, and any other combination of those attributes — was at most 10%. 

Going Beyond Fair Algorithms

Although fairness-aware model building processes show tremendous promise, it is important to know that they will always be limited by the circumstances of the data used to build these tools. 

AI-derived healthcare tools remain completely unregulated in the US, so there are no constraints on how models are developed and there are no federal requirements on performance, transparency, or equity. 

Independent companies are capable of building proprietary algorithms and selling them to healthcare systems. These are often referred to as "black-box" algorithms because buyers are unable to look under the hood — there is no way for practitioners to know whether the model was designed for the patients they're serving. 

Often, the harms caused by use of specific tools are only uncovered after they have been deployed, sometimes after years. Therefore, in tandem with improving our technology and regulatory infrastructure, we have to eliminate the systemic discrimination that harms the health of oppressed groups in our society.

AI is an inevitable component of healthcare provision. As practitioners and stakeholders we are morally obligated to use these technologies responsibly and prevent them from exacerbating inequities that are experienced by marginalized groups in our healthcare system.

Elle Lett, PhD, MA, MBiostat, is a Black, transgender woman, statistician-epidemiologist, and physician-in training. Through her work, she applies the theories and principles of Black Feminism to understanding the health impacts of systemic racism, transphobia, and other forms of discrimination on oppressed groups in the United States. She holds a PhD in Epidemiology from the University of Pennsylvania, master's degrees in Statistics and Biostatistics from The Wharton School and Duke University, respectively, and a bachelor's degree in Molecular and Cellular Biology from Harvard College. To date, her work has focused on intersectional approaches to transgender health and the health impacts of state-sanctioned violence and other forms of systemic racism. Now, she is turning her focus to algorithmic fairness in clinical prediction models and mitigating systems of inequity in health services provision. She is engaging in this new arm of research through a postdoctoral fellowship at the Boston Children's Hospital Computational Health Informatics Program (CHIP), before returning to finish her clinical training.

Follow Medscape on Facebook, Twitter, Instagram, and YouTube


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.