Machine Learning in Rheumatology: Going Beyond Clinical Instincts

Karmela Kim Chan, MD


April 05, 2022

Editorial Collaboration

Medscape &

"Oh right, it's medicine, not science." That's what a dear scientist friend said to me once as we were discussing our research involving my patients.

Karmela Kim Chan, MD

I think instinct plays a role in how we practice medicine. Before you dismiss this idea as unscientific, hear me out: Although we may not always know how to articulate what we know or how we know it, instinct is actually informed by our experiences superimposed on the background of medical training.

But each of us, alone, has incomplete data and insufficient time.

This seems to me to be the essence of machine learning — imagine each individual practitioner, with all the discrete data points gathered, consciously or otherwise, from each patient encounter multiplied by a thousand, or tens of thousands. Imagine having that massive amount of information to leverage for the next patient to benefit from. Artificial intelligence is exactly that — it would recognize patterns that you might not otherwise recognize if you relied only on your small population size. Rajkomar and colleagues, in a review article in The New England Journal of Medicine, describe it thus "Machine learning… is the fundamental technology required to meaningfully process data that exceed the capacity of the human brain to comprehend."

One area where machine learning has made the most remarkable advances is in the detection of diabetic retinopathy. In one study, a deep-learning system was trained to recognize diabetic retinopathy using 76,370 images; in the subsequent validation set, the deep-learning system had a 100% sensitivity (and a 91.6% specificity) for detecting vision-threatening diabetic retinopathy. Another study by a different group demonstrated that their algorithm performs just as well as US board-certified retina specialists in detecting diabetic retinopathy. Practically speaking, this could improve outcomes for the roughly 460 million people globally who have diabetes, most especially those who are in remote areas where access to care is limited. Deep-learning methods have also been successfully used in oncology in the detection of melanoma and lymph node metastases in breast cancer, among others.

In rheumatoid arthritis, strides are also being made using deep-learning systems. For example:

  • Orange and colleagues used unbiased consensus clustering on gene expression data from 45 synovial samples to find three distinct subtypes (high-, mixed-, and low-inflammatory subtypes). They then created a supervised learning model where the subtypes were the labels, and the histologic features were the inputs. Using this model, they found that certain features (such as lymphocytes, plasma cells, and neutrophils) were predictive of the high-inflammatory subtype; mucin, detritus, and synovial lining giant cells were predictive of a low-inflammatory subtype. 

  • Guan and colleagues built a probabilistic supervised regression model to predict response to anti–tumor necrosis factor therapy on the basis of demographic, clinical, and genetic markers of 1892 patients from 13 different cohorts. The model correctly classified responses in 78% of patients.

  • Tao and colleagues examined gene expression and DNA methylation patterns in immune cells of 80 patients with rheumatoid arthritis before and 6 months after treatment with either etanercept or adalimumab. Using differentially expressed genes and differentially methylated DNA positions in responders vs nonresponders, they trained a supervised machine-learning algorithm and subsequently demonstrated that the algorithm could predict response to these agents.

Work is also underway in other rheumatic diseases:

  • Burlina and colleagues compared a supervised algorithm and a fully automated deep-learning tool to identify dermatomyositis, polymyositis, and inclusion-body myositis; both methods performed well, though the fully automated method performed better, demonstrating that "computer algorithms can leverage, detect, and quantify image biomarkers and features which an operator may not always be able to do in a consistent fashion."

  • In knee osteoarthritis literature, underserved populations have more pain even after adjusting for Kellgren-Lawrence grade (KLG), which is, of course, a physician-determined radiographic measure of knee osteoarthritis severity that was developed in the 1950s and is based on 85 urban dwellers in the United Kingdom. Pain in underserved populations is therefore often blamed on psychosocial factors. Pierson and colleagues used an unsupervised deep-learning algorithm on a dataset of more than 25,000 radiographs from 2877 patients across race, income, and education. They generated a new measure of radiographic disease severity that accounted for pain much better than KLG does. This work has the potential to reduce disparities in the use of knee arthroplasty.

  • To aid in distinguishing immunoglobulin (Ig) G4–related disease from its mimics on the basis of readily-available laboratory tests, Yamamoto and colleagues developed supervised learning models using clinical data from 602 patients with IgG4-related disease and 204 patients with IgG4-related disease mimics. Both models performed extremely well when tested against the validation cohort. This is especially important for a rare, underrecognized disease that often requires tissue sampling for diagnosis.

All of these advances have huge practice-changing potential. When done correctly, machine learning can reduce barriers to prompt treatment and the time and resources spent on treatments that won't work.

These methodologies are, of course, not without fault. I have never heard the dictum "garbage in, garbage out" as often as I have in the time I spent reading up for this piece. And it's true — the models are only as good as the inputs that they are built on. These tools will greatly aid clinical medicine, but the ultimate responsibility lies with the human team who needs to understand how the artificial intelligence came up with its results and how to integrate them into care.

All of this brings us closer to the goal of precision medicine, where diagnostics and therapeutics are tailored to each patient. Clinical and molecular data, subjected to machine learning, could make easy work of diagnosis, disease classification, prognostication, selecting the right treatment, predicting response to treatment, or even drug repurposing, allowing medicine to approximate science just a bit more closely. Again, as Rajkomar and colleagues note in their paper, "the wisdom contained in the decisions made by nearly all clinicians and the outcomes of billions of patients should inform the care of each patient."

Karmela Kim Chan, MD, is an assistant professor at Weill Cornell Medical College and an attending physician at Hospital for Special Surgery and Memorial Sloan Kettering Cancer Center in New York City. Before moving to New York City, she spent 7 years in private practice in Rhode Island and was a columnist for a monthly rheumatology publication, writing about the challenges of starting life as a full-fledged rheumatologist in a private practice.

Follow Dr Chan on Twitter

Follow Medscape on Facebook, Twitter, Instagram, and YouTube


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.