AI Uncovers Individual Disease Risk in EHR Data

Diana Phillips

June 30, 2020

An artificial intelligence (AI)–based strategy that mines patients' medical records over time can identify patterns associated with specific diseases, researchers show.

The approach may help identify patients at risk of developing certain conditions and could inform management on the basis of disease and outcome models, Hossein Estiri, PhD, Laboratory of Computer Science, Massachusetts General Hospital, Boston, and colleagues report in an article published online in Patterns, an open-access journal published by Cell Press.

By combining a sequential pattern-mining algorithm with a machine-learning pipeline, the process is able to unlock temporal patient health information from electronic health records (EHRs) and seamlessly translate the large volumes of clinical data into actionable knowledge.

"Often the diagnosis code that a patient receives in their electronic chart may not truly match their health condition; or a patient might have the disease but hasn't yet received the appropriate diagnosis code," according to study coauthor Zachary Strasser, MD, a postdoctoral fellow in the Laboratory of Computer Science at Massachusetts General Hospital. "Therefore, the challenge is how to build a model that identifies patients with a particular disease when the codes themselves may be inaccurate," he explained in an interview.

The transitive sequencing approach described in the article utilizes a model built on sequential pairings of diagnoses and medications. "The sequence as a label offers greater accuracy than just the diagnosis alone," Strasser explained. "For example, if a patient has a congestive heart failure [CHF] diagnosis and then takes the appropriate medication, that sequence is likely to be more accurate for identifying whether the patient has heart failure than the diagnosis on its own."

Because the process of identifying sequences of all the diagnoses and medications in multiple patients' charts can be very computationally demanding, the researchers use "an algorithm that only selects the most relevant and useful sequences for creating the model," Strasser continued.

To test the strategy, the researchers used it to mine temporal sequences of EHR medication and diagnosis observations from a cohort of patients with an ICD-9-CM code for CHF. They then compared its classification and prediction performance with the conventional approach of aggregating discrete EHR observations for downstream machine-learning algorithms.

"We found that data representations mined from sequences of EHR events are better phenotype 'differentiators' and predictors than the 'atemporal' EHR records that are widely used as the primary data representations in machine learning," the authors report.

Among the examples of the stronger signals associated with the approach, the CHF probabilities associated with diagnostic codes for heart failure, chronic obstructive pulmonary disease, and benzodiazepines individually are 45%, 47%, and 63%, respectively, the authors report. Yet when these features are analyzed in sequence, the probability for heart failure increases. For instance, the temporal sequence of heart failure code and benzodiazepine code has a 64% likelihood for heart failure, and the sequence heart failure code and other chronic obstructive pulmonary disease code has a 78% likelihood.

The authors describe many possible clinical uses of this approach, including the ability to compute real-time CHF probabilities for patients who have not been diagnosed with heart failure. The strategy can also help develop alternative diagnoses, inform a medication recommendation, and offer clinical decision support on the basis of real-time probabilities for different sequences of diagnoses and medications, which, the authors write, "could be especially useful for generating recommendations for patients with complex histories, multiple providers, and health records that span many years."

At the population level, the tool can more accurately identify appropriate patients for clinical trials, quality assessment, and biomedical research; it could help identify patients at risk for any number of other diseases; and it could offer insight into new trajectories for a given disease. For example, Strasser said, "applying this method to an emerging disease, such as COVID-19, could help us understand how the disease progresses by analyzing common sequences."

The sequential pattern-mining/machine-learning strategy extends the transformational potential of AI in healthcare and fills an important gap related to extracting value from EHR data, according to Thomas Pollard, PhD, research scientist in the Laboratory for Computational Physiology, Institute for Medical Engineering and Science at Massachusetts Institute of Technology, Cambridge. "Better methods for understanding the time-varying aspects of electronic health records are certainly needed. And what's great here is that the authors have publicly shared their code, allowing the details of the approach to be reviewed and expanded upon," he said in an interview with Medscape Medical News.

Before algorithms such as this are moved into clinical practice, however, "it is important that we understand potential to create or compound inequities in care," Pollard stressed. "For example, it's important to know whether the algorithm can identify congestive heart failure equally well across gender and across black, white, and Asian patients."

The study authors, Pollard, and Strasser have disclosed no relevant financial relationships.

Patterns. 2020;1:100051. Full text

For more news, follow Medscape on Facebook, Twitter, Instagram, and YouTube.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.