The Year in Cardiovascular Medicine 2021: Digital Health and Innovation

Panos E. Vardas; Folkert W. Asselbergs; Maarten van Smeden; Paul Friedman


Eur Heart J. 2022;43(4):271-279. 

In This Article

Big Data and Prognostic Models for Cardiovascular Risk Prediction

Machine Learning for Risk Prediction

Clinical risk prediction modelling based on machine learning has been an active field of research. During the first months of the pandemic, hundreds of such models were developed.[42] Clinical prediction models are commonly developed to inform physicians about the probability of a certain disease being present (diagnosis), or to predict a certain health state in the future (prognosis), for individual patients, and to use that knowledge in the care of those patients.[43] By applying machine learning techniques that can use complex data relationships between predictors and outcome without the need for the modeller to pre-specify them, the expectation is that the accuracy of predictions will improve compared with traditional risk prediction modelling approaches, and that its application will be less labour-intensive at the bedside.

Improvements in predictive accuracy are, however, not guaranteed.[44] For instance, a study that developed machine learning models to predict the risk of death after acute myocardial infarction (AMI) found that machine learning models were not uniformly superior to a traditional logistic regression approach in a cohort of 755 402 AMI patients.[45] In fact, of the three models used, two were superior to the logistic regression model for risk stratification. In addition, those two models were much better calibrated across patient groups based on age, sex, race, and mortality risk, and thus better suited for risk prediction. In contrast, the third model, based on a neural network, was found to be inferior to the logistic regression model used in the study. There may be pragmatic reasons for this inferiority, but they are probably related to the methodology used and in particular the sample sizes of each of the study's populations.

Nonetheless, in other settings, machine learning approaches have yielded promising results. One such study developed models to predict the risk of death, myocardial infarction, and major bleeding after an acute coronary syndrome (ACS). The machine-learning-based models were developed from a cohort with 19 826 adult ACS patients and were shown to predict the risk with high AUCs on external validation, at 1 year (AUCs: 0.81–0.92) and 2 years (AUCs: 0.84–0.93).[46]

Early Warning Systems

Early warning systems are prognostic predictive models that aim to inform physicians about important future health outcomes. Often, these early warning systems are used to monitor patients and to update these predictions over time. For instance, to predict circulatory failure in patients admitted to intensive care, a machine learning model was developed that made a new prediction for every patient every 5 min.[47] The early warning systems developed were shown to yield high AUCs, between 0.88 and 0.94. However, these models also produced two to three alarms per patient per day. This may result in the so-called alarm fatigue, which can lead to inadequate responses and may even impact patient safety.[48] Hence, for these early warning systems and other risk prediction models used to guide clinical decisions, it is essential to ensure safety and effectiveness in improving patient outcomes, for instance, through a randomized controlled trial (RCT) comparing the early warning system to standard of care. One such RCT evaluated a machine-learning-based early warning system for pending intraoperative hypotension.[49] This early warning system updates every 20 s the probability of a hypotensive event in the next 15 min (warning when estimated probability >85%) based on the arterial pressure waveform.[50] In an RCT with 60 adult elective non-cardiac surgery patients, the early warning system, in combination with a haemodynamic diagnostic guidance and treatment protocol, reduced the median total time of hypotension per patient from 32.7 min under standard of care to 8 min.

Big Data: Representativeness and Algorithmic Fairness

Access to large and diverse databases with electronic health records creates important new research opportunities. Such large databases include the Clinical Practice Research Datalink (CPRD), with highly detailed data from >5 million individuals representative of the UK population. Using the CPRD data, one interesting study developed and validated several machine-learning-based risk prediction models for predicting the risk of familial hypercholesterolaemia in primary care patients.[51] These prediction models were shown to have high AUCs of around 0.89. The large scale and representativeness of large databases also allows for studying specific groups that may otherwise be difficult to study. For instance, one study compared cardiovascular disease incidences and outcomes in homeless individuals using a linkage between CPRD, hospital episode statistics, and the Office of National Statistics for mortality data.[52] This study showed that homeless individuals have a 1.8 times higher risk of developing cardiovascular disease and are 1.6 times more likely to die within 1 year after cardiovascular disease diagnosis, compared with similar individuals who are not homeless. Finally, large and diverse databases, where minority groups are also well represented, are essential to ensure that the algorithms developed are fair,[53] i.e. do not systematically disadvantage certain groups of individuals. This requires evaluation of the performance of the algorithms in important subgroups. For instance, a recent study on atherosclerotic cardiovascular disease risk prediction showed a comparable performance of existing pooled cohort equations and newly developed machine-learning-based models in Asian and Hispanic subgroups, for which the performance was so far uncertain.[54]