Harnessing Big Data for Identifying Atrial Fibrillation

Robert K. Altman; Jonathan S. Steinberg


Europace. 2019;21(9):1283 

The last decade has seen significant strides in the prevention of systemic embolism and stroke due to atrial fibrillation (AF) in the form of safe, convenient, and effective antithrombotic therapy. However, in part because many patients have undiagnosed AF and thus no protective anticoagulant therapy, the arrhythmia remains a significant cause of stroke. Up to 37% of patients younger than 75 years old presenting with a first stroke and no prior history of cardiovascular disease have stroke as the initial manifestation of AF.[1] Observations such as this suggest an opportunity to pre-identify patients with AF before the potentially devastating effects of stroke.

Except in those patients who have already had an embolic event,[2] population screening for AF has had mixed results[3] in part because the definition of at-risk patient populations has proved challenging. For instance, a model utilizing the Framingham Heart Study population had only modest predictive value with a C statistic of 0.67, meaning that up to a third of patients would be mischaracterized in this model.[4] This is comparable to most other prediction models for AF thus far developed.

In this issue of EP-Europace, Hu et al.[5] utilize a machine learning method to derive a novel prediction model for AF in a large cohort of Chinese patients incorporating age, gender, co-morbidities, and CHA2DS2-VASc score. The authors leveraged the availability of a large cohort of patients (the population of Taiwan) within a limited but accurate administrative database to develop a prediction model for the presence of AF. A machine learning technique known as the random forest model was used. This technique facilitates the development of a highly predictive model utilizing decision tree analysis without 'overfitting' of a training data set that would make the data less applicable to other similar cohorts of patients. The model successfully classified patients with extremely high accuracy in the binary identification of AF in a large validation data set of Chinese patients. The C statistic of 0.94 from the receiver operating characteristic curve suggests that only 6 out of 100 patients were misclassified.

If the powerful prediction model described by Hu et al. can be replicated, including other ethnic groups as well as incident rather than prevalent AF, better screening for AF may result by targeting nuanced at-risk populations. Future studies such as these are needed to develop refined prediction models with readily available variables that may be generalized to a large population.

Marrying potent population prediction tools with the potential of wearable technology[6] with embedded algorithms for AF identification is a particularly promising approach. The yield and specificity of opportunistic AF screening would likely improve by focusing on more precisely defined higher risk populations, enhancing efficiency, and cost-efficacy. Such wearable technology may one day allow for the collection of physiologic data as changes in heart rate variability, sympathetic skin response, or others. These variables may serve as powerful predictors of incident AF when applied to large populations.

The convergence of advances in the analysis of 'big data' with machine learning along with the development of inexpensive, non-invasive, and accurate wearable tools for physiologic data collection and AF detection holds great promise for progress in the prevention of stroke in at-risk individuals.