Artificial Intelligence-augmented ECG Assessment

The Promise and the Challenge

Kelley P. Anderson MD, FACC, FHRS


J Cardiovasc Electrophysiol. 2019;30(5):675-678. 

Since Einthoven developed the electrocardiogram (ECG) investigators and clinicians have sought to expand the diagnostic potential of this convenient, widely available and relatively inexpensive noninvasive examination.[1] In this issue of the Journal of Cardiovascular Electrophysiology, Attia and colleagues demonstrate the ability of a novel algorithm based on the standard 12-lead ECG developed with deep learning artificial intelligence (AI) to detect patients with reduced left ventricular ejection fraction (rLVEF).[2] This form of AI, also called machine learning allows computer programs to identify relationships directly from data. This would appear to be well-suited for ECGs, which consist of digitized data based on time and voltage from various sites on the body surface.[3] Using AI to identify patients with rLVEF might be useful in patients for whom echocardiography or other imaging modalities are not indicated, not available or not cost-effective. Early detection of rLVEF with the implementation of therapy may inhibit progression of the underlying disorder, prevent the development of symptoms, or reduce mortality. This is supported by a study involving screening based on elevated brain-type natriuretic peptide (BNP) followed by echocardiography and appropriate intervention, which reduced the combined rates of rLVEF, diastolic dysfunction, and heart failure (HF).[4] ECG changes reflecting rLVEF are biologically plausible because disorders that result in rLVEF may alter cardiac electrical impulse creation, propagation or repolarization in atrial, ventricular, and conduction tissue by multiple mechanisms such as direct injury, remodeling, and alterations in neuroendocrine activity. If these disturbances are of sufficient magnitude, they may be detectable by the ECG. On the other hand detection of signals due to rLVEF is not straightforward as there are many electrophysiological disorders that result in major changes in the ECG without rLVEF that could mimic or distort the signals related to rLVEF, such as genetic or acquired conduction and repolarization disturbances, atrial or ventricular arrhythmias, and paced rhythms. Furthermore, surface manifestations of rLVEF could be obscured by external electromagnetic signals, skeletal muscle electrical signals, ECG filters, poor electrode contact, incorrect electrode placement, or unusual body habitus.

Attia et al[2,3] report impressive statistics on the ability of the AI-ECG model (AI-ECG) to detect patients with LVEF ≤35%. Three population samples are reported. The first sample was comprised of 6008 adult patients who underwent a standard 12-lead ECG at the Mayo Clinic ECG laboratory between September 1 and 30, 2018 and who had a clinically indicated comprehensive echocardiogram within 12 months of the ECG. The AI-ECG detected patients with LVEF ≤35% with accuracy, specificity, and sensitivity of 86.0%, 86.3%, and 81.5%, respectively. The area under the receiver operating characteristic (ROC) curve (c-statistic) was 0.911, which suggests excellent discrimination. The second sample was the subset of patients who had the echocardiogram performed within a month of the ECG (n = 3874). This was associated with slightly greater accuracy, specificity, sensitivity, and c-statistic (86.5%, 86.8%, 82.5%, and 0.918, respectively). There were 700 AI-ECG test-positive patients, 32% had an LVEF ≤35% by echocardiography (true positives), 27% had an LVEF 36% to 50% on echocardiography and the remaining 41% had a LVEF ≥50%. In more concrete terms, using the test-positive threshold selected by the investigators, if 1000 patients from a similar population were tested, 181 would test positive, 58 would be true positives with an LVEF ≤35%, and the remaining 123 patients would be false positives although 48 would have an intermediate LVEF 36% to 50% and the remaining would have LVEF of >50%. This performance reflects a modest positive predictive value (PPV), which was not reported by the authors, but which can be calculated to be 32% if one assumes a prevalence of 7%. The PPV, which has been considered a useful statistic for assessing the utility of screening tests, is a function of the prevalence, sensitivity, and specificity.

The effect of filtering patients using N-terminal pro-brain-type natriuretic peptide (NT-proBNP) was explored in 522 patients in whom this biomarker was assessed.[2] An NT-proBNP threshold of ≥125 ng/mL reduced the number of false positive cases by five (from 96 to 91) with no increase in the number of false negatives. These improvements reflect a much higher prevalence, calculated to be 16.5% in the patients with NT-proBNP levels, and a higher PPV, calculated to be 44.5%. An NT-proBNP threshold of ≥450 ng/mL reduced false positives by 26 but increased the number of false negatives by three reflecting a higher calculated PPV of 50%. This merely demonstrates that increasing the pretest probability of a test improves the PPV. It is expected that this subset of patients undergoing BNP testing for clinical indications would have a higher prevalence of rLVEF.

The third sample assessed by Attia et al[2] was comprised of 5999 patients who had an ECG obtained but no echocardiogram. Application of the AI-ECG identified 3.5% as having an LVEF ≤35%. The AI-ECG positivity rate was lower by a factor of nearly five than in the first two samples. There was a strong relation of AI-ECG positivity with age. Between the ages of 20 and 45 years, the rates of positive AI-ECG ranged from 1.6 to 1.9, whereas the positive rates of patients greater than 75 years ranged between 6.6 and 33.3. This population of patients who did not have a clinical indication for echocardiography would seem to be an appropriate target for AI-ECG screening, but if the low rate of AI-ECG positivity reflects a large drop in the prevalence of rLVEF, the PPV could be much lower and lower the performance of AI-ECG for screening purposes. On the other hand, this population was not included in the AI-ECG training sample so the distribution of ECG characteristics could be substantially different. A major feature of the AI approach is that performance might improve significantly after the system incorporates data and outcomes (ie, learns) from the new data set.

A large number of studies have attempted to predict rLVEF using diverse variables in several contexts. Ho et al provide an exemplary study using a long list of clinical, demographic, metabolic, and ECG factors at HF presentation to distinguish HF with rLVEF (HFrEF) from HF with preserved LVEF (HFpEF) among 712 participants in the Framingham Heart Study (FHS) and 4436 patients with HF in the Enhanced Feedback for Effective Cardiac Treatment (EFFECT) study.[5] The ECG factors included heart rate, atrial fibrillation, left bundle branch block (LBBB), right bundle branch block (RBBB), any ST-segment elevation, any ST-segment depression, and any T-wave inversion. The prevalence of all ECG variables differed significantly in both the groups (FHS and EFFECT) between patients with HFrEF (defined as LVEF ≤45%) and HFpEF except RBBB and any ST-segment depression. The latter variable was significantly lower in patients with FHS HFrEF compared to patients with HFpEF but not in the EFFECT group. The multivariate-adjusted model included eight variables, five of which were ECG variables (heart rate, atrial fibrillation, LBBB, any ST-segment elevation, and any T-wave inversion). The other variables were female sex, coronary heart disease, and elevated potassium level. All the variables selected by the multivariate analysis predicted HFrEF except female sex and atrial fibrillation, which favored HFpEF. The investigators developed three multivariate models to discriminate between HFrEF and HFpEF in the two cohorts of patients. The c-statistics were between 0.748 and 0.782. Although Ho et al focused on a population much different than Attia et al, their results confirm that ECG variables (heart rate, atrial fibrillation, LBBB, any ST-segment elevation, and any T-wave inversion) contribute to a model that discriminates between low and preserved LVEF in patients with HF with modest statistical power.

Cardiac biomarkers have been assessed by several investigators. Maisel et al[6] evaluated the utility of BNP in 200 patients without known cardiac dysfunction referred for echocardiography for detecting patients LV dysfunction. The c-statistic was 0.95. A BNP of 75 pg/mL was 98% specific for detecting LV systolic or diastolic dysfunction on echocardiography. The PPVs varied with BNP threshold from 71% to 98%. Romano et al[7] assessed the value of NT-ProBNP levels to detect an LVEF <45% in 134 asymptomatic patients with a history of hypertension of at least 5 years. Echocardiography showed normal LVEF in 40 patients, diastolic dysfunction in 80 patients and rLVEF in 14 patients. The c-statistic was 0.89 for the detection of systolic dysfunction with a sensitivity of 83% and a specificity of 80% for a cut-off value of 114 pg/mL with a PPV of 33%. Bibbins-Domingo et al,[8] in a cross-sectional study of 293 outpatients who had stable coronary disease and no history of HF compared elevations in plasma BNP levels with echocardiography for the diagnosis of rLVEF (ejection fraction, <55%) and diastolic dysfunction (diastolic dominant pulmonary vein flow with LVEF ≥55%). A total of 48 patients (16%) had systolic dysfunction. Among the remaining 245 with preserved LVEF, 31 (13%) had diastolic dysfunction. At the standard cutpoint of 100 pg/mL, an elevated BNP level was only 38% sensitive (80% specific) for systolic dysfunction with a c-statistic of only 0.59. These studies demonstrate that the cardiac biomarkers BNP and NT-ProBNP have variable performance in the identification of patients with rLVEF. There are, however, many new biomarkers that may be found to contribute to models that more accurately identify patients with rLVEF.

The model reported by Attia et al relied purely on ECG data. Reinier and colleagues[9] examined the role of conventional measurements from the standard 12-lead ECG to identify patients with rLVEF. The development group consisted of 1014 subjects (560 cardiac arrest cases, 487 controls). Patients with acute ST-elevation myocardial infarction and paced-ventricular rhythms were excluded. Nineteen percent (195 of 1014 subjects) had LVEF ≤35%. The validation group was comprised of 7601 subjects in whom clinically indicated echocardiograms and ECGs were performed. The prevalence of LVEF ≤35% was 6.1% in this group. Patients with paced rhythms were included in the validation group. Atrial fibrillation paced rhythm, and LBBB were considered "major ECG abnormalities." The "expanded" set of ECG variables, which were not assessed in all patients with major ECG variables, included heart rate, P-wave duration, PR interval, QRS duration, QTc interval (Bazett's correction), frontal QRS-T angle (calculated as the absolute difference between the frontal QRS axis and T-wave axis with values 0°-180°), delayed QRS transition zone (R-wave amplitude less than S-wave amplitude in lead V4), delayed intrinsicoid deflection (defined as R-peak time ≥50 ms in lead V5 or V6), and left ventricular hypertrophy (LVH; by Cornell voltage or Sokolow-Lyon criteria). In the multivariable model applied to the validation group, heart rate, QTc interval, QRS duration, QRS-T angle, delayed QRS transition zone and delayed intrinsicoid deflection remained independently associated with LVEF ≤35%, while LVH, prolonged PR interval, and prolonged P wave were not significant. On the basis of six statistically significant ECG markers, an unweighted-expanded ECG panel sum was constructed ranging from 0 to ≥4 abnormal markers. A one-unit increase in the panel sum was associated with 2.9-fold increased odds of LVEF ≤35%, c-statistic 0.831. The odds ratios were consistent in models stratified by sex and age, ranging from 2.6 to 3.5. In the subset of patients without major ECG markers, a panel sum of ≥4 had a PPV of 32.6%.

The studies of Reinier et al[9] and Attia et al[3] (the model development study) had similar aims, ie, to develop a model based on the 12-lead ECG that could identify patients with LVEF ≤35%. Both studies included patients with clinical indications for echocardiography and patients with known disorders associated with rLVEF such as HF and cardiac arrest. The groups for both studies included patients with moderate prevalences of patients with the target LVEF ≤35%, (19% for the development group, 6.1% for the validation group of Reinier et al,[9] 7.8% for the three groups, training, validation, and testing of Attia et al).[3] Reinier et al, excluded patients with ST-elevation myocardial infarction from both groups and pacing from the development group (due to a previous protocol decision). Reinier et al, did not exclude patients based on any other clinical, echocardiographic, or ECG characteristic. Attia, et al, also did not exclude patients based on any clinical, echocardiographic, or electrocardiographic characteristic. Both groups used an echocardiographic estimation of LVEF the precision and accuracy of which is generally modest and depends on many factors especially image quality, sonographer skill, and use of contrast agents. One could question the failure to exclude patients with suboptimal imaging. However, the use of a sufficiently low LVEF threshold (≤35%) provides assurance that the rLVEF group was not contaminated with normal patients.

An important limitation of conventional ECG parameters is that they are not reliably determined by the current-computerized ECG methods and require time-consuming manual measurement or verification by experts.[10] Reinier et al,[9] performed a tour de force in their meticulous analysis of 10 789 ECGs. In contrast, the model developed by Attia et al, is based on direct computer analysis of the digitized-ECG signals and is capable of automatically providing the highly accurate probability of rLVEF without human interaction. Attia et al submitted more than 100 000 ECGs. In addition, the AI process enables continuous feedback and refinement of the model, which can be readily distributed to update the ECG computer systems.

Transparency is a concern for the AI systems. Conventional ECG variables can be well-defined and can provide indirect physiological information regarding autonomic tone (heart rate), propagation velocity (P-wave duration, PR interval, and QRS duration), repolarization (QT interval), impulse creation (atrial and ventricular ectopy), activation sequence (P and QRS morphology), repolarization sequence (QRS-T angle and T-wave morphology), etc. In contrast, the AI-ECG is a "black box," which could be a significant limitation. Attia, et al, noted that the ECG is "pseudo cyclical, and its main features are morphologic." To "enable detection of patterns in these features", and to "extract very subtle patterns within each lead" they used "architectures that were based on convolutional layers for feature extraction."[3] There is no information regarding the role of crucial ECG factors such as heart rate, premature ventricular complexes, ventricular pacing, or atrial fibrillation, and various forms of ECG noise. Concerns have been expressed that machine learning in other contexts has resulted in findings that are not accurate or reproducible.[11] Because the AI systems are programmed to find solutions, problems with the analysis may not be clearly indicated. It appears that the AI-ECG will provide a probability of LVEF ≤35% even if an ECG is too noisy to interpret rather than rejecting the tracing data. New generations of AI systems that will assess the uncertainty and reproducibility of their analyses are under development.[11]

The AI-ECG models were developed and tested in population samples in whom echocardiograms were clinically indicated and available. The ECG-based rLVEF detector would more likely find utility in the third population studied by Attia et al, ie, patients without indications for echocardiography. How the AI-ECG would perform in this or other populations is unknown and would require further study. The PPVs of the populations examined by Attia et al may not be effective for many screening examination purposes. However, a population targeted for screening might have ECG characteristics much different than the training and validation samples. It is uncertain how the AI-ECG would change if the targeted screening population had much fewer ECGs with pacing, LBBB, PVCs, and atrial fibrillation. This underscores the requirement to train and test models in the targeted population.

Despite these concerns, the ability of the AI-ECG to assimilate vast amounts of ECG and clinical data, the ability to analyze ECG data as images rather than as discrete components and the ability to learn indicates the potential to identify relationships among the ECG signals and other clinical variables well beyond the capacity of conventional methods. AI is permeating all aspects of medical care highlighting the need to develop tools to scrutinize and critique new models so improvements to medical care can be implemented efficiently. Many of the current concerns and challenges will likely be overcome by new generations of AI systems. By showing us that the basic 12-lead ECG can reliably identify patients with rLVEF, Attia and colleagues have revealed a new frontier of clinical utility and physiological information based on this convenient and widely available examination. So compelling are the new AI approaches that it is seems inevitable that our current process for ECG analysis will be transformed in the near future.