Taking the Pulse of the Apple Heart Study

John M. Mandrola, MD


March 27, 2019

The Positives of Apple Heart

The Apple Heart Study[1] (AHS) showed that it's possible to enroll almost 420,000 people in a study in less than a year. That is remarkable and surely related to the virtual nature of the enrollment. We should not discount the novelty and importance of this. When we look back at history, AHS may stand out as the beginning of a new way of doing research.

Consumer-directed health gadgets are here whether clinicians like it or not. These devices will allow people—for better or worse—to be more involved in their own healthcare. AHS deserves credit for attempting to sort out the ability of the Apple Watch's irregular pulse detection algorithm to identify AF and guide subsequent care. The Food and Drug Administration deemed the Apple Watch a consumer device, so Apple did not need to do this study.

AHS also addresses an important and yet unknown question: What is the base rate of irregular rhythms, such as atrial fibrillation (AF) or premature ventricular contractions, in the population. We know that AF increases with age, but we don't know how often or how much AF the average 40- or 50-year-old has. Before wearable tech, we monitored people only when there was a reason. One important finding of AHS was that 20% of the AF discovered was greater than 24 hours in duration.

Apple earns credit for partnering with proven researchers. The Stanford research team has a pedigree for doing serious science. I hope Apple steps aside and lets the investigators do their work.

The Negatives of AHS

The demographics of AHS are problematic. Entry criteria required having enough income to buy a smart watch paired with an iPhone. The makeup of enrolled individuals reflects the fact that an Apple Watch is a plaything for the young and affluent. More than 80% of the nearly 420,000 participants in AHS were younger than 55 years, and nearly 70% were white. What's more, only 13% of participants had a CHA2DS2VASc score of 2 or greater.

The pragmatic virtual design of AHS allows for big enrollment numbers, but three serious downsides become obvious. First, since AHS was designed to test the ability of the watch's algorithm to find new AF, the investigators rightly excluded patients with known AF. Yet 15% of those  who received notification of an irregular heart rhythm had AF before signing up. Testing a population enriched with patients with AF rather than the general population makes the precision of the algorithm look better than it actually is.

The second problem with the pragmatic design of AHS was an extremely low engagement rate. More than half of the 2161 people with irregular pulse notifications did not connect with a study telehealth doctor.  Of those who did, some met study exclusion criteria (prior AF or oral anticoagulant use) and ultimately only 450 returned an electrocardiography patch for analysis. In  other words, almost 80% of people with irregular pulse notifications were effectively lost to follow-up or should not have been in the study in the first place.

The third problem was that the investigators tested the second co-primary endpoint, the positive predictive value (PPV) of the irregular rhythm notification, not in the total population but only in those older than age 65—which made up 6% of the total cohort. Since PPV depends greatly on disease prevalence, the choice to measure PPV only in the group with the highest disease prevalence overestimates the performance of the algorithm.

My next critique falls more on the American College of Cardiology (ACC) than Apple or AHS investigators. Given the stark inequities of care in the United States and globally, I question the morality of dedicating this much attention to a study on a low-risk population, a group that already enjoys relatively excellent health. We live in a free society, so Apple can make its own choices, but shouldn't professional organizations serve more lofty ideals? These preliminary unpublished data seem more suited to a poster session rather than the featured presentation in the opening late-breaking clinical trial session of a major conference.

Unknowns and Worries of AHS  

First some specific uncertainties, then some big-picture questions.

AHS used earlier-generation watches (series 1 to 3) that measure the pulse. A later version of the watch (series 4) can record an actual one-lead rhythm strip when paired with an iPhone app. Unknown is whether this or future iterations of smart watches will lead to better outcomes.

A smart watch has to be charged. Many people charge the watch at night, a frequent time for AF to occur. Apple predicts an 18-hour battery life with an overnight charge. At the ACC meeting, we did not hear data on time worn, but that is important.

Many experts were reassured that only 0.5% of the general population was notified of an irregular pulse detection. This is reassuring only if not being notified meant you did not have a problem. In a detailed Twitter thread, Venk Murthy, MD, from the University of Michigan, calculated the sensitivity of an irregular pulse notification as only 47%.

The tension between sensitivity and specificity is one of the reasons why an observational study leaves more questions than answers. As clinicians who will advise people on the use of consumer-facing technologies, we mustn't forget that the only way to know if a device helps people is to study it in randomized controlled trials that measure real outcomes—not surrogates. A necessary skill of the modern-day clinician will be helping people understand hype.

As a doctor, I worry often about the false sense of knowing. Co-principal investigator of AHS, Marco Perez, MD, from Stanford University, said from the podium in a Q & A session after the presentation, "AHS gives clinicians information they can use. It was designed to help doctors know what to do when a patient shows up with AF."

Nothing could be further from the truth. AHS gives clinicians no reliable information about what to do; in fact, it has made our job harder. What, for instance, am I supposed to do if the watch finds a true-positive AF episode of 20 minutes in a 64-year-old? If this device leads to low-risk people getting oral anticoagulants, this could cause harm because bleeding rates may exceed the reduction of stroke rates.  Also, enhanced detection of AF may lead to greater use of antiarrhythmic drugs—a sure bet for increased harm.

Which brings me to my greatest worry. Clinicians learn early on in their training that you do not order a test unless you have a question and plan for the results. More Information hardly equates with gains in knowledge.

Smart watches and digital health will deliver oodles more information. This, along with hype, will surely enrich the makers of devices, and in turn the medical industry.

I remain skeptical that it will improve the human condition. It could even make us sicker.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.