Broken PROMISE: CTA for Suspected CAD in Patients With Diabetes

John M. Mandrola, MD


March 05, 2019

Millions of people encounter the healthcare system because of chest pain. The best way to evaluate these patients garners great debate. One way is to look at the anatomy noninvasively, with coronary computed tomographic angiography (CTA); the other way is to assess for ischemia with functional stress testing.

A recent study[1] concluded that in patients with diabetes who present with chest pain, "a CTA strategy resulted in fewer adverse CV [cardiovascular] outcomes than a functional testing strategy. CTA may be considered as the initial diagnostic strategy in this subgroup."

This flawed conclusion from a post hoc analysis of the PROMISE trial[2] deserves critique—both on the specifics and on what it says about adjudication of clinical science through traditional peer review.

First, the Specifics

PROMISE was a randomized controlled trial (RCT) comparing CTA vs functional stress testing in 10,003 patients who presented with stable chest pain.[2] The main results were neutral: The composite primary endpoint of death, myocardial infarction (MI), hospitalization for unstable angina, or major procedural complication occurred in 3.3% of patients in the CTA arm and 3.0% of those in the functional stress testing arm (adjusted hazard ratio, 1.04; 95% confidence interval, 0.83 to 1.29; P = .75).

The most recent analysis set out to assess whether CTA was superior to functional stress testing in reducing the adverse outcomes of CV death or MI among symptomatic patients with diabetes.

Before we get to the results, notice first that the authors' endpoint of interest in this substudy, CV death and MI, is different from the primary endpoint of the main trial: all-cause death, MI, unstable angina, and procedural complication.[3] In an email, corresponding author Pamela Douglas, MD, wrote: "We have been criticized for using 'soft' endpoints in previous papers, so we have gravitated to the 'hardest' composite we can in which all components are clearly cardiac events … hence CV death/MI."

Another issue: This is a post hoc analyses of a subgroup from a neutral trial. Vanderbilt statistics professor Frank Harrell writes on Twitter: "Sample sizes for overall randomized trials are just barely adequate (if that); sample sizes are not adequate for subgroup analyses that effectively reduce the already minimal sample size."

This point is highly relevant because the investigators of the original PROMISE trial estimated that they would need 10,000 patients to detect a 20% difference in the primary endpoint, assuming an event rate of 8% in the functional testing group at 2.5 years.[3] Because the actual rate of events was 3%, even the main trial was underpowered. This substudy of patients with diabetes included only 1908 people, or one fifth of the total population. Notwithstanding the post hoc change in endpoint, you can see the problem of drawing conclusions from such a small subsample of an already underpowered trial.

In a Twitter thread, Venk Murthy, MD, from the University of Michigan, offered other challenges. He first noted the problem of multiplicity: that is, if you test for an effect in enough subgroups, by chance alone, you will find some have significant P values.

Peter Sleight, MD, from Oxford famously analyzed the ISIS trial (streptokinase, aspirin, or both in acute MI) by astrological sign and found that aspirin was not beneficial in Geminis or Librans.[4]

The PROMISE authors say that this post hoc analysis was prespecified.

Even so, Columbia University statistics professor Andrew Gelman, PhD, has described the issue of doing analyses after the data are known as the problem of forking paths. His point is that even if prespecified, analyses that are contingent on the data are fragile.

In this case I would say implausible rather than fragile. The author's conclusion that among patients with diabetes, CTA "resulted" in fewer adverse CV outcomes comes from their observation that patients with diabetes who underwent CTA rather than functional testing had a lower risk for CV death or MI, but they found no such difference in patients without diabetes. They reported a P value of .02 for interaction with diabetes.

The event rates in Table 11 in the supplement (not the main paper) expose the core problem with this conclusion. In patients with diabetes, the rate of MI was only 0.2% in the CTA arm vs 1.3% in patients who had a functional stress test; this difference drove the "positive" finding. But that same table shows that the number of MIs in the CTA arm was 3.5-fold lower in patients with diabetes (2 of 936) than in those without diabetes (0.7% or 24 of 3564). Think about that; how is it possible that people with diabetes had 3.5-fold fewer MIs than those without diabetes? It is not.

The likely reason for this positive association is chance.

Now the Implications

Ben Mazer, MD, and I recently wrote about the problems with traditional review. That this post hoc substudy made it into an influential journal helps make our case.

The main PROMISE study showed no advantage for either testing strategy. And this was not an outlier. I was a coauthor on a meta-analysis of 13 trials comparing CTA vs functional stress testing studies in patients with chest pain.[5] We found that CTA associated with an increased rate of coronary angiography and revascularization procedures, but no differences in mortality and cardiac hospitalizations compared with functional stress testing. We did note that CTA was associated with a 0.4% reduction in MI, but this difference was driven by inclusion of the SCOT-HEART trial.[6] Originally, we did not include the SCOT-HEART trial in our analysis because the vast majority of patients in that trial (both groups) received stress testing; however, peer reviewers forced us to include it.

The evidence clearly shows there is no difference in outcomes with CTA vs functional stress testing. That means the increase in downstream procedures with CTA suggests anatomic imaging in the millions of patients who present with chest pain fosters low-value care.[7]

Despite these priors and the flaws outlined above, the editors of the Journal of the American College of Cardiology (JACC) and the peer reviewers let the authors use causal language in the conclusion. They also permitted the authors to bury actual event rates in the supplement and did not require an intention-to-treat analysis of the outcomes.

Causal language bothers me the most. Since JACC wields great influence, this paper will foster the impression that CTA is the better test for patients with chest pain who have diabetes. I can't prove this false, but this flawed post hoc analysis surely does not prove it is true.

The good news is that many thoughtful people are critically appraising the paper in the public domain. The bad news is that these discussions will not accompany the study in the indexed literature. There it remains—a flawed study, with overreaching conclusions and a supportive editorial.[8]

Something needs to be done about this. This problem transcends the evaluation of patients with chest pain. It goes to the reliability of clinical science writ large.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.
Post as: