Fragility Index Shows CV Trial Results
Often Fragile

Debra L. Beck

December 19, 2019

About 1 in 4 statistically significant clinical trials in cardiovascular disease show limited robustness, as measured by the fragility index (FI), according to two new studies published in the December issue of Circulation: Cardiovascular Quality and Outcomes.

The fragility index (FI) is the minimum number of additional events needed in one of the arms of a trial for the P-value to cross its significance threshold and "flip" the trial to a nonstatistically significant result. A low FI indicates that statistical significance would be lost with a small change in outcomes.

But although FI helps readers better contextualize the findings of a significant trial — and as in the recent case of the PARAGON trial, it can also be used to explain how just a few events can flip a trial from nonsignificant to significant — how much FI really improves understanding and clinical application of trial results given its many limitations is a matter of some debate.

According to one of the study authors, Muhammad Shahzeb Khan, MD, of John H. Stroger Jr. Hospital of Cook County in Chicago, Illinois, fragility should be just one consideration when assessing a new trial.

"There is no number by which we can definitively say the trial findings are robust or not very robust, so we need to put the fragility index in context with other factors, including the P-value, effect size estimates, and the numbers lost to follow-up," said Khan in an interview with theheart.org | Medscape Cardiology.

In the first study, Mario Gaudino, MD, Weill Cornell Medicine, New York City, and colleagues found an FI of 3 or lower in 27.5% of trials referenced in the 2018 European Society of Cardiology guidelines for myocardial revascularization and the 2014 American College of Cardiology/American Heart Association guideline for stable ischemic heart disease.

This means that changing just three patients or less from one outcome group, (eg, having a myocardial infarction at the end of the study period) to the other outcome (no MI) would flip the trial from positive (P < .05) to negative (P > .05).

In a second study, Khan and colleagues looked only at studies with sample sizes of more than 500 patients that were published in high-impact journals and found that 22.8% of studies had an FI between 1 and 4.

Both papers noted that in an even larger proportion of studies (42.5% for Gaudino et al and 30.1% for Khan et al), the FI was lower than the number of patients lost to follow-up.

Khan explained: "Let’s say a trial has a fragility index of 5, meaning that our results are significant based on five events and if we switched five events to the other outcome, the overall findings would no longer be significant. But if there were 100 patients lost to follow-up, we don’t know how many of those 100 patients might have had events, so how can we be sure about our results?" he asked.

"I think this is the most important piece of our study, this idea that we have no idea what might have happened to those 100 patients lost to follow-up."

Even a high FI is no guarantee of robustness, particularly given that FI is related to the study’s sample size. "The fragility index should be quoted in context with the sample size and, if possible, the fragility quotient, which is the fragility index divided by the sample size," explained Khan.

"If the number of patients lost to follow-up exceeds the FI, the results should still be interpreted with caution," he added.

Is FI an Important Value Add?

Testing the fragility of a trial’s findings is established, but not yet mainstream, according to Henry Seligman, MBBS, and Darrel Francis, MB BChir, MD, both from Imperial College London, United Kingdom, and colleagues, who penned an editorial that accompanied the Khan and Guadino articles.

FI offers little information on top of a careful assessment of the P-value, effect size, and confidence interval, according to the editorialists.

"The fragility index is one of several interesting ways to help evaluate the reliability of evidence from randomized controlled trials," said clinical trials expert Allan Hackshaw, PhD, University College London, UK, in an email exchange with theheart.org | Medscape Cardiology.

"However, it propagates the mindset that statistically significant vs not significant is synonymous with an effect vs no effect, which is not correct. Making a result not statistically significant does not mean that the clinical effect (eg, a relative risk of 0.70) has disappeared. And this is a missing key limitation."

He added that FI is useful for understanding the reliability of a result, "but if investigators want highly reliable data (with high fragility indices) this comes at a cost of having a very large trial."

As well, FI doesn’t account for the time at which events occur. Khan and Guadino accommodated this limitation by only assessing trials with statistically significant dichotomous primary outcomes (P < .05) with random allocation to treatment or control with a 2-arm or 2-by-2 factorial design.

Trials with crossover, cluster, or noninferiority designs, or with continuous or nondichotomous outcomes, cannot be assessed using the FI. Seligman suggested that the confidence interval, used properly, "contains all the information that is contained within the P-value and the FI," and, unlike the FI, works with continuous variables and time-to-event data.

Cardiovascular trials do not appear to be special in the proportion of positive trials showing low FI values. According to Khan, cardiology compares well with trials in other fields, and even somewhat better than various other subspecialties, including critical care medicine, spinal surgery, sports medicine, and anesthesiology.

He suggests though that for as long as the medical research world remains reliant on P-values, fragility helps clinicians understand trial findings and their clinical application.

"I'm not a big believer in P-value thresholds," he said. "I think we should look at the completeness of the data to drive our decision making, but if we continue to use P-value thresholds for statistical significance, I think the fragility index is a good number just to give an idea of how easily the results can be switched."

The researchers have disclosed no relevant financial relationships.  

Circ Cardiovasc Qual Outcomes. Published online December 11, 2019.
Khan et al article, Gaudino et al article, Editorial

For more from theheart.org | Medscape Cardiology, join us on Twitter and Facebook

Comments

3090D553-9492-4563-8681-AD288FA52ACE
Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.
Post as:

processing....