Non-inferiority Trial Design in Drug Development

A Primer for Cardiovascular Healthcare Professionals

Fabio Angeli; Paolo Verdecchia; Gianpaolo Reboldi


Am J Cardiovasc Drugs. 2020;20(3):229-238. 

In This Article

Motivating Examples From Cardiovascular Medicine

The unequivocal efficacy of warfarin for prevention of thromboembolism in patients with atrial fibrillation (AF)[29] means a placebo-controlled study would be unethical. In this context, noninferiority trials of new agents would allow clinicians to select one drug over the other based on alternative factors, such as safety, convenience, or cost.[4,5]

We reanalyzed the results of some trials exploring the effect of new oral anticoagulation agents on the risk of stroke and systemic embolism. This led us to elucidate some key findings in the design and interpretation of noninferiority trials.

Case Studies for the Fixed-margin Method

In the last few years, new classes of direct oral anticoagulant medications (direct thrombin inhibitors and direct inhibitors of factor Xa) have been released in the market.[30] Specifically, four direct oral nonvitamin K anticoagulants[31–34] have been proved noninferior or even superior to warfarin in reducing the risk of stroke and systemic embolism in patients with nonvalvular AF.[35,36] Unlike warfarin, direct oral anticoagulants do not need periodic or routine measurements of anticoagulation and have a more convenient administration route and fewer interactions with drugs and food.[30]

All the trials[31–34] used warfarin as the active comparator, and two trials[31,34] used RR to define the noninferiority margin, setting the noninferiority margin at the same level.

In Randomized Evaluation of Long-term anticoagulant therapY (RE-LY), a 2-year multicentre noninferiority trial, patients with AF with an increased risk of stroke were randomly assigned (by allocation concealment) to receive dabigatran 110 mg twice daily or 150 mg twice daily (blinded) or warfarin (open label).[31]

Rivaroxaban Once Daily Oral Direct Factor Xa Inhibition Compared with Vitamin K Antagonism for Prevention of Stroke and Embolism Trial in Atrial Fibrillation (ROCKET AF) was a multicenter, randomized, double-blind, doubledummy, event-driven trial conducted at 1178 participating sites in 45 countries.[34] It recruited patients with nonvalvular AF, as documented on electrocardiography, who were at moderate to high risk for stroke. Patients were randomly assigned to receive fixed-dose rivaroxaban 20 mg daily, or 15 mg daily in patients with a creatinine clearance of 30–49 ml/min, or adjusted-dose warfarin (target international normalized ratio [INR] 2.0–3.0). Patients in each group also received a placebo tablet to maintain blinding.[34]

In both cases, the minimum noninferiority threshold was selected based on a meta-analysis published in 1999,[29] which quantified the effect of warfarin on the prevention of thromboembolic events versus placebo or absence of treatment, at a RR of 0.38 (95% [CI] 0.28–0.52). It included six trials involving 2900 patients with a total of 186 strokes: anticoagulation with oral vitamin K antagonists was compared with placebo (five trials[1,3,7–9]) or control (one trial[6]); of note, to estimate the RR reduction, the combined odds ratio was computed using the modified Mantel–Haenszel (Peto) method (Figure 3).[37]

Figure 3.

Meta-analysis of six trials involving 2900 patients with a total of 186 strokes, comparing anticoagulation with oral vitamin K antagonists versus placebo or control. Data from Hart et al..29 AFASAK Copenhagen Atrial Fibrillation, Aspirin, and Anticoagulation Study, BAATAF Boston Area Anticoagulation Trial for Atrial Fibrillation, CAFA Canadian Atrial Fibrillation Anticoagulation Study, CI confidence interval, EAFT European Atrial Fibrillation Trial, RCT randomized controlled trial, RR relative risk, SPAF Stroke Prevention in Atrial Fibrillation Study, SPINAF Stroke Prevention in Nonrheumatic Atrial Fibrillation

The procedure for selecting the threshold and evaluating the results was as follows:

  1. the reference category needs to be changed, as if the effect of the "placebo or absence of treatment" was being calculated with respect to that of warfarin. In our case, this effect is the inverse of 0.38, which corresponds to an RR of 2.63 (95% CI 1.92–3.57),

  2. the lower margin of this CI (1.92) is considered the minimum noninferiority threshold for the new anticoagulants,

  3. a noninferiority threshold that assumes warfarin has a hypothetical effect that is just 50% of its real effect is chosen,[15]

  4. accordingly, the minimum noninferiority threshold is set at 1.46,

  5. to conclude that the test drug is not inferior to active control, the upper limit of the 95% CI of the effect of the new treatment compared with that of warfarin cannot exceed 1.46.

As depicted in Figure 4, plotting the RRs and 95% CIs of dabigatran and rivaroxaban when compared with warfarin, we show that their noninferiority margins were stricter than the 50% preserved effects reference noninferiority margin (1.03, 1.12 and 0.83 for rivaroxaban, dabigatran 110 mg, and dabigatran 150 mg, respectively); thus, the conclusion of noninferiority in these trials was proven. Of note, noninferiority has also been documented using the 50% preserved effects reference noninferiority margin computed on log scale (1.38).

Figure 4.

Steps in defining a noninferiority margin in two trials investigating the effects on stroke and systemic embolism of direct oral anticoagulants compared with warfarin. See text for details. Data from Hart et al.,29 Connolly et al.31 and Giugliano et al..32 CI confidence interval, ITT intention to treat, RCT randomized controlled trial, RE-LY randomized evaluation of long-term anticoagulant therapy, ROCKET Rivaroxaban Once Daily Oral Direct Factor Xa Inhibition Compared with Vitamin K Antagonism for Prevention of Stroke and Embolism Trial, RR relative risk

Furthermore, the RE-LY trial showed that dabigatran 150 mg twice daily was superior to warfarin at reducing stroke or systemic embolism; indeed, the RR and CI for this treatment arm sits wholly above 1.

The use of Different Outcome Metrics

Ximelagatran is an oral direct thrombin inhibitor that has been investigated as a new antithrombotic option for prophylaxis or treatment of thromboembolic disease.[38,39] In this context, the SPORTIF V (Stroke Prevention using an Oral Thrombin Inhibitor in Atrial Fibrillation) was a doubleblind, randomized, multicenter trial conducted at 409 North American sites, involving 3922 patients with nonvalvular AF and additional stroke risk factors.[40] The efficacy of the oral direct thrombin inhibitor ximelagatran 36 mg twice daily was compared with adjusted-dose warfarin (aiming for an INR 2.0–3.0) for prevention of stroke and systemic embolism.[40]

As reported in the statistical plan of the trial,[40,41] the noninferiority margin in the primary analysis was based on absolute event rate differences. Noninferiority of ximelagatran over warfarin was accepted if the upper bound of the CI around the estimated difference in primary event rates lies below Δ.[40,41] An absolute Δ of 2% was adopted based on the expected event rate during warfarin therapy and a judgment about the difference between treatments that would be clinically meaningful.[40,41]

During follow-up, 37 primary events were observed among patients assigned to warfarin and 51 primary events among patients assigned to ximelagatran, corresponding to incidence rates of 1.16% and 1.61%/year, respectively (p = 0.13 for a difference between treatments).[40] The p value for noninferiority was < 0.001.[40] The upper bound of the 95% CI surrounding the difference of 0.45%/year was 1.03%, well below the specified margin of 2.0%/year.[40] Hence, noninferiority was concluded.

Nonetheless, the absolute event rates with warfarin treatment seen in the SPORTIF V trial (1.2%/year) was lower than expected.[40] Specifically, the event rate of warfarin that was determined in SPORTIF V[40] was half of the expected rate that was observed in historical studies (2.4%),[29] suggesting violation of the constancy assumption.[23]

A key assumption required for valid inference in the noninferiority setting is the constancy assumption; it requires that the effect of the active comparator in the noninferiority trial is consistent with the effect that was observed in previous trials.[42]

Unfortunately, violations of the constancy assumption due to intertrial heterogeneity can result in a dramatic increase in the rate of incorrectly concluding noninferiority in the presence of ineffective or even harmful treatments.[43,44] These results highlight the need for statistical methods that use all available information to detect and account for violations of the constancy assumption in noninferiority clinical trials.[43,44]

In our case, a reanalysis by Althunian et al.[24] of the results of the SPORTIF V trial highlighted these aspects, demonstrating that the conclusion changed when a different approach was used to determine the noninferiority margin (point-estimate method).

Briefly, they estimated a pooled risk difference of − 3.75% (95% CI − 5.54 to − 1.96) from the six placebo-controlled trials of warfarin.[24] However, a high level of heterogeneity among these six trials (I2 = 58.1%, p = 0.036) was observed. The authors noted[24] that the large heterogeneity was suggestive of violation of the constancy assumption, and the impact of this large heterogeneity of the RDs from historical studies became evident when performing a sensitivity analysis for the SPORTIF V trial[40] using the historical placebo-controlled trials of warfarin with low heterogeneity (i.e., only four of the six placebo-controlled trials were included). The new pooled estimate was − 2.62% (95% CI − 3.77 to − 1.47) with no heterogeneity (I 2 = 0%, Q = 0.73, p = 0.86). A noninferiority margin (1.31%, M 2) was defined for the point-estimate method to preserve 50% of the pooled point estimate (2.62%, M 1). This margin was exceeded by the upper limit of the CI from the SPORTIF V trial (1.64%), therefore, noninferiority was not demonstrated (Figure 5).

Figure 5.

Efficacy of the oral direct thrombin inhibitor ximelagatran compared with adjusted-dose warfarin for the prevention of stroke and systemic embolism in the SPORTIF V (Stroke Prevention using an Oral Thrombin Inhibitor in Atrial Fibrillation) trial; results of three different methods to define the noninferiority margin and to test noninferiority are depicted (see text for details)

To further expand the results obtained by Althunian et al.,[24] we applied the same method to the SPORTIF V trial[40] as used to establish the noninferiority margin in RE-LY[31] and ROCKET[34] (Figure 5). Using RR as the outcome metric and setting the minimum noninferiority threshold at 1.46, ximelagatran was 38% less effective than warfarin for the prevention of stroke and systemic embolism, and the 95% CI for this estimate (0.91–2.10) straddled both 1 and Δ (1.46). Thus, noninferiority remains unproven (Figure 5).

Results were also similar when using the point-estimate and the synthesis methods.[24] Using the point-estimate method, the noninferiority margin was calculated as follows: the RR of placebo compared with warfarin from the six placebo- controlled trials was 2.77 (1/0.36 = 2.77), and M2 was 1.66 (50% of the log relative risk of 2.77). The upper limit of the CI of the risk of ximelagatran compared with warfarin in SPORTIV V trial (2.10)[40] exceeds the noninferiority margin of 1.66, and the noninferiority of ximelagatran compared with warfarin was not concluded (Figure 5).

The same margin was used to apply the synthesis method; the CI was adjusted to account for the variability in the point estimates from the six placebo-controlled trials of warfarin.[29] The estimated standard error of the log(RR) of warfarin against placebo was 0.19, whereas the standard error of the log(RR) of ximelagatran against warfarin was 0.22. The indirect estimate of the standard error of the effect of ximelagatran against warfarin was calculated as the square root of (0.22)2 + (0.5 × 0.19)2 = 0.24. This led to an indirect 95% CI around 1.39 of 0.87–2.22, which was wider than the CI based on the data of the noninferiority trial only. Again, the margin M 2 was included in the CI, hence noninferiority cannot be concluded (Figure 5).