Non-inferiority Trial Design in Drug Development

A Primer for Cardiovascular Healthcare Professionals

Fabio Angeli; Paolo Verdecchia; Gianpaolo Reboldi

Disclosures

Am J Cardiovasc Drugs. 2020;20(3):229-238. 

In This Article

Determining a Noninferiority Margin

The noninferiority margin should be based on statistical considerations and clinical judgement as to how effective a new treatment must be in order to be declared not clinically inferior to the standard treatment.

A number of systematic overviews demonstrated that the method of determining the margin was not mentioned in more than half of the published noninferiority trials.[18,19] More importantly, a review aiming to specifically assess the methodological quality of noninferiority trials showed that the noninferiority margin was defined in most reports (94%) but justified for only one-quarter of the trials.[20] Similarly, Wangge et al.[21] documented that the noninferiority margin was reported in 97.8% of the trials, but only 45.7% of the trials reported the method used to determine the margin. In 22% of the trials, the margin was determined based solely on the investigator's own assumption, and in 8.6% of the trials the margin was stated as an acceptable clinical difference according to the literature.[21]

Methods

Noninferiority margins should be defined based on statistical considerations and clinical judgement.[1,2,6,7] Three methods can be used to define noninferiority margins: the point-estimate method, the synthesis method (or putative placebo), and the fixed-margin (95–95%) method.[22,23] In all three methods, the analysis of noninferiority is performed by comparing the CI from the noninferiority trial to that of the margin.[12,24,25]

The margin in the point-estimate and synthesis methods is determined based on the pooled point estimate itself. The point-estimate method assumes constant variability in the estimates of the active comparator. In the synthesis method, the CI estimated from the noninferiority trial is adjusted to account for the variability of the estimates of the active comparator. However, the synthesis method is often applied by determining a test statistic that shows whether the new drug retained a fraction (the preserved fraction) of the effect of the active comparator, as described by Holmgren.[22] The synthesis method could also be used to test whether the effect of the new drug is superior to a putative placebo.[26,27]

The FDA[13] prefers the fixed-margin (95–95%) method; it is considered the most straightforward and readily understood approach.[13]

Margins

The method starts by identifying two different measures (M1 and M2). M1 is the effect of the active control compared with placebo. Determining M1, as the first step in defining a noninferiority margin, can be based on (1) one or more placebo-controlled trials of the active comparator that have a design similar to the current noninferiority trial or (2) a meta-analysis of several placebo-controlled trials. The latter approach is encouraged because it will result in a pooled, more precise effect estimate of the active comparator.[13,14] Thus, M 1 is chosen as a conservative estimate (smallest effect size possible) of the effect of the active comparator, which is the upper bound of the 95% CI of the pooled effect size, rather than the point estimate.

The second step is to calculate M2 from M1 by choosing how much of the treatment effect is judged necessary to be preserved. In other words, M2 represents the largest clinically acceptable difference (degree of inferiority) of the test drug compared with the active control, a consideration that may reflect the seriousness of the outcome, the benefit of the active comparator and the relative safety profiles of the test drug and the comparator. This factor has considerable practical implications. For example, in large cardiovascular studies, it is unusual to seek retention of more than 50% of the effect of the control drug, even if this might be clinically reasonable, because doing so will usually cause the size of the study to become infeasible. Thus, a preserved effect of 50% to determine M2 is commonly used.[13] Choosing a higher percentage to be preserved (e.g., 67%, where M 2 is 33% of M 1) results in a stricter or more conservative noninferiority margin, meaning it is more difficult to conclude noninferiority.

When a risk difference (RD) is the outcome measure, the formula to calculate M 2 is,

For relative risk (RR) and other ratio measures, the preferred formula calculates the margin using the natural logarithm,

Finally, the results of the noninferiority trial are compared with the prespecified noninferiority margin (M 2) and noninferiority is concluded if the upper bound of the 95% CI for the effect estimate is smaller than the noninferiority margin.

Outcome Metrics

The choice of the outcome metrics to analyze noninferiority (absolute vs. relative metrics) is another issue worth mentioning.

The noninferiority margin should be based on an absolute measure, such as the RD, or a relative measure, such as the RR.[13]

Absolute measures of effects, such as RDs, tend to be more subject to heterogeneity in the effect estimates of the active comparator versus placebo than relative measures of effects.[13,27]

Relative metrics are less dependent on the baseline risk, less likely to show heterogeneity between trials and are mathematically more convenient.[13,27]

Of note, in the context of noninferiority trials, the RDs and RRs can yield opposite conclusions regarding noninferiority if the rate of events seen in the active comparator group differs from the assumed rate used to define the noninferiority margin (see Sect. 5).[13]

Furthermore, a simulation study documented how the conclusions of a noninferiority trial changed when the observed risk differed from the expected risk.[28]

Specifically, Xie et al.[28] demonstrated that when the hazard ratio (HR) is the effect measure, the probability of concluding noninferiority will increase as the underlying risk in the control group increases. When difference in two Kaplan–Meier estimators is the effect measure, the probability of concluding noninferiority will decrease as the underlying risk in the control increases.[28]

Comments

3090D553-9492-4563-8681-AD288FA52ACE

processing....