A Standardised, Generic, Validated Approach to Stratify the Magnitude of Clinical Benefit That can be Anticipated From Anti-cancer Therapies

The European Society for Medical Oncology Magnitude of Clinical Benefit Scale (ESMO-MCBS)

N. I. Cherny; R. Sullivan; U. Dafni; J. M. Kerst; A. Sobrero; C. Zielinski; E. G. E. de Vries; M. J. Piccart

Disclosures

Ann Oncol. 2015;26(8):1547-1573. 

In This Article

Background and Methodology

An ESMO Task Force to guide the development of the grading scale was established in March 2013. The members of the Task Force co-chaired by Elisabeth de Vries and Martine Piccart are Richard Sullivan, Nathan Cherny, Urania Dafni, Martijn Kerst, Alberto Sobrero and Christoph Zielinski. A first-generation draft scale was developed and adapted through a 'snowball' method based upon previous work of Task Force members who had independently developed preliminary models of clinical benefit grading. The first-generation scale was sent for review by 276 members of the ESMO faculty and a team of 51 expert biostatisticians.

The second-generation draft was formulated based on the feedback from faculty and biostatisticians and the conceptual work of Alberto Sobrero regarding the integration of both hazard ratio (HR), prognosis and absolute differences in data interpretation.[33,34] The second-generation draft was applied in a wide range of contemporary and historical disease settings by members of the ESMO-MCBS Task Force, the ESMO Guidelines Committee and a range of invited experts. Results were scrutinised for face validity, coherence and consistency. Where deficiencies were observed or reported, targeted modifications were implemented and the process of field testing and review was repeated. This process was repeated through 13 redrafts of the scale preceding the current one (ESMO-MCBS v1.0). The final version and fielded testing results were reviewed by selected members of the ESMO faculty and the ESMO Executive Board.

The goal of the ESMO-MCBS evaluation was to assign the highest grade to trials having adequate power for a relevant magnitude of benefit, and to make appropriate grade adjustment to reflect the observed magnitude of benefit. To achieve this goal, a dual rule was implemented; first, taking into account the variability of the estimated HR from a study, the lower limit of the 95% confidence interval (CI) for the HR is compared with specified threshold values; and secondly the observed absolute difference in treatment outcomes is compared with the minimum absolute gain considered as beneficial. Different candidate threshold values for HR and absolute gains for survival, DFS and PFS, adjusted to represent as accurately as possible the expert opinion of the oncology community, have been explored through extensive simulations. The finally implemented combined thresholds for the HR and the minimum observed benefit that could be considered as deserving the highest grade in both the curative and non-curative setting are outlined in Table 2.

In all forms, HR thresholds refer to the lower extreme of the 95% CI (Figure 1). The performance of the evaluation rule based on the lower limit of the 95% CI of HR, was compared with the simpler rule of using a cut-off for the point estimate of HR, in conjunction with the additional rule on the minimum absolute gain in treatment outcome. The simulation results under different HR values and corresponding power, favoured the proposed approach to use the lower limit of the 95% CI which takes into account the variability of the estimate. The correspondence between an HR value and the minimum absolute gain considered as beneficial according to the ESMO-MCBS, is presented by median survival (OS or PFS) for standard treatment, in Figure 2. For example, for a standard treatment median survival of 6 months, an absolute gain of 3 months corresponds to an HR = 0.67, while a gain of 1.5 months corresponds to an HR = 0.8.

Figure 1.

Use of threshold HR in the ESMO-MCBS exemplified for HR threshold of 0.65.

Figure 2.

The correspondence between an HR value and the minimum absolute gain in months considered as beneficial according to the ESMO-MCBS by median survival (OS or PFS) for control.

The ESMO Magnitude of Clinical Benefit Scale (ESMO-MCBS v1.0)

The ESMO Magnitude of Clinical Benefit Scale version 1 (ESMO-MCBS v1.0) (Appendix 1) has been developed only for solid cancers. Given the profound differences between the curative and palliative settings, the tool is presented in two parts. Form 1 is used to evaluate adjuvant and other treatments with curative intent. Form 2 (a, b or c) is used to evaluate non-curative interventions, with form 2a for studies with OS as the primary outcome, form 2b for studies with PFS or TTP as primary outcomes, 2c for studies with QoL, toxicity or response rate (RR) as primary outcomes and for non-inferiority studies. Form 2a is prognostically sub-stratified for studies where the control arm produced OS greater or less than or equal to 1 year and form 2b for studies where the control arm produced PFS greater or less than or equal to 6 months.

Eligibility for Application of the ESMO-MCBS

The ESMO-MCBS can be applied to comparative outcome studies evaluating the relative benefit of treatments using outcomes of survival, QoL, surrogate outcomes for survival or QoL (DFI, EFS, TTR, PFS and TTP) or treatment toxicity in solid cancers. Eligible studies can have either a randomised or comparative cohort design[35,36] or a meta-analysis which report statistically significant benefit from any one, or more of the evaluated outcomes. When more than one study has evaluated a single clinical question, results derived from well-powered registration trials should be given priority.

Studies with pre-planned subgroup analyses with a maximum of three subgroups can be scored. When statistically significant results are reported for more than one subgroup, then each of these should be evaluated separately. Subgroups not showing statistically significant results are not graded. Except for studies that incorporate collection of tissue samples to enable re-stratification based on new genetic or other biomarkers, findings from un-planned (post hoc) subgroup analysis cannot be graded and they can only be used as foundation for hypothesis generation.

Form 1

This form is used for adjuvant and neoadjuvant therapies and for localised or metastatic diseases being treated with curative intent. This scale is graded A, B or C. Grades A and B represent a high level of clinical benefit (Figure 3). The scale makes allowance for early data demonstrating high DFS without mature survival data. Studies initially evaluated based on DFS criteria alone will need to be revaluated when mature survival data is available. Hyper-mature data from studies that were un-blinded after compelling early results with subsequent access to the superior arm are contaminated, subsequently late intention-to-treat (ITT) follow-up data are not evaluable.[37,38] Pathological complete remission from neoadjuvant therapies is not included as a criteria for clinical benefit because of lack of consistent evidence that it is a valid surrogate for survival in clinical studies.[39–42]

Figure 3.

Visualisation of ESMO-MCB scores for curative and non-curative setting. A & B and 5 and 4 represent the grades with substantial improvement.

Forms 2

These forms are used for studies of new agents or approaches in the management of cancers without curative intent. This scale is graded 5, 4, 3, 2, 1, where grades 5 and 4 represent a high level of proven clinical benefit (Figure 3).

Form 2a. This version is used for therapies evaluated using a primary outcome of OS. The form is stratified by median OS of the control arm ≤12 and >12 months. Preliminary grading takes into consideration HR and median survival gain as well as late survival advantage and is reported on a 4-point scale. When there is differential grading between the median and late survival gain, the higher score prevails. Preliminary scores can be upgraded by 1 point when the experimental arm demonstrates improved QoL or delayed deterioration in QoL using a validated scale or substantial reduction in grade 3 or 4 toxicity. A score of 5 can only be achieved when optimal survival outcomes are further enhanced by data indicating reduced toxicity or improved QoL.

Form 2b. This version is used for therapies evaluated using a primary end point of PFS or TTP. The form is stratified by median duration of PFS of the control arm ≤6 and >6 months. The maximal preliminary score is discounted to 3 because PFS and TTP are surrogate outcomes with a less reliable relationship to improved survival or QoL.[18,20–23] In studies that allow crossover on subsequent therapy, this may be the best available evidence of activity since subsequent therapies may reduce the likelihood of observing survival benefit.

Preliminary scores derived from PFS studies can be upgraded or downgraded depending on secondary outcomes such as toxicity data, improvement in OS or data derived from QoL evaluation. This form incorporates an adverse effect criterion for down-grading in cases of severe toxicity compared with the control arm. If an OS advantage is observed as a secondary outcome, scores are upgraded using the scale on form 2a. In PFS studies that evaluate global QoL, positive findings (as evidenced by statistically significant improvement in global QoL or delayed deterioration in QoL) will upgrade the evaluation by 1 point and, in the absence of survival advantage, the absence of QoL advantage will result in a down-grading by 1 point.

Form 2c. This form is used for therapies evaluated in non-inferiority (equivalence) studies and for studies in which the primary outcomes are QoL, toxicity or RR.

Field Testing of ESMO-MCBS

ESMO-MCBS has been applied in a wide range of solid tumours by members of the ESMO-MCBS Task Force, the ESMO Guidelines Committee and a range of invited experts (Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12)

When discrepancies between graders were observed, this was generally related to either inaccurate data extraction, variable interpretation of the significance and severity of toxicity data, or errors in applying the data to the correct grading criteria.

Comments

3090D553-9492-4563-8681-AD288FA52ACE

processing....