The Oxford Classification for Immunoglobulin A Nephropathy

A Common Language Blurred By Dissonant Voices

Nicolas Maillard; Christophe Mariat


Nephrol Dial Transplant. 2019;34(10):1617-1618. 

The Oxford Classification was developed between 2004 and 2009 to address the heterogeneity of the numerous pre-existing histoprognostic classifications of immunoglobulin A (IgA) nephropathy. The methodology leading to this goal was thought to be robust and internationally consensual. Data from 265 adults and children with primary IgA nephropathy were gathered from 16 centres worldwide to identify kidney biopsy lesions that were the most strongly associated with severe outcomes. Importantly, patients with very mild presentation (proteinuria <0.5 g/day), advanced disease [estimated glomerular filtration (eGFR) <30 mL/min/1.73 m2] and with severe evolution (minimal follow-up of 1 year) were excluded. A first step consisted of selecting elementary lesions that were (i) highly reproducible between observers and (ii) statistically independent of each other.[1] In the second step, only lesions associated with progression to renal failure were retained.[2] This prognostic impact had to be significant in univariate models as well as in multivariable adjusted models integrating validated prognostic clinical variables. This methodology led to the selection of three elementary lesions, namely M for mesangial hypercellularity (M0 if <50% of the glomeruli, and M1 if present in >50% of glomeruli), S for segmental glomerulosclerosis (S0 and S1) and T for tubular atrophy/interstitial fibrosis (T0, T1, T2, depending on the extent of the lesion). Due to its strong association with corticosteroid treatment, potentially biasing its prognostic value, the E lesion, for endocapillary hypercellularity (absence, E0 or presence, E1), was finally qualified after consensual agreement. Some years after this first classification, a large-scale multicentre study demonstrated the prognostic impact of crescents even with a focal distribution.[3] This description led to a revised Oxford Classification in 2016, comprising crescents as an additional elementary lesion (named C0, C1 and C2, for absence, presence <25%, and presence >25%, respectively).[4]

This classification has been evaluated for external validation in numerous studies confirming generally the association of M, E, S, T and C lesions with hard outcomes in univariate models. This evaluation was, however, less consistent across studies after adjustment on clinically relevant variables (proteinuria >1 g/day, hypertension and eGFR at diagnosis).[4,5] Importantly, the Oxford Classification in combination with clinical data at biopsy has been shown to provide the same predictive power as monitoring clinical data for 2 years.[6] In addition, it is suggested that some histopathological lesions might be more responsive to corticosteroids than others and thus, that the MEST-C score could help guiding therapy.[7,8] Overall, the ability of the Oxford Classification to improve the management of patients suffering from IgA nephropathy is undoubtedly gaining credence among nephrologists.

This being said, many other key dimensions of the Oxford Classification still need to be explored in order to definitively validate its real clinical added value. Among them, are (i) the fact that in practice certain combinations of the M-E-S-T-C lesions are more frequently observed than others, with no information regarding the prognostic consequences of those preferential combinations, (ii) whether the evolution over time of the MEST-C score (obtained from repeat biopsies) could further ameliorate prediction and (iii) the true reproducibility of this classification under real-life conditions.

This latter issue has been directly addressed in the present study conducted by Bellur et al.[9] with the additional aim to evaluate potential consequences in terms of treatment allocation and prognostic impact. The authors took advantage of the large European multicentre retrospective VALIGA cohort in which 1147 patients were scored using the Oxford Classification independently by both a local pathologist and a central expert pathologist. By describing discrepancy between pathologists and how this translates into outcome prediction, the authors propose a pragmatic evaluation of the Oxford Classification.

The main results of this study are the followings. While M, E, S and C lesions scored by local pathologists were independently associated with the decision to use immunosuppressive treatments (mainly steroids), reproducibility between local and central pathologists was found to be moderate for S and T (kappa coefficients of, respectively, 0.51 and 0.53), and only poor for M, E and C (0.28, 0.19 and 0.24). Overall, local pathologists tended to overscore all elementary lesions except S. Discrepancies between local and central pathologists were found, almost systematically, to impair the ability of each elementary lesions to predict outcome.

Altogether these results are in favour of a global scoring bias (with significant clinical impact) for M, E and C lesions, with however distinct interpretations. As for mesangial cellularity, many local pathologists fail to strictly follow the Oxford Working Group recommendations by, for example, scoring from Hematoxillin and Eosin sections (when PAS-stained sections should be preferred) or by focusing on glomerular areas that should not be assessed (e.g. peri-hilar area). This should be, if not entirely fixed, at least improved with training and practice. In this regard, an international web-based training programme should soon be available. On the other hand, E and C lesions are very sensitive to sampling differences, with only a unique segmental lesion being sufficient for positivity. Since central assessment was based on a lower number of slides, local pathologists might have been, here, the most performant (as suggested by the fact that locally scored E and C lesions were better associated with outcome). It is important to mention here that CD68 immunostaining may help to refine the assessment of E lesion, as recently reported.[10] Interestingly, S was the only lesion that was locally (and probably, wrongly) underscored, despite the fact that more slides were examined by local pathologists. Here again, reinforced training could be the answer, especially for certain forms of S (e.g. isolated capsular adhesion). Another solution worth mentioning would be to adopt a fully automated process of scoring using artificial intelligence, a strategy similar to what is being developed in other fields of medicine.[11]

While the analysis of Bellur et al. raises important concerns on the reproducibility of the Oxford Classification when used under real-life conditions, this has to be balanced with the fact that, in oncology, for instance, other pathological grading scores that are considered as fully operational do not exhibit better concordance than the one reported here.[12,13]

Finally, the Bellur et al. study is a remarkable reminder of the absolute necessity to re-evaluate the robustness of any diagnostic test after its widespread implementation and in conditions of routine practice. In this respect, the 'real-life context' should certainly also extend to the entity the test aims to diagnose. Applied to the Oxford Classification, it is evident that all the possible combinations between M, E, S, T and C lesions are not observed with an equal frequency in practice. Interestingly, a certain level of unbalanced distribution between elementary lesions was already suggested during the genesis of the Oxford Classification.[1] Whether there is, under real-life conditions, only a limited number of preferential combinations would now need to be thoroughly examined in large cohorts of patients. If this turns out to be verified, then it would be preferable to evaluate the diagnostic performance and the risk of progression for each combination of lesions, rather than for each elementary lesion. In this scenario, the hazard ratio for poor outcome of a given combination of elementary lesions is clearly not equivalent to the addition of each individual lesion's hazard ratio. This approach could allow us to better (and in a clinically more relevant manner) stratify the risk of progression and, 'in fine', to better identify subgroups of patients who should preferentially participate in clinical interventional trials and/or those who could be proposed more aggressive treatments. This would be a first step towards precision medicine for IgA nephropathy.