Prospective External Validation of a New Non-invasive Test for the Diagnosis of Non-alcoholic Steatohepatitis in Patients With Type 2 Diabetes

Thierry Poynard; Valérie Paradis; Jimmy Mullaert; Olivier Deckmyn; Nathalie Gault; Estelle Marcault; Pauline Manchon; Nassima Si Mohammed; Beatrice Parfait; Mark Ibberson; Jean-Francois Gautier; Christian Boitard; Sébastien Czernichow; Etienne Larger; Fabienne Drane; Jean Marie Castille; Valentina Peta; Angélique Brzustowski; Benoit Terris; Anais Vallet-Pichard; Dominique Roulot; Cédric Laouénan; Pierre Bedossa; Laurent Castera; Stanislas Pol; Dominique Valla


Aliment Pharmacol Ther. 2021;54(7):952-966. 

In This Article


This prospective study examined the association of the NashFibroTest panel and liver histology in a cohort of type 2 diabetes patients undergoing biopsy for investigation of suspected NAFLD. The results validated the diagnostic performances with the Obuchowski measure, the primary endpoint. To our knowledge, this is the first prospective study focusing on patients with type 2 diabetes in a context of use of diabetology clinics. The present findings confirm the results of several retrospective studies in patients with type 2 diabetes,[12,16,17] and in subsets of patients at risk of Nash including type 2 diabetes.[5–7]

The results of the comparisons between NashFibroTest and NITs confirmed the similar performance already observed in NAFLD and viral hepatitis, for FibroTest vs VCTE in intention to diagnose, with a higher reliability of Fibrotest vs VCTE, as well as the higher performance of FibroTest vs FIB-4, especially when the biopsy was more than 15 mm long.[6,31,36,40]


The study has several advantages compared to others evaluating the performance of NITs in NAFLD.

All of the elements in the liver-FIBROSTARD checklist were assessed except for cost-effectiveness (File S2).[23] Although the main limitations have been well known since 2003,[34] few studies have used appropriate methods.

This study takes into account a possible spectrum effect using the Obuchowski measure as the main endpoint, as recommended.[19,20,23] This was particularly important because our population had a high prevalence of minimal fibrosis, and for the grading of steatosis because of the very low prevalence of grade 0. In patients with type 2 diabetes, the influence of the spectrum effect on binary AUROCs explains the misleading interpretation of binary AUROCs in the absence of face-to-face comparisons between NITs,[12,16,17] even when using the C-statistic,[17,19] that is, only one comparison for fibrosis staging instead of 10 pairwise comparisons by Obuchowski measure, or the Harrel-C statistic which has a risk of overestimation.[19]

This really means for practice that a clinician can prefer a test with a 0.70 binary AUROC predicting significant fibrosis or significant Nash, because of the methodological quality of the validation of this test. He can also reject a test with 0.90 binary AUROC because the validation studies had not eliminated a spectrum effect or a risk of overestimation due to the uncertainty of biopsy.

The NashFibroTest panel was constructed using the SAF scoring system, which has several advantages compared to the standard CRN scoring system.[6–10,22] A simpler definition of activity was used as a reference: hepatocyte ballooning, and lobular inflammation with at least 1 point for each category. Indeed, this definition does not require the presence of steatosis or the presence of both lobular inflammation and ballooning. This independence among features reduces the risk of false positive/negatives.[21] The NAS score is not appropriate for the construction of a NIT because it adds the grade of steatosis to the grades of ballooning and lobular inflammation.[7–9,22] A NAS score of 4 can correspond to a patient with grade 3 steatosis and grade 1 lobular inflammation as well as to a patient with higher histological activity, grade 2 lobular inflammation, grade 1 ballooning and grade 1 steatosis. Furthermore, the grades of inflammation with the SAF score are more detailed with less inter-pathologist variability, and ballooning is differentiated from round and clear hepatocytes,[9] but not in the CRN score.[22] Our study used a centralised reading by a single expert, reducing the inter-observer variability.

Our results externally validated that a single blood sample provided an independent assessment of the severity of three histological features of NAFLD, the stage of fibrosis, the SAF grades of Nash by NashTest-2 and steatosis by SteatoTest-2, including elementary features of activity. This is an improvement in comparison to our first generation of tests. The sample was analysed in a biochemistry unit and results were obtained within few hours. These results (Obuchowski measures and NPV) confirmed previous studies (Table 3 and Table 4).[6] In cases of lower prevalence, NIT of this type with high NPV would be an excellent 'rule out' test, particularly as in the context of use with a relatively low prevalence of NASH (Table 5). The risks of false positive/negative are well known, lower than 2%.[20,28] Another advantage is the numerous studies of FibroTest and SteatoTest whose diagnostic and prognostic performances have been extensively validated in chronic viral hepatitis and alcoholic liver disease.[6,7,38] which are frequently associated with type 2 diabetes. Furthermore, in comparison to the first generation test the NashTest-2 did not include fasting glucose or BMI in its components,[6] which simplifies its use. The SteatoTest-2 has also the advantage of increased reliability, as total bilirubin is no more included.[7]


The main limitation for the validation of NITs in NAFLD, including ours, is sampling variability which is directly associated with specimen length.[18] In our study the median (IQR) biopsy length of 17 (8) mm does not correspond to the recommended ideal of 25 mm.[34] However, 17 mm is also the mean length of the only retrospective study in patients with type 2 diabetes,[17] and is within the range of lengths found in 64 studies in NAFLD.[13] This study confirms the effect of this uncertainty using the appropriate definitions as well as its associated simulation tool.[33] The median biopsy specimen was 17 mm, thus the maximum expected binary AUROC for an ideal NIT decreased to 0.70 due to the 30% misclassification rate of the biopsy (Figure 4). Therefore, binary AUROCs of more than 0.80 using biopsies of around 17 mm as a reference, may have been overestimated in past studies.[12,13,17] A minor limitation is also that the NashFibroTest requires fasted samples.

There are several other limitations to the present study. Our study only provides an external validation of the NashFibroTest panel in diabetology clinics and not in a general population. Like in the construction subset, we also acknowledge that all patients pre-included for biopsy required abnormal transaminases, which is usually recommended by ethics committees in France. However, despite patient selection based on increased liver enzymes, the spectrum of stages was uniform up to stage F3 with a lower prevalence of cirrhosis than in the original study. The performances of NashTest-2 were also similar to those of the original study, with an uniform spectrum of grades.

We found the same high sensitivity of SteatoTest-2 (0.85; 0.80–0.89) as in the T2DM subset of the original validation (0.85) and the same limited specificity vs the original SteatoTest.

A cost-effectiveness analyses should be performed like in Hepatitis C.[41] Face-to-face comparisons between the main NITs in intention to diagnose, with appropriate sample size, are mandatory for an objective ranking. Fibroscan can measure two of the three features here and MRE could do all of them. However, it not yet clear for CAP what are the criteria of reliability, and for MRE the performances for staging NASH severity are not yet fully validated. Even if our results confirm the performance observed in United States,[16,17] other validation outside France is needed.

Finally, our results support a simplification of the standard definitions of NAFLD without the mandatory concomitant (temporal) presence of steatosis and inflammation. Most transversal studies assessing NITs in type 2 diabetes, only included patients with at least 5% steatosis at MRI-PDFF, and therefore excluded by definition burnt-out fibrosis (fibrosis without inflammation) or burnt-out NASH (inflammation without steatosis).[42] Only very large cohort studies using NITs such as the NashFibroTest panel, without selection on transaminases values, could estimate the true prevalence of burnt-out fibrosis, limited here to 0.7%.

In summary, despite the limitations of biopsy, this study confirms the significant performances of the NashFibroTest panel for the diagnostic of fibrosis stages, NASH grades and steatosis grades in a cohort of patients with type 2 diabetes.