Systematic Review With Meta-analysis

The Appropriateness of Colonoscopy Increases the Probability of Relevant Findings and Cancer While Reducing Unnecessary Exams

Leonardo Frazzoni; Marina La Marca; Franco Radaelli; Cristiano Spada; Liboria Laterza; Rocco Maurizio Zagari; Franco Bazzoli; Cesare Hassan; Marzio Frazzoni; Mario Dinis-Ribeiro; Lorenzo Fuccio


Aliment Pharmacol Ther. 2020;53(1):22-3. 

In This Article


In this systematic review and meta-analysis, we showed that 71% only of colonoscopies were performed with an appropriate indication, which is far below the 85% threshold proposed by ESGE, and we demonstrated that performing colonoscopy according to appropriateness criteria increases the diagnostic yield while saving unnecessary exams. Indeed, it increases the diagnostic yield for relevant findings by 15%, for CRC by 5% and for IBD by 3%, while saving 24% of procedures for relevant findings, 22% of procedures for CRC and 24% of procedures for IBD, when compared to inappropriate examination (see Figure 3). On the other hand, the sensitivity was not 100% both for relevant findings, CRC and IBD, thus these criteria need to be integrated with a thorough clinical evaluation of the patient. The certainty of evidence supporting that appropriateness increases colonoscopy diagnostic yield was judged as very low for relevant findings, and low for CRC and IBD; on the other hand, very low certainty of evidence supports that appropriateness leads to saving unnecessary procedures for relevant findings, CRC and IBD (see Appendix S2).

The diagnostic yield of colonoscopy in terms of lesions detection rate has been recognised as a major quality criterion determining its diagnostic usefulness.[5] According to Bayes' theorem the performance of a diagnostic test is influenced by the pre-test probability, that is, the prevalence of the disease.[37] Therefore, optimising the selection of patients with the highest probability of the disease of interest is crucial in order to maximise the diagnostic performance of colonoscopy. The suboptimal colonoscopy appropriateness rate of 71% was not an unexpected finding, as US data showed that up to 60% of colonoscopies were performed more often than recommended for follow-up in CRC screening programs.[6,38] According to our results, paying more attention to appropriately selecting patients for colonoscopy would significantly increase its diagnostic yield, discovering on average 16% more relevant findings and 5% more CRCs.

Costs related to the health care are a major issue,[4] especially in the case of an open-access system with finite resources.[39] In fact, in general terms a health care intervention is considered appropriate when the potential benefits outweigh costs and risks.[9] Based on our data, performing only appropriate colonoscopies would save 24% and 22% unnecessary procedures for relevant findings and CRC, respectively, while missing 12% relevant findings and 3% CRCs diagnosed among inappropriate examinations. This means that the current appropriateness criteria are informative on the probability of relevant findings and CRC and might be a starting point towards an optimisation of resources; nevertheless, the drawbacks related to the sensitivity and specificity suggest that they need to be integrated by a thorough clinical evaluation of the patient,[40] and possibly improved. Indeed, the LR+ <10 and the LR- >0.1 imply that appropriateness alone is not confirmatory for relevant findings, CRC and IBD when it is met, nor it excludes such diagnoses when absent. A suggestion might come from a systematic review on the diagnostic utility of alarm symptoms for CRC including 15 studies for a total of 19,443 patients.[41] Indeed, Ford et al[41] found that individual alarm symptoms or signs had a suboptimal performance for diagnosing CRC, whereas statistical models based on a combination of alarm features had higher sensitivity but still low specificity for diagnosing CRC.

Over the years, both ASGE and EPAGE appropriateness criteria have been updated, so that four different criteria[8–11] were applied in the included studies. A significant difference in the diagnostic performance for relevant findings and IBD in favour of the ASGE-2000 criteria was found. Indeed, these criteria were more strictly associated with relevant findings and IBD (RR 2.56 and 5.24 respectively), yielding the highest sensitivity for relevant findings (92 and 94% respectively) while having an intermediate specificity (24% for both). On the other hand, the EPAGE-II had a worse performance for relevant findings (RR 1.34), with a lower sensitivity (89%) and the lowest specificity (16%). Of note, the EPAGE-I performed poorly in terms of sensitivity for IBD (71%). These data may have a twofold explanation. First, the EPAGE-I and EPAGE-II criteria differ from the ASGE-2000 list, as they are algorithm based and exclude specific situations (eg patients presenting with haematochezia and known IBD);[9,11] of note, the EPAGE-II criteria were designed to be slightly less strict than the EPAGE-I,[8,9] thus possibly explaining their much lower specificity and higher sensitivity than EPAGE-I. Second, studies assessing the EPAGE-II criteria were more recent, as the enrolment started after 2007 in all but one study; on the other hand, all studies assessing the ASGE-2000 criteria ended the enrolment by 2007. As the prescription of colonoscopy has steadily increased during years,[6] this may translate into a selection bias, in the sense that the population undergoing colonoscopy due to clinical indication has progressively narrowed, thus possibly reducing the added value of clinical appropriateness. As far as CRC is concerned, the highest association found with ASGE-1992 criteria (RR 7.49) might be due to a higher burden of CRC in the included populations, as two out of the four studies in this group enrolled patients before the year 2000, when screening programs were not fully operational.[16,17] Interestingly, the EPAGE-I criteria displayed the worst sensitivity (83%) despite having the highest specificity (28%), probably owing to their higher selectivity for considering an indication as appropriate. Of note, the studies by Adler[26] and Eskeland[35] displayed a low sensitivity when applying the EPAGE-I criteria, markedly improved when considering the ASGE-1992 and EPAGE-II criteria respectively.

Our findings expand the previously published meta-analysis by Hassan et al,[12] who included 12 studies for a total of 14,160 patients. In fact, at least four elements of novelty can be discerned. First, eight studies were published since then, allowing us to obtain more reliable estimates. Second, the EPAGE-II criteria were not assessed by Hassan et al since there were no published studies at that time. Third, we found that including more recent studies yielded a better diagnostic performance in terms of sensitivity for CRC (97% vs. 89%), possibly due to higher quality colonoscopy, while data on relevant findings remained substantially unchanged. Last, we provided data on the magnitude of association between appropriateness and the probability of relevant findings and CRC at colonoscopy, which might prove useful for health policymakers.

Our analysis has strengths and limitations. First, most of the included studies were prospective, therefore at lower risk of bias; second, we followed the PRISMA recommendations[13] and GRADE approach[14] for conducting systematic reviews and meta-analyses. Third, we performed several subgroups and metaregression analyses, which partly explained the heterogeneity. Our search was limited to studies already reporting diagnostic yield of colonoscopy according to appropriateness criteria, as their algorithmic and specific design prevented us to apply them to other studies in a post hoc analysis. We decided to include both older and more recent versions of ASGE and EPAGE criteria in the main analysis, in order to obtain more reliable estimates; this did not significantly increase the heterogeneity, and appropriate indications in older and updated criteria were quite similar indeed. Two different tools to judge methodological quality of the included studies were applied, namely the Appraisal tool for Cross-Sectional Studies (AXIS tool) for the primary outcome and the QUADAS-2 tool for the secondary outcome. Applying the QUADAS-2 tool might have been questionable, as most of included studies were not designed to test the diagnostic accuracy of appropriate colonoscopy; however, we decided to use both in order to provide the most complete assessment of methodological quality. Further, the definition of relevant findings was slightly different for some of the included studies, thus the analyses regarding this outcome might be considered as exploratory rather than conclusive, possibly explaining some of the clinical heterogeneity. The metaregression and subgroup analyses might be affected by a lack of statistical power, probably depending on the number of included studies. Last, we could not explain all the heterogeneity for relevant findings.

In conclusion, we found that colonoscopy is performed inappropriately in about one third of the cases, and that performing colonoscopy according to appropriateness guidelines would increase the diagnostic yield while reducing the number of unnecessary exams. Nevertheless, the sensitivity and specificity indicate that these criteria are useful but need to be integrated into a thorough and critical clinical evaluation of the patient. An upgrade to the existing appropriateness criteria, possibly through the development and validation of statistical models based on patients' demographics, clinical presentation and laboratory tests should be the focus of future research.