AI Accurately Classifies Endoscopic Severity in Ulcerative Colitis

Brandon May

August 12, 2022

A newly developed artificial intelligence (AI) model accurately evaluated endoscopic images from patients with ulcerative colitis (UC), according to new research. The AI could even distinguish between all four Mayo endoscopic subscore (MES) levels of disease activity, which is a first among similar AI models, the researchers noted.

Although management of UC involves disease activity monitoring and prompt response with appropriate therapy, endoscopic assessment has shown significant intra- and interobserver variation, thereby reducing the reliability of individual evaluations. Techniques that use AI may eliminate observer variation and aid in distinguishing between all levels of endoscopic activity with good accuracy.

"However, up until now, only a few computer-assisted diagnostic tools have been available for UC, and none are capable of distinguishing between all levels of endoscopic activity with sufficient accuracy," wrote study authors Bobby Lo, MD, of the Copenhagen University Hospital Hvidovre, and colleagues, who published their findings in The American Journal of Gastroenterology. The researchers believe their new AI could optimize and standardize the assessment of UC severity measured by MES, regardless of the operator's level of expertise.

The researchers extracted 1,484 unique endoscopic images from 467 patients with UC (median age, 45 years; 45.3% male) who had undergone a colonoscopy or sigmoidoscopy. Images of healthy colon mucosa were also extracted from a colorectal cancer surveillance program "to adequately reflect the distribution in the clinic," the researchers wrote.

Two experts blinded for clinical details or other identifying information separately scored all images according to the MES. A third expert, blinded to results from the initial two experts, also scored the images in case of disagreement between the first sets of scores. Nearly half of the images (47.3%) were classified as normal, while 26.0% were deemed MES 1 (mild activity), 20.2% were classified as MES 2 (moderate activity), and 6.5% were classified as MES 3 (severe activity).

All endoscopic images were randomly split into a training dataset (85%) and a testing dataset (15%) with stratified sampling. Several convolutional neural networks architectures were considered for automatically classifying the severity of UC. The investigators used a fivefold cross-validation of the training data to develop and select the optimal final model. Subsequently, the investigators then used unseen test datasets to evaluate the model.

The final chosen model was the EfficientNetB2, given the superiority of its mean accuracy during cross-validation. This model, according to the researchers, is able to "process images significantly faster and requires less computing power than InceptionNetV3," which was the other model evaluated in the study.

The test accuracy of the final model in distinguishing between all categories of MES was 0.84. The investigators evaluated the model on binary tasks of distinguishing MES 0 versus MES 1-3 and MES 0-1 versus 2-3. They found the model achieved accuracies of 0.94 and 0.93 and areas under the receiver operating characteristic curves of 0.997 and 0.998, respectively.

According to the researchers, they used 10-fold fewer images in this study than have been used in similar studies but noted that the developed model demonstrated an accuracy of around 0.74 "even when using images from another cohort" that had lower image quality. The investigators added that the model could have achieved better results if more data were available, citing this as a limitation of the study.

"In conclusion, we have developed a deep learning model that exceeded previously reported results in classifying endoscopic images from UC patients. This may automate and optimize the evaluation of disease severity in both clinical and academic settings and ideally in clinical trials," they wrote. "Finally, this study serves as a stepping stone for future projects, including the use of video material and the assessment of long-term outcomes."

The authors reported no relevant conflicts of interest.

This article originally appeared on, part of the Medscape Professional Network.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.