Applications of Machine Learning Models in the Prediction of Gastric Cancer Risk in Patients After Helicobacter Pylori Eradication

Wai K. Leung; Ka Shing Cheung; Bofei Li; Simon Y. K. Law; Thomas K. L. Lui


Aliment Pharmacol Ther. 2021;53(8):864-872. 

In This Article

Abstract and Introduction


Background: The risk of gastric cancer after Helicobacter pylori (H. pylori) eradication remains unknown.

Aim: To evaluate the performances of seven different machine learning models in predicting gastric cancer risk after H. pylori eradication.

Methods: We identified H. pylori-infected patients who had received clarithromycin-based triple therapy between 2003 and 2014 in Hong Kong. Patients were divided into training (n = 64 238) and validation sets (n = 25 330), according to period of eradication therapy. The data were used to construct seven machine learning models to predict risk of gastric cancer development within 5 years after H. pylori eradication. A total of 26 clinical variables were input into these models. The performances were measured by the area under receiver operating characteristic curve (AUC) analysis.

Results: During a mean follow-up of 4.7 years, 0.21% of H. pylori-eradicated patients developed gastric cancer. Of the seven machine learning models, extreme gradient boosting (XGBoost) had the best performance in predicting cancer development (AUC 0.97, 95%CI 0.96–0.98), and was superior to conventional logistic regression (AUC 0.90, 95% CI 0.84–0.92). With the XGBoost model, the number of patients considered at high risk of gastric cancer was 6.6%, with miss rate of 1.9%. Patient age, presence of intestinal metaplasia, and gastric ulcer were the heavily weighted factors used by the XGBoost.

Conclusion: Based on simple baseline patient information, machine learning model can accurately predict the risk of post-eradication gastric cancer. This model could substantially reduce the number of patients who require endoscopic surveillance.


Gastric cancer is still the third commonest cause of cancer-related death in the world with an estimate of 783,000 deaths in 2018.[1] Interestingly, the incidences of gastric cancer have very wide geographic variabilities and are more prevalent in Asian countries, which account for more than 70% of the cases in the world. This could be partly explained by the high background prevalence of Helicobacter pylori (H. pylori) infection. To this end, H. pylori eradication has been shown to reduce the risk of gastric cancer development by about 40%.[2,3] However, a considerable proportion of H pylori-infected subjects would still progress to gastric cancer. While H. pylori-associated gastric carcinogenesis is generally believed to be a multistep progression from chronic gastritis to atrophic gastritis, intestinal metaplasia and dysplasia, H. pylori eradication before the development of these pre-neoplastic lesions is generally considered necessary in preventing gastric cancer development.[4] On the other hand, we have recently shown that later eradication in older subjects is still effective in reducing the risk of gastric cancer development.[3]

Although the exact mechanisms underlying the progression to gastric cancer after H pylori eradication remain uncertain,[5] we have shown that the usage of medications such as aspirin,[6] proton pump inhibitors (PPIs),[7] metformin[8] and statin[9] could modulate the risk of gastric cancer development. In addition to old age and medications, family history of gastric cancer, smoking and certain dietary habits could also increase the risk of gastric cancer progression.

Previous studies have attempted to predict the risk of gastric cancer progression by various clinical risk scores that were based on composites of clinical, endoscopic, pathological and/or serological parameters.[10–13] However, all of these scoring systems required the use of endoscopic, serological and even histological findings, there is so far no accurate scoring system that is based on clinical parameters alone. Artificial intelligence, particularly the latest machine learning and deep learning, is increasingly applied to health care management and risk prediction.[14,15] The machine learning model has been recently applied in the prediction of risk of hospital-based intervention or death in patients with upper gastrointestinal bleeding,[16] and was found to be superior to validated clinical risk scoring systems. Several studies with relatively small sample size also used the machine learning algorithm to predict the risks of gastric cancer development.[17,18] However, both studies included baseline endoscopic findings and did not consider the role of H pylori infection.

With the increasing availability of machine learning algorithms, this study aims to explore the feasibility of using different machine learning models in the prediction of future risk of gastric cancer development in patients who had H pylori eradicated, based on simple clinical parameters and medication usages. We also explored the potential application of the validated machine learning model in the identification of high-risk patients who would require endoscopic surveillance for gastric cancer.