Stand-alone Artificial Intelligence for Breast Cancer Detection in Mammography

Comparison With 101 Radiologists

Alejandro Rodriguez-Ruiz; Kristina Lång; Albert Gubern-Merida; Mireille Broeders; Gisella Gennaro; Paola Clauser; Thomas H. Helbich; Margarita Chevalier; Tao Tan; Thomas Mertelmeier; Matthew G. Wallis; Ingvar Andersson; Sophia Zackrisson; Ritse M. Mann; Ioannis Sechopoulos


J Natl Cancer Inst. 2019;111(9):916-922. 

In This Article

Abstract and Introduction


Background: Artificial intelligence (AI) systems performing at radiologist-like levels in the evaluation of digital mammography (DM) would improve breast cancer screening accuracy and efficiency. We aimed to compare the stand-alone performance of an AI system to that of radiologists in detecting breast cancer in DM.

Methods: Nine multi-reader, multi-case study datasets previously used for different research purposes in seven countries were collected. Each dataset consisted of DM exams acquired with systems from four different vendors, multiple radiologists' assessments per exam, and ground truth verified by histopathological analysis or follow-up, yielding a total of 2652 exams (653 malignant) and interpretations by 101 radiologists (28 296 independent interpretations). An AI system analyzed these exams yielding a level of suspicion of cancer present between 1 and 10. The detection performance between the radiologists and the AI system was compared using a noninferiority null hypothesis at a margin of 0.05.

Results: The performance of the AI system was statistically noninferior to that of the average of the 101 radiologists. The AI system had a 0.840 (95% confidence interval [CI] = 0.820 to 0.860) area under the ROC curve and the average of the radiologists was 0.814 (95% CI = 0.787 to 0.841) (difference 95% CI = −0.003 to 0.055). The AI system had an AUC higher than 61.4% of the radiologists.

Conclusions: The evaluated AI system achieved a cancer detection accuracy comparable to an average breast radiologist in this retrospective setting. Although promising, the performance and impact of such a system in a screening setting needs further investigation.


Breast cancer is the most common cancer in women, and despite important improvements in therapy, it is still a major cause for cancer-related mortality, accounting for approximately 500 000 annual deaths worldwide.[1] Population-based breast cancer screening programs using mammography are regarded as effective in reducing breast cancer-related mortality.[2–5] However, current screening programs are highly labor intensive due to the large number of women screened per detected cancer and the use of double reading, especially in European screening programs, which also leads to additional economical costs. Moreover, despite this practice, up to 25% of mammographically visible cancers are still not detected at screening.[6–9]

Considering the increasing scarcity of radiologists in some countries, including breast screening radiologists,[10–12] alternative strategies to allow continuation of current screening programs are required. In addition, it is of paramount importance to prevent visible lesions in digital mammography (DM) being overlooked or misinterpreted.

Since the 1990s, computer-aided detection systems have been developed to automatically detect and classify breast lesions in mammograms. The widespread implementation of DM for breast cancer imaging further spurred the development of automated detection techniques for breast cancer. Unfortunately, no studies to date have found that traditional computer-aided detection systems directly improve screening performance or cost-effectiveness, mainly because of a low specificity.[13,14] This has also precluded their use as a stand-alone reader for screening mammography.

However, the field of artificial intelligence (AI) is rapidly changing due to the success of novel algorithms based on deep learning convolutional neural networks. These approaches are very successful in automating cognitively difficult tasks; classic examples include self-driving cars and advanced speech recognition. In medical imaging, deep learning-based AI is also rapidly closing the gap between humans and computers.[15,16] It has been suggested that such algorithms could therefore have the potential to further improve the benefit to harm ratio of breast cancer screening programs.[17] In recent years, several deep learning-based algorithms for automated analysis of mammograms have been developed, some of which have already shown very promising results when compared to radiologists, but in very limited and homogeneous scenarios.[18,19]

Therefore, in this study, we compare, at a case level, the cancer detection performance of a commercially available AI system to that of 101 radiologists who scored nine different cohorts of DM examinations from four different manufacturers as part of reader studies previously performed for other purposes.