AI Beats Dermatologists on Skin Lesion Images

Pam Harrison

May 29, 2018

A form of artificial intelligence (AI) known as deep learning convolutional neural network (CNN) has been shown to arrive at more accurate diagnoses for both benign and malignant skin lesions than expert dermatologists.

A previous study found that AI performed better than pathologists when it came to predicting overall survival in patients diagnosed with glioma.

In the latest study, published online May 29 in the Annals of Oncology, it was found to perform better than even expert dermatologists in assessing benign and malignant skin lesions from dermoscopic images.

"The CNN works like the brain of a child," Holger Haenssle, MD, senior managing physician, Department of Dermatology, University of Heidelberg, Germany, explained in a statement.

"To train it, we showed the CNN more than 100,000 images of malignant and benign skin cancers and moles and indicated the diagnosis for each image," he added.

"The CNN missed fewer melanomas, meaning it had a higher sensitivity than the dermatologists, and it misdiagnosed fewer benign moles as malignant melanoma, which means it had a higher specificity," Haenssle continued.

CNNs are capable of outperforming dermatologists, including extensively trained experts. Dr Holger Haenssle

"These findings show that deep learning convolutional neural networks are capable of outperforming dermatologists, including extensively trained experts, in the task of detecting melanomas," he concluded.

Image Test Sets

Researchers created an image test set consisting of 300 images, 20% of which were melanomas (both in situ and invasive). The remaining 80% were benign melanocytic nevi of different subtypes routinely encountered in clinical practice.

As the researchers explain, only dermoscopic images were used to build the image sets. The lesions were imaged at 10-fold magnification.

Two dermatologists then selected 100 images from the 300-image test set to increase the diagnostic difficulty. This 100-image set (set-100) was used for CNN testing, the results of which were compared to results from dermatologists in a global reader study.

The study involved 58 dermatologists, half of whom (52%) described themselves as being expert in dermoscopy and as having more than 5 years of experience; 19% indicated they were skilled in dermoscopy, with 2 to 5 years of experience; and 29% indicated they were a beginner in dermoscopy, with less than 2 years of experience.

In the reader study level-I test, dermatologists were asked to make a diagnosis of either malignant melanoma or benign nevi from dermoscopic images alone.

Results showed that the dermatologists as a group accurately detected an average of 86.6% of malignant melanomas, meaning they had a mean sensitivity for dichotomous classification of set-100 lesions of 86.9%.

They also correctly identified an average of 71.3% of the benign nevi, for a mean specificity of 71.3%.

This translated into an average area under the receiver operating characteristic curve (ROC AUC) of 0.79, the investigators point out.

"Experts in dermoscopy showed a significantly higher mean sensitivity, specificity and ROC area than beginners," the authors note.

On same reader study level-I test, the CNN was able to accurately identify 95% of the malignant melanomas (sensitivity) and 63.8% of the benign moles (specificity).

This in turn translated into an ROC AUC of 0.86 for the CNN, researchers indicate.

Level-II Reader Study

After 4 weeks, the dermatologists underwent a reader study level-II test, in which they were shown dermoscopy images and were given additional clinical information. They then received the same close-up images of the 100 cases included in the reader study level-I test.

Having additional information improved the dermatologists' performance where the mean sensitivity was 88.9%; the mean specificity 75.7%, and the mean ROC AUC 0.82 (p<.01).

"These changes were solely based on significant improvements of ‘beginners' and ‘skilled' dermatologists," investigators point out.

In contrast, experts in dermoscopy did not benefit from the supplemental clinical information, they note.

On the other hand, the CNN performance on the same reader study level-II test showed a sensitivity of 95%, a specificity of 90%, and an ROC AUC of 0.95.

CNN vs Dermatologists

The investigators then used results in study level-I as a benchmark for comparison to the CNN where dermatologists' mean sensitivity was 86.6%.

"At this sensitivity, the CNN's specificity was higher (92.5%) than the mean specificity of dermatologists (71.3%)," researchers report, noting that the difference was statistically significant (p<.01).

Furthermore, the CNN ROC AUC at 0.86 was also greater than the mean ROC AUC of dermatologists at 0.79 in the same level-I test, (p<.01), they add.

Under more realistic conditions, in which dermatologists received both images and clinical information, their performance improved, but the CNN still performed better.

Using the dermatologists' mean study level-II sensitivity of 88.9%, "the CNN specificity was 82.5% which was significantly higher than the dermatologists' mean specificity of 75.7%," the investigators report (P < .01).

This was also true for the CNN ROC AUC, at 0.86, which was greater than the mean ROC AUC of dermatologists, at 0.82 (P < .01), they observe.

"The results of our study demonstrate that an adequately trained deep learning CNN is capable of a highly accurate diagnostic classification of dermoscopic images of melanocytic origin," the investigators write.

A CNN algorithm may be a suitable tool to aid physicians in melanoma detection irrespective of their individual level of experience and training Dr Holger Haenssle and colleagues

"Our data clearly show that a CNN algorithm may be a suitable tool to aid physicians in melanoma detection irrespective of their individual level of experience and training," they write.

Editorial Comment

In a related editorial, Victoria Mar, MD, Monash University, Melbourne, Australia, and Peter Soyer, MD, University of Queensland, Brisbane, Australia, comment on the potential of AI in diagnostics.

They highlight a recent landmark study in which it was again shown that AI is capable of classifying skin cancer with a level of competence comparable to that of dermatologists.

The use of AI "promises a more standardised level of diagnostic accuracy, such that all people, regardless of where they live or which doctors they see, will be able to access reliable diagnostic assessment," they write.

The benefits of improved diagnostic outcomes are quite sizeable, the editorialists point out. Those benefits include fewer unnecessary procedures, less morbidity for patients, and lower cost to the healthcare system.

Key issues that need to be resolved before physicians can transfer the benefits of AI into clinical practice include the identification of those patients who should be targeted for surveillance and those patients who may require lifelong surveillance.

Ways of using AI to enable patients themselves to participate in surveillance have yet to be developed but could include adding AI to smartphone applications, the editorialists suggest.

A few unanswered questions remain, the editorialists point out. For example, it is not yet known how AI will fare in diagnosing atypical melanomas, which can be difficult to image.

In addition, how AI might assist in a full skin examination, which is a critical component of skin cancer detection, is not clear. A full skin examination relies on both and visual and tactile senses.

These and other barriers notwithstanding, Mar and Soyer foresee AI being integrated into routine clinical practice. They see it becoming a tool to help physicians reach appropriate management decisions.

"There is some concern that the use of AI will lead to de-skilling of the workforce," the editorialists acknowledge.

"However, much of the skill comes with knowing what to image, how to interpret the results and what the most appropriate next step is," they point out.

"Provided the most concerning lesions are selected for imaging, images capture the diagnostic features within a lesion (eg. vascular pattern), and the diagnostic algorithm can interpret them correctly, AI will no doubt be an excellent support tool," Mar and Soyer conclude.

The study received no funding. Dr Haenssle has received honoraria, travel expenses, or both from companies involved in the development of devices for skin cancer screening. Those companies include Scibase AB, FotoFinder Systems GmbH, Heine Optotechnik 5 GmbH, and Magnosco GmbH. Dr Soyer is a shareholder and consultant for MoleMap Pty Ltd and is an e-dermatology consultant for GmbH. Dr Mar has disclosed no relevant financial relationships.

Ann Oncol. Published online May 29, 2018. Full text, Editorial

For more from Medscape Oncology, follow us on Twitter: @MedscapeOnc


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.