Smartphone Apps for Suspicious Skin Lesions Unreliable

Liam Davenport

February 10, 2020


Smartphone applications (apps) using so-called artificial intelligence (AI) aimed at the general public for use on suspicious skin lesions are unreliable, say UK researchers reporting a systematic review.

These apps are providing information that could lead to "potentially life-or-death decisions," commented co-lead author Hywel C. Williams, MD, PhD, from the Centre of Evidence Based Dermatology, University of Nottingham, England.

"The one thing you mustn't do in a situation where early diagnosis can make a difference between life and death is you mustn't miss the melanoma," he said in an interview with Medscape Medical News.

"These apps were missing melanomas and that's very worrisome," he commented.

The review included nine studies of skin cancer smartphone apps, including two apps, SkinScan and SkinVision, that have been given Conformit Europenne (CE) marks, allowing them to be marketed across Europe. These apps are also available in Australia and New Zealand, but not in the United States.

The review found that SkinScan was not able to identify any melanomas in the one study that assessed this app, while SkinVision had a relatively low sensitivity and specificity, with 12% of cancerous or precancerous lesions missed and 21% of benign lesions wrongly identified as cancerous.

This means that among 1000 people with a melanoma prevalence of 3%, 4 of 30 melanomas would be missed, and 200 people would be incorrectly told that a mole was of high concern, the authors estimate.

The research was published by The BMJ on February 10.

A rapid response in the journal notes that the SkinScan app referred to in the paper was an early version of SkinVision and that it is not related to the current skinScan app produced by TeleSkin ( 

"Although I was broad minded on the potential benefit of apps for diagnosing skin cancer, I am now worried given the results of our study and the overall poor quality of studies used to test these apps," Williams commented in a statement.

Co-author Jac Dinnes, PhD, from the Institute of Applied Health Research at the University of Birmingham, England, added it is "really disappointing that there is not better quality evidence available to judge the efficacy of these apps."

"It is vital that healthcare professionals are aware of the current limitations both in the technologies and in their evaluations," she added.

The results also highlight the limitations of the regulatory system governing smartphone apps, as they are currently not subject to assessment by bodies such as the UK's Medicines and Healthcare Products Regulatory Agency (MHRA), the authors comment.

"Regulators need to become alert to the potential harm that poorly performing algorithm-based diagnostic or risk monitoring apps create," said co-lead author Jonathan J. Deeks, PhD, also at the Institute of Applied Health Research.

"We rely on the CE mark as a sign of quality, but the current CE mark assessment processes are not fit for protecting the public against the risks that these apps present."

Speaking with Medscape Medical News, Williams lamented the poor quality of the research that had been conducted. "These studies were not good enough," he said, adding that "there's no excuse for really poor study design and poor reporting."

He would like to see the regulations tightened around AI apps purporting to inform decision-making for the general public, suggesting that these devices should be assessed by the MHRA. "I really do think a CE mark is not enough," he said.

The team notes that the skin cancer apps "all include disclaimers that the results should only be used as a guide and cannot replace healthcare advice," through which the manufacturers "attempt to evade any responsibility for negative outcomes experienced by users."

Nevertheless, the "poor and variable performance" of the apps revealed by their review indicates that they "have not yet shown sufficient promise to recommend their use," they conclude.

The "official approval" implied by a CE mark "will give consumers the impression that the apps have been assessed as effective and safe," writes Ben Goldacre, DataLab director, Nuffield Department of Primary Care, University of Oxford, England, and colleagues in an accompanying editorial.

"The implicit assumption is that apps are similarly low risk technology" to devices such as sticking plasters and reading glasses, they comment.

"But shortcomings in diagnostic apps can have serious implications," they warn. The "risks include psychological harm from health anxiety or 'cyberchondria,' and physical harm from misdiagnosis or overdiagnosis; for clinicians there is a risk of increased workload, and changes to ethical or legal responsibilities around triage, referral, diagnosis, and treatment." There is also potential for "inappropriate resource use, and even loss of credibility for digital technology in general."

TeleSkin, the manufacturer of the skinScan app (which was not included in the review) says that news reports about this paper have mistakenly assumed that it was this app that was involved and have done "severe damage to us, our application and our plans."  

Commenting more generally about the whole field, Zeljko Ratkaj, CEO of TeleSkin ApS in Denmark told Medscape Medical News: "I totally agree that this area, mobile application usage in healthcare, needs to be regulated more strictly. It also needs to be very clear what is the extent of the app usage and what the applications are and what are not."

"However, the process of certification is really hard and, for young startups, can be an impossible obstacle to cross, due to the funding limits (process costs), time and effort that needs to be put in. Also, some of companies are probably not aware that they actually need to be certified. So, while I totally support the efforts in strengthen the regulations (it is human lives that we talking about, after all), I would also like to see the efforts from the medical institutions to help and support the companies in this. One of those efforts should be independent medical studies, for example. Communication between healthcare professionals and people who develop the healthcare solutions (including mobile solutions) is the only way forward," he added.

The manufacturer of SkinVision told Medscape Medical News that there has been research on their device that has been published since this review was conducted. "The latest research, not included in the authors' overall assessment, proves that our algorithm can detect 95% of cases of skin cancer. In comparison, the sensitivity of general practitioners ranges from 61% and 66%, while the sensitivity of dermatologists is between 75% and 92%," the company said in an email. "SkinVision has assisted in finding over 40,000 cases of skin cancer already. These objective facts prove that the clinical benefits of the service outweigh the risks."

Details of the Review

For their review, the authors searched the Cochrane Central Register on Controlled Trials, the MEDLNE, Embase, Cumulative Index to Nursing and Allied Health Literature, Conference Proceedings Citation index, Zetoc, and Science Citation Index databases, and online trial registers for studies published between August 2016 and April 2019.

From 80 studies identified, nine met the eligibility criteria.

Of those, six studies, evaluating a total of 725 skin lesions, determined the accuracy of smartphone apps in risk stratifying suspicious skin lesions by comparing them against a histopathological reference standard diagnosis or expert follow-up.

Five of these studies aimed to detect only melanoma, while one sought to differentiate between malignant or premalignant lesions (including melanoma, basal cell carcinoma, and squamous cell carcinoma) and benign lesions.

The three remaining studies, which evaluated 407 lesions in all, compared smartphone app recommendations against a reference standard of expert recommendations for further investigation or intervention.

The researchers found the studies had a string of potential biases and limitations.

For example, only four studies recruited a consecutive sample of study participants and lesions, and only two included lesions selected by study participants, whereas five studies used lesions that had been selected by a clinician.

Three studies reported that it took five to 10 attempts to obtain an adequate image. In seven studies, it was the researchers and not the patients who used the app to photograph the lesions, and two studies used images obtained from dermatology databases.

This "raised concerns that the results of the studies were unlikely to be representative of real life use," the authors comment.

In addition, the exclusion of unevaluable images "might have systematically inflated the diagnostic performance of the tested apps," they add.

The independent research was supported by the National Institute for Health Research (NIHR) Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham and is an update of one of a collection of reviews funded by the NIHR through its Cochrane Systematic Review Programme Grant.

BMJ. 2020;368:m127, m428. Abstract, Editorial

For more from Medscape Oncology, follow us on Twitter: @MedscapeOnc.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.