Physicians Outperform Algorithms in Diagnostic Accuracy

Tara Haelle

October 11, 2016

In a competition of diagnostic accuracy between humans and machines, physicians handily beat algorithms designed to diagnose conditions based on vignettes of patient history. Specifically, physicians accurately identified the diagnosis in clinical vignettes of patient health histories about twice as often as computer-based symptom checkers did, according to a research letter published online October 10 in JAMA Internal Medicine.

"I think doctors would feel comfortable knowing that they're not going to lose their job to a computer," said Steve Kassakian, MD, an assistant professor of medicine/informatics and medical director of Clinical Informatics at Oregon Health & Science University, Portland, about the study's findings.

"But we've known that for decades," added Dr Kassakian, who was not involved in the study.

"The truth is that differential diagnosis generators and the idea of computer-aided diagnostics has actually been around for several decades," he explained, and the new findings are in line with what prior studies have found.

Research dating back to the 1970s and 1980s determined that creating diagnostic algorithms is tremendously complicated, Dr Kassakian said. Yet, those algorithms do not perform better than physicians and do not provide much utility overall, as a recent systematic review revealed as well.

In the current study, Hannah L. Semigran, BA, from Harvard Medical School in Boston, Massachusetts, and colleagues measured the diagnostic accuracy of 23 online or app-based symptom checkers, using 45 clinical vignettes. The vignettes covered 19 uncommon conditions and 26 common conditions, and each included a patient's medical histories, but not test findings or physical examination notes. One third of the vignettes were high-acuity, one third were low-acuity, and one third were medium.

The researchers provided these vignettes to 234 physicians participating in an online platform called Human Dx. Just more than half (52%) of the participants were fellows or residents, and 90% of them specialized in internal medicine. Each physician solved at least one vignette by writing an open text diagnosis, and each vignette was assessed by at least 20 physicians.

Physicians' diagnostic accuracy was more than double that of the symptom checkers: 72.1% of physicians listed the correct diagnosis first compared with just 34.0% of the symptom checkers. About half the time (51.2%), symptom checkers correctly included the vignette condition in the top three diagnoses it listed, but physicians outperformed the algorithms there as well, with 84.3% accuracy among the top three diagnoses.

Physicians "were more likely to list the correct diagnosis first for high-acuity vignettes (vs low-acuity vignettes) and for uncommon vignettes (vs common vignettes)," the authors report. "In contrast, symptom checkers were more likely to list the correct diagnosis first for low-acuity vignettes and common vignettes."

Physicians still provided an incorrect diagnosis in approximately 15% of cases, which matches up with the 10% to 15% diagnostic inaccuracy identified in prior research.

"We don't have a good way to get a true diagnosis out of electronic medical records," Dr Kassakian explained, pointing out that patient history in electronic medical records are simply a collection of International Classification of Diseases and Related Health Problems (ICD) codes. "The ICD codes are notoriously inaccurate for capturing a patient's clinical diagnosis, so it's fraught with problems that a lot of people have already defined."

However, this study did not use a list of ICD codes, so the vignettes may not accurately represent the information clinicians might have if they were only relying on electronic medical records, and the vignettes lacked any physical examination or test results, a major limitation of the study. Other limitations included the inability to generalize the findings because the physicians who participated in the study may not be representative of the broader medical community, and the fact that other computer diagnostic tools exist beyond symptom checkers that were not assessed in this study.

However, Dr Kassakian also noted that facing challenging diagnostic dilemmas comprise a very small proportion of most physicians' day-to-day work, another reason algorithms were not particularly useful or applicable to daily practice.

"Much of medical care is not concerned with difficult diagnosis," Dr Kassakian said. "Making difficult diagnostic decisions is not a big part of most clinicians' practice, because most of it is chronic disease management, follow-up, or preventative care."

One author holds equity in the Human Diagnosis Project. No information regarding external funding was provided. Dr Kassakian has disclosed no relevant financial relationships.

JAMA Intern Med. Published online October 10, 2016. Abstract

For more news, join us on Facebook and Twitter


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.