Symptom Checkers Not Very Accurate, Study Suggests

Ken Terry

July 21, 2015

The online and mobile symptom checkers many people use to diagnose their ailments and to decide whether to seek medical help are not reliable in either respect, according to a new study published online July 8 in the BMJ. However, their ability to identify emergent symptoms is pretty good, the study found.

Moreover, as the average performance of the symptom checkers is roughly comparable to that of nurse triage lines, and they cost very little to operate, "symptom checkers could potentially be a more cost effective way of providing triage advice than nurse-staffed phone lines," the researchers said.

The study looked at 23 English-language symptom checkers based in the United Kingdom, the United States, the Netherlands, and Poland. (Editor's note: WebMD's symptom checker was one of those evaluated in this study. and are both owned by WebMD LLC.) To measure the applications' performance, the researchers chose 45 standardized clinical vignettes from those used to test physicians on their diagnostic abilities and management decisions. Fifteen of the vignettes were for conditions requiring emergent care, 15 were for nonemergent care, and 15 were for self-care. The vignettes included symptoms of both common and uncommon conditions.

Study participants with no clinical training entered the symptoms from the vignettes into the symptom checkers. Performance was assessed on 770 standardized patient evaluations for diagnosis and 532 standardized patient evaluations for triage.

Of the studied symptom checkers, 11 provided both diagnoses and triage advice, eight provided only diagnoses, and four just supplied triage advice. Thirteen of the applications asked users about their age and sex; they performed no better than the ones that did not request demographic data.

The correct diagnosis was listed first in 34% (95% confidence interval [CI], 31% - 37%) of all evaluations, including 24% (95% CI, 19% - 30%) of emergent evaluations, 38% (95% CI, 32% - 34%) of nonemergent evaluations, and 40% (95% CI, 34% - 47%) of self-care evaluations. The correct diagnosis was listed in the first three choices in 51% (95% CI, 47% - 54%) of evaluations and in the first 20 diagnoses in 58% (95% CI, 55% - 62%) of the cases. Diagnostic accuracy was higher for self-care conditions than for emergent conditions and was also higher for common conditions than for uncommon ones.

The researchers were unsure how many people pay attention to the top 20 diagnoses, but most consumers probably go beyond the first diagnosis on the list, said lead author Ateev Mehrotra, MD, an associate professor at Harvard Medical School, Boston, Massachusetts, in an interview with Medscape Medical News.

The symptom checkers' diagnostic accuracy was higher for common than for uncommon conditions, he said, because they are designed to serve the general population, which is more likely to have common conditions.

Appropriate triage advice was given in 57% (95% CI, 52% - 61%) of all evaluations. Here, performance was much better for emergent care (80%; 95% CI, 75% - 86%) than for nonemergent care (55%; 95% CI, 75% - 86%) or self-care (33%; 95% CI, 26% - 40%). The rate of appropriate triage advice was higher for uncommon conditions (63%; 95% CI, 26% - 40%) than for common ones (52%; 95% CI, 46% - 57%).

Four symptom checkers always advised people to seek care and never recommended self-care. Excluding those, appropriate triage advice was given in 61% (95% CI, 56% - 66%) of cases.

In two thirds of the cases in which self-care was appropriate, patients were advised to seek medical care, which was a finding that concerned the researchers. This conservative tendency of symptom checkers could push up medical costs and lead to unnecessary care, they note. However, the same is often true of telephone triage, they add.

It is not surprising that the symptom checkers tended to perform as well as nurse triage lines on treatment advice, Dr Mehrotra pointed out, "because many of the nurse triage lines are based on the same logic rules that the symptom checkers use."

There were some sharp differences in the performance of individual symptom checkers, the study found. Applications developed by physician groups and medical associations generally did better than those built by private companies or health plans. However, Dr Mehrotra observed that none of them performed very well.

The authors urge the builders of symptom checkers to improve their applications in several areas. Among other things, they said, the developers should ask for demographic data and should incorporate local epidemiologic data about the incidence of particular illnesses. In addition, the inclusion of clinical data from electronic health records or claims data could help improve correct diagnosis and triage rates, they note.

Sheldon Greenfield, MD, a professor of medicine and executive codirector of the Health Policy Research Institute at the University of California at Irvine, was not surprised the symptom checkers did so well in identifying emergent symptoms and advising people to seek emergency care.

"It does work in the most serious things, but for everything else, it doesn't work, because unvarnished symptoms are pretty much worthless, except in emergency situations," he told Medscape Medical News.

Dr Greenfield, an expert on health coaching and the use of patient-generated data, noted that a nurse or a physician would ask additional questions to find out whether the patient's symptoms pointed to a serious medical problem. For example, they would ask how severe a headache was and whether it was one-sided, or the amount of shortness of breath or diarrhea a patient had.

"The symptom checker can skim off the most emergent cases, and this study proves that," he said. "What we need is some combination of nurse triage and a symptom checker that deals with the most emergent things: If you're coughing up blood, the symptom checker will tell you what to do."

Reminded that not all patients have access to nurse triage lines, Dr Greenfield conceded, "It's ok to use a symptom checker alone, as long as the symptoms are put in context. You may need two or three symptoms, or the algorithm must be rich enough to deal with it. A checklist is only a checklist, not an algorithm.

"You have to ask a few more questions. A triage nurse can do that, and maybe an algorithm can do that. We have to find ways to avoid the doctor visit, and certainly the [emergency department] visit."

The authors and Dr Greenfield have disclosed no relevant financial relationships.

BMJ. Published online July 8, 2015. Full text


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.