Errors Common in Notes Produced by Speech Recognition Software

Diana Swift

July 06, 2018

Although computerized speech recognition (SR) may lighten physicians' documentation load, a new study published online July 6 in JAMA Network Open calls into question the accuracy of this time-saving software. The data reveal an error rate of more than seven words per 100 in unedited SR-generated documents, including clinically significant errors in one of every 250 words that could affect care.

"With the rapid adoption of SR in clinical settings, there is a need for automated methods based on natural language processing for identifying and correcting errors in SR-generated text," write Li Zhou, MD, PhD, an internist and researcher in medical informatics at Brigham and Women's Hospital and Harvard Medical School in Boston, Massachusetts, and colleagues.

The new findings highlight the need for old-fashioned manual review and editing of SR-generated clinical notes. "While this technology is very useful for our clinical productivity and work flow, it's not perfect, and as we enter dictation into the system, we need to be careful about errors and to carefully proofread our notes," coauthor Foster R. Goss, DO, MMSc, an emergency physician and clinical informaticist at the University of Colorado, Denver, told Medscape Medical News.

To estimate error rate and types of errors, Zhou and colleagues analyzed 217 clinical documents dictated during 2016 at two centers, Partners Healthcare System in Boston and the University of Colorado Health System in Aurora. Of these, 96.3% contained errors at the SR stage.

The overall rate of errors in SR-generated text was 7.4%. After editing of SR texts by professional medical transcriptionists, the error rate dropped to 0.4% and then to 0.3% when corrected and signed by dictating physicians.

Both study centers used back-end dictation, in which an SR system captures audio dictation and converts it to text. The text is then edited by a professional transcriptionist and is sent back to the clinician for review and signed approval.

Another system, known as front-end dictation, skips the transcriptionist stage. The SR-generated text is directly edited and approved by the dictating clinician. Some physicians edit in real time as the dictated words appear, while others do it later. "The back-end system is considered far more accurate because the transcriptionist edits before the notes get entered into the patient's electronic medical record," Goss said. It is also difficult for a physician to concentrate on editing while dictating and perhaps dealing with a patient as well, Goss said.

A 2017 Australian study found that a front-end system took up to 18% more physician time and had more than four times the errors compared with keyboarded notes.

In the current study, the physician notes that were analyzed contained a mean of 507 words dictated in an average time of 5 minutes, 46 seconds. The average delay to physician review was about 4.5 days. The notes were dictated by 144 different physicians, 44 female (30.6%) and 10 of unknown sex (6.9%). Their mean age was 52 years. Among the 121 physicians for whom specialty information was available, 35 specialties were represented; the specialists included 45 surgeons (37.2%), 30 internists (24.8%), and 46 others (38.0%).

Overall, 96.3% of the unedited SR-generated notes contained errors, as did 58.1% of transcriptionist-edited notes and 42.4% of physician-signed notes. The rate of mistakes involving clinical information was highest in the transcriptionist-edited stage. Specifically, across the three sequential processing stages, 15.8%, 26.9%, and 25.9% of mistakes involved clinical information; of these, 5.7%, 8.9%, and 6.4%, respectively, were considered clinically significant.

The most common SR mistakes were deletions/omissions (34.7%), followed by insertions (27%). Other errors involved incorrect, added, or omitted prefixes, such as incorrectly converting "inadequate" to "adequate." Sometimes erroneous homophones showed up in the text, for example, "cereal" for "serial." Other errors occurred in numbers, with "17-year-old" becoming "70-year-old."

More worrisome, the authors note, were errors relating to medical conditions, potential biopsies, and pharmaceutical therapy. In one case, SR-generated notes read "dengue" instead of "DKA" (diabetic ketoacidosis) in the original audio transcipt. In another, "lamotrigine therapy" emerged in text as "layman will try therapy."

In another instance, the immunosuppressive drug Rapamune (PF Prism) was mistakenly converted to the calcium channel blocker verapamil. And in a listing of four patient allergies, only one made it into the SR text. A groin mass became a grown mass. Oxycodone was mistakenly prescribed every 4 hours when the original dictation had specified as needed (PRN).

In one case, the SR-generated note completely flipped the clinical meaning, stating, "adequate evaluation to exclude neoplasia," whereas the physician had dictated, "inadequate evaluation to exclude neoplasia."

By document type, the highest mean SR error rate occurred in discharge summaries, in comparison with office, operative, and other notes: 8.9% vs 6.6% (95% confidence interval [CI], 1.0 - 3.6; P <.001). The error rate was lower for operative notes than for other types of notes, at 6.1% vs 7.9% (95% CI, 0.4 - 3.2; P = .01). By specialty, surgeons had lower mean error rates than other clinicians (6.0% vs 8.1%; 95% CI, 0.8 - 3.5; P = .002). No differences in accuracy emerged with respect to the sexes of the physicians.

At one of the two centers, the mean SR error rate was slightly higher, at 7.6% vs 6.6% (95% CI, -0.2 to 2.8; P = .10), but the error rates for both medical transcription and author sign-off were lower at that center: 0.3% vs 0.7% (95% CI, -0.63% to -0.04%; P = .03) and 0.2% vs 0.6% (95% CI, -0.7% to -0.2%; P = .003).

Because many hospitals adopt front-end systems that require clinicians to review and edit their notes themselves, the authors caution, "[f]ully shifting the editing responsibility from transcriptionists to clinicians may lead to increased documentation errors if clinicians are unable to adequately review their notes." For example, a 2016 emergency department study by Goss's group reported higher error rates with a front-end system, with 71% of notes containing at least one error; 14.8% of the errors were judged to be clinically significant.

In addition, a 2011 study revealed that in the radiologic setting, SR-generated breast imaging notes had eight times as many errors as conventionally dictated notes.

"Taken together, these findings demonstrate the necessity of further studies investigating clinicians' use of and satisfaction with SR technology, its ability to integrate with clinicians' existing workflows, and its effect on documentation quality and efficiency compared with other documentation methods," Zhou and colleagues write.

Zhou's center is currently surveying physicians to find out what kind of SR training would serve them better. "And in training sessions, we're emphasizing that they need to edit and carefully proofread before signing off on their notes," Zhou told Medscape Medical News. In addition, a spell check system over and above the embedded SR dictionary is being developed to provide an extra layer of accuracy for notes.

Despite potential errors, Goss said SR systems are still a valuable tool. "SR-generated notes are still a big advantage, since doctors can talk a lot faster than they can type."

The study was funded by the Agency for Healthcare Research and Quality (AHRQ). The AHRQ had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication. One coauthor is a coinventor on a patent licensed to the Medicalis Corporation and holds a minority equity position in the privately held company Medicalis. He also serves on the board for SEA Medical Systems, consults for EarlySense, receives cash compensation from CDI-Negev, and receives equity from Valera Health, Intensix, and MDClone.

JAMA Open Network. Published online July 6, 2018. Full text


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.