Published DSM-5 Field Trial Results Prompt Renewed Criticism

Megan Brooks

November 14, 2012

Publication of the final results of the field trials for the upcoming Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) has prompted renewed criticism from one of its most vocal critics.

Preliminary results, which were first unveiled last May at the American Psychiatric Association's (APA's) 2012 Annual Meeting and were reported by Medscape Medical News at that time, were published online October 30 in 3 separate articles in the American Journal of Psychiatry.

Dr. David J. Kupfer

The DSM-5 field trials, which got underway 2 years ago this month, were designed to subject proposed diagnostic criteria for the future DSM-5 to "rigorous, empirically sound evaluation across diverse clinical settings," David J. Kupfer, MD, chair of the DSM-5 Task Force, and Helena C. Kraemer, PhD, DSM-5 Task Force member and the chief methodologist of the field trials, noted in a joint statement sent to Medscape Medical News.

"And now, as the first comprehensive analyses of that effort are published, what's clear is just how well the field trials did their job," they added.

But Allen Frances, MD, former head of the DSM-IV Task Force, who was has been one of the DSM-5's greatest detractors, has a different view.

Dr. Allen Frances

The DSM-5 field trials "asked the wrong question and then answered it with a poorly designed and sloppily conducted study. The right question was what would be the impact of DSM-5 on diagnostic inflation and the consequent risk of overtreatment with medication," Dr. Frances, who is professor emeritus from the Department of Psychiatry at Duke University School of Medicine in Durham, North Carolina, told Medscape Medical News.

"Real-World" Testing

For the DSM-5 field trials, diagnostic interviews using DSM-5 criteria were conducted by 279 clinicians in various disciplines who received training comparable to what would be available to any clinician after publication. Overall, 2246 patients with various diagnoses and levels of comorbidity participated, and 86% completed 2 diagnostic interviews.

The field trials generated a range of reliability coefficients for the categorical diagnoses and dimensional measures.

"The point of testing the preliminary criteria in real-world environments was never to rubberstamp a specific outcome," Dr. Kupfer and Dr. Kraemer noted. "Quite the contrary, we looked to the trials to provide crucial information on which diagnoses and definitions were most effective for and with clinicians — and which missed the mark."

"We selected disorders with high clinical and public health importance, disorders with major possible changes or newly-proposed disorders, and we always expected issues to surface. Indeed, had they not, there would have been legitimate questions as to the quality of the field trials' design and sensitivity," they added.

The field trials tested the criteria for 23 disorders — 15 adult and 8 child/adolescent diagnoses — and asked a "straightforward" question.

"In the hands of regular clinicians, assessing typically symptomatic patients in no different a way than they would during everyday practice, what's the chance that a second, equally expert diagnosis will agree with the first, making a particular diagnosis reliable? A reliability of 1 means that the 2 diagnoses will always agree; a reliability of 0 means that the second is no more likely to agree than it is to disagree," Dr. Kupfer and Dr. Kraemer stated.

Reassuring Findings

Overall, 5 diagnoses were in the "very good range" (kappa = 0.60 - 0.79); 9 were in the "good range" (kappa = 0.40 - 0.59); 6 were in the "questionable range" (kappa = 0.20 - 0.39); and 3 were in the unacceptable range (kappa values < 0.20). For 8 diagnoses, sample sizes were insufficient to generate precise kappa estimates, the study authors noted.

Commenting on these findings, Dr. Kupfer and Dr. Kramer pointed out that 14 diagnoses ranked in the top categories of "good" or "very good" reliability, among them, criteria for schizophrenia, attention-deficit/hyperactivity disorder, and posttraumatic stress disorder, as well as for new entries such as somatic symptom disorder and autism spectrum disorder.

The results for the latter, they added, were "gratifying given the concerns of advocates and parents that many children could be adversely affected, and we hope they now feel reassured."

The 3 diagnoses that fell into the "unacceptable" reliability category have since undergone substantial revision or are no longer proposed for inclusion. "That leaves 6 additional diagnoses, which finished with acceptable but low reliability.

"Several already have been revised or, in the case of attenuated psychosis syndrome, recommended to be moved to the section of the manual that stipulates further study is needed," Dr. Kupfer and Dr. Kraemer said.

Results "All Over the Map"

In comments to Medscape Medical News, Dr. Frances noted that "good kappa has historically been over 0.6; acceptable was over 0. 5. The DSM-5 results came in remarkably low and all over the map."

"Even an extremely well-established diagnosis like major depressive disorder, that has achieved good kappa in hundreds of studies, came in at only 0.3 in the DSM-5 field trials. The only possible explanation is poor administration of the study," Dr. Frances said.

Dr. Kupfer and Dr. Kraemer acknowledge that some DSM-5 detractors have spotlighted the 6 diagnoses that fell in the "questionable range" as indicative of flaws in the field trials, especially because this group included major depressive disorder and generalized anxiety disorder, 2 of the most commonly diagnosed conditions.

But they charge that "the opposite is closer to the truth. Rather than discrediting the field trials, the outcome here reveals the critical value of how the trials were constructed and conducted and how we are moving forward. Ironically, both major depressive disorder and generalized anxiety disorder were tested not because they were being modified for the next manual, but because they were remaining relatively unchanged and could serve as reference disorders from the DSM-IV trials."

They also pointed out that as part of the DSM-IV process 2 decades ago, "patients were carefully screened, and participating clinicians received special training and explicit direction on how to perform evaluations. In contrast, the DSM-5 field trials accepted patients as they came and asked clinicians to work as they usually did — to mirror the circumstances in which most diagnosing takes place."

Sloppy Work?

Dr. Kupfer and Dr. Kraemer believe that the DSM-5 results represent the "truer picture of the difficulty clinicians may have in reliably diagnosing both conditions, either because they often occur with other conditions or because they are accompanied by symptoms that can fluctuate greatly.

"Regardless of why, we acknowledge that the relatively low reliability of major depressive disorder and generalized anxiety disorder is a concern for clinical decision-making. Strategies need to be developed to address the problem as the manual evolves into a living document that incorporates revisions and additions as research and clinical practices advance.

"The good news is that we're now inherently better prepared for this challenge; the DSM-5 field trials have laid the groundwork for how such strategies and future changes should be judged," they said.

But Dr. Frances is not satisfied.

"The only way to salvage DSM-5 now is to conduct what was planned to be a second stage of quality control field testing. This was canceled because inefficient implementation had led to repeated delays in the first stage," he said.

"Psychiatry is a wonderful and very helpful specialty that deserves better than the sloppy work that has characterized every step of DSM-5. We will have to see how psychiatrists and other mental health workers will react to it," Dr. Frances added.

Am J Psychiatry. Published online October 30, 2012. Abstract, Abstract, Abstract