"Flaws" Seen in Studies Claiming Added Value to New CV Risk Predictors

December 02, 2009

December 2, 2009 (Chicago, Illinois) — A myth-buster is at it again with another analysis in a high-profile journal aimed at steering clinical research away from fallacy and false hope.

A report in the December 2, 2009 Journal of the American Medical Association documents what are described as fundamental methodological problems in the design or analysis of dozens of studies that compared a proposed risk marker with the Framingham Risk Score (FRS) in heart-disease risk stratification [1].

The study, from Dr Ioanna Tzoulaki (Imperial College of Medicine, London, UK) and associates, is a new piece in a much larger, long-developing picture. Among the authors is Dr John PA Ioannidis (University of Ioannina, Greece, and Tufts University, Boston, MA), who has expertise in epidemiology and molecular medicine but is known for dozens of papers--similar to the current one--that question the methodologies and interpretations of clinical trials and the ideas they generate. (His most widely read publication may be the 2005 essay "Why most published research findings are false" [2].)

In a message that may reach across many areas of clinical research, the group's findings not only undercut the conclusions of a large number of published studies, they question any further research based on those conclusions. "The majority of these studies claimed that they had found a new risk factor that could add predictive value beyond what the Framingham Risk Score could achieve, but at the same time, most of them had flaws that cast doubt on the validity of their claims," Tzoulaki told heartwire .

Most of them had flaws that cast doubt on the validity of their claims.

Those flaws, she said, included inappropriate use of statistical tools that overestimated the proposed marker's predictive power (while the marker did not add to FRS risk prediction in studies that used the tools appropriately); FRS application to populations with existing coronary heart disease (the FRS was designed for people without manifest CHD); and use in predicting outcomes the FRS wasn't designed for (such as stroke). Inadequate documentation of statistical methods was rampant.

Identification of markers that can sharpen risk stratification based on the FRS is an important goal, "but to do that, we need more rigorous trial design and analysis than we saw in these studies," Tzoulaki said. Risk markers that seemed to add to the FRS in studies her group identified as flawed include carotid plaque thickness, coronary artery calcification, natriuretic peptides, and C-reactive protein (CRP).

The current analysis included all 79 reports from the group's literature scan that compared the predictive value of the FRS for a prespecified outcome with that of some other risk marker. About 80% of the reports concluded that the individual marker added to the predictive value of the FRS.

But according to Tzoulaki et al, many, sometimes most, of the reports included inadequate calculation and inappropriate application of the FRS (>60% of the studies), a focus on a clinical outcome the FRS wasn't designed to predict (>50%) or a population the FRS wasn't designed for (>40%), and inadequate documentation of multivariate analyses and analyses of the area under the receiver operating curve.

The study shows in great detail that in their published reports, many researchers aren't documenting their methods appropriately, observed Dr Mark Hlatky (Stanford University, CA), who wasn't involved in the analysis. The problem is well recognized, he told heartwire , and is addressed in an American Heart Association (AHA) statement from earlier this year on the evaluation of new cardiovascular risk markers [3]. Hlatky chaired the expert panel behind the document.

They're not reporting discrimination area under the curve, they're not reporting the calibration, they're not reporting anything related to reclassification, let alone doing it right.

"What they're saying is that in order to evaluate a claim that [a marker] does better, you really need to lay out the evidence in a way that allows people to evaluate that claim on their own," he said. That includes providing details from the statistical model on, for example, its ability to discriminate the clinical end point of interest, its calibration to clinical events in the analysis, and reclassification of patient risk status.

"A lot of these studies [analyzed by Tzoulaki et al] aren’t even reporting some of these measures," Hlatky said. "They're not reporting discrimination area under the curve, they're not reporting the calibration, they're not reporting anything related to reclassification, let alone doing it right. I think our AHA statement said this is what we ought to be doing, and journal editors ought to be looking at this [issue], as well as reviewers when they get a new paper, to see whether it's reporting it completely and really putting [the risk marker] to a fair test."

The authors of such reports, he said, "could and should report this information properly, but there's no uniform way to do it. There's no set of standards for how you're supposed to report an evaluation of a new diagnostic test or risk marker, as compared with, say, a randomized trial. For a clinical trial of a new therapy, it's fairly clear how you're supposed to report it. What we're seeing here is a need for [a reporting standard] for articles that [propose] a new risk marker."

An accompanying editorial from Dr Peter WF Wilson (Emory University, Atlanta, GA) is largely a primer on the major statistical "performance criteria" used to evaluate heart-disease risk prediction [4]. "Clinicians should cautiously interpret results that claim importance of new risk factors for initial CHD events. Risk assessment with traditional variables works relatively well at the present time," he writes.

"Looking forward, there is some room for improvement in CHD risk assessment, and studies that include careful consideration of discrimination, calibration, validation, and potentially reclassification will provide the most reliable information."

processing....