ATLAS trial "flaws"? Wall Street Journal alleges stats do not prove noninferiority of Liberté to Taxus Express

Shelley Wood

August 14, 2008

New York, NY - A provocative story in the Wall Street Journal (WSJ) today alleges that the ATLAS trial results—showing Boston Scientific's Taxus Liberté stent to be noninferior to its first-generation Taxus Express stent—was based on "a flawed statistical equation" that swayed the results in favor of the Liberté stent [1].

"Using a number of other methods of calculation—including 14 available in off-the-shelf software programs—the Liberté study would have been a failure by the common standards of statistical significance in research," reporter Keith J Winstein writes. Massachusetts Institute of Technology mathematician Dr Travis Schedler, asked to confirm the WSJ's analysis, produced results that "agreed" with the paper's, Winstein notes.

The FDA has sent an "approvable" letter to Boston Scientific, suggesting that the agency's review of the data indicates the device can be approved for the US market; according to Winstein, the agency refused to discuss the WSJ's review. Dr Gregory Campbell, director of the FDA's device branch biostatistics division, said Winstein's analysis raised "good questions" but also called Boston Scientific's statistical approach, known as the Wald interval, "a standard methodology."

But according to Winstein, the Wald equation, also used by stent rivals Abbott and Medtronic, has "long been criticized by statisticians for exaggerating the certainty of research results." The WSJ article also quotes statistics professor Dr Lawrence Brown (University of Pennsylvania, Philadelphia) calling the Wald interval "broken" and a "deficient technique" that "should not be used." Similarly, Harvard University statistics professor Dr Brent Coull is quoted by the WSJ saying that the Wald method "overstates the certainty" of clinical results. A third statistician, Dr Jonathan Shuster (University of Florida, Gainesville), told the WSJ that the Wald method is an imperfect method but commonly used. "Most statisticians would accept this approximation," Shuster is quoted in the WSJ. "But since this was right on the border, greater scrutiny reveals that the true, the real, p value was slightly more than 5%."

In defense of ATLAS

When the ATLAS study results were published in April 2007, Dr Mark Turco (Washington Adventist Hospital, Tacoma Park, MD) and colleagues reported that the Liberté met the primary end point of noninferiority vs the Express for nine-month target-vessel revascularization, with a one-sided 95% CI of 2.98%, below the prespecified noninferiority margin of 3%, with a p value of 0.0487 [2].

But in the WSJ analysis of the data, the p value was actually 0.051 (5.1%), "failing to rule out the possibility that patients getting the Liberté stent will have markedly more artery recloggings than those receiving the Express," Winstein writes. "Although the difference seems small—0.2 of a percentage point—it is the difference between success and failure for a product on which Boston Scientific has spent some tens of millions of dollars."

Boston Scientific is quoted in the article insisting that its methods were sound. "We used standard methodology that we discussed with the FDA up front and then executed," Dr Donald Baim, the company's chief scientific and medical officer, is quoted in the story. "We have no obligation to show that we can meet any arbitrary test."

But Baim also does not dispute the WSJ analysis and is quoted by Winstein saying, "We're not going to say there's anything mistaken in your mathematical calculations, but the issue is really the relevance of that analytical thread."

Contacted about the WSJ story, Boston Scientific spokesperson Paul Donovan reiterated Baim's assertions that the company had done nothing wrong and pointed out, "The FDA has reviewed the trial results and the analysis and has validated both. While this may be an interesting statistical debate, it has no bearing on the performance of the Taxus Liberté drug-eluting stent."

In an interview with heart wire , Turco called the WSJ story is "a big to-do over very little" and rejects any suggestion that the trial may have been designed with a more lenient statistical methodology.

"Even if that was the premise from the standpoint of the company, the fact is that the FDA would need to approve that statistical analysis plan and their independent statisticians would need to say it was okay. The FDA has some excellent statisticians who review these trials in a very rigorous manner, and they approved this from the get-go. Sure, you can make the argument that everybody, including the primary investigator, really wants to [have a trial] that's going to be a winner, but you can't beat the system."

And there are bigger issues at stake, says Turco. "My concerns with this story are twofold: first, that we continue to confuse our patients out there with headlines. And second, the FDA overall does a good job with the regulatory process, and if the FDA starts to feel too much pressure and then really sets the bar too high, it's going to continue to slow the iteration of newer products coming to market for us to use to treat our patients. So then we fall way behind our colleagues outside the US in what we have to offer. And that is a problem."



A p value with p-zazz

The WSJ story will no doubt reinvigorate a perennial debate among clinical researchers and others over what constitutes a meaningful p value.

Turco made the point forcefully that cardiologists care about safety, efficacy, and the amount of worldwide experience with a device, "so that I can feel confident using it in my patients." The Liberté, he points out, is the most popular DES outside the US, and Boston Scientific's Olympia registry, which is tracking postmarketing events with the device in countries where it is already approved, now includes data on more than 15 000 patients.

"There have been no specific safety issues that have come to pass, so are we really going to argue about a p value of 0.049 vs 0.051? These arguments are very unclear to me and really have no clinical relevance to what we need to do in everyday medicine."

Dr Sanjay Kaul (Cedars-Sinai Medical Center, Los Angeles, CA) points out that regulatory agencies "worship at the altar of two trials demonstrating a p<0.05 as requirement for approval." He thinks this approach, in itself, is flawed.

"The 0.05 threshold is arbitrary, nothing magical about it," he told heart wire . "Making binary decisions—yes or no—based on whether the p-value threshold is met is rather superficial. For example, a p=0.048 will meet the criterion, but a p=0.052 won't. The difference between the two scenarios is 0.4 percentage points in probability! This line of thought process exposes the tyranny of p values."

But Dr Brian Choi, assistant moderator on theheart.org's Fellow's Corner forum, asks in a post this morning whether the reanalysis in Weinstein's story is even something to get excited about. "Is there any real 'gotcha' here? . . . Who cares if [the ATLAS trial] meets statistical significance with the Wald only? It's commonly used, and every other p value in the WSJ analysis had p=0.05. It is interesting to note that no cardiologists (besides Baim of Boston Scientific) were interviewed for the story, and the expert criticism comes from statisticians, a mathematician, and an internist. I think that most interventionalists would still say, 'Give me Liberté!'"

And on the Wall Street Journal's Health Blog, where readers are responding to Winstein's article, a post by Dr Neil Blumberg takes a similar tone:

"Bottom line, both clinically and mathematically, there is no difference whatever in meaning between a p value of 0.04999 and 0.0500000. It's totally arbitrary, and arguing about it is like arguing about how many angels can dance on the head of a pin. I'm perfectly happy as a scientist and clinician accepting as 'significant' and likely correct a p value of 0.06 or 0.07 as one of 0.04."


A role for the lay press?

The WSJ story has sent ripples through the cardiology community, where physicians are debating the role of a "Wall Street Journal of Medicine" to wade into the world of medical research and regulatory affairs.

"How interesting it is to see a journalist at the WSJ serve as the watchdog, FDA-like function," Dr Eric Topol (Scripps Translational Science Institute, La Jolla, CA) commented to heart wire . "A new precedent, indeed."

"I do think there is a role for the lay press to verify the accuracy of medical research because, for one, they have a large readership that comprises the patients in whom the technology or drugs will be used, and two, they have the resources to serve a watchdog role until there is full transparency of the process at the regulatory level," Dr Sunil Rao (Duke University Medical Center, Durham, NC) commented to heart wire . "However, there are some major caveats—the lay press has to employ independent qualified personnel to review the data, similar to what the WSJ did, like statisticians and clinicians with clinical-trial experience. I don't think it's appropriate to have journalists without this type of background review complex trial designs—half the time even those of us in the profession have trouble understanding the designs."

Kaul, who has been an outspoken critic of studies he feels have been built on shoddy statistical methods, agreed. "If something that's very important is being missed by peer review or allegedly by the FDA review, then I think it was perfectly within the purview of the media to examine this, if it's carefully done and vetted by other external experts. I think this is a very important role for the media to play."

Rao and others also pointed out that Winstein's story illuminates problems with the current regulatory process. "Boston Scientific clearly chose the least burdensome route," Rao said. "Those are the rules of engagement, and Boston followed them. We can't criticize them for that, but we can hopefully use stories like these to convince the clinical community that the devices that get approved in this country don't go through a rigorous process like drugs. If the FDA tells the companies that they have to jump one inch, they ain't going to jump any higher!"

Kaul points to another problem with the clinical-trial process: "People can approach the FDA for guidance, and the FDA can make certain recommendations, but there's no guarantee that the FDA can be held to those, because it's not always the case that the people who agreed to the trial design are the same people who are going to be reviewing the data. This has come up many times."

Kaul also congratulated Winstein for "knocking on the right door," calling the comments on the Wald test "reasonable."

"Just because a methodology is standard does not necessarily make it right!" Kaul stated. "Generally speaking, robust results should be reproducible using different statistical methodology."

Rao disclosed receiving research funding from Cordis. Turco has received consulting fees/honoraria from , has served on the s peakers' bureau of , and received research grants from Boston Scientific.

Comments

3090D553-9492-4563-8681-AD288FA52ACE
Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.
Post as:

processing....