Shrink the P Value for Significance, Raise the Bar for Research: A Renewed Call

April 02, 2018

The P value of .05 has once again been questioned as a threshold for clinical significance in medical research, this time in a commentary that offers a way to ease toward more relevant alternatives.

"The problem with P values is that if you take their exact definition, what they convey is not something that any clinician would ever be interested in, with very rare exceptions," according to John PA Ioannidis, MD, DSc, Stanford University, California.

The P value, he told | Medscape Cardiology, "is a bit of a ritual that has been embedded across the literature. It's misleading and wrong. We just have to get rid of it."

Indeed, scientists and journals should replace the P value threshold for significance, typically P < .05, with one tenth the magnitude, Ioannidis argues in a viewpoint published March 22 in JAMA.

The new P = .005 standard would be a temporary fix until the field more consistently adopts and ingrains a more clinically relevant statistical test, or several depending on the type of analysis, he proposes.

That P values are currently "misinterpreted, overtrusted, and misused" means that a research finding within the .05 standard "is wrongly equated with a finding or an outcome (eg, an association or a treatment effect) being true, valid, and worth acting on," Ioannidis writes.

"These misconceptions affect researchers, journals, readers, and users of research articles, and even media and the public who consume scientific information. Most claims supported with P values slightly below .05 are probably false (ie, the claimed associations and treatment effects do not exist). Even among those claims that are true, few are worth acting on in medicine and health care," according to Ioannidis.

"Drowning in a Flood of Statistical Significance"

P values are often misunderstood to be the likelihood a finding is by chance, which is incorrect and, besides, does not account for the finding's clinical relevance, Ioannidis notes.

Rather, "a P value is the chance that you will see such extreme results if the null hypothesis is true and if there is no bias." The two 'ifs' are critical, he noted.

A better metric, one that would serve the needs of clinicians, would reflect whether there is a treatment effect, one large enough to be clinically meaningful. The P value, Ioannidis said, "is very remote from that. It's so remote from it that people are just misled."

More useful are hazard ratios (or relative risks or odds ratios) with confidence intervals that convey effect sizes that can show whether a treatment outcome may be clinically appealing, he said. Those metrics don't simply dichotomize results in terms of significance vs nonsignificance.

"We're drowning in a flood of statistical significance," Ioannidis said. "So we need to do something quickly to avoid drowning while we work on some better and more lasting solutions, which include largely perhaps abandoning P values for other metrics in about 80% or 90% of the literature where they're not the appropriate tool of inference."

His viewpoint contends that "Moving the P value threshold from .05 to .005 will shift about one-third of the statistically significant results of past biomedical literature to the category of just 'suggestive.' This shift is essential for those who believe (perhaps crudely) in black and white significant or nonsignificant categorizations."

"What Are We Trying to Answer?"

It's not a new idea, and Ioannidis references a number of past critiques and proposals for moving away from traditional P value thresholds, including a recent one calling for a new standard of .005.

Ioannidis himself has a long history of raising flags about the issue and other standards by which study results, and the publications reporting them, are rated. And it has been 13 years since his own manifesto on the subject, titled "Why Most Published Research Findings Are False."

Sanjay Kaul, MD, Cedars Sinai Medical Center, Los Angeles, who was not involved in the published viewpoint, expressed support for the idea that the .05 threshold for P should be tightened up. He noted for | Medscape Cardiology that a few journals have insisted on smaller P value thresholds to gauge the strength of results, while others have been discouraging the use of P values at all.

He said he has supported the use of Bayesian analysis, "which overcomes the shortcomings of P values," as at least one alternative.

That method, writes Ioannidis, should be broadly applicable to different types of research, as should the metrics that show effect sizes and uncertainty intervals. Regardless of the metrics used, they should be suitable to the type of research.

"We need to think for each study and each question that we are asking: why are we doing it, and what are we trying to answer? And then we can select the metric and the tool that will specifically look at what we want to answer. And this is very rarely a P value," he said in an interview.

For now, widely adopting P < .005 as the standard for significance would be a step in the right direction, probably "for any type of study design," whether randomized trial, meta-analysis, or observational study, he said — although even that level "is probably very lenient" for observational studies.

"For observational results, like associations of diet or lifestyle with cardiovascular outcomes or cancer or stroke, I'd go with thresholds that are much lower, like 10-6. In genetics, people use thresholds like 10-8."

According to Kaul, "Results of observational studies and meta-analyses are most likely going to benefit from implementing a lower P value threshold." Even the US Food and Drug Administration, he said, has endorsed a P value of <.001 for meta-analyses of safety events, and it would be a good idea "if the journal editors followed the lead of the FDA in this regard."

Ioannidis reports being a member of the panel working on the American Statistical Association statement and an author of the article proposing decreasing the threshold of statistical significance. Kaul has reported being a consultant or advisor for Boehringer Ingelheim, Eli Lilly, and Novo Nordisk.

JAMA. Published online March 22, 2018. Abstract

Follow Steve Stiles on Twitter: @SteveStiles2. For more from | Medscape Cardiology, follow us on Twitter and Facebook.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.