P-Value Statistic Comes under Criticism

Clinical medicine is anything but precise. To practice medicine well, clinicians need to recognize the many influences that can change, for example, a heart that is working well to one that is in failure. Environment, aging, social influences, and the many unknowns surrounding the workings of the mind on the body all shape this transition from health to disease.

In its essence, clinical medicine involves ambiguity, and practitioners are required to manage, and even be comfortable with, a degree of uncertainty when evaluating and treating a patient. And yet, the research on which practitioners rely to provide good information on how to diagnose and treat patients has, for many years, been based on a measurement that has given at least the illusion of certainty. Use of the p-value to signify that a finding is positive, particularly at an arbitrarily set threshold of <0.05, has governed what is considered “significant” and has strongly influenced which data is accepted for publication and how that data is interpreted for clinical decision making.

But all this is changing. Although many clinicians and researchers over the years have questioned our overreliance on the p-value as a measure that provides sufficient information on which to base the practice of good clinical medicine, a concerted effort to supplement the p-value with other measures more aptly applied to clinical medicine is rapidly evolving.

This evolution is changing the way in which research is conducted, the criteria used for publication, and, importantly, the emphasis on clinical relevance versus statistical significance when interpreting data.

Clinical Relevance

“Many disciplines have come to rely on bright-light thresholds (such as p<0.05) as a means of filtering what is scientifically meaningful from what is not,” said Ronald L. Wasserstein, executive director of the American Statistical Association (ASA), based in Alexandria, Va. “Such thresholds are simple to apply and have the appearance of objectivity. The results, unfortunately, are much less objective than they appear,” he added.

In 2016, the ASA released a statement outlining six principles that clarify the proper use and interpretation of the p-value (Am Stat. 2016;70:129–133; see “The Proper Use and Interpretation of the P-Value,”). What these principles highlight is the limitations of relying solely on the p-value to provide sufficient information for interpreting data.

For Timothy Smith, MD, MPH, director of the Oregon Sinus Center, chief of rhinology and sinus-skull base surgery, and director of clinical research in the department of otolaryngology–head and neck surgery at Oregon Health and Sciences University in Portland and a member of ENTtoday’s Editorial Advisory Board, the ASA statement reinforces what he thinks many clinicians have sensed for a long time. “I think clinicians generally are a bit wary of p-values and significant findings,” he said.

Despite this belief, he emphasized that many, if not most, clinicians are trained with only a superficial knowledge of statistical interpretation. “We’re armed with enough knowledge that we know we should be wary of it, but we’re not necessarily sure what other questions we should be asking or how we should be interpreting the data.”

According to Dr. Smith, other questions that clinicians need to start asking are those directed at understanding the clinical relevancy of the data instead of only looking at statistical significance. This includes asking questions about the effect size around the differences found between, for example, two treatment groups, as well as the width of the confidence interval around the observed difference. Once physicians become more comfortable asking these questions, he said, they can begin to draw their own conclusions about how to interpret the data.

“I think clinicians generally are a bit wary of p-values and significant findings. We’re armed with enough knowledge that we know we should be wary of it, but we’re not necessarily sure what other questions we should be asking or how we should be interpreting the data.” —Timothy Smith, MD, MPH

Zachary M. Soler, MD, associate professor in the department of otolaryngology–head and neck surgery at the Medical University of South Carolina (MUSC) in Charleston, reiterated the importance of asking about the effect size and confidence interval when interpreting a statistical finding in a study based on the p-value.

He acknowledged, however, that sometimes clinicians may not be up to speed when it comes to knowing how best to interpret complex statistical methods. In these cases, he encouraged clinicians to look to sources of post-publication peer review or published commentaries by leading experts for insight and interpretation of a study’s findings. In addition, he emphasized that the most important evidence supporting a specific finding is whether it can be replicated in an entirely new study.

Wasserstein also emphasized questions clinicians should begin thinking about to help determine whether a statistically significant finding is clinically relevant or meaningful (see “How to Determine Clinically Relevant Findings,”).

Research and Publication

While clinicians will be increasingly required to evolve in their thinking of how to interpret medical data, the real revolution in shifting from an overreliance on the p-value to other ways to demonstrate clinical relevance is and will be at the research level, said Dr. Smith. In part, this is being driven by editorial decisions at medical journals that are just beginning to establish policies for publishing research findings that don’t prioritize publishing “positive” data based on the p-value. “More and more journal editors are becoming more sophisticated about statistics and demanding different results reporting from their authors,” he said.

One such journal is JAMA Otolaryngology–Head and Neck Surgery. The publication now states in its editorial policies that the p-value is no longer a sufficient measurement for reporting results. An editorial published in 2016 states the journal’s new guidelines on results reporting (JAMA Otolaryngol Head Neck Surg. 2016;142:937–939).

The Laryngoscope and Laryngoscope Investigative Otolaryngology, both publications of the Triological Society, do not take the hard-lined stance that JAMA Otolaryngology–Head and Neck Surgery does. Authors can use any statistics they think are appropriate, and evaluating their relevance is part of the peer-review process. “In my opinion, the p-value still has some utility, and should be reported,” said Michael G. Stewart, MD, editor-in-chief of The Laryngoscope. “It is not sufficient however for making decisions about clinical significance, or meaningful clinical differences in outcomes or treatments, and other statistical measures. Confidence intervals, for example, are better for making those assessments.”

Although it isn’t clear how many other journals are or will be adopting similar changes to results reporting, there is clear interest in rethinking the p-value and how it is used in scientific research. According to Wasserstein, the ASA statement has generated widespread attention among scientific publications, and medical schools are emphasizing the principles in the statement to their students.

Cultural Shift: Embrace Uncertainty

For Wasserstein, the key challenge of shifting from an overreliance on the p-value to other ways of reporting results is cultural. “Naturally, all of us would love research that provides incontrovertible answers that generalize to every relevant situation, [but] actual research is noisier than that,” he said. “The challenge for researchers is to recognize and even embrace uncertainty, recognizing that evidence in difficult problems rarely sorts unambiguously into ‘something important is there’ and ‘nothing important is there.”

Clinicians, too, may feel a shift, but it may be best characterized by a renewed focus on the inherent ambiguity and uncertainty of medical care as it is practiced in the clinic and the ultimate importance of informed clinical judgment.

“Ultimately, clinicians must consider the study findings in the context of their own clinical practice and make a determination of whether they feel the data is clinically significant such that it impacts their current understanding of disease or the manner in which it should be treated,” said Dr. Soler.

Mary Beth Nierengarten is a freelance medical writer based in Minnesota.

Proper Use and Interpretation of the P-Value

P-values can indicate how incompatible the data are with a specified statistical model.
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
Proper inference requires full reporting and transparency.
A p-value does not measure the size of an effect or the importance of a result.
By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

Source: American Statistical Association

How to Determine Clinically Relevant findings

What was the quality of the study design?
Does the research report on the decisions made in analyzing data that affect the statistical results, such as dealing with outliers or missing data?
Does the research indicate how much the results would differ if other decisions had been made?
What analyses were done but not included in the research report?
How do the results of the research square with prior knowledge?
Have ways of incorporating prior knowledge into the analysis been considered?
Is there a plausible explanation to explain the outcome?

Pages: 1 2 3 4 | Multi-Page

You Might Also Like

Explore This Issue

Clinical Relevance

Research and Publication

Cultural Shift: Embrace Uncertainty

Proper Use and Interpretation of the P-Value

How to Determine Clinically Relevant findings

You Might Also Like: