Even authors of published articles in a top statistics journal are often confused about the meaning of p-values, especially by treating 0.05, or the range 0.05–0.15, as the location of a threshold. The underlying problem seems to be deterministic thinking. To put it another way, applied researchers and also statisticians are in the habit of demanding more certainty than their data can legitimately supply. The problem is not just that 0.05 is an arbitrary convention; rather, even a seemingly wide range of p-values such as 0.01–0.10 cannot serve to classify evidence in the desired way.
In our article, John and I discuss some natural solutions that won’t, on their own, work:
– Listen to the statisticians, or clarity in exposition
– Confidence intervals instead of hypothesis tests
– Bayesian interpretation of one-sided p-values
– Focusing on “practical significance” instead of “statistical significance”
– Bayes factors
You can read our article for the reasons why we think the above proposed solutions won’t work.
From our summary:
We recommend saying No to binary conclusions . . . resist giving clean answers when that is not warranted by the data. . . . It will be difficult to resolve the many problems with p-values and “statistical significance” without addressing the mistaken goal of certainty which such methods have been used to pursue.
P.S. Along similar lines, Stephen Jenkins sends along the similarly-themed article, “‘Sing Me a Song with Social Significance’: The (Mis)Use of Statistical Significance Testing in European Sociological Research,” by Fabrizio Bernardi, Lela Chakhaia, and Liliya Leopold.