Deborah Mayo on Banning Significance Tests

[Excerpts taken from the article “P-value Thresholds: Forfeit at Your Peril’ by Deborah Mayo, forthcoming in the European Journal of Clinical Investigation]
“A key recognition among those who write on the statistical crisis in science is that the pressure to publish attention-getting articles can incentivize researchers to produce eye-catching but inadequately scrutinized claims.”
“We may see much the same sensationalism in broadcasting metastatistical research, especially if it takes the form of scapegoating or banning statistical significance.”
“A lot of excitement was generated recently when Ron Wasserstein, Executive Director of the American Statistical Association (ASA), and co-editors A. Schirm and N. Lazar, updated the 2016 ASA Statement on P-Values and Statistical Significance (ASA I).”
“In their 2019 interpretation, ASA I “stopped just short of recommending that declarations of ‘statistical significance’ be abandoned,” and in their new statement (ASA II) announced: “We take that step here….Statistically significant –don’t say it and don’t use it”.”
“To herald the ASA II, and the special issue “moving to a world beyond p < 0.05”, the journal Nature requisitioned a commentary from Amrhein, Greenland and McShane “Retire Statistical Significance” (AGM). With over 800 signatories, the commentary received the imposing title “Scientists rise up against significance tests”!”
“Getting past the appeals to popularity and fear, the reasons ASA II and AGM give are that thresholds can lead to well-known fallacies, and even to some howlers more extreme than those long lampooned.”
“Of course it’s true: “a statistically non-significant result does not ‘prove’ the null hypothesis (the hypothesis that there is no difference between groups or no effect of a treatment …). Nor do statistically significant  results ‘prove’ some other hypothesis.” (AGM)”
“It is easy to be swept up in their outrage, but the argument: “significance thresholds can be used very badly, therefore remove significance thresholds” is a very bad argument. Moreover, it would remove the very standards we need to call out the fallacies.”
“The danger of removing thresholds on grounds they could be badly used is that they are not there when you need them.”
Ioannidis zeroes in on the problem: ‘With the gatekeeper of statistical significance, eager investigators whose analyses yield, for example, P = .09 have to either manipulate their statistics to get to P < .05 or add spin to their interpretation to suggest that results point to an important signal through an observed “trend.” When that gate keeper is removed, any result may be directly claimed to reflect an important signal or fit to a preexisting narrative.’”
“ASA II regards its positions “open to debate”. An open debate is very much needed.”
To read the article, click here.

Leave a comment