[Note: This blog is based on our articles “Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence” (Management Science, 2016) and “Statistical Significance and the Dichotomization of Evidence” (Journal of the American Statistical Association, 2017).]
The null hypothesis significance testing (NHST) paradigm is the dominant statistical paradigm in the biomedical and social sciences. A key feature of the paradigm is the dichotomization of results into the different categories “statistically significant” and “not statistically significant” depending on whether the p-value is, respectively, below or above the size alpha of the test, where alpha is conventionally set to 0.05. Although prior research has oft criticized this dichotomization for, inter alia, having “no ontological basis” (Rosnow and Rosenthal, 1989) and the arbitrariness of the 0.05 cutoff value, the impact of this dichotomization on the judgments and decision making of academic researchers has received relatively little attention.
Our articles examine this question. We find that the dichotomization intrinsic to the NHST paradigm leads expert researchers from a variety of fields (including medicine, epidemiology, cognitive science, psychology, business, economics, and even statistics) to make errors in reasoning. In particular, when presented with a hypothetical study summary with a p-value experimentally manipulated to be either above or below the 0.05 threshold for statistical significance, we show:
 Academic researchers interpret evidence dichotomously primarily based on whether the p-value is below or above 0.05.
 They fixate on whether a p-value reaches the threshold for statistical significance even when p-values are irrelevant (e.g., when asked about descriptive statistics).
 These findings apply to likelihood judgments about what might happen to future subjects as well as to choices made based on the data.
 Researchers’ judgments reflect a tendency to ignore effect size.
We briefly review these findings with a focus, given the audience of this blog, on our results for economists.
Study 1: Descriptive Statements
In our first series of studies, the hypothetical study summary described a clinical trial of two treatments where the outcome of interest was the number of months lived by the patients (average of 8.2 and 7.5 months for treatments A and B respectively). Our subjects were asked a multiple choice question about whether the number of months lived by those who received treatment A was greater, less, or no different than the number of months lived by those who received treatment B or whether it could not be determined.