In late February, the National Academy of Sciences published a report summarizing a workshop held the previous year. The report can be freely downloaded here. The workshop convened researchers across a wide variety of disciplines and addressed numerous facets regarding research reproducibility. Some highlights are:
— There is still no consensus about terminology: “reproducibility”, “replicability”, and “robustness” are some (but not all!) of the terms that attempt to parse out the nuances associated with verifying research reliability.
— There is general consensus that a p-value of 0.05 is too high to ensure a reasonable likelihood that the results can be “reproduced.” However, no consensus exists about what 0.05 should be replaced with.
— Workshop participants noted that the p-value is itself a sample statistic with variance. This had led to constructs such as the “reproducibility probability” which reports the probability that “a repeated experiment will produce a statistically significant result.”
— There is progress, but still no consensus, on the appropriate statistical measures to determine when a follow-up study confirms the findings of a previous study. Greater reliance on Bayesian statistics was mentioned.
— TABLE 3.2 in the report provides an illuminating taxonomy of different issues associated with reproducibility
— Many ideas for incentivizing reproducibility were offered. One innovative idea is journal policies that give authors the option to have their article certified as “reproducible,” and allowing the reviewers who do the certification to receive some degree of co-authorship status.
— Economics is far behind other disciplines in seriously addressing this issue.