Predicting Reproducibility

[From the working paper “Predicting the Replicability of Social Science Lab Experiments” by Altmejd et al., posted at BITSS Preprints]
“We have 131 direct replications in our dataset. Each can be judged categorically by whether it succeeded or failed, by a pre-announced binary statistical criterion. The degree of replication can also be judged on a continuous numerical scale, by the size of the effect estimated in the replication compared to the size of the effect in the original study.”
“Our method uses machine learning to predict outcomes and identify the characteristics of study-replication pairs that can best explain the observed replication results.”
“We divide the objective features of the original experiment into two classes. The first contains the statistical design properties and outcomes: among these features we have sample size, the effect size and p-value originally measured, and whether a finding is an effect of one variable or an interaction between multiple variables.”
“The second class is the descriptive aspects of the original study which go beyond statistics: these features include how often a published paper has been cited and the number and past success of authors, but also how subjects were compensated.”
“We compare a number of popular machine learning algorithms … and find that a Random Forest (RF) model has the highest performance.”
“Even with our fairly small data set, the model can forecast replication results with substantial accuracy —around 70%.”
“The statistical features (p-value and effect size) of the original experiment are the most predictive. However, the accuracy of the model is also increased by variables such as the nature of the finding (an interaction, compared to a main effect), number of authors, paper length and the lack of performance incentives.”
“Our method could be used in pre- and post-publication assessment, … For example, when a paper is submitted an editorial assistant can code the features of the paper, plug those features into the models, and derive a predicted replication probability. This number could be used as one of many inputs helping editors and reviewers to decide whether a replication should be conducted before the paper is published.”
“Post-publication, the model could be used as an input to decide which previously published experiments should be replicated.”
To read the paper, click here.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: