In our AER Papers and Proceedings paper, “Assessing the Rate of Replications in Economics” we try to answer two questions. First, how often do economists attempt to replicate results? Second, how aware are we collectively of replication attempts that do happen?
Going into this project, the two of us were concerned about the state of replication in the profession, but neither of us really knew for sure just how bad (or good) it might be. To get a better handle on the problem, we set out to quantify just how often results produced in subsequent work spoke to the veracity of the core insights in empirical papers (even if this was not the main goal of the follow up work).
We couldn’t answer this for all work ever done, so we needed to limit the exercise to a meaningful subsample. To do this we chose a base set of papers from the AER’s 100th volume, published in 2010. This volume sample therefore represented important, general-interest ideas in economics, and gave all the papers at least 5 years since publication to accrue replication attempts.
We wanted to be fairly comprehensive on the fields we included, but we also wanted to focus on “replication” in a very broad sense: had the core hypothesis of the previous paper been exposed to a retest and incorporated into the published literature? But this broad definition led to a problem on the coding, as we wanted the reader of each volume paper to be an expert in the field providing his or her opinion on whether something was a replication. To solve this, we put together a group of coauthors who possessed expertise across of an array of fields (adding James Berry, Rania Gihleb, and Douglas Hanley to the project).
Assigning the volume papers by specialty, we read through and coded just over 1,500 papers citing one of the 70 empirical papers in our volume sample. For each paper we coded our subjective opinions on whether each was a replication and/or an extension for one of the original paper’s main hypotheses. Alongside this, we also coded more-objective definitions on the relationship of the data in each citing paper to the original, allowing us to compare our top-level replication coding to the definitions given by Michael Clemens.
The end results from our study indicate that only a quarter of the papers in our volume sample were replicated at least once, while 60 percent had either been replicated or extended at least once. While the replication figure is still lower than we would want, it was higher than we expected. Moreover, the papers that were replicated were the most important papers in our sample: Every single volume paper in our sample with 100 published citations had been replicated at least once. Given 50 published citations, the paper was more likely to have been replicated than not. While the quantitative rates differ slightly, this qualitative result is replicated by the findings in the session papers by Daniel Hamermesh and Sandip Sukhtankar (examining very well-cited papers in labor economics, and top-5/field publications in development economics, respectively.)
While the replication rates that we found were certainly higher than we initially expected, one thing that we discovered from the coding exercise was how hard it was to find replications. Our coding exercise was an exhaustive search within all published economics papers citing one of our volume papers. In total we turned up 52 papers that we coded as a replication, where the vast majority of these were positive replications. But of these 52, only 18 actually explicitly presented themselves as replications. Simply searching for a paper with a keyword such as “replication” isn’t enough, as many of the replications we found were buried as sub-results within larger papers, for which the replication was not the main contribution.
This hampers awareness of replications. Though one might expect that knowledge of replications is better distributed among the experts within each literature, in a survey we conducted of the volume-paper authors and a subsample of the citing authors, the main finding was substantial uncertainty on the degree to which papers and ideas had been replicated.
Certainly the profession could do a far better job in organizing replications through a market design approach. In a companion paper to this one that we wrote with Muriel Niederle, we set out some modest proposals for better citation and republication incentives for doing so. But much, much more is possible.
Lucas Coffman is a Visiting Associate Professor of Economics at Harvard University. Alistair Wilson is an Assistant Professor of Economics at the University of Pittsburgh. Comments/feedback about this blog can be directed to Alistair at firstname.lastname@example.org.
– Berry, James , Lucas Coffman, Rania Gihleb, Douglas Hanley and Alistair J. Wilson. 2017. “Assessing the Rate of Replication in Economics” Am. Econ. Rev P&P, 107 (5): p.27-31
– Coffman, Lucas, Muriel Niederle and Alistair J. Wilson. 2017. “A Proposal to Incentivize, Promote, and Organize Replications” Am. Econ. Rev P&P, 107 (5): p.41-5
– Clemens, Michael. 2017. “The Meaning of Failed Replications: A Review and Proposal.” J. Econ. Surv. 31 (1): p.326–42
– Hamermesh, Daniel S. 2017. “What is Replication? The Possibly Exemplary Example of Labor Economics.” Am. Econ. Rev P&P, 107 (5): p.37-40.
–Sukhtankar, Sandip. 2017. “Replications in Development Economics” Am. Econ. Rev P&P, 107 (5): p.32-6