[This blog originally appeared at the blogsite Development Impact] About a year ago, I blogged on a paper that had tried to replicate results on 61 papers in economics and found that in 51% of the cases, they couldn’t get the same result. In the meantime, someone brought to my attention a paper that takes a wider sample and also makes us think about what “replication” is, so I thought it would be worth looking at those results.
The paper in question is by Maren Duvendack, Richard Palmer-Jones and Robert Reed and appeared last year in Econ Journal Watch (see here). The paper starts with an interesting history of replication in economics. It turns out that replication goes pretty far back. Duvendack and co. cite the introductory editorial to Econometrica, where Frisch wrote “In statistical and other numerical work presented in Econometrica the original raw data will, as a rule, be published, unless their volume is excessive. This is important to stimulate criticism, control and further studies.” That was in 1933.
Various journals have made similar affirmations of the need for replication over the years. The Journal of Human Resources put it in its policy statement in 1990 – explicitly saying that it welcomed the submission of studies that replicated studies that had appeared in the JHR in the last five years. But this is missing from the current policy, which focuses more on making data and code available with published papers. The Journal of Political Economy took a different approach, and had a “confirmations and contradictions” section from 1976-1999. These explicit publication opportunities may have declined in recent times, but there has been a sharp surge in a different path to replication – the requirement that authors submit their code and dataset for a given paper. Duvendack and co. find 27 journals that regularly publish data and code – and many of these are top journals. The only development field journal that makes this list is the World Bank Economic Review. In addition, many funders now require that, after a decent interval, the data they funded be made publicly available in its entirety.
Before we look at Duvendack and co.’s review of replication trends, it’s worth taking a short detour as to what exactly replication means. Unfortunately, as it’s used in many conversations, it’s imprecise. Michael Clemens has a very nice (and very precise) paper where he lays out a number of distinctions. In this case, precision requires some verbosity, so hang on. Clemens lays out four different types (in two groups):
Replication (both sub-types use the same sampling distribution of parameter estimates and are looking for discrepancies that come from random chance, error, or fraud):
– Verification – uses the same specification, same population and same sample
– Reproduction – uses the same specification, same population but not the same sample
Robustness (uses different sampling distribution for parameter estimates and is looking for discrepancies that come from changes in the sampling distribution – as Clemens notes they need not give identical results in expectation):
– Reanalysis – uses a different specification, the same population and not necessarily the same sample
– Extension – uses the same specification, different population and a different sample.
Duvendack and co. are using a broader definition of replication (especially when compared to the paper I blogged on last year): they’re including what Clemens calls robustness. They go out, casting a wide net to look for replication studies (they include not only Google Scholar and the Web of Science, and the Replication in Economics wiki, but also suggestions from journal editors, their own collections and a systematic search of the top 50 economics journals). This search gives them 162 published studies. The time trend is interesting, as the figure below (reproduced from their paper) shows what could be an upwards trend: