On the Past and Present of Reproducibility and Replicability in Economics

[Excerpts are taken from the article “Reproducibility and Replicability in Economics” by Lars Vilhuber, published in Harvard Data Science Review]

“In this overview, I provide a summary description of the history and state of reproducibility and replicability in the academic field of economics.”

“The purpose of the overview is not to propose specific solutions, but rather to provide the context for the multiplicity of innovations and approaches that are currently being implemented and developed, both in economics and elsewhere.”

“In this text, we adopt the definitions of reproducibility and replicability articulated, inter alia, by Bollen et al. (2015) and in the report by NASEM (2019).”

“At the most basic level, reproducibility refers to “to the ability […] to duplicate the results of a prior study using the same materials and procedures as were used by the original investigator.”

“Replicability, on the other hand, refers to “the ability of a researcher to duplicate the results of a prior study if the same procedures are followed but new data are collected”…, and generalizability refers to the extension of the scientific findings to other populations, contexts, and time frames.”

“Much of economics was premised on the use of statistics generated by national statistical agencies as they emerged in the late 19th and early 20th century…Economists were requesting access for research purposes to government microdata through various committees at least as far back as 1959 (Kraus, 2013).”

“Whether using private-sector data, school-district data, or government administrative records, from the United States and other countries, the use of these data for innovative research has been increasing in recent years. In 1960, 76% of empirical AER articles used public-use data. By 2010, 60% used administrative data, presumably none of which is public use.”

“In economics, complaints about the inability to properly conduct reproducibility studies, or about the absence of any attempt to do so by editors, referees, and authors, can be traced back to comments and replies in the 1970s.”

“In the early 2000s, as in other sciences (National Research Council, 2003), journals started to implement “data’ or ‘data availability’ policies. Typically, they required that data and code be submitted to the journal, for publication as ‘supplementary materials.’”

“Journals in economics that have introduced data deposit policies tend to be higher ranked…None of the journals…request that the data be provided before or during the refereeing process, nor does a review of the data or code enter the editorial decision contrast to other domains (Stodden et al., 2013). All make provision of data and code a condition of publication, unless an exemption for data provision is requested.”

“More recently, economics journals have increased the intensity of enforcement of their policies. Historically being mainly focused on basic compliance, associations that publish journals …have appointed staff dedicated to enforcing various aspects of their data and code availability policies…The enforcement varies across journals, and may include editorial monitoring of the contents of the supplementary materials, reexecution of computer code (verification of computational reproducibility), and improved archiving of data.”

If the announcement and implementation of data deposit policies improve the availability of researchers’ code and data…, what has the impact been on overall reproducibility? Table 2B, shows the reproduction rates both conditional on data availability as well as unconditionally, for a number of reproducibility studies”Data that is not provided due to licensing, privacy, or commercial reasons (often incorrectly collectively referred to as ‘proprietary’ data) can still be useful in attempts at reproduction, as long as others can reasonably expect to access the data…Providers will differ in the presence of formal access policies, and this is quite important for reproducibility: only if researchers other than the original author can access the non-public data can an attempt at reproducibility even be made, if it at some cost.

“We made a best effort to classify the access to the confidential data, and the commitment by the author or third parties to provide the data if requested. For instance, a data curator with a well-defined, nonpreferential data access policy would be classified under ‘formal commitment.’…We could identify a formal commitment or process to access the data only for 35% of all nonpublic data sets.”

“One of the more difficult topics to empirically assess is the extent to which reproducibility is taught in economics, and to what extent in turn economic education is helped by reproducible data analyses. The extent of the use of replication exercises in economics classes is anecdotally high, but I am not aware of any study or survey demonstrating this.”

“More recently, explicit training in reproducible methods (Ball & Medeiros, 2012; Berkeley Initiative for Transparency in the Social Sciences, 2015), and participation of economists in data science programs with reproducible methods has increased substantially, but again, no formal and systematic survey has been conducted.”

“Because most reproducibility studies of individual articles ‘only’ confirm existing results, they fail the ‘novelty test’ that most editors apply to submitted articles (Galiani et al., 2017). Berry and coauthors (2017) analyzed all papers in Volume 100 of the AER, identifying how many were referenced as part of replication or cited in follow-on work.”

“While partially confirming earlier findings that strongly cited articles will also be replicated (Hamermesh, 2007), the authors found that 60% of the original articles were referenced in replication or extension work, but only 20% appeared in explicit replications. Of the roughly 1,500 papers that cite the papers in the volume, only about 50 (3.5%) are replications, and of those, only 8 (0.5%) focused explicitly on replicating one paper.”

“Even rarer are studies that conduct replications prior to their publication, of their own volition. Antenucci et al. (2014) predict the unemployment rate from Twitter data. After having written the paper, they continued to update the statistics on their website (“Prediction of Initial Claims for Unemployment Insurance,” 2017), thus effectively replicating their paper’s results on an ongoing basis. Shortly after release of the working paper, the model started to fail. The authors posted a warning on their website in 2015, but continued to publish new data and predictions until 2017, in effect, demonstrating themselves that the originally published model did not generalize.”

“Reproducibility has certainly gained more visibility and traction since Dewald et al.’s (1986) wake-up call…Still, after 30 years, the results of reproducibility studies consistently show problems with about a third of reproduction attempts, and the increasing share of restricted access data in economic research requires new tools, procedures, and methods to enable greater visibility into the reproducibility of such studies. Incorporating consistent training in reproducibility into graduate curricula remains one of the challenges for the (near) future.”

To read the article, click here.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: