The Association for Library Collections & Technical Services, a division of the American Library Association, is hosting a one-day workshop on issues of reproducibility. The workshop will feature “scholars, librarians, and technologists” discussing “tools and techniques to manage data, enable research transparency, and promote reproducible science. Attendees will learn strategies for fostering and supporting transparent research practices at their institutions”. Participants will (i) “learn tools and techniques that can be immediately employed at their institution to foster and support a culture of transparent research practices”; (ii) “know how to develop a research project on an open platform to manage data and other digital objects throughout the research lifecycle; and (iii) “have access to techniques for organizing empirical research projects in such a way that they can be easily and exactly reproduced.” The conference will be held in Orlando, Florida, on Friday, June 24th. To learn more, click here.
The replication crisis has elicited a number of recommendations, from betting on beliefs, to open data, to improved norms in academic journals regarding replication studies. In our recent working paper, “A Call for Out-of-Sample Testing in Macroeconomics” (available at SSRN), we argue that a renewed focus on out-of-sample tests will significantly mitigate the issues with replication, and we document the fact that out-of-sample tests are absent from entire literatures in economics.
Our starting point is the observation that the literature regarding the government spending multiplier lacked almost any result supported by an out-of-sample test. For this we use a fairly forgiving definition of “out-of-sample test;” so long as a model is parameterized in one period and then applied to new data, we count it. We review 87 empirical papers estimating the multiplier. Out-of-sample tests do not make an appearance, with only a few exceptions. Given that this question is perhaps the most important in macroeconomics, with quite literally trillions of dollars on the line, this result is jarring.
It was in 1953 that Milton Friedman published The Methodology of Positive Economics, urging economists to use prediction as their criterion for comparing the worthiness of competing theories. Clearly, philosophy of science and practical econometrics have moved beyond this simplistic dictum, but does it make sense to cast aside out-of-sample predictions altogether when comparing theories and models? Are we all that confident that the results found using the methods which claim the throne of the “credibility revolution in empirical economics” will withstand the scrutiny of truly out-of-sample tests?
The primary exception to our result is a 2007 paper by Frank Smets and Rafael Wouters, who ably perform an out-of-sample test against a series of baseline models. However, in the absence of other papers performing such tests, it is difficult to say how strong of a result it is. Even more laudable is the lengthy attempt in 2012 by Volker Wieland and his colleagues in comparing the performance of a number of macroeconomic models, although it is difficult to parse this study to answer the narrower question regarding the government spending multiplier. Another example is a 2016 paper by Jorda and Taylor, which creates a “counterfactual forecast,” which is similar to, but not quite, an out-of-sample test.
The other two examples we were able to identify were published in 1964 and 1967.
There are clearly additional criteria that economists can and should use for evaluating theories. Nonetheless, the paucity of examples of these tests points to p-hacking, specification searches, and the whole slew of problems associated with the replication crisis of social science. And perhaps macroeconomics is “hard” and things like recessions cannot be reasonably forecasted. Fine. Meteorologists cannot forecast more than a week or so ahead, but they still forecast what they can forecast. What models work the best is still an extremely pertinent question, even if all models fail miserably when a recession hits.
Rather, doing away with out-of-sample tests and other similar tests does away with the scientific ideal of Conjectures and Refutations, with scientific knowledge evolving as bold ideas starkly stated compete for the title of least wrong.
Bob Gelfond is the CEO of MQS Management LLC and the chairman and founder of MagiQ Technologies. Ryan Murphy is a research assistant professor at the O’Neil Center for Global Markets and Freedom at SMU Cox School of Business.
[From the Retraction Watch website] “In January 2014, Psychological Science began rewarding digital badges to authors who committed to open science practices such as sharing data and materials. A study published today in PLOS Biology looks at whether publicizing such behavior helps encourage others to follow their leads. The authors summarize their main findings in the paper: `Before badges, less than 3% of Psychological Science articles reported open data. After badges, 23% reported open data, with an accelerating trend; 39% reported open data in the first half of 2015″. To read more, click here.
How does one know when replication has hit the big time? When JOHN OLIVER and LAST WEEK TONIGHT do an entire episode on it. For readers of TRN, much of what he talks about will be familiar. Just a lot funnier. Check it out here.
In a recent article in Slate entitled “The Unintended Consequences of Trying to Replicate Research,” IVAN ORANSKY and ADAM MARCUS from Retraction Watch argue that replications can exacerbate research unreliability. The argument assumes that publication bias is more likely to favour confirming replication studies over disconfirming studies. To read more, click here. This is the same argument that Michele Nuijten makes in her guest blog for TRN, which you can read here.
Whether this is a real concern depends on the replication policies at journals. At least two economics journals have publication policies that explicitly state they are neutral towards the conclusion of replication studies. In their “Call for Replication Studies”, Burmann et al. state: “Public Finance Review will publish all … kinds of replication studies, those that validate and those that invalidate previous research” (see here). And the journal Economics: The Open-Access, Open-Assessment E-Journal states: “The journal will publish both confirmations and disconfirmations of original studies. The only consideration will be quality of the replicating study” (see here).
Further, in their recent study, “Replications in Economics: A Progress Report” (see here), Duvendack et al. find that most published replication studies in economics disconfirm the original research. So while it is possible that replications could make things worse, perhaps this is more a worry in theory than in practice. At least in economics.
[From the article “Cancer Research is Broken” in Slate] “The deeper problem is that much of cancer research in the lab—maybe even most of it—simply can’t be trusted. The data are corrupt. The findings are unstable. The science doesn’t work. In other words, we face a replication crisis in the field of biomedicine, not unlike the one we’ve seen in psychology but with far more dire implications.” To read more, click here.
[From the article, “How scientists fell in and out of love with the hormone oxytocin” in Vox:Science & Health] This article recounts how initial laboratory research showing the hormone oxytocin induced trust between people eventually was demonstrated to be mostly Type I error. In this case, it was the original research team (lead by psychologist ANTHONY LANE) who came to realize that that their own lab had focused on statistically significant findings while ignoring insignificant experiments. Of particular interest is when Lane and colleagues tried to publish their revised, null results, they found that journals were unreceptive to the new, less sensational evidence. To read more, click here. To read a previous, related post from The Replication Network, click here.
(FROM THE ARTICLE “The Reproducibility Crisis Is Good for Science”) The author, an editor at Nature, reports on ways the reproducibility crisis is promoting change in science. An excerpt: “For what it’s worth, articles about confirmation bias and the misuse of p-values are consistently among Nature’s most-read stories. Opportunities to get credit for careful work that does not yield a flashy, new result are also expanding. In the past year, journals as diverse as the American Journal of Gastroenterology and Scientific Data actively solicited replication studies or negative results. Information Systems, a data science journal, has introduced a new type of article in which independent experts are explicitly invited to verify work from a previous publication. Last November, the U.K.’s Royal Society introduced a system known as registered reports: The decision to publish is made before results are obtained based on a pre-specified plan to address an experimental question. The F1000 Preclinical and Reproducibility Channel, launched in February, aims to give drug companies an easy way to show which scientific papers promising new paths to drugs might not deliver.” To read more, click here.
[From the article, “New article type verifies experimental reproducibility” at Elsevier Connect] ” Information Systems, a data science journal published by Elsevier, has devised a solution to the question of reproducibility by establishing a new article type: the Invited Reproducibility Paper. Authors of selected published articles are invited to co-author, with the journal’s reproducibility reviewers, a report by which the experiment described in their published article is reproduced and verified.” To read more, click here. NOTE: This idea echoes a suggestion that appeared in the National Academy of Science’s workshop on scientific reproducibility. To read a summary of that workshop, click here.
[From the article in Retraction Watch] “Can we teach good behavior in the lab? That’s the premise behind a number of interventions aimed at improving research integrity, invested in by universities across the world and even private companies. Trouble is, a new review from the Cochrane Library shows that there is little good evidence to show these interventions work. We spoke with authors Elizabeth Wager (on the board of directors of our parent organization) and Ana Marusic, at the University of Split School of Medicine in Croatia.” To read more, click here.
You must be logged in to post a comment.