In a recent opinion piece in the journal Nature, ROBERT MACCOUN and SAUL PERLMUTTER argue that “blind analysis” techniques, originally developed in particle physics, could be useful for the social sciences as well. For example, in testing whether X affects Y, the data would be manipulated by a third party in such a way that the researcher would not be able to determine the true relationship. Instead, the researcher would develop the data and estimation procedures that they judged to be most likely to produce an accurate outcome. Only after the researcher signed off on the data and estimation procedures, would they be allowed to see the result. To read more, click here.
What’s the International Initiative for Impact Evaluation (3ie) doing in the replication business? 3ie is mostly known in the development community as a funder of impact evaluations and systematic reviews. But our leadership always envisioned a role for replication research within 3ie’s mandate to provide high quality evidence for policymaking. We designed 3ie’s replication programme to encourage internal replication studies of influential, innovative, or controversial development-related impact evaluations. Through two rounds of replication windows we’ve funded 10 replication studies to date, including the highly discussed replication research around deworming treatments in Kenya (for a related news item in TRN, click here). So here’s what 3ie is doing in the replication business.
Our processes are designed to address common criticisms of replication research (with this blog borrowing from a forthcoming paper we’re writing about it). Selection of replication-eligible studies is fraught with insinuations of improper selection, with replication researchers supposedly only choosing to replicate studies they feel confident they can disprove. We address that criticism by creating different eligibility mechanisms, such as choosing studies based on a crowdsourced Candidate Studies List and gathering a committee of experts to judge the policy relevance of each replication study proposal.
One of the biggest concerns regarding replication research is researcher incentives. If bias exists for original authors to discover a new result, it can also exist for replication researchers to disprove the established result. To address this concern, we require all replication researchers to post replication plans, which allow readers to know how the researchers intended to undertake their replication study before starting the research. Ideally all robustness tests conducted in the replication paper will be publicly pre-specified in these replication plans.
In an attempt to further defuse replication tensions, we encourage engagement between the replication researchers and the original authors. We require 3ie-funded replication researchers to include a “pure replication,” and to share these results early in the replication process. In the pure replication, the researchers attempt to reproduce the published results using the same data and methods as in the publication. We then require these replication researchers to share their findings with the original authors before completing their study, giving the original authors the opportunity to reply to the direct reproduction of their work before any results are finalized.
Original authors are understandably sensitive to replication researchers who (in their opinion) solely aim to discredit their work. 3ie’s replication process includes multiple rounds of internal and external referring, including reviews of replication plans, pure replications, and draft final replication reports (see our peer reviewing replication research blog for more detail).
Finally, replication researchers are concerned that they will spend a significant amount of time conducting their study and then have no place to publish it. And original authors are worried that they won’t have an opportunity to directly reply to the replication study. While we cannot guarantee publications, we created 3ie’s Replication Paper Series (RPS) to partially address both of these concerns. The RPS provides an outlet for the replicating researchers to publish their work, and for original authors to respond to it. We view the RPS as a repository of replication research, including confirmatory studies that might struggle to find space in a journal.
If you’re interested in replication research, here are a few ways to get involved with 3ie’s replication programme:
– We’re planning another replication window. Send us the titles of recently published policy relevant development impact evaluations papers that you think should be considered for future replication to replication@3eimpact.org.
– Apply for a replication award when we open our next window.
– Volunteer to serve as an external reviewer of replication research.
– Submit your replication paper, even if it wasn’t funded by 3ie, to our RPS (here are the instructions).
In economic sciences, empirically-based studies have become increasingly important: According to Hamermesh (2012), the number of contributions to journals in which authors utilized self-collected or externally produced datasets for statistical analyses have massively increased in the course of the last decades.
With the growing relevance of publications based on empirical research, new questions and challenges emerge: Issues like integrating research data and scripts to run a data model in the broader context of a published article to foster replicable research and validation of scientific results are becoming increasingly important for both researchers and editors of scholarly journals.
Especially for a scientific discipline like economics, the effects of flawed research might have a huge impact on society, as the prominent debate of Reinhart’s and Rogoff’s “Growth in a time of debt” (2010) illustrated. Their paper attracted much attention and the results were cited by US vice presidential candidate Paul Ryan and EU monetary affairs commissioner Olli Rehn to justify austerity policy.
But when Rogoff and Reinhart provided the Excel-sheet of their calculations for teaching purposes to a student in 2013, this student, Thomas Herndon, was not able to replicate the results of the paper. Furthermore, he discovered that the Excel-sheet contained faulty calculations and selectively omitted data, which casted massive doubts on Reinhart’s and Rogoff’s findings.
Despite the fact, that the paper of Rogoff and Reinhart has been published in the American Economic Review (AER), a journal having a strict data availability policy, the paper of the two American researchers has been exempted from this policy.
Therefore one could ask how journals in economics and business studies generally handle the challenges associated with empirically-based research. At least in theory, journals should serve as a quality assurance for economic research. On these grounds, peer-review was established to ensure a high quality of published research. But apparently, peer-review does not include the data appendices or other materials associated with empirically-based research: According to the US-economist B.D. McCullough, journals often fail to ensure the robustness of published results. After investigating the data archives of some economic journals, he concluded: „Despite claims that economics is a science, no applied economics journal can demonstrate that the results published in its pages are replicable, i.e., that there exist data and code that can reproduce the published results. No theory journal would dream of publishing a result (theorem) without a demonstration (proof) that the reader can trust the result.” (McCullough, 2009)
To analyse how journals in economics and business studies deal with the challenge of reproducible research since the new decade, in 2010 we applied for funding from the German Research Foundation (DFG) for a project called “European Data Watch Extended – EDaWaX“. EDaWaX has several goals: The main objective of the project is to develop a software application for editors of social sciences’ journals. This software facilitates the management of publication-related research data. To gather some of the functional requirements for the development of the application, we analysed the number and specifications of existing data policies of economics journals for the first time in 2011. In 2014 we expanded our study. In our recent paper, we analysed the data policies of scholarly journals available in a sample of 346 journals. Many of them are among the top-journals of the profession. In contrast to our study in 2011, we also included a big share of journals in business studies to compare both branches of economic research.
Especially for economics journals we are able to state that things are changing slowly but steady: More than fourth of all economics journals in our sample are equipped with more or less functional data policies. While some journals pay lip-service to reproducible research, others effectively enforce their data policy.
In our paper we summarise the findings of this empirical study. We regard both the extent and the quality of journals’ data policies, which should facilitate replications of published empirical research. The paper presents some characteristics of journals equipped with data policies and gives some recommendations for suitable data policies in economics and business sciences journals. In addition, we also evaluate the journals’ data archives to roughly estimate whether these journals enforce data availability.
References:
Herndon, T.; Ash, M. & Pollin R. (2013), Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff, Political Economy Research Institute. Retrieved from: http://www.peri.umass.edu/fileadmin/pdf/working_papers/working_papers_301-350/WP322.pdf.
McCullough, B.D. (2009): Open Access Economics Journals and the Market for Reproducible Economic Research. Economic Analysis and Policy 39, 1, pp. 117-126.
Ryan, P. (2013), The Path to Prosperity: A Blueprint for American Renewal. Fiscal Year 2013 Budget Resolution, House Budget Committee. Retrieved from http://budget.house.gov/uploadedfiles/pathtoprosperity2013.pdf.
The American Economic Review (2005), Data Availability Policy. Retrieved from: https://www.aeaweb.org/aer/data.php.
Vlaeminck, S. / Herrmann L.K. (2015), Data policies and data archives: A new paradigm for academic publishing in economic sciences? In: Schmidt, B. & Dobreva, M. (Eds.), New Avenues for Electronic Publishing in the Age of Infinite Collections and Citizen Science: Scale, Openness and Trust. Proceedings of the 19th International Conference on Electronic Publishing, September 2015. Retrieved from doi:10.3233/978-1-61499-562-3-145.
Science is a community of human beings of the homo sapiens species: bipedals with the capacity to be self-reflexive. This implies that science as a community is subject to all the same behavioral patterns that all human communities are, including a plethora of biases at both the individual and collective level.
Examples of well-known individual-level biases are hubris, confirmatory preference, and desire for novelty (or the reverse: fear of the new). This implies, for instance, that “When an experiment is not blinded, the chances are that the experimenters will see what they ‘should’ see” (The Economist, 2013). Together, these biases lead to Type I and Type II errors in judging research, both our own and that of others. As a result, without correcting mechanisms, published research will be heavily biased in favor of evidence that is in line with the theory.
Science’s first line of defense is the micro-level reviewing process. Regrettably, the reviewing process, double-blinded or not, is anything but flawless, but rather full of biases itself. This is not surprising, as the reviewing process is carried out by exemplars of the very same homo sapiens species that cannot escape from all these biases referred to above (plus quite a few others).
Ample evidence abounds that current reviewing practices fail to provide the effective filtering mechanism they are claimed to provide. Take the revealing study of Callaham and McCulloch (2011). On the basis of a 14-year sample of 14,808 reviews by 1,499 reviewers rated by 84 editors, they conclude that the quality scores deteriorated steadily over time, with the rate of deterioration being positively correlated with reviewers’ experience. This is mirrored in the well-established finding that reviewers, on average, fail to detect fatal errors in manuscripts, which reinforces the publication of false positives (Callaham & Tercier, 2007; Schroter et al., 2008).
Hence, giving these unavoidable biases associated with the working of the human brain, the scientific community should adhere, as a collective, to a set of macro-level correcting principles as a second line of defense. Probably the most famous among these is Popper’s falsifiability principle. Key to Karl Popper’s (1959) incredibly influential philosophy of science is his argument that scientific progress evolves on the back of the falsification principle.
We, as researchers, should try, time and again, to prove that we are wrong. If we find the evidence that indeed our theory is incorrect, we can further work on developing new theory that does fit with the data. Hence, we should teach the younger generation of researchers that instead of being overly discouraged, they should be happy if they cannot confirm their hypotheses.
This quest for falsification is critical because, in the words of Ioannidis (2012: 646), “Efficient and unbiased replication mechanisms are essential for maintaining high levels of scientific credibility.” The falsification principle requires a tradition of replication studies in combination with the publication of non-significant and counter-results, or so-called nulls and negatives, backed by systematic meta-analyses.
Current publication practices are overwhelmingly anti-Popperian. No one is really interested in replicating anything, and meta-analyses are far and between. Indeed, only a tiny fraction of published studies involve a replication effort or meta-analysis. Moreover, journal authors, editors, reviewers and readers are not interested in seeing nulls and negatives in print.
This replication defect and publication bias crisis implies that Popper’s critical falsification principle is actually thrown into the scientific community’s dustbin. We, as a collective, violate basic scientific principles by (a) mainly publishing positive findings (i.e., those that are in support of our hypotheses) and (b) rarely engaging in replication studies (being obsessed with novelty). Behind the façade of all these so-called new discoveries, false positives abound, as do questionable research practices .
In my recently published Manifesto “What Happened to Popperian Falsification?”, I argue what I believe is wrong, why that is so, and what we might do about this. This Manifesto is primarily directed at the worldwide Business and Management scholarly community. However, clearly, Business and Management is not the only discipline in crisis.
If you share the concerns expressed in the my Manifesto, I encourage you to signal your support. For that purpose, I opened a petition webpage at change.org. This can be signed, and used to start exchanging ideas.
To kick-start this dialogue, I provide a tentative suggestion regarding a new and dynamic way of conducting, reporting, reviewing and publishing research, for now referred to as Scientific Wikipedia. My hope is that by initiating this dialogue, a few of the measures suggested in the Manifesto will be implemented; and others – perhaps far more effective ones – will be added over time.
Callaham, M. and C. McCulloch (2011). Longitudinal Trends in the Performance of Scientific Peer Reviewers, Annals of Emergency Medicine, 57: 141-148.
Callaham, M. L. and J. Tercier (2007). The Relationship of Previous Training and Experience of Journal Peer Reviewers to Subsequent Review Quality, PLoS Medicine, 4: 0032-0040.
Ioannidis, J. P. A. (2012). Why Science Is Not Necessarily Self-Correcting, Perspectives on Psychological Science, 7: 645-654.
Popper, K. (1959). The Logic of Scientific Discovery. Oxford: Routledge.
Schroter, S., N. Black, S., Evans, F., Godlee, L., Osorio, L., and R. Smith (2008). What Errors Do Peer Reviewers Detect, and Does Training Improve their Ability to Detect Them?, Journal of the Royal Society of Medicine, 101: 507-514.
In a recent working paper, authors ANDREW CHANG and PHILLIP LI examined 60 published, empirical papers in 13 economics journals to determine whether the research could be replicated. Less than half of the papers could be replicated, even with help from the authors. The main reason that articles could not replicated was that sufficient data and code were not provided. The paper concludes with recommendations for how to improve the state of replicability in economics. To read more, click here.
In a recent editorial, the journal Epidemiology argues that committing to a set of standards known at the Transparency and Openness Promotion (TOP) guidelines would, among other things, inhibit “creativity and novelty.” Instead, the journal says, “We intend to soon ask authors of each new submission to explain whether and how these materials might be made available.” To read the journal’s full editorial, click here. To learn more about the TOP guidelines, click here. (H/t to the blog Political Science Replication for this item.)
In 1981 I was appointed as a professor of economics at the University of Amsterdam. One of my colleagues was Joop Klant, professor of economic methodology. When he retired, in 1986, at the farewell dinner, he reminded us of his opportunity cost: we received a copy of his novel De fiets (The bicycle), a booklet that had brought him literary fame in his youth. He signed my copy with the encouragement: “Test, test, test, Hartog: never stop”. Yes, we shared a belief in the Popperian assignment and I have been trying to test theory whenever I could. In fact, the paper I am now working on is an empirical test of the theory that employers safeguard young graduates from wage reduction when their productivity turns out below standard (the present verdict is: reject!).
In 1994, my friend Jules Theeuwes and I launched the journal Labour Economics. We wanted a balanced mix of theory and empirical work and we opened a separate section for Replications, with a separate editor. As replication submissions were barely forthcoming (one paper in 3 years), we decided to beat the drum. We were firm believers. “The basic premise of econometric research is the existence of stable parameter values in equations that relate economic variables. Yet, we do not have a great deal of information about parameter values and their empirical distributions”, we wrote in 1997 as introduction to a set of invited papers on replication (Labour Economics, 4(2), 99). We guaranteed publication of replication studies, provided they would meet some mild conditions (aim to replicate key findings of an original article in a leading journal, and contain no methodological flaws).
To our regret, we never got a single submission. We understood the reason quite well. Replication does not lead to academic prestige. There are some famous cases of replications that failed to reproduce the original findings (Harberger’s tax on capital, the Journal of Money, Credit and Banking Project, see our Labour Economics introduction), but basically, the profession and in particular journal editors, were not interested. But the times, they are a’changing.
Interest in the reliability and credibility of empirical work has been mounting. For nine years, I have been a member of LOWI, a national board for research integrity founded by Dutch universities and The Netherlands Academy of Sciences. It’s a board of appeal on university rulings on research integrity. When we started, a decade ago, we got a few cases annually. In 2014 the board got 24 cases, generally much more serious than initially, with heavy impact on the accused. In The Netherlands we experienced some spectacular cases of data fraud (not in economics though) and awareness of the ofen very shaky basis of econometric results has strongly increased. The dangers of an emphasis on originality, on new methods, new models, new approaches, rather than on the painstaking patient search for reliable, reproducable results are now clearly appreciated. Data must be made easily available. I remember a phrase that struck me long ago: “Often, a researcher’s mind is more fruitful than his database”. Data transparancy and a new attitude should change that.
Some time ago, it occurred to me that professional organisations or journals should create a replication archive. I mentioned that, visiting Waikato University, to Jacques Poot, who told me: my dear friend, that exists! An excellent initiative. It proves again that the new day starts in New Zealand: that ‘s where the sun first rises.
Here’s evidence on my sincerity:
-
Arulampalam, J. Hartog, T. MaCurdy and J.Theeuwes (1997), Replication and re-analysis, Labour Economics, 4(2), pp. 99-105
-
Cabral Vieira, L. Diaz Serrano, J. Hartog and E. Plug (2003), Risk compensation in wages: a replication, Empirical Economics, 28, pp. 639-647
-
Mazza, H. van Ophem and J. Hartog (2013), Unobserved heterogeneity and risk in wage variance: Does more schooling reduce earnings risk? Labour Economics, 24 (), 323-338
-
Budria, L. Diaz Serrano, A. Ferrer and J. Hartog (2013), Risk Attitude and Wage Growth, Replicating Shaw (1996), Empirical Economics, 44 (2), 981-1004
The upcoming annual meeting of the American Economic Association will be held in San Francisco on January 3-5, 2016. The preliminary program was recently released and features a session on “Replications in Economics.” To learn more, click here.
This article from Discover highlights a self-replication by leading researchers investigating the relationship between oxytocin and trust. These researchers reported not being able to replicate their own, previously published study. Click here to read the article. While the subject itself may interest some, the article leaves unaddressed a very interesting question. SHOULD JOURNALS ALLOW RESEARCHERS TO PUBLISH WORK REFUTING THEIR EARLIER RESEARCH? There are some potentially perverse incentives in play here. But perhaps reputational effects would keep these in check? Some journals, such as Public Finance Review, require the replicating author to be independent of the original author (see Guideline #3 here). However, that may be a necessary requirement given PFR’s policy of publishing both positive and negative confirmations. Does that mean that journals should only publish self-replications if the researchers fail to replicate their original study? Food for thought.
In this article from Time Higher Education, WOLFGANG STROEBE and MILES HEWSTONE ask “What have we learned from the Reproducibility Project?” The short answer is — in their opinion — not much. Instead, they argue that the replicating researchers would have used their time more productively if they had conducted a meta-analysis. To read more, click here.
You must be logged in to post a comment.