BROWN: How to Conduct a Replication Study – What Not To Do
Two weeks ago, on Halloween, I wrote a post about how to conduct a replication study using an approach that emphasizes which tests might be run in order to avoid the perception of a witch hunt. The post is based on my paper with Benjamin D.K. Wood, which I recently presented at the “Reproducibility and Integrity in Scientific Research” workshop at the University of Canterbury. When Ben and I first submitted the paper to Economics E-journal, we received some great referee comments (all of which are public) including requests by an anonymous referee and Andrew Chang to include in the paper a list of what not to do – a list of don’ts.
We spent some time thinking about this request. We realized that what the referees wanted was a list of statistical and econometric no-nos, especially drawing on the most controversial replication studies funded by the International Initiative for Impact Evaluation (3ie) while we were both there. However, our role at 3ie was to be a neutral third party, at least as much as possible, and we didn’t want to abandon that now.
At the same time, we did learn a lot of lessons about conducting replication research while at 3ie, and we agreed that some of those lessons would be appropriate don’ts. So we added a checklist of don’ts to the paper that was ultimately published. Here I summarize three of these don’ts. I should note that I’m talking here about internal replication studies, which is when the replication researcher uses the original data from a publication to check whether the published findings can be exactly reproduced and are robust, particularly those findings supporting conclusions and recommendations.
When conducting a replication study, don’t confuse critiques of the original research with the replication tests or findings. Certainly, critiques of the original research can motivate the choice of replication exercises, and it is fine to present critiques in that context. But often there are critiques that are separate from what can be explored with the data. For example, a replication researcher might be concerned that the fact that treatment and controls groups were unblinded may mean the published findings are biased.
This concern about the original research design may be valid, but it is not something that can be tested through replication exercises. Simply identifying this concern is not a replication finding. We saw many examples where replication researchers interspersed their critiques of the motivation or design of the original research with their replication exercises and results. Mixing these two types of analysis contributed to some of the biggest controversies that we witnessed.
Don’t conduct measurement and estimation analysis (which some call robustness testing) before conducting a pure replication. (See here and here for more on terminology and the 3ie replication program.) Often replication researchers begin a study motivated by questions of robustness and may even take for granted that a pure replication (which is applying the published analysis methods to the original data) would reproduce the published results.
While skipping the pure replication may seem like a way to save time, conducting the pure replication often has the benefit of saving time. The pure replication is the best way for the replication researcher to familiarize herself with the data, methods, and findings of the original publication, and missing a problem at the pure replication stage is only going to confuse the measurement and estimation analysis.
Even more to the point, some consider pure replication the only stage of the research that should be called “replication”, and therefore the only results that should be reported as replication results. It is important for a replication researcher to be able to make a clear statement about the results at this stage.
Don’t present, post or publish replication results without first sharing them with the original authors. Replication research is, unfortunately, often a contentious undertaking. Replication researchers are advised to take the high road and communicate with original authors about their work – ideally from the beginning, even if the data are already publicly available. We saw cases where the replication researchers made mistakes that the original authors caught, so communication can save face on both sides.
There is a real concern about the original authors scooping a replication study by posting a correction without citing the replication researchers. We have seen this happen. Some approaches to addressing it include publicly posting the replication plan in advance. This research transparency approach serves multiple purposes, but one is putting a name and timestamp on the work that might lead to corrections. Another approach is to document the dates and subjects of communications with original authors and include this information, as an acknowledgement or footnote, in the replication study.
Perhaps one of our most important don’ts is don’t label the difference between a published result and a replication study results an “error” or “mistake” without identifying the source of the error. Just because the second estimate is different than the first does not make the second right. Ben and I already blogged about this don’t recommendation here on the World Bank Development Impact blog.
Recent revelations, such as last month’s report of the retraction of 15 articles by well-known Cornell food researcher, Brian Wansink, remind us that replication research is as important to the advancement of the natural and social sciences as ever. My hope is that more researchers accept the responsibility of conducting replication research as part of their contribution to science. The advice presented in the which tests paper and summarized in my last post and this one is intended to help them get started.
Annette N. Brown, PhD is Principal Economist at FHI 360, where she leads efforts to increase and enhance evidence production and use across all sectors and regions. She previously worked at 3ie, where she directed the research transparency programs, including the replication program.