(FROM THE CHANNEL’S WEBSITE) “The Preclinical Reproducibility and Robustness channel is a platform for open and transparent publication of confirmatory and non-confirmatory studies in biomedical research. The channel is open to all scientists from both academia and industry and provides a centralized space for researchers to start an open dialogue, thereby helping to improve the reproducibility of studies.” To read more, click here.
(FROM THE ARTICLE “Are Results in Top Journals To Be Trusted?”) A paper recently published in the American Economic Journal, entitled “Star Wars: The Empirics Strike Back”, “analyses 50,000 tests published between 2005 and 2011 in three top American journals. It finds that the distribution of results (as measured by z-score, a measure of how far away a result is from the expected mean) has a funny double-humped shape (see chart below). The dip between the humps represents “missing” results, which just happen to be in a range just outside the standard cut-off point for statistical significance (where significance is normally denoted with stars, though the name may also be something to do with a film recently released—file under ‘economists trying to be funny’). Their results suggest that among the results that are only just significant, 10-20% have been fudged.” To read more, click here.

As reported in a previous blog post, the Economics E-Journal has launched a new replication section. As part of this initiative, we have developed a set of guidelines for replication submissions.
These guidelines seek to strike a reasonable balance among the needs of replicating authors (a fair chance to publish replications), replicated authors (protection against poorly-done replication studies), and readers (who need to know in a timely manner whether or not economics research is robust).
These guidelines for replication submissions are described at the journal’s website. This blog consists of two parts. In the first part, we provide a summary of our current guidelines. In the second part, we ask readers for input.
PART I: Current Guidelines
The guidelines for a replication submission at Economics E-Journal are summarized as follows.
1) An assistant editor determines whether the submission is of sufficient merit to be sent through the refereeing process. If not, the paper is desk-rejected.
2) If a paper passes the first stage, it is sent on to a Co-Editor, who makes a similar determination about merit. If the paper is not of sufficient merit, the paper is desk-rejected.
3) Then, the replication is sent to the original author, who has a chance to reply within 60 days. This reply is then appended to the submission.
4) If the paper passes this stage, the paper (with reply) is published as a discussion paper (which is like a working paper).
5) It is then sent on to two or three anonymous referees, none of whom is the original author. These referees submit reports which are posted online. Commenters can also comment during this time.
6) After the referee reports are posted, the author may reply to the referees or even update the paper.
7) After the author replies, a committee of three then makes a decision (generally to publish as a full-fledged journal article with or without specific revisions, or to reject). Any further exchanges between the replicating and original authors are then appended to the published article.
Our complete guidelines for replicators can be found here. While the guidelines are mostly set, we are still seeking input into our procedures, and we will undoubtedly make changes as we gain more experience with replications.
PART II: How You Can Help
We are seeking input on the following items:
– In light of the guidelines for non-replication submissions, do you believe that the current guidelines for replication submissions are appropriate?
– If not, what concrete suggestions do you have for improvement?
– Do you believe that the 60-day embargo on the discussion paper (to wait for the original author’s reply) makes sense, or should the discussion paper be published as soon as a Co-Editor believes it has sufficient merit?
– Should an embargo be placed instead on publication of the journal article? That is, should publication as a journal article wait until the original author has a chance to reply to the final version of the replication study?
In addition, we are keen to hear any other ideas you have for improving the replication policy at Economics E-Journal.
To provide feedback, comment directly on this blog page, or email Claire Boeing-Reicher (Kiel Institute for the World Economy), at Claire.Reicher@ifw-kiel.de .
We look forward to hearing from you.
In a blog for Retraction Watch, LIZ WAGNER argues that it is good when authors provide data and code. But it’s not necessarily the most important thing. Registering a research protocol would do more to prevent data mining and p-hacking. And requiring the former may distract attention from the latter. To read more, click here.
The Economics E-Journal announces the launch of a dedicated replication section. This initiative is a joint effort of the Kiel Institute for the World Economy (IfW) and the German National Library for Economics (ZBW). It provides authors across all fields of economics an outlet for publishing replication studies.
This initiative is motivated by the difficulties that authors have had in submitting replication studies to other journals, and by the culture of secrecy within the profession around failed replications. We hope that our initiative can help begin to change these things.
The journal has several unique characteristics that may make it an attractive outlet for researchers looking for outlets for their replication studies. These characteristics are driven by a principle of openness.
This openness shows up in the desire for the journal to open up access to the submission and refereeing processes, speed up those processes while maintaining high standards, and open up access to articles for readers outside major academic institutions.
So far, this openness has met with success.
For instance, the journal currently has an impact factor of 0.644 (JCR Social Sciences Edition 2014), which places it between the Southern Economic Journal (IF = 0.683) and Applied Economics (IF = 0.613). Further, Economics E-Journal’s impact factor is likely to be biased downward since the journal is not yet ten years old.
The characteristics of the Economics E-Journal that make it an ideal outlet for replications are as follows.
First of all, the journal’s electronic format implies that there are no space constraints. The only constraint on the number of replications that can be published is the quantity and quality of the submitted replication studies.
Secondly, the management of the journal occurs alongside, but independently from, the IfW’s and ZBW’s other journals, and from other journals in the economics profession. This ensures a degree of independence not found in some other journals.
Thirdly, the journal is open access and open evaluation. Open access means that authors can reach a wide audience without running into a paywall. Open evaluation means that reviewers, commenters, and editors adjudicate papers in a fair, transparent, and rapid way.
Fourthly, the journal is a general interest journal, which means that it accepts submissions from all subfields of economics.
Guidelines for replicators can be found here. While the guidelines are mostly set, we are still seeking input into our procedures. In my next installment, I will be asking TRN readers for their comments and suggestions. Stay tuned!
-Claire Boeing-Reicher, Researcher, Kiel Institute for the World Economy.
(FROM THE ARTICLE “All at sea: Ideological divisions in economics undermine its value to the public“): “Sifting out the guff requires transparency, argued John Cochrane of the University of Chicago in another recent blog post. Too many academics keep their data and calculations secret, he reckoned, and too few journals make space for papers that seek to replicate earlier results. Economists can squabble all they like. But the profession is of little use to anyone if it cannot then work out which side has the better of the argument.“
A new initiative calls for journal reviewers to ask editors to request authors to provide data and supporting code/documentation before they agree to review the manuscript. From The Peer Reviewers Openness Initiative: “We suggest that … reviewers make open practices a pre-condition for more comprehensive review. This is already in reviewers’ power; to drive the change, all that is needed is for reviewers to collectively agree that the time for change has come.” To learn more about the initiative, click here. To read the supporting paper, click here. (H/T to Political Science Replication)
Project TIER (Teaching Integrity in Empirical Research) is one of the many initiatives launched within the last several years—a number of which have been featured in previous TRN guest blogs—that seek to strengthen standards of research transparency in the social sciences. Its mission statement reads:
Project TIER’s mission is to promote a systemic change in the professional norms related to the transparency and reproducibility of empirical research in the social sciences. It is guided by the principle that providing comprehensive replication documentation for research involving statistical data should be as ubiquitous and routine as it is to provide a list of references cited. Authors should view this documentation as an essential component of how they communicate their research to other scholars, and readers should not consider a study to be credible unless such documentation is available.
We will know this mission has been accomplished when failing to provide replication documentation for an empirical study is considered as aberrant as writing a theoretical paper that does not contain proofs of the propositions, an experimental paper that does not describe the treatment conditions, or a law review article that does not cite legal statutes or judicial precedents.
Project TIER’s approach to promoting research transparency is distinctive in two ways: it focuses on the education of social scientists early in their training, and it emphasizes the things that authors of research papers can do ensure that interested readers are able to replicate their empirical results without undue difficulty. Both of these features reflect the circumstances that led to the conception of Project TIER and in which the initiative has evolved.
The ideas that eventually grew into Project TIER began taking shape in an introductory course on statistical methods for undergraduates majoring in economics at Haverford College. Richard Ball, an economics faculty member, was the instructor for the course, and Norm Medeiros, a librarian, collaborated closely in the advising of students conducting research projects required for the class. For those projects, students chose the topics they investigated, found statistical data that could shed light on the questions they were interested in investigating, examined and analyzed the data in simple ways, and then wrote complete papers in which they presented and interpreted their results.
When this research project was introduced as a requirement for the course in 2001, the initial results were not encouraging. The papers students turned in were, to put it mildly, less than completely transparent. Their descriptions of the original data they had used and the sources from which those data had been obtained, of how the original data were cleaned and processed to create the final data sets used for the analyses, and of how the figures and tables presented in the papers were generated from the final data sets, were incoherent. In most cases it was impossible to understand the empirical work underlying the papers or to evaluate it in a constructive way.
To address this problem, we began requiring students to turn in not only printed copies of their research papers, but also to submit electronic documentation consisting of their data, code and some supporting information. We found, however, that developing a workable set of guidelines for the required documentation presented some challenges: they needed to be detailed and explicit enough that students would know unambiguously what was expected of them, and they needed to be general enough that they would be applicable across the varied types of data and analyses encountered in these projects; but they also needed to be short, simple and clear enough that it would be realistic to expect students to understand and implement them. It took a number of iterations to formulate guidelines that met these challenges, but over the course of several semesters we arrived at a set of written instructions that proved to be adequate. In the past ten years or so it has become routine for our students to follow those instructions for constructing replication documentation to accompany their research papers.
Requiring students to turn in comprehensive replication documentation with their research papers has solved the problem that led us to introduce the requirement: if any aspect of the data processing and analysis is not explained adequately in text of a paper, it is possible to discover exactly what the students did simply by reading and running their code. But a number of other benefits have followed as well. When students know that they have to document their statistical work, and are given some guidance for doing so, they themselves understand better what they are doing. And when they understand what they are doing, the analyses they choose to conduct tend to make more sense, and the explanations of what they did that they give in their papers tend to be much more coherent. Moreover, throughout the entire course of the semester in which students work on a project, their instructors are able to advise them much more effectively than would be possible if students did not keep their data organized and systematically record their work in command files.
Most fundamentally, placing upon students the responsibility to ensure that their work is reproducible (and teaching them some tools for achieving this goal) reinforces the principle that one should not make claims that cannot be verified or whose validity is in doubt; allowing students to turn in a paper based on work that they cannot reproduce undermines this principle. This principle applies broadly across academic disciplines, but it is particularly important to convey to beginning students of statistics, many of whom hold the prior belief that manipulation and obfuscation are inherent in statistical analysis.
After developing a simple and effective way of teaching students to document statistical research and observing the benefits that follow from doing so, we decided it would be worthwhile to share our experiences with others. We began in 2011 by presenting a paper at the first annual Conference on Teaching and Research in Economic Education (CTREE), organized by the American Economic Association Committee on Education, and that paper later appeared in the Journal of Economic Education.[1]
Positive responses to these and other early outreach efforts led us to launch Project TIER in 2013. The activities we have undertaken since then include
– A series of workshops for social science faculty interested in introducing principles and methods of transparent research in classes they teach on quantitative methods. The next Faculty Development Workshop will take place April 1-2, 2016, on the Haverford College campus. These workshops are offered free of charge; information and applications are now available.
– A program of year-long fellowships, in which faculty who have already made significant contributions collaborate with us in the development and dissemination of new curriculum and approaches to teaching transparent research methods. We are currently working with five Fellows nominated for 2015-16, and have begun recruiting for the 2016-17 cohort of TIER Faculty Fellows, for which information and applications are also available.
– Workshops offered to doctoral students in doctoral programs in the social sciences, offering practical guidance on research documentation and workflow management in the course of writing an empirical dissertation. We will be conducting a workshop at Duke University, for economics graduate students, on February 12, 2016. Thanks to a generous grant, we can offer these workshops free of charge, and we would be happy to consider requests from other graduate departments interested in hosting a workshop.
TO LEARN MORE
Please note that we are working on a completely redesigned website, which will have a new URL: www.projecttier.org. At the time this blog is being posted, this URL is not yet active, but it will launch in the spring of 2016.
Follow us on Twitter: @Project_TIER
[1] Ball, R. and N. Medeiros (2012). Teaching Integrity in Empirical Research: A Protocol for Documenting Data Analysis and Management. Journal of Economic Education, 43(2), 182–189.
In a blog posted at BITSS (Berkeley Initiative for Transparency in the Social Sciences), TRN co-founder Bob Reed explains the motivation behind TRN. While there is an increasing (better, slowly increasing!) number of journal outlets for replication studies, the real surprise has been how few economists submit replication research to journals. Reed quotes Joop Hartog’s experience at Labour Economics as an example of “they built it, but nobody came.” To read more, click here.
Lately, there has been a lot of attention for the excess of false positive and exaggerated findings in the published scientific literature. In many different fields there are reports of an impossibly high rate of statistically significant findings, and studies of meta-analyses in various fields have shown overwhelming evidence for overestimated effect sizes.
The suggested solution for this excess of false postive findings and exaggerated effect size estimates in the literature is replication. The idea is that if we just keep replicating published studies, the truth will come to light eventually.
This intuition also showed in a small survey I conducted among psychology students, social scientists, and quantitative psychologists. I offered them different hypothetical combinations of large and small published studies that were identical except for the sample size – they could be considered replications of each other. I asked them how they would evaluate this information if their goal was to obtain the most accurate estimate of a certain effect. In almost all of the situations I offered, the answer was almost unanimously: combine the information of both studies.
This makes a lot of sense: the more information the better, right? Unfortunately this is not necessarily the case.
The problem is that the respondents forgot to take into account the influence of publication bias: statistically significant results have a higher probability of being published than non-significant results. And only publishing significant effects leads to overestimated effect sizes in the literature.
But wasn’t this exactly the reason to take replication studies into account? To solve this problem and obtain more accurate effect sizes?
Unfortunately, there is evidence from multi-study papers and meta-analyses that replication studies suffer from the same publication bias as original studies (see below for references). This means that both types of studies in the literature contain overestimated effect sizes.
The implication of this is that combining the results of an original study with those of a replication study could actually worsen the effect size estimate. This works as follows.
Bias in published effect size estimates depends on two factors: publication bias and power (the probability that you will reject the null hypothesis, given that it is false). Studies with low power (usually due to a small sample size) contain a lot of noise, and the effect size estimate will be all over the place, ranging from severe underestimations to severe overestimations.
This in itself is not necessarily a problem; if you would take the average of all these estimates (e.g., in a meta-analysis) you would end up with an accurate estimate of the effect. However, if because of publication bias only the significant studies are published, only the severe overestimations of the effect will end up in the literature. If you would calculate an average effect size based on these estimates, you will end up with an overestimation.
Studies with high power do not have this problem. Their effect size estimates are much more precise: they will be centered more closely on the true effect size. Even when there is publication bias, and only the significant (maybe slightly overestimated) effects are published, the distortion would not be as large as with underpowered, noisier studies.
Now consider again a replication scenario such as the one mentioned above. In the literature you come across a large original study and a smaller replication study. Assuming that both studies are affected by publication bias, the original study will probably have a somewhat overestimated effect size. However, since the replication study is smaller and has lower power, it will contain an effect size that is even more overestimated. Combining the information of these two studies then basically comes down to adding bias to the effect size estimate of the original study. In this scenario it would render a more accurate estimation of the effect if you would only evaluate the original study, and ignored the replication study.
In short: even though a replication will increase precision of the effect size estimate (a smaller confidence interval around the effect size estimate), it will add bias if the sample size is smaller than the original study, but only if there is publication bias and the power is not high enough.
There are two main solutions to the problem of overestimated effect sizes.
The first solution would be to eliminate publication bias; if there is no selective publishing of significant effects, the whole “replication paradox” would disappear. One way to eliminate publication bias is to preregister your research plan and hypotheses before collecting the data. Some journals will even review this preregistration, and can give you an “in principle acceptance” – completely independent of the results. In this case, studies with significant and non-significant findings have an equal probability of being published, and published effect sizes will not be systematically overestimated. Another way is for journals to commit to publishing replication results independent of whether the results are significant. Indeed, this is the stated replication policy of some journals already.
The second solution is to only evaluate (and perform) studies with high power. If a study has high power, the effect size estimate will be estimated more precisely and less affected by publication bias. Roughly speaking: if you discard all studies with low power, your effect size estimate will be more accurate.
A good example of an initiative that implements both solutions is the recently published Reproducibility Project, in which 100 psychological effects were replicated in studies that were preregistered and high powered. Initiatives such as this one eliminates systematic bias in the literature and advances the scientific system immensely.
However, before preregistered, highly powered replications are the new standard, researchers that want to play it safe should change their intuition from “the more information, the higher the accuracy,” to “the more power, the higher the accuracy.”
This blog is based on the paper “The replication paradox: Combining studies can decrease accuracy of effect size estimate” (2015) by Nuijten, van Assen, Veldkamp, Wicherts (2015). Review of General Psychology, 19 (2), 172-182.
LITERATURE ON HOW REPLICATIONS SUFFER FROM PUBLICATION BIAS:
Francis, G. (2012). Publication bias and the failure of replication in experimental psychology. Psychonomic Bulletin & Review, 19(6), 975-991.
Ferguson, C. J., & Brannick, M. T. (2012). Publication bias in psychological science: Prevalence, methods for identifying and controlling, and implications for the use of meta-analyses. Psychological Methods, 17, 120-128.
You must be logged in to post a comment.