[From the blog post, “What Is Preregistration For?” by Neuroskeptic, published at Discover Magazine]
“The paper reports on five studies which all address the same general question. Of these, Study #3 was preregistered and the authors write that it was performed after the other four had been completed. It was also larger than the others. The results of Study #3 closely matched the other studies’. So far, so good.”
“However, according to Daniël Lakens on Twitter (I’m not sure how he knows this), Study #3 was conducted on the instruction of the editors (during peer review).”
“Now, this is where alarm bells started ringing for me. If Psychological Science asked the authors of this paper to carry out Study #3, the reason, presumably, is that they weren’t fully convinced by the other studies. The journal wanted more evidence for the hypothesis that ‘participants in a dimly lit room or wearing sunglasses tended to estimate a lower risk of catching contagious diseases.’ That’s understandable, but what would the editors have done if the results of Study #3 had come back negative?”
[From the blog post, “Scientists Rarely Admit Mistakes. A New Project Wants to Change That.” by Dalmeet Singh Chawla, published at Undark]
“IN SEPTEMBER 2016, the psychologist Dana Carney came forward with a confession: She no longer believed the findings of a high-profile study she co-authored in 2010 to be true. The study was about “power-posing” — a theory suggesting that powerful stances can psychologically and physiologically help one when under high-pressure situations. Carney’s co-author, Amy Cuddy, a psychologist at Harvard University, had earned much fame from power poses, and her 2012 TED talk on the topic is the second most watched talk of all time.”
“Carney, now based at the University of California, Berkeley, had, however, changed her mind. “I do not believe that ‘power pose’ effects are real,” she wrote on her website in 2016.”
“…Carney’s change of heart and the pointed questions surrounding power pose research typifies the replication crisis that has dogged fields like social psychology for years. Some researchers have gone so far as to suggest that most published research findings are false.”
“Of course, some researchers have argued that the replication crisis is exaggerated. But even if that is the case, there really is no effective way for scientists to quickly and publicly inform colleagues that they are no longer confident in their published work. Public declarations like Carney’s are one way to go, but they are often difficult to track. So an ambitious new effort, motivated by Carney’s move, is encouraging psychologists to own up to shortcomings in their published work via a website in the form of official loss-of-confidence statements — published at a single online clearinghouse for such confessions called the Loss of Confidence Project.”
[From the blog post, “HARPing: Hedging After a Replication is Proposed” by Rich Lucas at The Desk Reject]
“Now, we are all sensitized to the fact that you’re not supposed to “HARK”—it is problematic to hypothesize after results are known (Kerr 1998). Once we know how things have turned out, it is easy to come up with a post hoc explanation as to why it happened that way. … a closely related problem [is] … the tendency to HARP, or to Hedge After a Replication is Proposed. Once a study has been selected for replication, original authors often suddenly develop skepticism about the importance or quality of the particular study that has been chosen for replication.”
“Why does this matter? … specific study features that were plenty good enough to get these studies published the first time around suddenly become problematic when incorporated into a replication study. For example, my colleagues and I had a replication study rejected from the same journal in which the original study had been published after the original author reviewed our paper and criticized a critical design feature that was included in the original study! On other another occasion, we had a lengthy e-mail discussion with an author about how to replicate one of his previous studies. Although he was more than willing to tell us the specific ways our replication attempt could go wrong, he was never willing to say how we could get it right. In short, he was hedging so strongly about the original study that one could never challenge the original result. This is one of the reasons why I don’t think we should insist that replicators work with original authors when designing replication studies.”
[From the blog post, “Can We Science Our Way out of the Reproducibility Crisis?” by Hilda Bastian at PLOS Blogs]
“Many studies are so thin on details, they’re unverifiable, unusable, or both. Many are too small, badly designed, or otherwise unreliable – and then the results are misinterpreted, the validity exaggerated. Many aren’t published, especially if the results aren’t favorable. It’s the scale of these problems, compounding over the years, that constitutes a reproducibility crisis.”
“Weak science, harmful policies, and counterproductive work practices burrowed us into this hole. It’s all fueled by unexamined assumptions, cherry-picked data, and anecdote-driven beliefs – and even the way we discuss and try to tackle non-reproducibility can be like that. It’s the opposite of the way scientists are meant to approach the world.”
“We need to science our way through this – not just with more rigorous methods in research and reporting it, but with evidence-based policies and evaluation as well.”
It is common for authors of empirical studies to use the conclusion of their paper to summarize their empirical findings, without explicitly discussing why their results might differ from previous studies, nor suggesting ways to resolve observed discrepancies. In a recent blog, Arnaud Vaganay suggests the following, and while it is targeted towards replications, it applies to any empirical study that addresses a topic that has been studied by previous literature.
“A reproducible discussion includes two main steps. The first step is the systematic comparison of your results with results from previous studies (as mentioned above). Inasmuch as possible, results should be compared head-to-head using both: (i) Unstandardized values: by comparing the direction and statistical significance of your results with the same quantities in previous studies; (ii) Standardized values: by comparing the magnitude/size of your effect with the magnitude/size of effects in previous studies. Ideally, an additional test should assess whether the difference between these effects is statically significant.”
“If your results cannot be directly compared (for example because your study analysed the data in a novel way), you should clearly mention it and invite further replications. As previously mentioned, it is through replication that the credibility of a theory can be ascertained.”
“The second step consists in correctly interpreting findings. If your results are in line with previous results, the effect is robust and the theory is corroborated (assuming no p-hacking of course). If the results are significantly different, the plausibility of the following scenarios should be discussed: (i) Your study differs significantly in terms of analysis …; (ii) Your study differs significantly in terms of intervention/independent variable … ; (iii) Your study differs significantly in terms of sample; (iv) Your study differs significantly in terms of social, cultural or institutional context.”
“Ideally: (i) These hypotheses should be tested in subsequent studies; (ii) These two steps should be pre-registered and any change to the original protocol flagged and justified.”
[From the opinion article, “Undergrads Can Improve Psychology” by Russell Warne and Jordan Wagge, published at http://www.wsj.com]
“A lot of what we think we know about human psychology is bunk. That’s because experimental psychology has a “replication crisis”: Too many studies, when repeated, fail to produce the same results.”
“Here’s a solution: Enlist students to perform replications as part of scientific training. Almost every undergraduate and graduate student studying psychology must take a course in research methods. They can learn by attempting to replicate earlier studies.”
“The Collaborative Replications and Education Project follows this model. CREP is a crowdsourcing project in which highly cited studies in psychology are selected and posted online for teams of undergraduates to try to replicate. Each step of the process is reviewed by experts, and results from multiple sites are pooled and published as a combined study called a meta-analysis.”
“Students respond positively to performing replications, which is not surprising—studies have shown that research projects positively influence student engagement and retention. What makes CREP unique is its adherence to the principles of good science (such as publicly documenting all aspects of the research process) and the important service to the field that each direct replication provides. Future scholars learn how to do things the right way early, and the results from their projects serve a public purpose beyond pedagogy.”
To read more, click here (NOTE: this article is behind a paywall).
FYI: Readers interested in the meta-analysis aspect of this article may also find the website Curate Science of interest.
[From the article “Randomly auditing research labs could be an affordable way to improve research quality: A simulation study” by Adrian Barnett, Pauline Zardo, and Nicholas Graves, published at PLoS One]
“The “publish or perish” incentive drives many researchers to increase the quantity of their papers at the cost of quality. Lowering quality increases the number of false positive errors which is a key cause of the reproducibility crisis. We adapted a previously published simulation of the research world where labs that produce many papers are more likely to have “child” labs that inherit their characteristics. This selection creates a competitive spiral that favours quantity over quality. To try to halt the competitive spiral we added random audits that could detect and remove labs with a high proportion of false positives, and also improved the behaviour of “child” and “parent” labs who increased their effort and so lowered their probability of making a false positive error. Without auditing, only 0.2% of simulations did not experience the competitive spiral, defined by a convergence to the highest possible false positive probability. Auditing 1.35% of papers avoided the competitive spiral in 71% of simulations, and auditing 1.94% of papers in 95% of simulations. … Audits improved the literature by reducing the number of false positives from 30.2 per 100 papers to 12.3 per 100 papers.”
[From the “2018 Economics Replication Project” posted by Nick Huntington-Klein and Andy Gill of California State University, Fullerton]
“In this project, we are asking recruited researchers to perform a “blind” replication of one of two studies. Without telling researchers the methods used by the original study, we will instruct participants to use a particular data set and set of statistical assumptions in order to estimate a single specific causal estimate. Participants will clean the data, construct variables, and make the other minor decisions that go into a statistical analysis, aside from the data source, identifying assumption, and effect of interest, which will be held constant. By comparing the analyses that different researchers perform under these conditions, we will estimate the variability in estimates that occurs as a result of decisions that researchers make.”
“This approach is different from most replication studies in economics. We are not trying to test the validity of the original results. Instead, our aim is to measure the degree of variation in results that can be attributed to generally “invisible” features of analysis. You may have seen similar tests elsewhere, such as in the New York Times’ The Upshot section. Our project is most similar to the “Crowdsourced Data Analysis” project described by Raphael Silberzahn and Eric Uhlmann here, although our goal is slightly different.”
“If you are interested in joining us, we are looking for researchers who have published at least one published or forthcoming paper in the empirical microeconomics literature and who are familiar with methods of causal identification. Participants will be offered authorship on the final publication. We are also currently working on securing funding. If we do, there may be financial compensation for your time.”
The Journal of Development Economics (JDE) is piloting a new approach in which authors have the opportunity to submit empirical research designs for review and approval before the results of the study are known. While the JDE is the first journal in economics to implement this approach—referred to as “pre-results review”—it joins over 100 other journals from across the sciences.
What is Pre-Results Review?
Pre-results review splits the peer review process into two stages (see Figure 1 below). In Stage 1, authors submit a plan for a prospective research project, typically including a literature review, research question(s), hypotheses, and a detailed methodological framework. This submission is evaluated based on the significance of the research question(s), the soundness of the theoretical reasoning, and the credibility and feasibility of the research design.
Positively evaluated submissions are accepted based on pre-results review. This constitutes a commitment by the journal to publish the full paper, regardless of the nature of the empirical results. Authors will then collect and analyze their data, and submit the final paper for final review and publication (Stage 2). The final Stage 2 review provides quality assurance and ensures alignment with the research design peer reviewed in Stage 1.
Why Pre-Results Review?
In development economics, we have long argued for the use of rigorous evidence to inform decisions about public policies. However, incentives in academia and journal publishing often reward studies featuring novel, theoretically tidy, or statistically significant results. Papers that fail to report such findings often go unpublished, even if the studies are of high quality and address important questions. As a result, we are left with an evidence base comprised of papers that tell ‘neat’ and clean stories, but may not accurately represent the world. When such research serves as the foundation for public policies, this publication bias can be costly.
In recent years, pre-results review has emerged as potential alternative model to address publication bias. We hope that this pilot will help us understand the effectiveness of this approach and its sustainability for both the JDE and other social science journals.
What’s in It For You?
– Publication decision earlier in the peer review process;
– Constructive feedback from peer reviewers earlier in the publishing process, with the potential for helpful suggestions for research design before beginning data collection;
– Editorial decisions that are not influenced by the results of a study;
– Inclusion of JDE “acceptance based on pre-results review” on author’s CVs; and
– The chance to be part of an exciting pilot effort in economics!
How to Submit
Submissions should be filed as ‘Registered Reports’ on the JDE’s regular submissions portal.
All submissions in this format will follow existing JDE policies, including the Mandatory Replication Policy. For guidelines specific to pre-results review, please see the JDE Registered Reports Author Guidelines.
Need Help?
The Berkeley Initiative for Transparency in the Social Sciences (BITSS) supports authors with pre-registering their research designs and preparing JDE submissions. Please contact Aleks Bogdanoski at abogdanoski@berkeley.edu with any questions.
Established by the Center for Effective Global Action (CEGA) in 2012, the Berkeley Initiative for Transparency in the Social Sciences (BITSS) works to strengthen the integrity of social science research and evidence used for policy-making. The initiative aims to enhance the practices of economists, psychologists, political scientists, and other social scientists in ways that promote research transparency, reproducibility, and openness. Visit www.bitss.org and @UCBITSS on Twitter to learn more, find useful tools and resources, and contribute to the discussion.
[From the article, “One team’s struggle to publish a replication attempt, part 3” by Mante Nieuwland, published at Retraction Watch]
“The purpose of this post was to provide a transparent, behind-the-scenes account of our replication study and what happened when we submitted our study to Nature Neuroscience. On the one hand, I can understand why Nature journals might be hesitant to publish replication studies. It might open the floodgates to a wave of submissions that challenge conclusions from publications in their journal (although that in itself is not necessarily a bad thing).”
“On the other hand, a few things from this case study stand out by clearly contradicting Nature’s commitment to replication and transparency. Nature Neuroscience triaged our study for lack of general interest, failed to follow their own submission procedure in terms of timeline, failed to follow their own policy on data and materials sharing, failed to correct important omissions in the academic record of the original study, and failed to provide, in my opinion, a fair review process (i.e. by relying on one reviewer who faulted us for the lack of clarity due to the original paper, and on one non-expert reviewer who mostly just questioned our intentions and disagreed with the publication format).”
To read the full account, starting from the beginning, click here.
For economics journals that explicitly state they publish replications, click here.
To see a list of replication studies that have actually been published by economics journals, click here.
You must be logged in to post a comment.