Making Replications Mainstream: It Takes a Research Community

Just in case you missed it, the latest issue of Behavioral and Brain Sciences includes an article by Rolf Zwaan, Alexander Etz, Richard Lucas, and Brent Donnellan entitled “Making Replications Mainstream”. It is something of a tour-de-force by four prominent scholars in the area of psychology and cognitive science. The article is organized around 6 “concerns”. Each concern addresses a concern or criticism of replication, which the authors then respond to.
The six concerns are:
Concern I: Context is too variable
Concern II: The theoretical value of direct replications is limited
Concern III: Direct replications are not feasible in certain domains
Concern IV: Replications are a distraction
Concern V: Replications affect reputations
Concern VI: There is no standard method to evaluate replication results
Together, the discussion of the respective concerns, along with the authors’ responses, provides a thorough introduction to the current state of play of replications in psychology/cognitive science.
But wait! There’s more!
Following the main article are thirty-six (36!) commentaries by a constellation of academic stars all weighing in on various aspects of the appropriate place of replication in psychological science. Many of the names will be familiar to even the most distant observers of the replication debate: Daniel Kahneman, John Ioannidis, Andrew Gelman, and many, many others. How many others? Eight-one!
Counting the original four authors, this collection of main article and commentaries consists of the combined efforts of 85 scholars. Thus, in both quality and quantity, the discussion testifies to the serious attention that replication is receiving in psychology and cognitive science.
Economists should blush with shame given their meagre efforts in comparison.
To read the article and corresponding commentaries, click here (but note that the article is behind a paywall).

IN THE NEWS: Science (July 31, 2018)

[From the article, “Plan to replicate 50 high-impact cancer papers shrinks to just 18” by Jocelyn Kaiser, published in Science magazine]
“An ambitious project that set out nearly 5 years ago to replicate experiments from 50 high-impact cancer biology papers, but gradually shrank that number, now expects to complete just 18 studies.”
“‘I wish we could have done more,’ says biologist Tim Errington, who runs the project from the Center for Open Science in Charlottesville, Virginia. But, he adds, ‘There is an element of not truly understanding how challenging it is until you do a project like this.'”
To read more, click here.

How to Get Something from Nothing. Or, “Yes, Virginia, You Can Do Ex-post Power Analyses”.

[From the article, “The effect of the conservation reserve program on rural economies: Deriving a statistical verdict from a null finding” by Jason Brown, Dayton Lambert, and Timothy Wojan, recently published in the American Journal of Agricultural Economics]
“This article suggests two methods for deriving a statistical verdict from a null finding, allowing economists to more confidently conclude when “not significant” can be interpreted as “no substantive effect.” The example used to demonstrate the methods is the Economic Research Service’s (ERS) 2004 Report to Congress on the economic implications of the Conservation Reserve Program (CRP).”
“…The ERS study … supported a cautious conclusion of “no evidence of negative employment impacts from CRP.” However, the report correctly noted that the “absence of evidence is not evidence of absence.” Rather, the statistical power of the test was unknown.”
“… we develop approaches to generate ex post statistical power estimates to supplement the interpretation of nonsignificant findings. The first approach uses a bootstrap resampling-with-replacement procedure. The second approach is Bayesian, estimating power based on posterior marginal distributions of posited effect sizes.”
“…In many circumstances, economists do not have the opportunity to conduct ex ante power analysis before research starts. The approaches we suggest can be used to determine ex post power for univariate analyses or multivariate regressions if the data-generating process can be replicated and if the effect size of economic significance or policy relevance is stated.”
To read the full article, click here (but note that it is behind a paywall).

The Truth is Out There. It’s Just Not Very Likely.

[From the paper, “Perceived Crisis and Reforms: Issues, Explanations, and Remedies”, authored by Paul De Boeck and Minjeong Jeon, published in the July issue of Psychological Bulletin]
“…we believe that the OSC [Open Science Collaboration] study allows us to obtain an indication of the true discovery rate (TDR) for the rejections in the original studies and of π0 [the probability of no effect] more in general if the effects investigated in the OSC study are representative.”
“…a cautious conclusion is that a substantial proportion of positive binary inferences are true (35.7% or more) and that another substantial proportion includes either zero or very small effects.”
“…Based on the OSC study the probability of the null hypothesis being true can in fact be estimated if again a number of assumptions are made. Based on the replication studies the TDR for the original studies turned out to be .357 when assuming that the replication study power is .92, while the TDR is .691 when assuming that the replication study power is 0.50. We now can tentatively estimate π0 for the broader set of studies from which the studies selected for the OSC replications are a representative subset.”
“…Starting from a TDR of .357, …, for a power value of .20 [of the original studies], π0 = .878, and for a power value of .50, π0 = .947.
“…Starting from a TDR of .691, …, for a power value of .20 of the original studies, π0 = .641, and for a power value of .50, π0 = .817. For higher power levels even higher estimates of π0 are obtained.”
“One cannot expect that significant effects (given that most published effects are significant) replicate well if they relate to unlikely hypotheses, and neither can one expect that they replicate well if they depend on perhaps subtle and not so evident differences in the context and when they depend on a complex interplay of factors.”
To access the full article, click here (but note the article is behind a paywall).

REED: How “Open Science” Can Discourage Good Science, And What Journals Can Do About It

In a recent tweet (or series of tweets) Kaitlyn Werner shares her experience of having a paper rejected after she posted all her data and code and submitted her paper to a journal. The journal rejected the paper because a reviewer looked over the data and had “a hunch” that there was a mistake.
Werner states that she was just about to change her stance on open science when, after several checks of her data and code, she realized the reviewer was right. There was a mistake in the coding of the data.
The lesson the author learned from this experience?:
“Fortunately, I think this error will actually make my paper a lot stronger. And as upset that I am about the 3 months of review that are now lost, I am happy to know that you didn’t publish a misleading paper. And from now on, I will always share my data.”
To read her full set of tweets, click here.
But there is another lesson here. If papers with data and code are more likely to be rejected (because they have more things that reviewers can find fault with), then they face a higher standard of getting published. If one believes that making data and code public makes researchers more careful, and the associated research is higher quality and more likely to be “true”, then “open science” will enable discrimination against higher quality research, and tilt the playing field towards lower quality research.
In this particular case, the journal’s actions were not compatible with good science.
If journals don’t want to discourage good science, and if some papers submit data and code and others do not, then at the very least, the journal should create a level playing field. Papers with data and code should not face a higher threshold of acceptance than papers without data and code.
One way they could do that is to inform their reviewers that they should never reject a paper based on the data and code. If a reviewer finds a mistake, but the rest of the paper seems publishable, the journal should allow the author to resubmit their research with corrected data and code.
Further, if journals wanted to tilt the playing field in favor of good science, they could build in a higher probability of acceptance for papers that supplied data and code. This is a reasonable policy for a journal to follow if one believes that these papers will tend to be higher quality: Researchers who make their data and code transparent know that they run a higher risk of having their mistakes uncovered. As a result, they will go to extra lengths to make sure their research is mistake-free and “true”.
Kaitlyn Werner is a noble scientist who cares about truth more than getting a publication in a prestigious journal. The lesson she drew from her experience made her more committed to open science.
However, if open science is to lead to better science, journals are going to have to figure out how to avoid penalizing open science practices.
Bob Reed is Professor of Economics at the University of Canterbury in New Zealand and co-founder of The Replication Network.  He can be contacted at bob.reed@canterbury.ac.nz.

MENCLOVA: SURE Journal Is Now Open For Submissions!

Is the topic of your paper interesting, your data appropriate and your analysis carefully done – but your results are not “sexy”? If so, please consider submitting your paper to the Series of Unsurprising Results in Economics. SURE is an e-journal of high-quality research with “unsurprising”/confirmatory findings.
This is how it works:
– We accept papers from all fields of Economics…
– Which have been rejected at a journal indexed in EconLit…
– With the ONLY important reason being that their results are statistically insignificant or otherwise “unsurprising”.
SURE is an open-access journal and there are no submission charges.
SURE benefits readers by:
– Mitigating the publication bias and thus complementing other journals in an effort to provide a complete account of the state of affairs;
– Serving as a repository of potential (and tentative) “dead ends” in Economics research.
SURE benefits writers by:
– Providing an outlet for interesting, high-quality, but “risky” (in terms of uncertain results) research projects;
– Decreasing incentives to data-mine, change theories and hypotheses ex post or exclusively focus on provocative topics.
We hope you will consider SURE as an outlet for your work and look forward to hearing from you!
To learn more about SURE, click here.
SURE Editorial board
Karen S. Conway (University of New Hampshire), Hope Corman (Rider University), John Gibson (University of Waikato), David Giles (University of Victoria), John Landon-Lane (Rutgers University), Nicholas Mangee (Georgia Southern University), Andrea K. Menclova (University of Canterbury), W. Robert Reed (University of Canterbury), Steven Stillman (Free University of Bozen-Bolzano), Edinaldo Tebaldi (World Bank), Robert S. Woodward (University of New Hampshire)

IN THE NEWS: NY Times (July 16, 2018)

[From the article “Psychology Itself Is Under Scrutiny” by Benedict Carey, published in the NY Times
“The urge to pull down statues extends well beyond the public squares of nations in turmoil. Lately it has been stirring the air in some corners of science, particularly psychology.”
“…since 2011, the psychology field has been giving itself an intensive background check, redoing more than 100 well-known studies. Often the original results cannot be reproduced, and the entire contentious process has been colored, inevitably, by generational change and charges of patriarchy.”
“Still, the study of human behavior will never be as clean as physics or cardiology — how could it be? — and psychology’s elaborate simulations are just that. At the same time, its findings are far more accessible and personally relevant to the public than those in most other scientific fields.”
“Psychology has millions of amateur theorists who test the findings against their own experience. The public’s judgments matter to the field, too.”
“It is one thing to frisk the studies appearing almost daily in journals that form the current back-and-forth of behavior research. It is somewhat different to call out experiments that became classics — and world-famous outside of psychology — because they dramatized something people recognized in themselves and in others.”
To read more, click here.

IN THE NEWS: The Guardian (July 11, 2018)

[From the  article, “MPs want new watchdog to root out research misconduct ” by Ian Sample, published at www.theguardian.com]
“A national watchdog that has the power to punish British universities for failing to tackle research misconduct is needed to ensure that sloppy practices and outright fraud are caught and dealt with fast, MPs say.”
“The new body would rule on whether universities have properly investigated allegations of malpractice and have the authority to recommend research funds be withdrawn or even reclaimed when it finds that inquiries into alleged wrongdoing have fallen short.”
To read more, click here.

Replication and Transparency in Political Science: An Update

[From the blog, “Replication and transparency in political science – did we make any progress? “by Nicole Janz, published at Political Science Replication]
“When a range of top political science journals signed a statement to enforce transparency in 2014 (JETS statement), there was an immediate backlash by qualitative researchers. Hundreds of scholars signed a petition against strict transparency rules asking for clarification. Then the LaCour scandal happened, where a political scientist fabricated a study and pretended to withhold his data because of confidentially. Another wake-up call. Where is the debate in political science now?”
To read more, click here.

“Cargo-Cult” Statistics. What Can Statisticians Do?

[From the blog “Cargo-cult statistics and scientific crisis” by Philip Stark and Andrea Saltelli, published by Significance magazine]
“Poor practice is catching up with science, manifesting in part in the failure of results to be reproducible and replicable. Various causes have been posited, but we believe that poor statistical education and practice are symptoms of and contributors to problems in science as a whole.”
“The problem is one of cargo-cult statistics – the ritualistic miming of statistics rather than conscientious practice. This has become the norm in many disciplines, reinforced and abetted by statistical education, statistical software, and editorial policies.”
“Statisticians can help with important, controversial issues with immediate consequences for society. We can help fight power asymmetries in the use of evidence. We can stand up for the responsible use of statistics, even when that means taking personal risks.”
“We should be vocally critical of cargo-cult statistics, including where study design is ignored, where p-values, confidence intervals and posterior distributions are misused, and where probabilities are calculated under irrelevant, misleading assumptions. We should be critical even when the abuses involve politically charged issues, such as the social cost of climate change. If an authority treats estimates based on an ad hoc collection of related numerical models with unknown, potentially large systematic errors as if they were a random sample from a distribution centred at the parameter, we should object – whether or not we like the conclusion.”
“We can insist that “service” courses foster statistical thinking, deep understanding, and appropriate scepticism, rather than promulgating cargo-cult statistics.”
To read more, click here.