MENCLOVA: Is it Time for a Journal of Insignificant Results?

It is well known that there is a bias towards publication of statistically significant results. In fact, we have known this for at least 25 years since the publication of De Long and Lang (JPE 1992):
“Economics articles are sprinkled with very low t-statistics – marginal significance levels very close to one – on nuisance coefficients. […] Very low t-statistics appear to be systematically absent – and therefore null hypotheses are overwhelmingly false – only when the universe of null hypotheses considered is the central themes of published economics articles. This suggests, to us, a publication bias explanation of our findings.” (pp. 1269-1270)
While statistically insignificant results are less “sexy”, they are often not less important. Failure to reject the null hypothesis can be interesting in itself, is a valuable data point in meta-analyses, or can indicate to future researchers where they are unlikely to find an effect. As McCloskey (2002) famously puts it:
“[…] statistical significance is neither necessary nor sufficient for a result to be scientifically significant.” (p. 54)
This problem is not unique to Economics but several other disciplines have moved faster than us to try and address it. For example, the following disciplines already have journals dedicated to publishing “insignificant” results:
Psychology: Journal of Articles in Support of the Null Hypothesis
Biomedicine: Journal of Negative Results in Biomedicine
Ecology and Evolutionary Biology: Journal of Negative Results
Is it time for Economics to catch up? I suggest it is and I know that I am not alone in this view. In fact, a number of prominent Economists have endorsed this idea (even if they are not ready to pioneer the initiative). So, imagine… a call for papers along the following lines:
Series of Unsurprising Results in Economics (SURE)
Is the topic of your paper interesting, your analysis carefully done, but your results are not “sexy”? If so, please consider submitting your paper to SURE. An e-journal of high-quality research with “unsurprising” findings.
How does it work:
— We accept papers from all fields of Economics…
— Which have been rejected at a journal indexed in EconLit…
— With the ONLY important reason being that their results are statistically insignificant or otherwise “unsurprising”.
To document that your paper meets the above eligibility criteria, please send us all referee reports and letters from the editor from the journal where your paper has been rejected.  Two independent referees will read these reports along with your paper and evaluate whether they indicate that: 1. the paper is of high quality and 2. the only important reason for rejection was the insignificant/unsurprising nature of the results.  Submission implies that you (the authors) give permission to the SURE editor to contact the editor of the rejecting journal regarding your manuscript.
SURE benefits writers by:
— Providing an outlet for interesting, high-quality, but “risky” (in terms of uncertain results) research projects;
— Decreasing incentives to data-mine, change theories and hypotheses ex post, exclusively focus on provocative topics.
SURE benefits readers by:
— Mitigating the publication bias and thus complementing other journals in an effort to provide a complete account of the state of affairs;
— Serving as a repository of potential (and tentative) “dead ends” in Economics research.
Feedback is definitely invited! Please submit your comments here or email me at
Andrea Menclova is a Senior Lecturer at the University of Canterbury in New Zealand.
De Long J. Bradford and Kevin Lang. 1992. “Are all Economic Hypotheses False?” Journal of Political Economy, 100:6, pp.1257-1272
McCloskey, Deirdre. 2002. The Secret Sins of Economics. Prickly Paradigm Press, Chicago.


GELMAN: Some Natural Solutions to the p-Value Communication Problem—And Why They Won’t Work

[NOTE: This is a repost of a blog that Andrew Gelman wrote for the blogsite Statistical Modeling, Causal Inference, and Social Science].
Blake McShane and David Gal recently wrote two articles (“Blinding us to the obvious? The effect of statistical training on the evaluation of evidence” and “Statistical significance and the dichotomization of evidence”) on the misunderstandings of p-values that are common even among supposed experts in statistics and applied social research.
The key misconception has nothing to do with tail-area probabilities or likelihoods or anything technical at all, but rather with the use of significance testing to finesse real uncertainty.
As John Carlin and I write in our discussion of McShane and Gal’s second paper (to appear in the Journal of the American Statistical Association):
Even authors of published articles in a top statistics journal are often confused about the meaning of p-values, especially by treating 0.05, or the range 0.05–0.15, as the location of a threshold. The underlying problem seems to be deterministic thinking. To put it another way, applied researchers and also statisticians are in the habit of demanding more certainty than their data can legitimately supply. The problem is not just that 0.05 is an arbitrary convention; rather, even a seemingly wide range of p-values such as 0.01–0.10 cannot serve to classify evidence in the desired way.
In our article, John and I discuss some natural solutions that won’t, on their own, work:
– Listen to the statisticians, or clarity in exposition
– Confidence intervals instead of hypothesis tests
– Bayesian interpretation of one-sided p-values
– Focusing on “practical significance” instead of “statistical significance”
– Bayes factors
You can read our article for the reasons why we think the above proposed solutions won’t work.
From our summary:
We recommend saying No to binary conclusions . . . resist giving clean answers when that is not warranted by the data. . . . It will be difficult to resolve the many problems with p-values and “statistical significance” without addressing the mistaken goal of certainty which such methods have been used to pursue.
P.S. Along similar lines, Stephen Jenkins sends along the similarly-themed article, “‘Sing Me a Song with Social Significance’: The (Mis)Use of Statistical Significance Testing in European Sociological Research,” by Fabrizio Bernardi, Lela Chakhaia, and Liliya Leopold.
Andrew Gelman is a professor of statistics and political science and director of the Applied Statistics Center at Columbia University. He blogs at Statistical Modeling, Causal Inference, and Social Science.

Bayes Factors Versus p-Values

In a recent article in PLOS One, Don van Ravenzwaaij and John Ioannidis argue that Bayes factors should be preferred to significance testing (p-values) when assessing the effectiveness of new drugs.  At his blogsite The 20% Statistician, Daniel Lakens argues that Bayes factors suffer from the same problems as p-values. Namely, the combination of small effect sizes and sample sizes leads to inconclusive conclusions no matter whether one uses p-values or Bayes factors.  The real challenge facing decision-making from statistical studies comes from publication bias and underpowered studies.  Both significance testing and Bayes factors are relatively powerless (pun intended) to overcome these more fundamental problems. To read more, click here.

How to Fix the “Reproducibility Crisis”? Three Solutions

[From the article “The science ‘reproducibility crisis’ — and what can be done about it” from the website]
“Reproducibility is the idea that an experiment can be repeated by another scientist and they will get the same result. It is important to show that the claims of any experiment are true and for them to be useful for any further research. However, science appears to have an issue with reproducibility. A survey by Nature revealed that 52% of researchers believed there was a “significant reproducibility crisis” and 38% said there was a “slight crisis”. We asked three experts how they think the situation could be improved.”
To read more, click here.

How Many Ways Are There to Replicate? The Journal Scientific Data Presents a Collection of Examples

[From the abstract of the article, “Replication data collection highlights value in diversity of replication attempts”, by DeSoto and Schweinsberg in the journal Scientific Data.]  
“Researchers agree that replicability and reproducibility are key aspects of science. A collection of Data Descriptors published in Scientific Data presents data obtained in the process of attempting to replicate previously published research. These new replication data describe published and unpublished projects. The different papers in this collection highlight the many ways that scientific replications can be conducted, and they reveal the benefits and challenges of crucial replication research. The organizers of this collection encourage scientists to reuse the data contained in the collection for their own work, and also believe that these replication examples can serve as educational resources for students, early-career researchers, and experienced scientists alike who are interested in learning more about the process of replication.”
To read the article, click here.

CAMPBELL: On Perverse Incentives and Replication in Science

[NOTE: This is a repost of a blog that Doug Campbell wrote for his blogsite at]
Stephen Hsu has a nice blog post on this topic. He writes about this common pattern:
(1) Study reports results which reinforce the dominant, politically correct, narrative.
(2) Study is widely cited in other academic work, lionized in the popular press, and used to advance real world agendas.
(3) Study fails to replicate, but no one (except a few careful and independent thinkers) notices.
#1 is spot-on for economics. Woe be to she who bucks the dominant narrative. In economics, something else happens. Following the study, there are 20 piggy-back papers which test for the same results on other data. The original authors typically get to referee these papers, so if you’re a young researcher looking for a publication, look no further. You’ve just guaranteed yourself the rarest of gifts — a friendly referee who will likely go to bat for you. Just make sure your results are similar to theirs. If not, you might want to shelve your project, or else try 100 other specifications until you get something that “works”. One trick I learned: You can bury a robustness check which overturns the main results deep in the paper, and your referee who is emotionally invested in the benchmark result for sure won’t read that far.

Hsu then writes: “one should be highly skeptical of results in many areas of social science and even biomedical science (see link below). Serious researchers (i.e., those who actually aspire to participate in Science) in fields with low replication rates should (as a demonstration of collective intelligence!) do everything possible to improve the situation. Replication should be considered an important research activity, and should be taken seriously”

That’s exactly right. Most researchers in Economics go their entire careers without criticizing anyone else in their field, except as an anonymous referee, where they tend to let out their pent-up aggression. Journals shy away from publishing comment papers, as I found out first-hand. In fact, much if not a majority of the papers published in top economics journals are probably wrong, and yet the field soldiers on like a drunken sailor. Often, many people “in the know” realize that many big papers have fatal flaws, but have every incentive not to point this out and create enemies, or to waste their time writing up something which journals don’t really want to publish (the editor doesn’t want to piss a colleague off either). As a result, many of these false results end up getting taught to generations of students. Indeed, I was taught a number of these flawed papers as both an undergraduate and a grad student.

What can be done?

Well, it would be nice to make replication sexy. I’m currently working on a major replication/robustness project of the AER. In the first stage, we are checking whether results are replicable, using the same data sets and empirical specifications. In the second stage, we plan to think up a collection of robustness checks and out-of-sample tests of papers, and then create an online betting market about which papers will be robust. We plan to let the original authors bet on their own work.

Another long-term project is to make a journal ranking system which gives journals points for publishing comment papers. Adjustments could also be made for other journal policies, such as the extent to which a particular journal leeches off the academic community with high library subscription fees, submission fees, and long response times.

The AEA should also come out with a new journal split between writing review articles (which tend to be highly cited), and comment papers (which tend not to be). In that case, they could do both well and good.

As an individual, you can help the situation by writing a comment paper (maybe light up somebody who isn’t in your main field, like I did). You can also help by citing comment papers, and by rewarding comment papers when you edit and serve as a referee. As an editor, do you really care more about your journal’s citations than truth? You could also engage in playful teasing of your colleagues who haven’t written any comment papers as people who aren’t doing their part to make economics a science. (You could also note that it’s also a form of soft corruption, but I digress…)

Doug Campbell is an Assistant Professor at the New Economic School in Moscow, Russia. You can follow him on Twitter at @lust4learning. Correspondence regarding this blog can be sent to him at .

The Peer Reviewers’ Openness Initiative (Part 2): Get the App!

Tom Hardwicke, a post-doctoral research fellow at the Meta-Research Innovation Center (METRICS) at Stanford, has written an app to collect and summarize reviewers’ experiences with encouraging data transparency when they review for journals.  Hardwicke describes his app as follows:
“The PRO Initiative (PRO-I) is a unique grassroots effort to promote open research, and your experiences as a PRO-I reviewer will contain a rich body of information about the barriers you face, the successes you achieve, and the effectiveness of the initiative as a whole. We want to try and capture as much of that information as possible.”
“With this in mind, we have built The PRO-I Reviewing App – a simple web interface where you can quickly compose your PRO-I reviews using an interactive checklist. The checklist dynamically updates to ensure that you are only asked the minimum number of questions needed to establish whether a manuscript meets the PRO-I criteria or not. We hope that the app will save you time by making the reviewing process more efficient.”
“You can also record author and editor reactions to your reviews, helping us to identify areas where additional efforts to promote open research may be needed, such as researcher training.”
“Head over to the app to find out more and give it a go:

The Peer Reviewers’ Openness Initiative (Part 1): Storm Brewing Within the American Psychological Association

[From the article “Peer-review activists push psychology journals towards open data” at]  “An editor on the board of a journal published by the prestigious American Psychological Association (APA) has been asked to resign in a controversy over data sharing in peer review.”
“Gert Storms — who says he won’t step down — is one of a few hundred scientists who have vowed that, from the start of this year, they will begin rejecting papers if authors won’t publicly share the underlying data, or explain why they can’t.”
“The idea, called the Peer Reviewers’ Openness Initiative, was launched by psychologists hoping to increase transparency in a field beset by reports of fraud and dubious research practices. And the APA, which does not ask that data be made available to peer reviewers or shared openly online, seems set to become an early testing ground for the initiative’s influence. With Storms’ situation still unresolved, the society’s council of editors will discuss whether it should change its policies at a meeting in late March.”
To read more, click here.

How Many Carrots Can Brian Wansink Eat?

[From the article “Introducing SPRITE (and the Case of the Carthorse Child)” by James Heathers at the website Hackernoon] “So, if you’re reading this, you’ve probably heard about the recent trouble with a rash of papers from the Cornell Food and Brand Lab. If you haven’t, well, you have some catching up to do. Popular accounts are here (Slate), here (New York Magazine) and here (Andrew Gelman); the problems with specific papers are outlined here (the central pre-print) and here; responses are given or linked here and hereOh, and now — since yesterday — there’s this. To sum it up quickly: the work from this lab is under a great deal of serious scrutiny at present. There are irregularities present in several papers which are very difficult to explain. But push all that onto the sideboard for now, because we’re going to talk about carrots.”
Besides being a nice summary of the controversies surrounding Brian Wansink and the Cornell Food and Brand Lab, this article features a description of the statistical app SPRITE, which looks very interesting.  Oh, and the article is very funny. To read more, click here.

“What Do Economists Know?” Not As Much As They Could

[From the article “What Do Economists Know?” by Russ Roberts at the website Medium] In this article, Russ Roberts argues that economists don’t know as much as everybody, especially economists!, would like them to know. However, there are some things that would improve the situation.  One of those things is that researchers should make their data available to others.  
Roberts discusses Jonathan Rothwell recent re-analysis of David Autor’s research on the economic impact of US trade with China.  He writes: “Rothwell’s analysis purporting to refute Autor et al was done with Autor et al’s data. Sharing data is honorable because data often takes work before you can use it and sharing allows others to profit without having to do the work. But all empirical claims should be accompanied by sharing of data along with all of the steps that were necessary to get the data into shape. The kitchen of econometrics can be an unsightly place. But everything that goes on in the kitchen should be publicly available in order to reassure the diners. Again, people can lie or hide things, but at least we should make clear what is the ideal.
Economists may not know that much.  But they could know more if they would get better at sharing their data.  To read Russ Roberts’ article, click here.