FINDLEY, JENSEN, MALESKY, & PEPINSKY: Nothing Up with Acai Berries: Some Reflections On Null Results from a Results-Free Peer Review Process

In the academy and well beyond, the problem of null results has become quite significant. Indeed, discussions of null results have made their way as far as TV commentator John Oliver’s recent discussion of science in which he poignantly notes that people generally do not like to hear about null results. And yet, maybe we would all be better off – with more money in the bank – if his headline “Nothing Up With Acai Berries” actually made it to the general public and we embraced it (see NIH).
This is not just a problem in health, the sciences in general struggle with how to engage null results. Social scientists are no exception. Our interest in the topic led to a special issue of Comparative Political Studies, a leading political science journal, in which we solicited results free submissions and conducted the entire review process with all decisions made in the absence of results. (See our original call for papers).  This meant that reviewers and editors could not condition their decisions on significant results, thereby allowing a greater opportunity for null results to end up in published manuscripts. This special issue has been accepted and three results free papers will be published along with our introduction.
The exercise demonstrated in practice that papers with developed theories and designs, but that ultimately ended up with null results, could make it through review and into print.  One of the three published papers documented statistically insignificant treatment effects in its main experimental interventions. This in itself was a huge success of the special issue—revealing that research may find null results, and allowing readers to learn from them. But the process was also quite instructive on the challenges of evaluating work where null results have a greater probability of being published. We found the authors’, and especially the reviewers’, comments on this process illuminating.  
We offer two interrelated suggestions from our pilot to help make null results more prominent in peer-reviewed publications. The first has to do with acclimating reviewers to a new way of thinking about null findings—that they may be meaningful theoretically. The second has to do with helping authors frame their prospective work, so that null results can be read as meaningful contributions.  
The first problem unveiled by our pilot was that reviewers have become conditioned to view null results as empirically suspect. It seemed especially difficult for referees to accept that potential null findings might mean that a theory fails to explain the phenomenon being investigated. Rather, it seemed that reviewers feel more comfortable when they can simply interpret null results as evidence of mechanical problems with how the hypothesis was tested (low power, poor measures, etc.). Tellingly, many reviewers described null results as “non-findings,” suggesting that one learns nothing from results that are not consistent with a directional hypothesis. Making this distinction, of course, is one of the main benefits of results free peer review.
Perhaps the single most compelling argument in favor of results-free peer review is that it allows for findings that a hypothesized relationship is not found in the data. Yet, our reviewers pushed back against making such calls. They appeared reluctant to endorse manuscripts in which null findings were possible, or if so, to believe those potential null results might be interpreted as evidence against the existence of a hypothesized relationship. For some reviewers, this was a source of some consternation: Reviewing manuscripts without results made them aware of how they were making decisions based on the strength of findings, and also how much easier it was to feel “excited” by strong findings.
The second problem that our pilot revealed has to do with how authors discuss their contributions. The main reason for rejection in our results free review process was that authors failed to explain their projects in ways that made a potential null result theoretically compelling.  
Again and again, reviewers posed some version of the question: If the tested hypotheses proved insignificant, would that move debates in this sub-literature forward in any way? In many of the rejected papers and even one of the accepted papers, the answer was no. There were three reasons that reviewers reached this conclusion. First, a null finding would not be interesting because the reviewer found the theory to be implausible in the first place. Proving that the implausible was in fact implausible is not a recipe for scintillating scholarship.
The second was a variant of Occam’s razor. Reviewers did not believe that the author had adequately accounted for the simpler, alternative theory to explain the underlying puzzle that motivated their research. In this instance, a null result would only reinforce the notion that the more parsimonious theory was superior, or that a natural experiment was confounded by unobservable selection.
Third, there was too much distance between the articulated theory and the abstract field, lab-in-field, or survey experiment articulated in the paper. The theory invoked a compelling concept, but the proposed research design failed to adequately capture it or stretched the meaning of the concept to the point of unrecognizability. In this case, a null result would only prove the empirical test was inadequate for the bigger question.
None of these dismissals of proposed research plans are new problems or unique to results-free review. They are a standard part of the way scholars evaluate research. The interesting implications for results-free review manifest themselves in how strategic authors may alter their research agenda to survive the review process. Introducing a laundry list of hypotheses and potential heterogeneous effects will not suffice. Our reviewers were quick to spot and reject this type of “hypothesis trolling.”
Three author strategies would seem most plausible for articulating work that makes null results compelling. First, authors might place themselves between two competing theories with contrasting observable implications, posing their research design as the distinguishing test. For example, does fiscal decentralization decrease or increase corruption? Here, a null finding might rule out one of the competing hypotheses (although concerns about statistical power might still appear). Second, authors may offer their research design as the first or a better test of prevailing theory or logic that has been inadequately tested in the literature. The theory of deliberative democracy, for instance, offers a number of very clear implications about how deliberation should affect the thinking and behavior of citizens, yet, most of these have been subjected to only limited empirical testing. If designed properly, this would be interesting purely because the potential target would be well known. Again, however, reviewers reacted quite negatively to this type of approach. Most referees wanted authors to build on the existing literature in important ways or to thoroughly explain why the observational work of previous generations was flawed. Finally, authors might offer a test of a hypothesis that is the next logical step within a prevailing and well-traveled research paradigm.
There is a clear drawback to the way results review prioritizes this type moderate theoretical progress and testing — what Kuhn referred to as normal science. Knowing that they have to convince a skeptical reviewer that a null finding is interesting, scholars may choose to abjure big questions and paradigmatic shifting scholarship for incremental research designs.  There is a reasonable debate to be had about the proper balance between normal science and big questions, and certainly outlets need to be available for the next big breakthrough.
Outside the academy, these problems may be magnified. For economists or political scientists working in the Bureau of Labor Statistics or the World Bank, for examples, authors may face an explicit peer review process or perhaps even simply face scrutiny from policymakers or funders who carry their own mental theoretical models within which they adjudicate the (likely) results from researchers. If such researchers see the possibility for null results, they may similarly attempt to strategize about the types of phenomena they track, the sorts of programs they endorse or evaluate, or more basically avoid potentially innovative policy approaches in favor of incremental programming and evaluation.
In our view, the more immediate problem, however, is that publication bias in the social sciences is impeding both normal science and the next big thing. We can’t even make incremental progress on the critical questions of our day without a clear documentation of all the research paths that did not prove fruitful.  Knowing about the failed tests is just as valuable as learning about successes when we envision new research projects.  Worse, the proliferation of successful tests means it is hard to identify the truly path-breaking findings, and, even more importantly, to trust that we can build upon them.
These observations point to an important conclusion for an exercise: scientists engaged in null-significance hypothesis testing lack a coherent framework for thinking about what null results actually mean, and how to build them into a cumulative scientific enterprise. Bayesians have long criticized null-significance hypothesis testing for this very reason. Our exercise proved useful in unexpected ways for bringing this problem to light, and point to the need for a more robust and honest discussion about why scientists are so eager to dismiss null findings as “non-results,” and the implications for collective efforts across the disciplines.
There is room for disagreement about the best approach. Indeed, there will be a variety of benefits and costs to admitting null results more equally into scholarly and policy debates. We endorse practices that will allow for null results to become more central in and out of the academy, but we suspect that a robust discussion lies ahead regarding the complexities of author, reviewer, and editor incentives for producing and evaluating them.
Michael Findley is associate professor in the Department of Government and the LBJ School of Public Affairs (courtesy) at the University of Texas at Austin. Nathan Jensen is professor in the Department of Government at the University of Texas at Austin. Edmund Malesky is professor in the Department of Political Science at Duke University. Thomas Pepinsky is associate professor in the Department of Government at Cornell University.

What is Post-Publication Peer Review?

This short blog by TONY ROSS-HELLAUER does a nice job of distinguishing between (i) “open pre-review” and (ii) “open final version commenting”, both of which are, unfortunately, lumped under the category of “post-publication peer review.” The blog includes references.  To read more, click here.
As an aside, TRN notes that the journal Economics E-Journal allows both.  It publishes submissions as working papers, with open review.  Then, after a paper is accepted, it allows post-publication comment.

Famous AER Article on Racial Discrimination Fails to Replicate. Why?

In his blog Data Colada, URI SIMONSON offers an hypothesis why a recent paper in the AER, Deming, Yuchtman, Abulafi, Goldin, & Katz (2016), failed to replicate Bertrand & Mullainathan’s (2004) famous paper on racial discrimination in labor markets.  The latter study focused on callback rates when CVs were mailed out using “black” versus “white” names.  B&M found that callback rates were lower for blacks.  DYAG&K found no difference.  Simonson argues that socioeconomic status (SES) may be a confounder.  In particular, B&M’s “black names” were also perceived to be low SES, while DYAG&K’s names arguably held SES constant.  So the final answer is…Yet To Be Determined.  To read more, click here.

Two More Findings from Psychology Fail to Replicate

[From the article, “A Worrying Trend for Psychology’s ‘Simple Little Tricks'” from The Atlantic magazine] “In yet another setback for the field, researchers have failed to replicate two studies showing that basic techniques can reduce racial achievement gaps and improve voter turnout.” To read more, click here.

The Rise of Negative Results

This article from Chemical & Engineering News discusses publication bias and ways to fix it that will sound familiar to readers of TRN.  Of particular interest is the need to make space in the literature for negative results:  “The open access movement has given rise to new models of publication that judge research work not on significance but solely on originality and competence. That is now giving us new avenues for publishing negative results, such as PLOS One, F1000 Research, Peer J, Scientific Reports, and the recently announced ACS Omega.”  The author, Stephen Curry, professor of structural biology at Imperial College London, then goes on to explain why he has started to publish negative results in author-pays, open access journals.  To read more, click here.

IN THE NEWS: The Economist (September 7, 2016)

[From the article “Excel errors and science papers”]  “A recent study in the journal Genome Biology looked at papers published between 2005 and 2015, and found spreadsheet-related errors in fully one-fifth of articles on genomics that provided supplementary data alongside their text. Although the papers themselves were not necessarily affected, such bugs can create complications for other scientists trying to replicate or build on previous work.” To read more, click here.

You Tube Video on PLOS and Open Science

This short YouTube video by Dr. Siouxsie Wiles,  microbiologist at the University of Auckland, is a great introduction to PLOS (Public Library of Science) and the promise of open science.  While it seems fair to say that the open science movement can, at times, be naive about the value of scientific journals, this YouTube video provides a compelling motivation for the open science movement, and an inspirational example of open science in action.  It is about 10 minutes long.  To watch it, click here.

On the Reproducibility Crisis in….Microbiology?

From obscure to ubiquitous, the reproducibility crisis is now headline news everywhere.  In a blog by American Society of Microbiology (ASM) CEO Stefano Bertuzzi entitled, “ASM Addresses the Reproducibility Crisis in New Academy Report”, 6 areas were highlighted for “restoring rigor to scientific practice”:
  1. The primacy of rigorous evaluation criteria to recognize and reward high-quality scientific research
  2. The importance of training in appropriate statistical approaches and in general
  3. The need for open data as the cornerstone for the scientific enterprise
  4. The elimination of publication bias by encouraging the publication of negative results
  5. The establishment of common criteria among journals for retraction criteria to ensure consistency and transparency
  6. The need for strengthening integrity oversight and training
 In addition, Bertuzzi speaks of the importance of replication, “A core concept in scientific research is the ability to replicate empirical results,” though he does not mention the importance of encouraging the publication of replication studies.  All excellent points. American Economic Association, wherefore art thou?  To read more, click here.

Project TIER Offers Free Workshop on Teaching Reproducibility (Nov. 18 & 19)

[FROM THE PROGRAM ANNOUNCEMENT] “This workshop is intended for faculty who are interested in incorporating principles of transparent and reproducible research in their teaching and/or research advising. The workshop will emphasize research methods in the social sciences, but participation is not limited to social science faculty. Instructors from departments of math and statistics, or other fields in which quantitative methods are important, are welcome as well. We are seeking participants who teach classes and/or supervise research involving applied analysis of statistical data, regardless of their disciplinary homes. Faculty from both graduate and undergraduate programs are invited to apply.”  Costs are covered by Project TIER. Application deadline is October 3 or until all spaces are filled.  To learn more, click here.

MAREN DUVENDACK: What are Registered Replication Reports?

Academia has been abuzz in recent years with new initiatives focusing on research transparency, replication and reproducibility of research. Notable in this regard are the Berkeley Initiative for Transparency in the Social Sciences, and the Reproducibility Initiative which PLOS and Science Exchange are involved, but there are many others. Psychology and political science have had a number of new initiatives that are shaking up the scientific research and publication process.  In economics, there are laudable endeavors by The Institute for New Economic Thinking, which funds the “Replication in Economics” project at Gottingen University; and 3ie, which initiated a replication initiative that includes funding for replication studies.  And, of course, there is The Replication Network, which started a little over a year ago.
 In this blog I would like to highlight a particular initiative that is concerned with the distorted incentive structure of the academic peer-review process. It is well known that the scientific literature rewards novel, ground-breaking findings that are sometimes at odds with how the scientific research process works. Novel findings are exciting, but we can only judge the true effects of something if we amass evidence from a variety of sources and these might not always be novel or exciting.
 This is where the idea of registered reports, and relatedly, registered replication reports. The way the registered report models works is very simple: Researchers submit a report setting out the research questions and proposed methodology before embarking on any data collection and analysis. This report is peer-reviewed to ensure certain quality criteria are met.  Once the submission is accepted, publication in the journal where it was accepted is almost guaranteed, assuming researchers have followed through with their registered methodology.
 This initiative is the brain child of Alex Holcombe, Bobbie Spellman and Daniel Simons and was started in 2013 in collaboration with the journal Perspectives on Psychological Science.  The first registered replication report was published in 2014.  The Center for Open Science has actively promoted registered reports. According to Daniel Simons, Professor of Psychology at the University of Illinois, “Registered reports eliminate the bias against negative results in publishing because the results are not known at the time of review”.  Adds Chris Chambers, chair of the COS-associated Registered Reports Committee, “Because the study is accepted in advance, the incentives for authors change from producing the most beautiful story to producing the most accurate one.”  The idea of registered reports has quickly gained much traction.  Brian Nosek, Professor of Psychology at the University of Virginia, is now piloting registered reports with over 20 journals.  
A related initiative is that of “results-free” reviewing (RFR) where studies are reviewed without reviewers knowing the results of the analysis. The journal Comparative Political Studies recently published a special issue that featured a pilot study of RFR.
The move towards registered replication reports somewhat mirrors that of registered trials in the medical sciences where trials are registered before embarking on the study to minimise reporting biases, enhance transparency and accountability (see here). 3ie has established a similar registry with the aim to register international development impact evaluations (see here). 
All these initiatives are important in the quest for more research transparency.  The medical sciences, psychology, and political science have been at the forefront of these efforts.  It would be good to see similar initiatives in economics.  
Maren Duvendack is a lecturer in development economics at the University of East Anglia and co-organizer of The Replication Network.