RANDALL & WELSER: On the Irreproducibility Crisis of Modern Science
[This post is based on the report, “The Irreproducibility Crisis of Modern Science: Causes, Consequences and the Road to Reform”, recently published by the National Association of Scholars]
For more than a decade, and especially since the publication of a famous 2005 article by John Ioannidis, scientists in various fields have been concerned with the problems posed by the replication crisis. The importance of the crisis demands that it be understood by a larger audience of educators, policymakers, and ordinary citizens. To this end, our new report, The Irreproducibility Crisis of Modern Science, outlines the nature, causes, and significance of the crisis, and offers a series of proposals for confronting it.
At its most basic level, the crisis arises from the widespread use of statistical methods that inevitably produce some false positives. Misuse of these methods easily increases the number of false positives, leading to the publication of many spurious findings of statistical significance. “P-hacking” (running repeated statistical tests until a finding of significance emerges) is probably the most common abuse of statistical methods, but inadequate specification of hypotheses and the tendentious construction of datasets are also serious problems. (Gelman and Loken 2014 provide several good examples of how easily these latter faults can vitiate research findings.)
Methodological errors and abuses are enabled by too much researcher freedom and too little openness about data and procedures. Researchers’ unlimited freedom in specifying their research designs—and especially their freedom to change their research plans in mid-course—makes it possible to conjure statistical significance even for obviously nonsensical hypotheses (Simmons, Nelson, and Simonsohn 2011 provide a classic demonstration of this). At the same time, lack of outside access to researchers’ data and procedures prevents other experts from identifying problems in experimental design.
Other factors in the irreproducibility crisis exist at the institutional level. Academia and the media create powerful incentives for researchers to advance their careers by publishing new and exciting positive results, while inevitable professional and political tendencies toward groupthink prevent challenges to an existing consensus.
The consequences of all these problems are serious. Not only is a lot of money being wasted—in the United States, up to $28 billion annually on irreproducible preclinical research alone (Freedman et al. 2015)—but individuals and policymakers end up making bad decisions on the basis of faulty science. Perhaps the worst casualty is public confidence in science, as people awaken to how many of the findings they hear about in the news can’t actually be trusted.
Fixing the replication crisis will require energetic efforts to address its causes at every level. Many scientists have already taken up the challenge, and institutions like the Center for Open Science and the Meta-Research Innovation Center at Stanford (METRICS), both in the U.S., have been established to improve the reproducibility of research. Some academic journals have changed the ways in which they ask researchers to present their results, and other journals, such as the International Journal for Re-Views in Empirical Economics, have been created specifically to push back against publication bias by publishing negative results and replication studies. National and international organizations, including the World Health Organization, have begun delineating more stringent research standards.
But much more remains to be done. In an effort to spark an urgently needed public conversation on how to solve the reproducibility crisis, our report offers a series of forty recommendations. At the level of statistics, researchers should cease to regard p-values as dispositive measures of evidence for or against a particular hypothesis, and should try to present their data in ways that avoid a simple either/or determination of statistical significance. Researchers should also pre-register their research procedures and make their methods and data publicly available upon publication of their results. There should also be more experimentation with “born-open” data—data archived in an open-access repository at the moment of its creation, and automatically time-stamped.
Given the importance of statistics in modern science, we need better education at all levels to ensure that everyone—future researchers, journalists, legal professionals, policymakers and ordinary citizens—is well-acquainted with the fundamentals of statistical thinking, including the limits to the certainty that statistical methods can provide. Courses in probability and statistics should be part of all secondary school and university curricula, and graduate programs in disciplines that rely heavily on statistics should take care to emphasize the ways in which researchers can misunderstand and misuse statistical concepts and techniques.
Professional incentives have to change too. Universities judging applications for tenure and promotion should look beyond the number of scholars’ publications, giving due weight to the value of replication studies and expecting adherence to strict standards of reproducibility. Journals should make their peer review processes more transparent, and should experiment with guaranteeing publication for research with pre-registered, peer-reviewed hypotheses and procedures. To combat groupthink, scientific disciplines should ask committees of extradisciplinary professionals to evaluate the openness of their fields.
Private philanthropy, government, and scientific industry should encourage all these efforts through appropriate funding and moral support. Governments also need to consider their role as consumers of science. Many government policies are now made on the basis of scientific findings, and the replication crisis means that those findings demand more careful scrutiny. Governments should take steps to ensure that new regulations which require scientific justification rely solely on research that meets strict standards for reproducibility and openness. They should also review existing regulations and policies to determine which ones may be based on spurious findings.
Solving the replication crisis will require a concerted effort from all sectors of society. But this challenge also represents a great opportunity. As we fight to eliminate opportunities and incentives for bad science, we will be rededicating ourselves to good science and cultivating a deeper public awareness of what good science means. Our report is meant as a step in that direction.
David Randall is Director of Research at the National Association of Scholars (NAS). Christopher Welser is an NAS Research Associate.
Freedman, Leonard P., Iain M. Cockburn, and Timothy S. Simcoe (2015), “The Economics of Reproducibility in Preclinical Research.” PLoS Biology, 13(6), e1002165. doi:10.1371/journal.pbio.1002165
Gelman, Andrew and Eric Loken (2014), “The Statistical Crisis in Science.” American Scientist, 102(6), 460–465.
Ioannidis, John P. A. (2005), “Why Most Published Research Findings Are False.” PLoS Medicine, 2(8), doi:10.1371/journal.pmed.0020124.
Simmons, Joseph P., Leif D. Nelson, and Uri Simonsohn (2011), “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science, 22(11), 1359–1366.