BOB REED: On Andrew Gelman, Retractions, and the Supply and Demand for Data Transparency

Posted on 23rd May 2016 by replicationnetwork

In a recent interview on Retraction Watch, Andrew Gelman reveals that what keeps him up at night isn’t scientific fraud, it’s “the sheer number of unreliable studies — uncorrected, unretracted — that have littered the literature.” He then goes on to argue that retractions cannot be the answer. His argument is simple. The scales don’t match. “Millions of scientific papers are published each year. If 1% are fatally flawed, that’s thousands of corrections to be made. And that’s not gonna happen.”

Actually, if 1% of studies are fatally flawed, the problem is probably manageable. Assuming a typical journal publishes 10 articles an issue, 4 issues a year, that means one retraction every two and a half years, which is certainly feasible for a journal. Problems arise only when the percent substantially rises. Gelman goes on to say that he personally thinks the error rate to be large as 50% in some journals, where “half the papers claim evidence that they don’t really have.” At that point retractions are not the solution.

If revealed preference is any indication, hopes for a solution appear centered on “data transparency.” Data transparency means different things to different people, but a common core is that researchers make their data and programming code publicly available.

The Center for Open Science, Dataverse, and EUDAT are but a few examples of the high-profile explosion in efforts to make research data more “open,” transparent and shareable. In a recent guest blog at The Replication Network (reblogged from BITSS), Stephanie Wykstra promotes the related topic of data re-use.

In an encouraging sign, these efforts appear to have had an impact. A recent survey article by Duvendack et al. report that, of 333 journals categorized as “economics journals” by Thompson Reuter’s Journal Citation Reports, 27, or a little more than 8 percent, regularly published data and code to accompany empirical research studies. As some of these journals are exclusively theory journals, the effective rate is somewhat higher.

Noteworthy is that many of these journals only recently instituted a policy of publishing data and code. So while one can argue whether the glass is, say, 20 percent full or 80 percent empty, the fact is that the glass used to contain virtually nothing. That is progress.

But making data more “open” does not, by itself, address the problem of scientific unreliability. Researchers have to be motivated to go through these data, examine them carefully, and determine if they are sufficient to support the claims of the original study. Further, they need to have an avenue to publicize their findings in a way that informs the literature.

This is what replications are supposed to do. Replications provide a way to confirm/disconfirm the results of other studies. They are scalable to fit the size of the problem. With so many studies potentially unreliable, researchers would prioritize the most important findings that are worthy of further analysis. The self-selection mechanism of researchers’ time and interests would insure that the most important, most influential studies are appropriately vetted.

But after obtaining their results, researchers need a place to publicize their findings.

Unfortunately, on this dimension, the Duvendack et al. study is less encouraging. They report that only 3 percent of “economics” journals explicitly state that that they publish replications. Most of these are specialty/field journals, so that an author of a replication study only has a very few outlets, maybe as few as one or two, in which they can hope to publish their research.

And just because a journal states that it publishes replication studies, doesn’t mean that it does it very often. Duvendack et al. report that 6 journals account for 60 percent of all replication studies ever published in Web of Science “economics” journals. Further, only 10 journals have ever published more than 3 replication studies. In their entire history.

Without an outlet to publish their findings, researchers will be unmotivated to spend substantial effort re-analysing other researchers’ data. Or to put it differently, the open science/data sharing movement only addresses the supply side of the scientific market. Unless the demand side is addressed, these efforts are unlikely to be successful in providing a solution to the problem of scientific unreliability.

The irony is this: The problem has been identified. There is a solution. The pieces are all there. But in the end, the gatekeepers of scientific findings, the journals, need to open up space to allow science to be self-correcting. Until that happens, there’s not much hope of Professor Gelman getting any more sleep.

Bob Reed is Professor of Economics at the University of Canterbury in New Zealand, and co-organizer of The Replication Network.

Category: GUEST BLOGS Tags: Andrew Gelman, data and code, replication, retraction, Sharing data

3 Comments on “BOB REED: On Andrew Gelman, Retractions, and the Supply and Demand for Data Transparency”

Leave a comment Cancel reply