On p-Hacking, Retractions, and the Difficult Enterprise of Science

This article in FiveThirtyEighty.com is a great read for lots of reasons.  The leitmotiv is that while science has its share of fraudsters and academic scammers, the underlying problem is that the scientific enterprise is inherently very, very difficult.  To prove the point, the article includes an online, interactive data analysis that studies the relationship between political parties and economic performance.  Words cannot do it justice.  Do the “research” and believe.  To read the article, click here.

B.D.MCCULLOUGH: The Reason so Few Replications Get Published Is….

When preparing to give a talk at a conference recently, I decided to update some information I had published a few years ago.  In McCullough (2009), I estimated that 16 economics journals had a mandatory data/code archive (archives that require only data do not support replication — see McCullough, McGeary and Harrison (2008)).  Vlaeminck (2013) counted 26 journals with a mandatory data/code archive. This is a non-trivial increase, since in 2004 only four economics journals had such a policy.  One might think that this increase bodes well for replicability in the economic science, but such is not the case.  It is all well and good to make data and code available for replications, but if there is no place for researchers to publish these replications, then all the mandatory data/code archives in the world will amount to only so much window dressing.

The problem is that editors do not want to admit that they publish unreplicable research, nor do they want to be bothered ensuring that the research they publish is replicable.  The fact is that very few journals will publish replications and the top-ranked journals only publish an infinitesimal number of replications.  Consequently, any editor is largely immune to the embarrassment that would arise if several of the articles he published were found to be not replicable.  Hence, editors have no incentive either to ensure the replicability of the articles they publish or to publish replications of the articles they do publish.  If researchers can’t get their replication articles published in decent journals, they won’t write the articles in the first place.  And this seems to be the present state of equilibrium, sub-optimal though it may be.  Worse, there seems to be a tacit collusion between the editors, in that one editor will not publish an article that exposes another editor as publishing unreplicable research.

Prima facie evidence of this sad state of affairs is the fact that Liebowitz’s failed replication of the JPE paper by Oberholzer-Gee and Strumpf still hasn’t been published, not by the the JPE and not be any other journal.  Anyone interested in replication should go to SRRN and read the papers by Liebowitz on this topic.  In “How Reliable is the Oberholzer-Gee and Strumpf Paper on File Sharing”, Liebowitz capably demonstrates fatal flaws in the data handling and analysis of the Oberholzer-Gee and Strumpf paper.  Actually, time is precious; just take my word for it so that you don’t have to read it: Liebowitz demolishes the Oberholzer-Gee/Strumpf paper.  In “Sequel to Liebowitz’s Comment on the Oberholzer-Gee and Strumpf Paper on File Sharing”, Liebowitz describes his efforts to get his paper published in the JPE.  This is the paper to be read. So Kafkaesque was Liebowitz’s ordeal that journalist Norbert Haring, writing in the German financial newspaper Handelsblatt (the German equivalent of the Wall Street Journal), said, “Steven Levitt, Editor of the Journal of Political Economy, uses a questionable tactic to block an undesired comment.   The subject of the criticised article was a hot topic.  On closer look, everything about the case was unusual.”  One might think that another journal with an interest in file sharing would publish Liebowitz’s paper….

No one can read these papers by Liebowitz and think that “truth will out” in the economics journals.  Yet there is cause for hope.

Third party organizations dedicated to replication have emerged in the past few years, such as 3ie (International Initiative for Impact Evaluation) and BITSS (Berkeley Initiative for Transparency in the Social Sciences) and EDAWAX (European Data Watch).  These organizations support replication without a necessary prospect of publication.  If these organizations can demonstrate that top journals are publishing non-replicable research, then the top journals might be embarrassed into admitting that their efforts to ensure replicability are insufficient.  And then Liebowitz’s article might finally get published.

References
===========
N. Haring, Handelsblatt, 23.06.2008

B. D. McCullough
“Open Access Economics Journals and the Market for Reproducible Economic Research”
Economic Analysis and Policy 39(1), 117-126, 2009

B. D. McCullough, Kerry Anne McGeary and Teresa D. Harrison
“Do Economics Journal Archives Promote Replicable Research?”
Canadian Journal of Economics 41(4), 1406-1420, 2008

Vlaeminck, Sven, 2013. “Data Management in Scholarly Journals and Possible Roles for Libraries – Some Insights from EDaWaX,” EconStor Open Access Articles, ZBW – German National Library of Economics, pages 49-79.

RICHARD ANDERSON: Replication and the Zen of Home Repair

This summer is the first since my retirement from government that I find myself without academic obligations here or abroad.  Instead, I am focused on starting to rehab a tattered house that I recently purchased jointly with one of my children.
Surprising parallels exist between repairing a house and pursuing scientific research, at least for persons with an active imagination. First, it is important to understand the basic structure of the problem: removing an incorrect wall might lead to collapse of the house.  Pursuit of an uninteresting hypothesis might doom many months of research to becoming a permanent resident in your file cabinet. Both are tragedies.
There also is the issue of “what was done before.” Is it important to discern the architect’s original location for a window or a door? Is it important to discern precisely how the investigator in a previous study specified his regression in Eviews? Surprisingly, the answers are both yes. Approximate guesses are not adequate. Cutting through framing that supports a hidden beam can lead to poor results, as can guessing what exact specification was used by a previous investigator.
Opening the door on an older house and opening a new academic study are quite similar in a challenging way: neither typically comes with adequate documentation. In a house, you open the door to adventure: no document reveals the modifications and flaws, there is candy and danger for you to discover. Your mind’s vision of the completed project is its advertisement. Similarly, as Bruce McCullough phrases it, opening a new published article is but an advertisement for the underlying research. What data, precisely, were used? If a regression was used, how was it specified and what options (or defaults) were used in its estimation? What statistical package was used?  If a hypothesis test was used, precisely how was the test statistic calculated?
Danger stalks both restored houses and scientific research.  An incorrectly modified house can risk human life (or at least the value of the property). An incorrect scientific study risks a poorly designed public policy, or creating a “bandwagon” that leads others in pursuit of flawed results.
Fortunately, the answer in economic research  (both empirical and DSGE-style simulation studies) is easier than in old houses: the profession should expect authors to furnish code and data as part of the output of their research. It is an enduring mystery that professional economists – and the persons who pay their salaries – see no value in such transparency. An old house is unable to reveal clearly its history and current flaws; most are sold “as is” for that reason. How much longer will published economic research similarly be sold “as is” to its consumers?

Headline News: Two Economists Make Their Data Available

FROM THE ARTICLE: “Every year hundreds of millions of children in the developing world are given deworming tablets, whether they have worms or not….This “deworm everybody” approach has been driven by a single, hugely influential trial published in 2004 by two economists, Edward Miguel and Michael Kremer. This trial, done in Kenya, found that deworming whole schools improved children’s health, school performance, and school attendance. ….A decade later, in 2013, these two economists did something that very few researchers have ever done. They handed over their entire dataset to independent researchers on the other side of the world, so that their analyses could be checked in public. What happened next has every right to kick through a revolution in science and medicine.”  To read more, click here.

It’s So Easy to Do: Small Coding Error Leads to Retraction

This article from the Washington Post is noteworthy only because it highlights how a small coding error can cause a major change in a study’s results.  The original study claimed that men were more likely than women to divorce a spouse who fell ill.  The study was based on a longitudinal survey.  A subsequent replication study found that the result was driven by a coding mistake, in which people who left the study were incorrectly coded as having become divorced.  To read more, click here.

What’s In a Name? Economist Argues for A Better Definition of Replication

In his post at The Impact Blog, economist Michael Clemens argues that vagueness about what constitutes a replication is harming the reputations of reputable researchers, and hurting the progress of replications.  Clemens proposes a classification system to eliminate confusion between a replication and a robustness check.  To read more, click here.

Retraction Watch Publishes a “Leaderboard” of Top Retractors

The website Retraction Watch is approaching it’s 5th birthday.  Among other things, it publishes a “leaderboard” where they keep track of researchers with the most retractions. The leaderboard lists a Top 30 list of researchers, with links to the individual cases.  Perhaps reassuringly, only one of the current Top 30 is an economist.  To read more, click here.

A Call for a Journal of Insignificant Results?

FROM THE ARTICLE: “Negative results are an important building block in the development of scientific thought, primarily because most likely the vast majority of data is negative, i.e., there is not a favorable outcome. Only very limited data is positive, and that is what tends to get published, albeit alongside a sub-set of negative results to emphasize the positive nature of the positive results. Yet, not all negative results get published.” To read more, click here.

Reblog from Retraction Watch: Replication May Cause More Harm than Good

FROM THE ARTICLE: “Replication is often viewed as the demarcation between science and nonscience. However, contrary to the commonly held view, we show that in the current (selective) publication system replications may increase bias in effect size estimates.” To read more, click here.

The AJPS Replication Policy: A Model for Other Journals?

Before a paper can be published at the American Journal of Political Science (AJPS), the journal checks that all the empirical results from the paper can be reproduced with the data and code that the author has provided.  The paper does not get published until the journal confirms that this can be done.  A full statement of the AJPS replication policy can be found by clicking here.