Science demands transparency. Yet much research in economics and finance uses secret data. The journals publish results and conclusions, but the data and sometimes even the programs are not available for review or inspection. Replication, even just checking what the author(s) did given their data, is getting harder.
Quite often, when one digs in, empirical results are nowhere near as strong as the papers make them out to be.
The solution is pretty obvious: to be considered peer-reviewed “scientific” research, authors should post their programs and data. If the world cannot see your lab methods, you have an anecdote, an undocumented claim, you don’t have research. An empirical paper without data and programs is like a theoretical paper without proofs.
On reflection, that instinct is a bit of a paradox. Economists, when studying everyone else, by and large value free markets, demand as well as supply, emergent order, the marketplace of ideas, competition, entry, and so on, not tight rules and censorship. Yet in running our own affairs, the inner dirigiste quickly wins out. In my time at faculty meetings, were few problems that many colleagues did not want to address by writing more rules.
And with another moment’s reflection (much more below), you can see that the rule-and-censorship approach simply won’t work. There isn’t a set of rules we can write that assures replicability and transparency, without the rest of us having to do any work. And rule-based censorship invites its own type I errors.
Replicability is a squishy concept — just like every other aspect of evaluating scholarly work. Why do we think we need referees, editors, recommendation letters, subcommittees, and so forth to evaluate method, novelty, statistical procedure, and importance, but replicability and transparency can be relegated to a set of mechanical rules?
DEMAND
So, rather than try to restrict supply and impose censorship, let’s work on demand. If you think that replicability matters, what can you do about it? A lot:
Though this issue has bothered me a long time, I have not started doing all the above. I will start now.
Here, some economists I have talked to jump to suggesting a call to coordinated action. That is not my view
I think this sort of thing can and should emerge gradually, as a social norm. If a few of us start doing this sort of thing, others might notice. They think “that’s a good idea,” and start doing it too. They also may feel empowered to start doing it. The first person to do it will seem like a bit of a jerk. But after you read three or four tenure letters that say “this seems like fine research, but without programs and data we won’t really know,” you’ll feel better about writing that yourself. Like “would you mind putting out that cigarette.”
Also, the issues are hard, and I’m not sure exactly what is the right policy. Good social norms will evolve over time to reflect the costs and benefits of transparency in all the different kinds of work we do.
If we all start doing this, journals won’t need to enforce long rules. Data disclosure will become as natural and self-enforced part of writing a paper as is proving your theorems.
Conversely, if nobody feels like doing the above, then maybe replication isn’t such a problem at all, and journals are mistaken in adding policies.
RULES WON’T WORK WITHOUT DEMAND
Journals are competitive too. If the JPE refuses a paper because the author won’t disclose data, and the QJE publishes it, the paper goes on to great acclaim, wins its author the Clark medal and the Nobel Prize, then the JPE falls in stature and the QJE rises. New journals will spring up with more lax policies. Journals themselves are a curious relic of the print age. If readers value empirical work based on secret data, academics will just post their papers on websites, working paper series, ssrn, repec, blogs, and so forth.
Replication is not an issue about which we really can write rules. It is an issue — like all the others involving evaluation of scientific work — for which norms have to evolve over time and users must apply some judgement.
Perfect, permanent replicability is impossible. If replication is done with programs that access someone else’s database, those databases change and access routines change. Within a year, if the programs run at all, they give different numbers. New versions of software give different results. The best you can do is to freeze the data you actually use, hosted on a virtual machine that uses the same operating system, software version, and so on. Even that does not last forever. And no journal asks for it.
Replication is a small part of a larger problem, data collection itself. Much data these days is collected by hand, or scraped by computer. We cannot and should not ask for a webcam or keystroke log of how data was collected, or hand-categorized. Documenting this step so it can be redone is vital, but it will always be a fuzzy process.
In response to “post your data,” authors respond that they aren’t allowed to do so, and journal rules allow that response. You have only to post your programs, and then a would-be replicator must arrange for access to the underlying data. No surprise, very little replication that requires such extensive effort is occurring.
And rules will never be enough.
Regulation invites just-within-the-boundaries games. Provide the programs, but no poor documentation. Provide the data with no headers. Don’t write down what the procedures are. You can follow the letter and not the spirit of rules.
Demand invites serious effort towards transparency. I post programs and data. Judging by emails when I make a mistake, these get looked at maybe once every 5 years. The incentive to do a really good job is not very strong right now.
A hopeful thought: Currently, one way we address these problems is by endless referee requests for alternative procedures and robustness checks. Perhaps these can be answered in the future by “the data and code are online, run them yourself if you’re worried!”
I’m not arguing against rules, such as the AER has put in. I just think that they will not make a dent in the issue until we economists show by our actions some interest in the issue.
PROPRIETARY DATA, COMMERCIAL DATA, GOVERNMENT DATA
Many data sources explicitly prohibit public disclosure of the data. Disclosing such secret data remains beyond the current journal policies, or policies that anyone imagines asking journals to impose. Journals can require that you post code, but then a replicator has to arrange for access to the data. That can be very expensive, or require a coauthor who works at the government agency. No surprise, such replication doesn’t happen very often.
However, this is mostly not an insoluble problem, as there is almost never a fundamental reason why the data needed for verification and robustness analysis cannot be disclosed. Rules and censorship is not strong enough to change things. Widespread demand for transparency might well be.
To substantiate much research, and check its robustness to small variations in statistical method, you do not need full access to the underlying data. An extract is enough, and usually the nature of that extract makes it useless for other purposes.
The extract needed to verify one paper is usually useless for writing other papers. The terms for using posted data could be, you cannot use this data to publish new original work, only for verification and comment on the posted paper. Abiding by this restriction is a lot easier to police than the current replication policies.
Even if the slice of data needed to check a paper’s results cannot be public, it can be provided to referees or discussants, after signing a stack of non-use and non-disclosure agreements. (That is a less-than-optimal outcome of course, since in the end real verification won’t happen unless people can publish verification papers.)
Academic papers take 3 to 5 years or more for publication. A 3 to 5 year old slice of data is useless for most purposes, especially the commercial ones that worry data providers.
Commercial and proprietary (banks) data sets are designed for paying customers who want up-to-the-minute data. Even CRSP data, a month old, is not much used commercially, because traders need up to the minute data useful for trading. Hedge fund and mutual fund data is used and paid for by people researching the histories of potential investments. Two-year old data is useless to them — so much so that getting the providers to keep old slices of data to overcome survivor bias is a headache.
In sum, the 3-5 year old, redacted, minimalist small slice of data needed to substantiate the empirical work in an academic paper are in fact seldom a substantial threat to the commercial, proprietary, or genuine privacy interest of the data collectors.
AUTHOR’S INTEREST
Authors often want to preserve their use of data until they’ve fully mined it. If they put in all the effort to produce the data, they want first crack at the results.
This valid concern does not mean that they cannot create redacted slices of data needed to substantiate a given paper. They can also let referees and discussants access such slices, with the above strict non-disclosure and agreement not to use the data.
In fact, it is usually in authors’ interest to make data available sooner rather than later. Everyone who uses your data is a citation. There are far more cases of authors who gained notoriety and long citation counts from making data public early then there are of authors who jealously guarded data so they would get credit for the magic regression that would appear 5 or more years after data collection.
Yet this property right is up to the data collector to decide. Our job is to say “that’s nice, but we won’t really believe you until you make the data public, at least the data I need to see how you ran this regression.” If you want to wait 5 years to mine all the data before making it public, then you might not get the glory of “publishing” the preliminary results. That’s again why voluntary pressure will work, and rules from above will not work.
SERVICE
One empiricist who I talked to about these issues does not want to make programs public, because he doesn’t want to deal with the consequent wave of emails from people asking him to explain bits of code, or claiming to have found errors in 20-year old programs.
Fair enough. But this is another reason why a loose code of ethics is better than a set of rules for journals.
You should make a best faith effort to document code and data when the paper is published. You are not required to answer every email from every confused graduate student for eternity after that point. Critiques and replication studies can be refereed in the usual way, and must rise to the usual standards of documentation and plausibility.
WHY REPLICATION MATTERS FOR ECONOMICS
Economics is unusual. In most experimental sciences, once you collect the data, the fact is there or not. If it’s in doubt, collect more data. Economics features large and sophisticated statistical analysis of non-experimental data. Collecting more data is often not an option, and not really the crux of the problem anyway. You have to sort through the given data in a hundred or more different ways to understand that a cause and effect result is really robust. Individual authors can do some of that — and referees tend to demand exhausting extra checks. But there really is no substitute for the social process by which many different authors, with different priors, play with the data and methods.
Economics is also unusual, in that the practice of redoing old experiments over and over, common in science, is rare in economics. When Ben Franklin stored lighting in a condenser, hundreds of other people went out to try it too, some discovering that it wasn’t the safest thing in the world. They did not just read about it and take it as truth. A big part of a physics education is to rerun classic experiments in the lab. Yet it is rare for anyone to redo — and question — classic empirical work in economics, even as a student.
Of course everything comes down to costs. If a result is important enough, you can go get the data, program everything up again, and see if it’s true. Even then, the question comes, if you can’t get x’s number, why not? It’s really hard to answer that question without x’s programs and data. But the whole thing is a whole lot less expensive and time consuming, and thus a whole lot more likely to happen, if you can use the author’s programs and data.
WHERE WE ARE
The American Economic Review has a strong data and programs disclosure policy. The JPE adopted the AER data policy. A good John Taylor blog post on replication and the history of the AER policy. The QJE has decided not to; I asked an editor about it and heard very sensible reasons. Here is a very good review article on data policies at journals by By Sven Vlaeminck
The AEA is running a survey about its journals, and asks some replication questions. If you’re an AEA member, you got it. Answer it. I added to mine, “if you care so much about replication, you should show you value it by routinely publishing replication articles.”
How is it working? The Report on the American Economic Review Data Availability Compliance Project
The quest for rules and censorship reflects a world-view that once we get procedures in place, then everything published in a journal will be correct. Of course, once stated, you know how silly that is. Most of what gets published is wrong. Journals are for communication. They should be invitations to replication, not carved in stone truths. Yes, peer-review sorts out a lot of complete garbage, but the balance of type 1 and type 2 errors will remain.
A few touchstones:
Mitch Petersen tallied up all papers in the top finance journals for 2001–2004. Out of 207 panel data papers, 42% made no correction at all for cross-sectional correlation of the errors. This is a fundamental error, that typically cuts standard errors by as much as a factor of 5 or more. If firm i had an unusually good year, it’s pretty likely firm j had a good year as well. Clearly, the empirical refereeing process is far from perfect, despite the endless rounds of revisions they typically ask for. (Nowadays the magic wand “cluster” is waved over the issue. Whether it’s being done right is a ripe topic for a similar investigation.)
“Why Most Published Research Findings are False” by John Ioannidis. Medicine, but relevant
A link on the controversy on replicability in psychology
There will be a workshop on replication and transparency in economic research following the ASSA meetings in San Francisco
I anticipate an interesting exchange in the comments. I especially more links to and summaries of existing writing on the subject
UPDATE
You must be logged in to post a comment.