There has been a huge amount of attention focused on “open data.” A casual reading of the blogosphere is that Open Data is good, Secret Data is bad.
Remarkably, there has been very little discussion given to the property right issues associated with open data. The Open Data Movement wants to turn a private good (datasets) into a public good. Economists know something about public goods. They tend to get under-produced. This introduces a trade-off between the propagation of data for use by multiple researchers, a social good (though see here for a discussion where this is not necessarily so), versus the disincentive this causes for producing data, a social bad. How best to make this trade-off is unclear.
In a recent blog entitled “Open data, authorship, and the early career scientist”, MARGARET KOSMALA, a postdoctoral fellow at Harvard University, argues that making one’s data available to others hurts the data-producing scholar, particularly younger scholars. The argument is not so much that the data-producing scholar will be scooped by other scientists on the associated research. Rather, it is that subsequent research projects that could have resulted in publications for the data-producing scholar will end up being undertaken by other scientists. And while Kosmala does not make this point explicitly, this serves as a disincentive for scientists to produce data, if only because younger scholars may not be able to produce sufficient publications to get the funding and tenure they need to continue their careers.
What is really interesting about this blog is that it led to a discussion between a reader and the author about the ethics of “requiring co-authorship” when authors use data produced by another scientist. Missing from the discussion was the recognition that “requiring co-authorship” provides a potential solution to the problem of open data. It is a way for the data-producing scientist to reap the rewards of data production, while still allowing other authors to use it.
Of course, there are issues associated with implementing a policy like this. Once data are released, how will the data-producing author be able to ensure that others who use the data will extend co-authorship to him/her? And suppose the data-producing author does not wish to have their data used in a certain way. Should they have the right to restrict its use? While the answers are debatable, the questions are illuminating, because they make us realise that the debate over open data is just another application of the larger subject of intellectual property rights.