IN THE NEWS: Nature (February 21, 2018)
[From the article, “How to make replication the norm” published by Paul Gertler, Sebastian Galiani and Mauricio Romero in Nature]
“To see how often the posted data and code could readily replicate original results, we attempted to recreate the tables and figures of a number of papers using the code and data provided by authors. Of 415 articles published in 9 leading economics journals in May 2016, 203 were empirical papers that did not contain proprietary or otherwise restricted data. We checked these to see which sorts of files were downloadable and spent up to four hours per paper trying to execute the code to replicate the results (not including code runtime).”
“We were able to replicate only a small minority of these papers. Overall, of the 203 studies, 76% published at least one of the 4 files required for replication: the raw data used in the study (32%); the final estimation data set produced after data cleaning and variable manipulation (60%); the data-manipulation code used to convert the raw data to the estimation data (42%, but only 16% had both raw data and usable code that ran); and the estimation code used to produce the final tables and figures (72%).”
“The estimation code was the file most frequently provided. But it ran in only 40% of these cases. We were able to produce final tables and figures from estimation data in only 37% of the studies analysed. And in only 14% of 203 studies could we do the same starting from the raw data (see ‘Replication rarely possible’).”
This invited comment to Nature that already had to be corrected once is not replicable itself as large parts of the data are not shared although the Berkeley Initiative that also funded the research officially has a – though vague – policy on replicability and the policy of the journal Nature says “authors are required to make materials, data, code, and associated protocols promptly available to readers without undue qualifications”, and “Nature Research titles will be required to include information on whether and how others can access the underlying data.”
Several flaws of the study like lacking citations to relevant previous literature, undocumented ad hoc data collection, an intransparently and simply wrongly restricted sample of replications as well as a lack of impartiality of the authors are described in detail here: https://ideas.repec.org/p/stg/wpaper/2020_02.html
LikeLike