Replication Failures: The Plot Thickens

[Excerpts taken from the working paper “Replicator Degrees of Freedom Allow Publication of Misleading “Failures to Replicate” by Christopher Bryan, David Yeager, and Joseph O’Brien, posted at SSRN]
“…using data from an ongoing debate, we show that commonly-exercised flexibility at the experimental design and data analysis stages of replication testing can make it appear that a finding was not replicated when, in fact, it was.”
“The present analysis is important, in part, because it provides the sort of direct demonstration that has the potential to spur change.”
“We focus here on the debate about whether a subtle manipulation of language—referring to voting in an upcoming election with a predicate noun (e.g., “to be a voter”) vs. a verb (e.g., “to vote”)—can increase voter turnout.”
“A preliminary analysis of the data from just the day before the election revealed that many of the most obvious model specifications yielded significant replications of the original noun-vs.-verb effect.”
“A closer examination of the analyses reported by Gerber and colleagues (35) in support of their claim of non-replication revealed that the replicating authors chose to include three features…that in combination are known to increase the risk of misleading results.”
“…study results often hinge on data analytic decisions about which reasonable and competent researchers can disagree…we employed an analytical approach that is expressly designed to provide a comprehensive assessment of whether study data support an empirical conclusion when the influence of arbitrary researcher decisions on results is minimized.”
“The primary statistical approach we employ, called “Specification-Curve Analysis,” involves running all reasonable model specifications (i.e., ones that are consistent with the relevant hypothesis, expected to be statistically valid, and are not redundant with other specifications in the set…”
“An associated significance test for the specification curve, called a “permutation test,” quantifies how strongly the specification curve as a whole (i.e., all reasonable model specifications, taken together) supports rejecting the null hypothesis.”
“The results … make clear that noun wording had a significant effect on turnout overall…But the specification curve results also strongly suggest that the replicating authors’ data analysis choices might not be the only replicator degree of freedom influencing results. Rather, a design-stage degree of freedom exercised by the replicating authors, regarding the window of time in which the study was conducted, may have further driven the treatment effect estimate downward…”
“Perhaps the clearest, most concrete implication of the present analysis is that specification-curve analysis should be standard practice in replication testing.”
To read the full paper, click here.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: