Experts Are Not Know-It-Alls (Experimentally Speaking)

[From the paper “Stability of experimental results: Forecasts and evidence” by Stefano DellaVigna and Devin Pope, an NBER Working Paper]
“How robust are experimental results to changes in design? And can researchers anticipate which changes matter most? We consider a specific context, a real-effort task with multiple behavioral treatments, and examine the stability along six dimensions: (i) pure replication; (ii) demographics; (iii) geography and culture; (iv) the task; (v) the output measure; (vi) the presence of a consent form.”
“The initial task is a typing task employed in DellaVigna and Pope (2018a,b): subjects on MTurk have 10 minutes to alternatively press the `a’ and `b’ buttons on their keyboards as quickly as possible.”
“We build on this experiment by considering several design variants, covering the six dimensions above and collecting data on nearly 10,000 new MTurk subjects. In each variant we include 15 of the original treatments, following a pre-registered design.”
“Moving from one design to the next, we are interested in the stability of the  findings on effort for the 15 treatments. But what is the right metric of stability? … We use rank-order correlation across the treatments as measure of stability…”
“Having identified the design changes and the measure of stability, following DellaVigna and Pope (2018b) we collect forecasts. We contact 70 behavioral experts or experts on replication, yielding 55 responses. Each expert sees a description of the task, of the design changes, and an illustration of how rank-order correlation works; whenever possible, we also provide information on the full-stability benchmark. The experts then forecast the rank-order correlation for 10 design changes.”
“… the experts have at best a mixed record in their ability to predict how much design changes affect the results. This contrasts with recent evidence that experts are able to predict quite accurately replication in pure-replication studies (Dreber et al., 2015; Camerer et al., 2016) … This confirms the anecdotal impression that design choices are a somewhat unpredictable part of the experimenter toolbox, and suggests that external validity judgments may be more tentative than we realize.”
To read the paper, click here.  (NOTE: paper is behind a paywall.)

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: