Wondering How to Write a Pre-Analysis Plan? Simulate it.

[From the blog “Better pre-analysis plans through design declaration and diagnosis” by Graeme Blair, Jasper Cooper, Alexander Coppock, and Macartan Humphreys, posted at BITSS]
“Pre-analysis plans (PAPs) are used by researchers to register a set of analyses before they learn about outcomes. PAPs clarify which choices were made before observing outcomes and which were made afterwards. Ironically, the set of decisions that should be specified in a PAP is itself remarkably unclear.”
“We propose an approach that clarifies what might belong in a PAP by placing design declaration at the core of a pre-analysis plan. In particular, we propose that PAPs include, in addition to other details:”
– “A declaration (in code) of the features of a design.”
– “A design diagnosis that analyzes the properties of the design such as power, bias, or coverage via Monte Carlo simulations.”
“By “declaration” we mean a formal statement of the questions a design seeks to answer and the strategies for answering them. There should be enough information to simulate the implementation of the design.”
“In many cases, this means giving information for each of the elements in the “MIDA” (Model-Inquiry-Data Strategy-Answer Strategy) framework:”
“A model, M, of how the world works. We are used to specifying expected effect sizes and sample sizes for power calculations, but this leaves out many aspects of the design – the pre-treatment variables and their distributions, expected heterogeneity in effect sizes, the structure of clustering in the data, and other details consequential for assessments of power and other properties.”
“An inquiry, I, about the distribution of variables, perhaps given interventions on some variables. The average treatment effect or the average level of support for a political candidate are common inquiries.”
“A data strategy, D, is how you as the researcher intervene in the world to change outcomes and generate data. This includes case selection (or sampling decisions) and measurement, as well as interventions such as assignment of treatments.”
“An answer strategy, A, that uses data to generate an answer. The answer strategy includes not only regression models and other numerical summaries of the data, but also the tables and figures based on those models.”
“If all four of these elements are declared, then it can become possible to “run” the design and to see how well it answers questions.”
“As an illustration, we describe a design that seems well-defined, but for which diagnosis reveals that the question itself is poorly formed.”
“Designs can be declared in many coding languages. The DeclareDesign software packages streamlines this process for R users, as illustrated below.”
“But you don’t have to use our package to do this. To illustrate, we declare the design in three additional code languages: Stata, Python, and via formulae in Excel.”
To read more, click here.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: