[From the blog “Power to the Plan” by Clare Leaver, Owen Ozier, Pieter Serneels, and Andrew Zeitlin, posted at BITSS]
“…Our blinded pre-analytical work uncovered two decision margins that could deliver substantial increases in power: changing test statistics used and putting structure on a model for error terms. Because the value of these decisions depends on things that are hard to know ex ante — even using baseline data — they create a case for blinded analysis of endline outcomes. We argue that there are circumstances in which this can be done without risk of p-hacking, and in which the power gains from these decision margins are substantial.”
“…Kolmogorov-Smirnov (KS) tests can be better powered than OLS t-tests by a factor of four, even under additive treatment effects.”
“…Remember how machine learning is a way of getting a better fit using observables? Imposing structure on error terms is a way of getting a better fit on the *unobserved* sources of variation. That structure can take many forms: it can relate to the correlations between units, the distribution of residuals (normal? pareto?), or both. Imbens and Rubin (2015, p. 68) observe that test statistics derived from structural estimates — for example, expressly modeling the error term — can improve power to the extent that they represent a “good descriptive approximation” to the data generating process. Blinded endline data allowed us to learn about the quality of such approximations, with substantial consequences.”
“In our setting, when we turned to look at effects on student outcomes, we intended to use a linear model … But there were still a number of potential correlations to consider: some students are observed at more than one point in time; each student has multiple teachers, and schools may have both incumbents and teachers recruited under a variety of contract expectations. Linear mixed-effects (LME) models provide an avenue for implementing this.”
“Our LME model, which assumes normally distributed error terms that include a common shock at the pupil level, delivers an estimator of the effect of interest that has a standard deviation as much as 30 percent smaller than the equivalent OLS estimator. Because normality is a reasonable approximation to these error terms, the structure of LME allows it to outperform traditional random-effects. The gains from LME are conceptually comparable to an increase of 70 percent in sample size.”
“…Endline data are often far from normal and correlation structures across units are hard to know ex ante. A blinded endline approach can be a useful substitute for tools like DeclareDesignin cases where baseline data, or a realistic basis for simulating the endline data-generating process, are not available.
“There is broad consensus that well-powered studies are important, not least because they make null results more informative. Consequently, researchers invest a lot in statistical power. Our recent experience suggests that blinded analyses — whether based on pooled or partial endline data — can be a useful tool to make informed choices of models and test statistics that improve power.”