Pre-Registration as a Severe Testing Device
[Excerpts are taken from the preprint, “The Value of Preregistration for Psychological Science: A Conceptual Analysis” by Daniël Lakens, posted at PsyArXiv Preprints]
What is Preregistration For?
“If the only goal of a researcher is to prevent bias, it suffices to verbally agree upon the planned analysis with collaborators as long as everyone will perfectly remember the agreed upon analysis. In the conceptual analysis presented here, researchers preregister to allow future readers of the preregistration (which might include the researchers themselves) to evaluate whether the research question was tested in a way that could have falsified the prediction.”
“Mayo (1996) carefully develops arguments for the role that prediction plays in science and arrives at an error statistical philosophy based on a severity requirement.”
Severe Tests
“A test is severe when it is highly capable of demonstrating a claim is false.”
“Figure 1A visualizes a null hypothesis test, where only one specific state of the world (namely an effect of exactly zero) will falsify our prediction. All other possible states of the world are in line with our prediction.”
“Figure 1B represents a one-sided null-hypothesis test, where differences larger than zero are predicted, and the prediction is falsified when the difference is either equal to zero, or smaller than zero. This prediction is slightly riskier than a two-sided test, in that there are more ways in which our prediction could be wrong, because 50% of all possible outcomes falsify the prediction, and 50% corroborate it.”
“Finally, Figure 1C visualized a range prediction where only differences between 0.5 and 2.5 support the prediction. Since there are many more ways this prediction could be wrong, it is an even more severe test.”
“If we observe a difference of 1.5, with a 95% confidence interval from 1 to 2, all three predictions are confirmed with an alpha level of 0.05, but the prediction in Figure 1C has passed the most severe test since it was confirmed in a test that had a higher capac-ity of demonstrating the prediction is false. Note that the three tests differ in severity even when they are tested with the same Type 1 error rate.”
“As far as I am aware, Mayo’s severity argument currently provides one of the few philosophies of science that allows for a coherent conceptual analysis of the value of preregistration.”
Examples of Practices that Reduce the Severity of Tests
“One example of such a practice is optional stopping, where researchers collect data, analyze their data, and continue the data collection only if the result is not statistically significant. In theory, a researcher who is willing to continue collecting data indefinitely will always observe a statistically significant result. By repeatedly looking at the data, the Type 1 error rate can inflate to 100%. In this extreme case the prediction can no longer be falsified, and the test has no severity.”
“The severity of a test can also be compromised by selecting a hypothesis based on the observed results. In this practice, known as Hypothesizing After the Results are Known (HARKing, Kerr, 1998) researchers look at their data, and then select a prediction. This reversal of the typical hypothesis testing procedure makes the test incapable of demonstrating the claim was false.”
“As a final example…think about the scenario [where a] researcher makes multitudes of observations and selects out of all these tests only those that support their prediction. Choosing to selectively report tests from among many tests that were performed strongly reduces the capability of a test to demonstrate the claim was false.”
“…a preregistration document should give us all the information that allows future readers to evaluate the severity of the test. This includes the theoretical and empirical basis for predictions, the experimental design, the materials, and the analysis code. Having access to this information should allow readers to see whether any choices were made during the research process that reduced the severity of a test.”
“Researchers should also specify when they will conclude their prediction is not supported. As De Groot (1969) writes: ‘The author of a theory should himself state…what potential outcomes would, if actually found, lead him to regard his theory as disproven.’”
Preregistration Makes it Possible to Evaluate the Severity of a Test
“The severity of a test could in theory be unrelated to whether it is preregistered. However, in practice there will almost always be a correlation between the ability to transparently evaluate the severity of a test and preregistration, both because researchers can often selectively report results, use optional stopping, or come up with a plausible hypothesis after the results are known, and because theories rarely completely constrain the test of predictions.”
“As this conceptual analysis of preregistration makes clear, the practice of specifying the design, data collection, and planned analyses in advance is based on a philosophy of science that values tests of predictions and puts more trust in claims that have passed severe tests (Lakatos, 1978; Mayo, 2018; Meehl, 1990; Platt, 1964; Popper, 1959).”
To read the article, click here.
Like this:
Like Loading...
You must be logged in to post a comment.