[From the blog, “Gazing into the Abyss of P-Hacking: HARKing vs. Optional Stopping” by Angelika Stefan and Felix Schönbrodt, posted at Felix Schönbrodt’s website at http://www.nicebread.de]
“Now, what does a researcher do when confronted with messy, non-significant results? According to several much-cited studies (for example John et al., 2012; Simmons et al., 2011), a common reaction is to start sampling again (and again, and again, …) in the hope that a somewhat larger sample size can boost significance. Another reaction is to wildly conduct hypothesis tests on the existing sample until at least one of them becomes significant (see for example: Simmons et al., 2011; Kerr, 1998). These practices, along with some others, are commonly known as p-hacking, because they are designed to drag the famous p-value right below the mark of .05 which usually indicates statistical significance. Undisputedly, p-hacking works (for a demonstration try out the p-hacker app).
“… P-Hacking exploits alpha error accumulation and fosters the publication of false positive results which is bad for science. However, we want to take a closer look at how bad it really is. In fact, some p-hacking techniques are worse than others (or, if you like the unscrupulous science villain perspective: some p-hacking techniques work better than others).”
“As a showcase, we want to introduce two researchers: The HARKer takes existing data and conducts multiple independent hypothesis tests (based on multiple uncorrelated variables in the data set) with the goal to publish the ones that become significant.”
“… the Accumulator uses optional stopping. This means that he collects data for a single research question test in a sequential manner until either statistical significance or a maximum sample size is reached. “
“… To conclude, we have shown how two p-hacking techniques work and why their application is bad for science. We found out that p-hacking techniques based on multiple testing typically end up with higher rates of false positive results than p-hacking techniques based on optional stopping, if we assume the same number of hypothesis tests.”