The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant

This short paper caught my eye recently when scouring the internet for something interesting to (attempt to) explain clearly in my first blog post. When I initially read the title, I was a bit shocked as thoughts rushed through my mind such as “All of those modules where I learned about p-values and statistical significance never mentioned this fairly crucial fact!”. After a few breaths, I began to read it and of course, realised the paper is not discounting this widely-used method of determining the validity of a variable in a model but it is simply making known a common error often made when using this method for comparisons.

Introduction

This common statistical error comes about when comparisons are summarised by declarations of statistical significance and results are sharply distinguished between “significant” and “not significant”. The reason this is important is that changes in statistical significance are not themselves statistically significant. The significance level of a quantity can be changed largely by a small (non-significant) change in some statistical quantity such as a mean or regression coefficient.

Quick Example

As a simple example, say, we have run two independent studies in different areas to determine the number of days/nights people had spent inside in the last month when compared to the same month in 2019, i.e looking at the effect of lockdown/Covid-19 on the amount of days/nights a person spends inside. Now, say, we obtained effect estimates of 27 in study 1 and 12 in study 2 with respective standard errors of 12.5 and 12 . The first study would be statistically significant while the second would not. A naive conclusion to make here but one that might be tempting is to declare that there is a large difference between the two studies. Unfortunately, this difference is certainly not statistically significant with an estimated difference of 15 and standard error of \sqrt{12.5^2 + 12^2} = 17.3 .

In the paper, they also explain how it can be problematic to compare estimates with different levels of information. Say, there was another independent study conducted with a far larger sample size and the effect estimate obtained was 2.7 with a standard error of 1.2 . This study could now obtain the same significance level as in study 1 but the difference between the two is significant! If we focussed just on significance, we might say study 1 and 3 replicate each other but looking at the effect estimated, clearly this is not true.

This is dangerous as “significance” often aids decision making and conclusions could be made based on the first study while disregarding the second, when actually the two don’t differ significantly from one another. As the paper explains, one way of interpreting this lack of statistical significance is that further information might change the conclusion/decision.

Conclusion

In essence, the paper is urging one to err on the side of caution when interpreting significance. Essentially, comparing statistical significance levels is not a good idea and one should look at the significance of the difference and not the difference in significance.

I hope you found this post interesting. If you’d like to read the full paper, see the link below and feel free to leave a comment (even if just to say you never want to hear the word “significance” again!).

The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant – Andrew GELMAN and Hal STERN