To make sense of variability in their data, biologists use statistical tests to determine whether experimental groups are significantly different from one another. The ability to determine statistical significance is a cornerstone of all biological research, and yet biologists of all ages often make fundamental errors in the ways they perform statistical comparisons. One of the most common mistakes is to conclude that two effects differ from one another when one group is significantly different from a control, but the other group is not. Sander Nieuwenhuis, Birte Forstmann and Eric-Jan Wagenmakers at Leiden University recently set out to quantify how often this mistake is made in neuroscience articles published in the world's leading scientific journals. They published their work in a recent edition of Nature Neuroscience.

Nieuwenhuis and colleagues first explain in detail the mistake itself. It tends to happen when neuroscientists want to claim that one effect is bigger or smaller than another effect compared with control data. To do this, they simply report that one effect is statistically significantly different from controls (i.e. there is a 95% probability that the effect has not arisen by chance, P<0.05), while another is not (P>0.05). On the surface, this sounds reasonable, but it is flawed because it doesn't say anything about how different the two effects are from one another. To do this, researchers need to separately test for a significant interaction between the two results in question. Nieuwenhuis and his co-workers sum up the solution concisely: ‘...researchers need to report the statistical significance of their difference rather than the difference between their significance levels.’

The team had an impression that this type of error was widespread in the neuroscience community. To test this idea, they went hunting for ‘difference of significance’ errors in neuroscience articles published in five very prestigious journals (Nature, Science, Nature Neuroscience, Neuron, Journal of Neuroscience). In total, they ended up evaluating the statistical tests used in over 500 neuroscience papers. They found that 31% of behavioural, systems, and cognitive studies contain situations where authors could potentially make an error. In half of these cases, authors made the mistake of not reporting the significance of differences within their data. The team then went on to look at cellular and molecular neuroscience articles published in Nature Neuroscience in 2009 and 2010. Incredibly, out of 120 articles sampled, not a single publication used correct procedures to compare effect sizes. At least 25 papers erroneously compared significance levels either implicitly or explicitly.

The work of Nieuwenhuis, Forstmann and Wagenmakers is a sobering self-evaluation. It shows clearly that a large number of neuroscientists at the very highest levels make basic errors in the way they statistically analyse data. To be fair, the group points out that many of the mistakes they found probably don't invalidate the main conclusions of the publications they examined. But this should not be seen as a reason for neuroscientists to be lax about ‘stats’. Clearly, the community has a responsibility to make sure that all the conclusions they put in print are backed up by valid statistics. After all, small mistakes eventually add up.

Nieuwenhuis
S.
,
Forstmann
B. U.
,
Wagenmakers
E. J.
(
2011
).
Erroneous analyses of interactions in neuroscience: a problem of significance
.
Nat. Neurosci.
14
,
1105
-
1107
.