Disastrous false positives.

Disastrous false positives.

Where's you significance now?.

Edward G. Robinson in The Ten Commandments

I do not have a statistical mind. I took, and dropped, statistics 3 times in college, once a year for 3 years. Once they got past the bell shaped curve my brain froze. My undergrad degree was physics, so it wasn't the math, it was the concepts. So I am not the best writer on the topic.

Others do it better, although the problem is that I cannot retain the information past reading the article, try as I might. However, might I suggest the excellent On the hazards of significance testing over at DC's Improbable Science. DC is David Colquhoun, not the comics, a Professor of Pharmacology and an excellent source for information and education.


I am going to quote him a length, but I advise you wander over to DC's Improbable Science and read the original.

Over at Science Based Medicine I wrote 5 out of 4 Americans Do Not Understand Statistics about issues of the p value and what constitutes significance. There are ongoing arguments that a significant p value should not be 0.05 but 0.005. One of the papers I discussed has 12 misconceptions with the p value, one of which is "If P = .05, the null hypothesis has only a 5% chance of being true."

Dr. Colquhoun in part continues in the vein:

It’s very common for people to claim that an effect is real, not just chance, whenever the test produces a P value of less than 0.05, and when asked, it’s common for people to think that this procedure gives them a chance of 1 in 20 of making a fool of themselves. Leaving aside that this seems rather too often to make a fool of yourself, this interpretation is simply wrong.

The purpose of this post is to justify the following proposition.

If you observe a P value close to 0.05, your false discovery rate will not be 5%. It will be at least 30% and it could easily be 80% for small studies.

Which renders most, if not all, clinical studies of pseudo-medicines as false.

The argument just presented should be quite enough to convince you that significance testing, as commonly practised, will lead to disastrous numbers of false positives.

I think that defines the literature of acupuncture etc quite nicely. Disastrous false positives.

He also suggests

If you want to avoid making a fool of yourself most of the time, don’t regard anything bigger than P < 0.001 as a demonstration that you’ve discovered something. Or, slightly less stringently, use a three-sigma rule.

which is

The three sigma rule means using P= 0.0027 as your cut off. This, according to Berger’s rule, implies a false discovery rate of (at least) 4.5%, not far from the value that many people mistakenly think is achieved by using P = 0.05 as a criterion.

He points out that this is in a perfect world with perfect experiments.

All of the problems discussed so far concern the near-ideal case. They assume that your sample size is big enough (power about 0.8 say) and that all of the assumptions made in the test are true, that there is no bias or cheating and that no negative results are suppressed. The real-life problems can only be worse.


So if a result is significant, ie there is a discovery of a real effect and the results are not due to the authors making a fool of themselves, a p of 0.0027 is the bare minimum, not 0.05.  

It is no wonder that much of the positive medical literature is not true and probably all the pseudo-medical literature. The significant statistics are not and suggest so often that for pseudo-medicines there is no there there.

But I do not do Dr. Colquhoun justice. Read the original and his blog. I'll read it again and again and one day it may stick.

Points of Interest 3/26/2014
Points of Interest: 3/25/2014