More On Statistics

More On Statistics

Yes I am aware that when applied to me there first two words in the title of this entry should be pronounced as one word. That does not keep me from trying and there was a nice paper entitled The faulty statistics of complementary alternative medicine (CAM).

The discussion is that of how to interpret a p value. As they note,

One common error is interpreting the p-value as the probability of H0 (the null hypothesis) given the data i.e. Pr(H0|data). For example, in a clinical trial comparing a treated group with a control group, a p-value of 0.01 is not a 0.01 probability of H0 being true but, as mentioned, the probability of obtaining the same result repeating the experiment given that H0 is true. If we infer this we commit a logical fallacy called “fallacy of the transposed conditional.”

So a p value is a measurement not of the probably that an event is true, but the probably it could be reproduced given the exact same experimental conditions.

They point out the value of a Bayesian approach, espcially as it concerns the interpretation of diagnostics tests. The problem is in clinical trials we may not be able to quantify prior plausibility except for CAM:

Possibly, the most important criterion of credibility is conformity to science. In this respect CAM interventions, as those proposed by homeopathy, acupuncture, iridology, Bach Flower Remedies, Ayurveda, anthroposophy, etc., have a very low prior probability of specific efficacy because their asserted modes of action imply a violation of basic science, anatomy and physiology (besides the rules of common sense).

Another important requisite of validity is “falsifiability”, according to which science differs from pseudoscience in that it aims at the production of falsifiable hypotheses. For example, acupuncture is not falsifiable since there is currently no satisfactory simulated procedure to serve as a control for blinded clinical studies.

There are conversions between p values and Bayes and they point out, as others have, that 0.01 is not likely to be significant, especially for interventions with little prior plausibility. It is another argument to suggest that 0.001 is a better standard for 'significant' in biomedical studies, that the effect measured is real.

As a rule of thumb, assuming a “neutral” attitude towards the null hypothesis (odds = 1:1), a p-value of 0.01 or, better, 0.001 should suffice to give a satisfactory posterior probability of 0.035 and 0.005 respectively.

and they conclude

Since the achievement of meaningful statistical significance as a rule is an essential step in the validation of medical interventions, unless some authentic scientific support to complementary alternative medicines is in the meantime provided, we have to conclude that these practices cannot be considered as evidence-based.

Even outside of pseudo-medicines I am skeptical that a p of 0.01 is meaningful for an intervention, especially if there is overlap in the confidence intervals.

Clinical trials are remarkably complex and there numerous opportunities to generate statistically significant false positive results, even for outcomes every bit as ludicrous as homeopathy or reiki:

In Study 2, we sought to conceptually replicate and extend Study 1. Having demonstrated that listening to a children’s song makes people feel older, Study 2 investigated whether listening to a song about older age makes people actually younger An ANCOVA revealed the predicted effect: According to their birth dates, people were nearly a year-and-a-half younger after listening to “When I’m Sixty-Four” (adjusted M = 20.1 years) rather than to “Kalimba” (adjusted M = 21.5 years), F(1, 17) = 4.92, p = .040.

The is p is indeed < 0.05. Statistically significant, practically ridiculous.

Compare and Contrast: MDs v. NDs practicing withou...
Points of Interest: 06/27/2014