Biostats for the Non-Biostatistician
Why Do We Compute p-values?
SO—I know you think I know EVERYTHING and that I’m the SMARTEST person you have EVER known…BUT…I’m here to break some sad news: I do not know everything and my consultants are really the ones that know everything and are the smartest people you will ever know (except for yourself of course)!
We all have our niche, right? For those that have worked with McCormick LifeScience Consultants, LLC, you know we’re a team of professionals having a wide-range of expertise in the Pharma/Biotech/Medical Device Industry. In some of my newsletters, I will be including some of my team’s insight and expertise across various topics. Although I’m not an expert biostatistician, I can tell you that without biostatistical analysis, our nonclinical and clinical data mean diddly squat! Hence why I hire people like Allen Fleishman Ph.D., PStat® to work with my clients.
So please…without further ado, let me introduce to you, part of the McCormick LifeScience Team, Allen I. Fleishman, Ph.D., PStat®, a professional biostatistician for the pharma/biologic/med device industry (having only [haha] 30 years experience). Allen has worked in large and small companies, CROs, and is currently a private statistical consultant.
Allen: please save my friends and colleagues from my biostats lesson and take it from here! 😉
(The following is an abbreviation of Allen’s second blog found here: http://allenfleishmanbiostatistics.com/Articles/; the first, third and fourth are also alluded to)
Believe it or not, most statisticians feel that the p-value, which tests the null hypothesis, is a near meaningless concept. This is based on:
- The likelihood that a difference between two different treatments will be exactly zero is zero. Theoretically, the Ho cannot be true.
- Scientists do everything in their power so the difference will never be zero. Scientifically, the Ho should not be true.
- With any non-zero true difference, a large enough study will reject the Ho. Practically, the Ho will not be true.
- We can never accept the Ho, we can only reject it. Philosophically, the Ho is never true.
So, why do we compute the p-value of such an unbelievable theory (Ho)? Three reasons: tradition, a primarily-false belief that p-values indicate the importance of an effect, and ‘something’ happened.
Journals, our colleagues, regulatory agencies require we present p-values. For example, for the FDA, any difference, no matter how small, would grant approval.
One of the biggest blunders made by non-statisticians is that if the p is < 0.05, then the results are significant or meaningful. Some make the even worse error that if the p is > 0.05, the treatment wasn’t useful.
Statistical significance has little to do with clinical significance. An effect which was unable to achieve statistical significance will need more information to demonstrate clinical significance, although the magnitude of the ns effect might have the potential to be quite clinically meaningful.
The reason statisticians still feel justified in providing their clients with p-values, is that they know that if the p-value is < 0.05, they can be certain the difference favors the treatment. Is it better by a millimeter, a mile? The p-value alone cannot answer that question.
p-values only indirectly measure clinical importance. Let us assume that we are doing an analysis in the same exact way for a variety of dependent variables. Let me further assume that the N’s are identical for all the dependent variables. Then parameters which are statistically significant have larger (relative) mean differences in comparison to the non-statistically significant parameters. If a t-test were significant and another wasn’t, it would mean that the (Mean1 – Mean2)/s is larger. In other words, how many standard deviations different are the two treatments, the ‘effect size’. Let me rephrase this, within a study, if one dependent variable has a larger (e.g., statistically significant) t-test relative to another parameter, then the effect size is larger. It is the effect size which indicates clinical significance, not the p-value.
If one study of 100,000 patients had a 0.003 p-value and a second study of 10 patients had the 0.04 p-value, then the ten patient study indicated a larger effect size. If I were investing in one of the two companies, I’d invest in the company who had the N=10 0.04 p-value, not the N=100,000 0.003 p-value.
- If you’re serious about looking at the effects of your studies, the best thing to look at is the above ‘effect size’.
- As an alternative to p-values, I strongly suggest you present and focus on confidence intervals. The lower end answers the question ‘Can the effect be zero?’ This gives identical results to a significant or non-significant p-value, except it also tells you by how much.
- If people intuitively understand the dependent variable, the upper end of the confidence interval answers the question ‘How clinically important might the effect be?’
- If people don’t intuitively understand the dependent variable, then one can get confidence intervals for the unit free ‘effect size’ or related magnitude of effects.