When can you use "trend toward significance" in a research paper?

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

ChessMaster3000

Full Member
10+ Year Member
Joined
Mar 7, 2010
Messages
866
Reaction score
295
I am in the process of writing a manuscript and have calculated a statistic with a p value of 0.064. I have seen the phrase "trending toward statistical significance" before in various situations , and am wondering when/where this is an appropriate phrase. The comparison with a p value as noted above, while not meeting the threshold of 0.05 that we commonly associated with statistical significance, is pretty close and I would love to highlight that somehow. Does anyone know when I can use the phrase "trending toward significance"?

Members don't see this ad.
 
I think the concept is fairly bogus and used by non-statistically trained people. It suggest cherry picking of the data. Of course, the p<0.05 is an arbitrary concept too.

You are free to talk about a trend.

You may also want to re-think what statistical methods you are using. I would recommend talking to a real statistician. Some universities have drop in statistical consulting labs that are free for students. You may also want to cough up the money for an hour of a statistical consult.
 
Members don't see this ad :)
I am in the process of writing a manuscript and have calculated a statistic with a p value of 0.064. I have seen the phrase "trending toward statistical significance" before in various situations , and am wondering when/where this is an appropriate phrase. The comparison with a p value as noted above, while not meeting the threshold of 0.05 that we commonly associated with statistical significance, is pretty close and I would love to highlight that somehow. Does anyone know when I can use the phrase "trending toward significance"?

When there is a trend toward significance. 0.05 is arbitrary. The number of patients needed in a trial is not always accurate, so a 0.064 may have been a 0.05 if more subjects were included.
 
Yeah it's hard to say without knowing more. It's more than just a p=0.64. If you have an n=3497, it may just not be significant by normal standards. On the other hand, if you have n=47, your study may have been underpowered and that's why you didn't detect a difference. I feel like those are the cases where I've said something about underpowered yet appeared to be trending toward significance.

All that said, my biggest pet peeve when reviewing papers are people who don't know the difference between statistical significance and actual significance. I don't know what other kind of statistical test you were running, but do make sure that your result itself is meaningful aside from the p-value.
 
Pretend you began with a p = .10

Huehuehue
 
I am in the process of writing a manuscript and have calculated a statistic with a p value of 0.064. I have seen the phrase "trending toward statistical significance" before in various situations , and am wondering when/where this is an appropriate phrase. The comparison with a p value as noted above, while not meeting the threshold of 0.05 that we commonly associated with statistical significance, is pretty close and I would love to highlight that somehow. Does anyone know when I can use the phrase "trending toward significance"?
Present as-is. If the p value is 0.064, it's 0.064. It doesn't invalidate your findings, just indicates you need more values, and that's fine.
 
Just don't be surprised about questions from reviewers about a power analysis or sample size calculation.

Power calculations are only for designing or proposing research. It makes no sense to calculate it after the fact. Once you have ran your analysis you already know whether or not your saw a positive result.
 
Yeah it's hard to say without knowing more. It's more than just a p=0.64. If you have an n=3497, it may just not be significant by normal standards. On the other hand, if you have n=47, your study may have been underpowered and that's why you didn't detect a difference. I feel like those are the cases where I've said something about underpowered yet appeared to be trending toward significance.

All that said, my biggest pet peeve when reviewing papers are people who don't know the difference between statistical significance and actual significance. I don't know what other kind of statistical test you were running, but do make sure that your result itself is meaningful aside from the p-value.
Are you referring to clinical significance when you say "actual significance" ...or something else? Thanks.
 
I am in the process of writing a manuscript and have calculated a statistic with a p value of 0.064. I have seen the phrase "trending toward statistical significance" before in various situations , and am wondering when/where this is an appropriate phrase. The comparison with a p value as noted above, while not meeting the threshold of 0.05 that we commonly associated with statistical significance, is pretty close and I would love to highlight that somehow. Does anyone know when I can use the phrase "trending toward significance"?

Is this a study on human subjects, rodents, in vitro cultures from immortalized lines, in vitro from primary human cells, primary mouse cells? I think the threshold for variation in the groups is more accepted with human work, and the only time I have personally seen mentions of trends like the one you're proposing (but I am not a journal editor). I had this issue when working with wild mice.
 
Members don't see this ad :)
Are you referring to clinical significance when you say "actual significance" ...or something else? Thanks.

It's more like this: let's say you do a correlation and find r=.09 with a p=.03
Now, that is statistically significant but that is a tiny r value. This is where stats becomes more of an art as I would want to see the raw scatter plot to see what the shape of the data actually is. I would also want to see what sort of data we are even talking about. Maybe there are some fields where a tiny but statistically significant correlation means something but that again beomes a judgement call. My concern is that some people only see the p value without thing about what their actual result means.
 
It's more like this: let's say you do a correlation and find r=.09 with a p=.03
Now, that is statistically significant but that is a tiny r value. This is where stats becomes more of an art as I would want to see the raw scatter plot to see what the shape of the data actually is. I would also want to see what sort of data we are even talking about. Maybe there are some fields where a tiny but statistically significant correlation means something but that again beomes a judgement call. My concern is that some people only see the p value without thing about what their actual result means.
Ok, so do you expect the r value to be reported with every p value in a submitted manuscript?
 
It's not statistically significant. Trending towards significant is a bogus term and you know it or else you wouldn't have made this thread

I agree that it is bogus, but only slightly. If a p value is 0.064, then that means there is a 6.4%, rather than a 5%, chance that the result was obtained by random chance. It would be bogus if someone reviewing a paper didn't have an open mind about that. This is a simple retrospective study, not some phase III trial with a life-threatening intervention.
 
Is this a study on human subjects, rodents, in vitro cultures from immortalized lines, in vitro from primary human cells, primary mouse cells? I think the threshold for variation in the groups is more accepted with human work, and the only time I have personally seen mentions of trends like the one you're proposing (but I am not a journal editor). I had this issue when working with wild mice.
Splenomegastar (great name by the way), this is a retrospective human subject study. Sorry, I should have made that clear because youre right its different in different models.
 
On the other hand, if you have n=47, your study may have been underpowered and that's why you didn't detect a difference. I feel like those are the cases where I've said something about underpowered yet appeared to be trending toward significance.

Yeah this is the kind of thing where I have seen it used, especially in some educational studies.

Power calculations are only for designing or proposing research. It makes no sense to calculate it after the fact. Once you have ran your analysis you already know whether or not your saw a positive result.

Doesn't mean a reviewer won't ask about it.
 
Ok, so do you expect the r value to be reported with every p value in a submitted manuscript?
Uh, if you're doing a correlation or a regression then the r value is your actual result. The p only tells you if that result is statistically significant.

Someone else mentioned power and that is one of the most misunderstood concepts in stats. It matters after the fact only if your result is not significant because you want to know if this is a result of being underpowered ( and therefore you may have a type ii error ). If you have a p<.05 then power analysis is fairly meaningless because you are rejecting the null and can't possibly have a type ii error by definition.

As a reviewer I've never asked to see a power analysis for significant results. What I do ask for are confidence intervals. This is where small yet significant papers will show their weakness. It's not a reason to reject the manuscript usually, but needs to be there and be addressed accordingly.
 
One more thought I had for the OP:

If you've done a study and your ONLY result is not statistically significant, then that may be a sign you need to increase your sample size before submitting. If the rest of the study results are significant (or their lack of significance is still notable and worth publishing) and this is part of a subgroup analysis or something, then you're probably fine. You see this all the time in major clinical trials where they do all sort of post-hoc subgroup analysis and oftentimes these aren't quite significant because they are underpowered after paring down to a smaller sample size, but still worth noting.
 
A lot of people who say "trending toward statistical significance" have a fundamental misunderstanding of statistics.

I agree that it is bogus, but only slightly. If a p value is 0.064, then that means there is a 6.4%, rather than a 5%, chance that the result was obtained by random chance. It would be bogus if someone reviewing a paper didn't have an open mind about that. This is a simple retrospective study, not some phase III trial with a life-threatening intervention.

That is absolutely not what a p-value is. This is a very common misconception. The p-value tells you the the likelihood of getting the results you did assuming that the null hypothesis is true. Has nothing to do with random chance.

Wikipedia probably explains it in the simplest terms:

Calculating the p-value is based on the assumption that every finding is a fluke, that is, the product of chance alone. Thus, the probability that the result is due to chance is in fact unity.

Edit: As a side-note, I really think that med schools need to put more emphasis on statistics and study design. I would actually make the argument that it's one of the most important subjects to learn in med school. I'm hopeful that residency, at least, spends more time emphasizing this. You simply cannot accurately critique research without a good grasp of this material.
 
Last edited:
As a side-note, I really think that med schools need to put more emphasis on statistics and study design. I would actually make the argument that it's one of the most important subjects to learn in med school. I'm hopeful that residency, at least, spends more time emphasizing this. You simply cannot accurately critique research without a good grasp of this material.
QFT. This is absolutely the thing that I've found most disappointing about medical school curriculum so far (early into M2).
 
when your P value is between .05 and .10 and you have no other interesting things to report and have a report due.

it means nothing...
 
That is absolutely not what a p-value is. This is a very common misconception. The p-value tells you the the likelihood of getting the results you did assuming that the null hypothesis is true. Has nothing to do with random chance.

But p < 0.05 means "findings have a < 5% chance of being due to chance" when your null hypothesis is "there is no association between x and y", right? Because the only factor influencing findings in the absence of an association is chance, correct?

If not, then the distinction is going over my head.

As a side-note, I really think that med schools need to put more emphasis on statistics and study design. I would actually make the argument that it's one of the most important subjects to learn in med school. I'm hopeful that residency, at least, spends more time emphasizing this. You simply cannot accurately critique research without a good grasp of this material.

Absolutely. I had a single lecture on biostats in medical school, that was it.
 
Lets not forget that statistics is very much an art and there are few inviolable rules. No, p values don't exactly mean the probability of getting the results by chance, but that definition is good enough and simple enough to give people the gist. I find that when statisticians quibble about definitions, it often turns out to be a distinction without a difference.

Yes, schools absolutely need to teach stats better, but finding people who are both good at stats and capable of talking about them like a normal human being = difficult. Usually it's a PhD who gets up there and starts using words like "Gaussian, null, alpha, beta, rho, parametric, etc." and loses most of the class. I've been lucky to meet and work with biostatisticians who are good at teaching this stuff and they are worth their weight in gold.
 
QFT. This is absolutely the thing that I've found most disappointing about medical school curriculum so far (early into M2).

just wait until you get into the clinic. Like 90% of what you learn you will never use again.
 
Uh, if you're doing a correlation or a regression then the r value is your actual result. The p only tells you if that result is statistically significant.

Someone else mentioned power and that is one of the most misunderstood concepts in stats. It matters after the fact only if your result is not significant because you want to know if this is a result of being underpowered ( and therefore you may have a type ii error ). If you have a p<.05 then power analysis is fairly meaningless because you are rejecting the null and can't possibly have a type ii error by definition.

As a reviewer I've never asked to see a power analysis for significant results. What I do ask for are confidence intervals. This is where small yet significant papers will show their weakness. It's not a reason to reject the manuscript usually, but needs to be there and be addressed accordingly.

got it, thanks. It's not a correlation as far as I understand, but your perspective was very helpful, thanks. definitely am all over the confidence intervals!
 
A lot of people who say "trending toward statistical significance" have a fundamental misunderstanding of statistics.



That is absolutely not what a p-value is. This is a very common misconception. The p-value tells you the the likelihood of getting the results you did assuming that the null hypothesis is true. Has nothing to do with random chance.

Wikipedia probably explains it in the simplest terms:



Edit: As a side-note, I really think that med schools need to put more emphasis on statistics and study design. I would actually make the argument that it's one of the most important subjects to learn in med school. I'm hopeful that residency, at least, spends more time emphasizing this. You simply cannot accurately critique research without a good grasp of this material.

Totally agree--was thinking too fast on that one. Glad I can flesh these things out on an anonymous forum rather than in real life!
 
statistical significance calculation is the doom of modern medical research. it is on par with inquisition in terms of scientific advance impairment.
 
all the studies I have projected rely on leveraged data acquisition, so no statistical significance calculation is needed. Alas none of them have been approved.
 
The .05 cutoff widely used for p-values is totally arbitrary and a dichotomous summary of a study as "non-significant" or "significant" is a huge flaw in the classical (aka "frequentist") interpretation of studies. Fisher, who developed the p-value concept, only intended for p-values to be a continuous, pseudo-subjective indicator of strength of evidence. He would be horrified that cutoffs are now used to do hypothesis testing. Reporting your outcome statistic of interest (e.g. absolute risk reduction, hazard ratio, correlation coefficient, etc) with its confidence interval is a much better way of reporting results.

An even better approach is with Bayesian inference. The whole concept of "pre-test" and "post-test" probabilities for diagnostic testing is just Bayesian statistics with a different name. If someone with zero risk factors for HIV (low pre-test probability) tests positive on screening, the probability that they actually have HIV is still extremely low (i.e. low negative predictive value). Bayesian statistics is all about taking a "prior" probability and updating it with new data to calculate a "posterior" probability.

In terms of clinical trials, by the time we have a phase III trial, there are numerous previous phase I and II trials and usually some other phase III trials ( doesn't have to be a trial...really any related study). These provide ample data to form a prior probability of a treatment's effect. If past studies show an effect, then it is nonsensical to adopt a null hypothesis that there is no difference between it and placebo. It makes far more sense to adopt a prior probability of effect based on previous data, and update that probability with the results of YOUR study. In Bayesian statistics, a significant/non-significant dichotomy doesn't exist, so it is a far more intuitive and elegant way to interpret results. An effect size that is non-significant by traditional testing could actually increase your posterior probability of an effect.

Here is a Bayesian re-analysis of the GUSTO trial of thrombolytics in acute MI that was published in JAMA: http://www.ncbi.nlm.nih.gov/pubmed/7869558
 
The .05 cutoff widely used for p-values is totally arbitrary and a dichotomous summary of a study as "non-significant" or "significant" is a huge flaw in the classical (aka "frequentist") interpretation of studies. Fisher, who developed the p-value concept, only intended for p-values to be a continuous, pseudo-subjective indicator of strength of evidence. He would be horrified that cutoffs are now used to do hypothesis testing. Reporting your outcome statistic of interest (e.g. absolute risk reduction, hazard ratio, correlation coefficient, etc) with its confidence interval is a much better way of reporting results.

An even better approach is with Bayesian inference. The whole concept of "pre-test" and "post-test" probabilities for diagnostic testing is just Bayesian statistics with a different name. If someone with zero risk factors for HIV (low pre-test probability) tests positive on screening, the probability that they actually have HIV is still extremely low (i.e. low negative predictive value). Bayesian statistics is all about taking a "prior" probability and updating it with new data to calculate a "posterior" probability.

In terms of clinical trials, by the time we have a phase III trial, there are numerous previous phase I and II trials and usually some other phase III trials ( doesn't have to be a trial...really any related study). These provide ample data to form a prior probability of a treatment's effect. If past studies show an effect, then it is nonsensical to adopt a null hypothesis that there is no difference between it and placebo. It makes far more sense to adopt a prior probability of effect based on previous data, and update that probability with the results of YOUR study. In Bayesian statistics, a significant/non-significant dichotomy doesn't exist, so it is a far more intuitive and elegant way to interpret results. An effect size that is non-significant by traditional testing could actually increase your posterior probability of an effect.

Here is a Bayesian re-analysis of the GUSTO trial of thrombolytics in acute MI that was published in JAMA: http://www.ncbi.nlm.nih.gov/pubmed/7869558

But...what is the p-value then?
 
The .05 cutoff widely used for p-values is totally arbitrary and a dichotomous summary of a study as "non-significant" or "significant" is a huge flaw in the classical (aka "frequentist") interpretation of studies. Fisher, who developed the p-value concept, only intended for p-values to be a continuous, pseudo-subjective indicator of strength of evidence. He would be horrified that cutoffs are now used to do hypothesis testing. Reporting your outcome statistic of interest (e.g. absolute risk reduction, hazard ratio, correlation coefficient, etc) with its confidence interval is a much better way of reporting results.

An even better approach is with Bayesian inference. The whole concept of "pre-test" and "post-test" probabilities for diagnostic testing is just Bayesian statistics with a different name. If someone with zero risk factors for HIV (low pre-test probability) tests positive on screening, the probability that they actually have HIV is still extremely low (i.e. low negative predictive value). Bayesian statistics is all about taking a "prior" probability and updating it with new data to calculate a "posterior" probability.

In terms of clinical trials, by the time we have a phase III trial, there are numerous previous phase I and II trials and usually some other phase III trials ( doesn't have to be a trial...really any related study). These provide ample data to form a prior probability of a treatment's effect. If past studies show an effect, then it is nonsensical to adopt a null hypothesis that there is no difference between it and placebo. It makes far more sense to adopt a prior probability of effect based on previous data, and update that probability with the results of YOUR study. In Bayesian statistics, a significant/non-significant dichotomy doesn't exist, so it is a far more intuitive and elegant way to interpret results. An effect size that is non-significant by traditional testing could actually increase your posterior probability of an effect.

Here is a Bayesian re-analysis of the GUSTO trial of thrombolytics in acute MI that was published in JAMA: http://www.ncbi.nlm.nih.gov/pubmed/7869558

Preach it brother!

People not reporting confidence intervals is a huge pet peeve of mine as they tell you a lot more than a p value.
 
But p < 0.05 means "findings have a < 5% chance of being due to chance" when your null hypothesis is "there is no association between x and y", right? Because the only factor influencing findings in the absence of an association is chance, correct?

No.

For example, if p = 0.03, the probability of a type I error, assuming the null hypothesis is true, is 3%. The p-value does not tell you if a result was due to chance. At least, that's how I've learned it (though perhaps, I could be the one misunderstanding it!).

Like @vokey588 and @operaman state, though, p-values are not that great (especially in the context of how commonly people, including myself, misunderstand them). Confidence intervals are excellent and, like operaman, I hate seeing results without confidence intervals.

Lets not forget that statistics is very much an art and there are few inviolable rules. No, p values don't exactly mean the probability of getting the results by chance, but that definition is good enough and simple enough to give people the gist. I find that when statisticians quibble about definitions, it often turns out to be a distinction without a difference.

While I definitely agree with what you're getting at, we can probably say the same thing about of a lot of stuff that we learn in medicine. A common one I see on the wards, for example, is regarding COPD and the "hypoxic respiratory drive." It's inaccurate, since the change in PaCO2 levels we see in patients receiving O2 supplementation during a COPD exacerbation is, in reality, due to V/Q mismatch and the Haldane effect. But approaching it from the "hypoxic ventilatory drive" point of view doesn't drastically affect clinical management either.

Here's a quote I like from Evidence-Based Diagnosis regarding understanding p-values:

To those teachers and students who insist that one can get along just fine not really understanding what P-values mean, we would point out that you also can get along fairly well believing that the sun revolves around the earth. It is much more satisfying, however, to learn (and teach) what is right.

The general idea is that I really like understanding the mechanisms behind things. It makes it easier for me to learn the material and retain that information. It's more, like the quote puts it, a satisfaction issue. Like I said though, I think you're completely right that we can do just fine with a rough idea of what a p-value is, even if it's not entirely accurate. So, I've got nothing against that approach. It doesn't significantly affect practical aspects of clinical medicine.

Yes, schools absolutely need to teach stats better, but finding people who are both good at stats and capable of talking about them like a normal human being = difficult. Usually it's a PhD who gets up there and starts using words like "Gaussian, null, alpha, beta, rho, parametric, etc." and loses most of the class. I've been lucky to meet and work with biostatisticians who are good at teaching this stuff and they are worth their weight in gold.

I both agree and disagree. I agree that it's hard to find someone who can teach the subject well and teach us how it applies to clinical medicine. Part of it, I think does have to do with throwing those terms (Gaussian, null, alpha, beta, etc) at students too fast to digest. With that being said, I still think that schools should go out of their way to find a good teacher who can teach this topic well to med students. In my personal opinion, the only subjects in med school possibly more important than statistics and study design are probably physiology and pathology/pathophysiology. It sucks only getting 1 or a few lectures on statistics. I would prefer a journal club type situation where, during the course of the year, you go through some landmark papers while learning about statistics and study design. I think most students will be better able to understand and appreciate the information if it's presented slowly rather than everything tossed together into 1 lecture. I don't know if that makes sense or if I'm just rambling now. :laugh:
 
Last edited:
No.

For example, if p = 0.03, the probability of a type I error, assuming the null hypothesis is true, is 3%. The p-value does not tell you if a result was due to chance. At least, that's how I've learned it (though perhaps, I could be the one misunderstanding it!).

Like @vokey588 and @operaman state, though, p-values are not that great (especially in the context of how commonly people, including myself, misunderstand them). Confidence intervals are excellent and, like operaman, I hate seeing results without confidence intervals.



While I definitely agree with what you're getting at, we can probably say the same thing about of a lot of stuff that we learn in medicine. A common one I see on the wards, for example, is regarding COPD and the "hypoxic respiratory drive." It's inaccurate, since the change in PaCO2 levels we see in patients receiving O2 supplementation during a COPD exacerbation is, in reality, due to V/Q mismatch and the Haldane effect. But approaching it from the "hypoxic ventilatory drive" point of view doesn't drastically affect clinical management either.

Here's a quote I like from Evidence-Based Diagnosis regarding understanding p-values:



The general idea is that I really like understanding the mechanisms behind things. It makes it easier for me to learn the material and retain that information. It's more, like the quote puts it, a satisfaction issue. Like I said though, I think you're completely right that we can do just fine with a rough idea of what a p-value is, even if it's not entirely accurate. So, I've got nothing against that approach. It doesn't significantly affect practical aspects of clinical medicine.



I both agree and disagree. I agree that it's hard to find someone who can teach the subject well and teach us how it applies to clinical medicine. Part of it, I think does have to do with throwing those terms (Gaussian, null, alpha, beta, etc) at students too fast to digest. With that being said, I still think that schools should go out of their way to find a good teacher who can teach this topic well to med students. In my personal opinion, the only subjects in med school possibly more important than statistics and study design are probably physiology and pathology/pathophysiology. It sucks only getting 1 or a few lectures on statistics. I would prefer a journal club type situation where, during the course of the year, you go through some landmark papers while learning about statistics and study design. I think most students will be better able to understand and appreciate the information if it's presented slowly rather than everything tossed together into 1 lecture. I don't know if that makes sense or if I'm just rambling now. :laugh:

No, you're absolutely spot on about what p-values are. Maybe I'm just lazy and don't like trying to explain them to people who don't understand - the random chance thing gets them close enough in a short time. But I think you're right: we can do better and teach it right.

I think 3rd year has a lot of room to incorporate more journal-based stats and study analysis. I think of all the didactics we got that were nothing more than rehashing M1/M2 material and wish we could have replaced them with something new and interesting. Studying the landmark papers for each field would be a great way to kill two birds with one stone.
 
No.

For example, if p = 0.03, the probability of a type I error, assuming the null hypothesis is true, is 3%. The p-value does not tell you if a result was due to chance. At least, that's how I've learned it (though perhaps, I could be the one misunderstanding it!).

Like @vokey588 and @operaman state, though, p-values are not that great (especially in the context of how commonly people, including myself, misunderstand them). Confidence intervals are excellent and, like operaman, I hate seeing results without confidence intervals.



While I definitely agree with what you're getting at, we can probably say the same thing about of a lot of stuff that we learn in medicine. A common one I see on the wards, for example, is regarding COPD and the "hypoxic respiratory drive." It's inaccurate, since the change in PaCO2 levels we see in patients receiving O2 supplementation during a COPD exacerbation is, in reality, due to V/Q mismatch and the Haldane effect. But approaching it from the "hypoxic ventilatory drive" point of view doesn't drastically affect clinical management either.

Here's a quote I like from Evidence-Based Diagnosis regarding understanding p-values:



The general idea is that I really like understanding the mechanisms behind things. It makes it easier for me to learn the material and retain that information. It's more, like the quote puts it, a satisfaction issue. Like I said though, I think you're completely right that we can do just fine with a rough idea of what a p-value is, even if it's not entirely accurate. So, I've got nothing against that approach. It doesn't significantly affect practical aspects of clinical medicine.



I both agree and disagree. I agree that it's hard to find someone who can teach the subject well and teach us how it applies to clinical medicine. Part of it, I think does have to do with throwing those terms (Gaussian, null, alpha, beta, etc) at students too fast to digest. With that being said, I still think that schools should go out of their way to find a good teacher who can teach this topic well to med students. In my personal opinion, the only subjects in med school possibly more important than statistics and study design are probably physiology and pathology/pathophysiology. It sucks only getting 1 or a few lectures on statistics. I would prefer a journal club type situation where, during the course of the year, you go through some landmark papers while learning about statistics and study design. I think most students will be better able to understand and appreciate the information if it's presented slowly rather than everything tossed together into 1 lecture. I don't know if that makes sense or if I'm just rambling now. :laugh:

That isn't quite correct either. A p-value is the probability that, if the null hypothesis is true, you could get a result from your experiment at least as extreme as the result you observed. It does not represent the Type I error rate. The Type I error rate is always alpha (usually .05). You can demonstrate that with just a little code:

> p.values=data.frame(p=seq(1000)) #making a dummy data frame to store my p values
>
> for(i in seq(1000)){ ##this will repeat the experiment 1000 times
+ sample1=rnorm(1000,mean=10,sd=1) ##generate two random samples from two populations with the same mean and sd.
+ sample2=rnorm(1000,mean=10,sd=1)
+ p.values[i,1]=t.test(sample1,sample2)$p.value ##run a t test and save the p value generated by it.
+ }
> length(p.values[p.values$p<.05,1])/1000 #proportion of p values that are less than the alpha level.
[1] 0.049 ##Almost exactly 5% of the t tests generated a Type I error (p value less than .05).
> min(p.values$p) ##this is the smallest p value generated by the experiment
[1] 0.001635109 #if your experiment generated this p value, according to your definition you would have concluded the Type I error rate was 0.2%!


This long discussion is a perfect example of another flaw of "frequentist" (classical) statistical inference: the interpretation of the tests is totally non-intuitive and very difficult to grasp. In contrast, the interpretation of statistical tests of Bayesian inference are very intuitive. For example, you can precisely calculate the probability that a treatment is better than placebo (can't do that with classical stats). Also, the 95% CI in Bayesian statistics is interpreted as: There is a 95% chance that the "true" treatment effect size lies within this interval (the definition of 95% CI in classical stats is very confusing). Bayesian analysis is catching on in meta-analysis and I hope trial authors begin to incorporate into their publications.

Edit: I should add, that all of this isn't to say that study results presented with classical stats are garbage. If a p-value is really low (like <.001), there most likely IS a real effect. The problem is classical statistics leads many people to put too much weight on weakly significant p values (especially in the face of contrary past evidence) and OTOH discard study results if the p value is not significant, even when the prior probability of an effect is high.
 
Last edited:
That isn't quite correct either. A p-value is the probability that, if the null hypothesis is true, you could get a result from your experiment at least as extreme as the result you observed. It does not represent the Type I error rate. The Type I error rate is always alpha (usually .05). You can demonstrate that with just a little code:

> p.values=data.frame(p=seq(1000)) #making a dummy data frame to store my p values
>
> for(i in seq(1000)){ ##this will repeat the experiment 1000 times
+ sample1=rnorm(1000,mean=10,sd=1) ##generate two random samples from two populations with the same mean and sd.
+ sample2=rnorm(1000,mean=10,sd=1)
+ p.values[i,1]=t.test(sample1,sample2)$p.value ##run a t test and save the p value generated by it.
+ }
> length(p.values[p.values$p<.05,1])/1000 #proportion of p values that are less than the alpha level.
[1] 0.049 ##Almost exactly 5% of the t tests generated a Type I error (p value less than .05).
> min(p.values$p) ##this is the smallest p value generated by the experiment
[1] 0.001635109 #if your experiment generated this p value, according to your definition you would have concluded the Type I error rate was 0.2%!


This long discussion is a perfect example of another flaw of "frequentist" (classical) statistical inference: the interpretation of the tests is totally non-intuitive and very difficult to grasp. In contrast, the interpretation of statistical tests of Bayesian inference are very intuitive. For example, you can precisely calculate the probability that a treatment is better than placebo (can't do that with classical stats). Also, the 95% CI in Bayesian statistics is interpreted as: There is a 95% chance that the "true" treatment effect size lies within this interval (the definition of 95% CI in classical stats is very confusing). Bayesian analysis is catching on in meta-analysis and I hope trial authors begin to incorporate into their publications.

Edit: I should add, that all of this isn't to say that study results presented with classical stats are garbage. If a p-value is really low (like <.001), there most likely IS a real effect. The problem is classical statistics leads many people to put too much weight on weakly significant p values (especially in the face of contrary past evidence) and OTOH discard study results if the p value is not significant, even when the prior probability of an effect is high.

Good call! I confused myself enough to give an incorrect description. You're absolutely right that it's another misconception to equate the p-value to the type I error rate.

Also, completely agree that the 95% CI in Bayesian statistics is easier to understand than the same in frequentist inference. Ultimately, I think both frequentist and Bayesian inference can be useful in certain situations. However, I don't know enough to know when and where one tool is "better" than the other.

This has been a good thread. It reminds me how much more I need to learn about statistics and study design.
 
Power calculations are only for designing or proposing research. It makes no sense to calculate it after the fact. Once you have ran your analysis you already know whether or not your saw a positive result.
1) Yes. And reviewers will ask if it was calculated and what it showed.

2) I've also been asked multiple times by for post-hoc power analyses. It's a thing, regardless of what extra value it adds.
 
No.

For example, if p = 0.03, the probability of a type I error, assuming the null hypothesis is true, is 3%. The p-value does not tell you if a result was due to chance. At least, that's how I've learned it (though perhaps, I could be the one misunderstanding it!).

Like @vokey588 and @operaman state, though, p-values are not that great (especially in the context of how commonly people, including myself, misunderstand them). Confidence intervals are excellent and, like operaman, I hate seeing results without confidence intervals.



While I definitely agree with what you're getting at, we can probably say the same thing about of a lot of stuff that we learn in medicine. A common one I see on the wards, for example, is regarding COPD and the "hypoxic respiratory drive." It's inaccurate, since the change in PaCO2 levels we see in patients receiving O2 supplementation during a COPD exacerbation is, in reality, due to V/Q mismatch and the Haldane effect. But approaching it from the "hypoxic ventilatory drive" point of view doesn't drastically affect clinical management either.

Here's a quote I like from Evidence-Based Diagnosis regarding understanding p-values:



The general idea is that I really like understanding the mechanisms behind things. It makes it easier for me to learn the material and retain that information. It's more, like the quote puts it, a satisfaction issue. Like I said though, I think you're completely right that we can do just fine with a rough idea of what a p-value is, even if it's not entirely accurate. So, I've got nothing against that approach. It doesn't significantly affect practical aspects of clinical medicine.



I both agree and disagree. I agree that it's hard to find someone who can teach the subject well and teach us how it applies to clinical medicine. Part of it, I think does have to do with throwing those terms (Gaussian, null, alpha, beta, etc) at students too fast to digest. With that being said, I still think that schools should go out of their way to find a good teacher who can teach this topic well to med students. In my personal opinion, the only subjects in med school possibly more important than statistics and study design are probably physiology and pathology/pathophysiology. It sucks only getting 1 or a few lectures on statistics. I would prefer a journal club type situation where, during the course of the year, you go through some landmark papers while learning about statistics and study design. I think most students will be better able to understand and appreciate the information if it's presented slowly rather than everything tossed together into 1 lecture. I don't know if that makes sense or if I'm just rambling now. :laugh:

My institution has just such a program, though we had to apply for it and only a few people per year are selected (it funds a full year of research as well). The professor who leads the journal club-style seminars is a clinical research methodology guru and personally curated a set of 20 papers that give an excellent overview of fundamental statistics, tricks study authors use to make their results seem more impressive, special types of trials (non-inferiority, factorial), stopping rules for trials, creating good composite outcomes, selective reporting of outcomes, and several other topics. I pasted the list of references below. If you could read only one, I'd suggest the "HARLOT plc" paper by Sackett et al. It's satirical but lays out many of the big ways trial authors will try to trick you. If you want to read them all, I'd start from the bottom and go up.

References

[1] Dekkers Olaf M., Egger Matthias, Altman Douglas G., Vandenbroucke
Jan P.. Distinguishing case series from cohort studies. Annals of
internal medicine. 2012;156:37–40.
[2] Chan An-Wen W., Hróbjartsson Asbjørn, Haahr Mette T., Gøtzsche
Peter C., Altman Douglas G.. Empirical evidence for selective
reporting of outcomes in randomized trials: comparison of protocols to
published articles. JAMA. 2004;291:2457–2465.
[3] Mathieu Sylvain, Boutron Isabelle, Moher David, Altman Douglas G.,
Ravaud Philippe. Comparison of registered and published primary
outcomes in randomized controlled trials. JAMA. 2009;302:977–984.
[4] Goodman Steven N.. Stopping at nothing? Some dilemmas of data
monitoring in clinical trials. Annals of internal medicine.
2007;146:882–887.
[5] D’Agostino Ralph B., D’Agostino Ralph B.. Estimating treatment
effects using observational data. JAMA. 2007;297:314–316.

[6] Mueller Paul S., Montori Victor M., Bassler Dirk, Koenig Barbara A.,
Guyatt Gordon H.. Ethical issues in stopping randomized trials early
because of apparent benefit. Annals of internal medicine.
2007;146:878–881.
[7] Foster E. Michael. Propensity score matching: an illustrative analysis
of dose response. Medical care. 2003;41:1183–1192.
[8] Sackett David L., Oxman Andrew D., HARLOT plc . HARLOT plc:
an amalgamation of the world’s two oldest professions. BMJ (Clinical
research ed.). 2003;327:1442–1445.
[9] Morton Veronica, Torgerson David J.. Effect of regression to the mean
on decision making in health care. BMJ (Clinical research ed.).
2003;326:1083–1084.

[10] Freemantle Nick, Calvert Melanie, Wood John, Eastaugh Joanne,
Griffin Carl. Composite outcomes in randomized trials: greater
precision but with greater uncertainty? JAMA. 2003;289:2554–2559.
[11] Kaul Sanjay, Diamond George A.. Good enough: a primer on the
analysis and interpretation of noninferiority trials. Annals of internal
medicine. 2006;145:62–69.


[12] Spruance Spotswood L., Reid Julia E., Grace Michael, Samore
Matthew. Hazard ratio in clinical trials. Antimicrobial agents and
chemotherapy. 2004;48:2787–2792.
[13] McAlister Finlay A., Straus Sharon E., Sackett David L., Altman
Douglas G.. Analysis and reporting of factorial trials: a systematic
review. JAMA. 2003;289:2545–2553.

[14] Zhang J., Yu K. F.. What’s the relative risk? A method of correcting
the odds ratio in cohort studies of common outcomes. JAMA.
1998;280:1690–1691.
[15] Katz Mitchell H.. Multivariable analysis: a primer for readers of
medical research. Annals of internal medicine. 2003;138:644–650.
[16] Sterne J. A., Davey Smith G.. Sifting the evidence-what’s wrong with
significance tests? BMJ (Clinical research ed.). 2001;322:226–231.
[17] Guyatt G., Walter S., Shannon H., Cook D., Jaeschke R., Heddle N..
Basic statistics for clinicians: 4. Correlation and regression. CMAJ :
Canadian Medical Association journal = journal de l’Association
medicale canadienne. 1995;152:497–504.

[18] Jaeschke R., Guyatt G., Shannon H., Walter S., Cook D., Heddle N..
Basic statistics for clinicians: 3. Assessing the effects of treatment:
measures of association. CMAJ : Canadian Medical Association journal
= journal de l’Association medicale canadienne. 1995;152:351–357.
[19] Guyatt G., Jaeschke R., Heddle N., Cook D., Shannon H., Walter S..
Basic statistics for clinicians: 2. Interpreting study results: confidence
intervals. CMAJ : Canadian Medical Association journal = journal de
l’Association medicale canadienne. 1995;152:169–173.
[20] Guyatt G., Jaeschke R., Heddle N., Cook D., Shannon H., Walter S..
Basic statistics for clinicians: 1. Hypothesis testing. CMAJ : Canadian
Medical Association journal = journal de l’Association medicale
canadienne. 1995;152:27–32.
 
Top