Biostats - First Aid - Power

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

Phloston

Osaka, Japan
Removed
Lifetime Donor
10+ Year Member
Joined
Jan 18, 2012
Messages
3,880
Reaction score
1,676
FA2012, on the top of p. 58, says that power "depends on:

1) total number of endpoints experienced by population

2) difference in compliance between treatment groups (differences in the mean values between groups)

3) size of expected effect"


Could someone please explain what you think each of those three points means?

I have my own interpretation, but I would benefit from someone else's perspective.


I'm guessing that, for #1, as the total number of endpoints increases, power decreases, because it is therefore less likely that any one given conclusion could be drawn.

I feel, with #2, power decreases with an increased difference in compliance between Tx groups, because we're more likely to draw the correct conclusion if compliance is maximized across all participants.

For #3, I'm a bit confused by the wording here. I'm interpreting "size" as synonymous with "significance," or "inclination to cause impact." I would guess that as the size of the expected effect increases, the power also increases, because conclusions that are more tangible or significant to begin with are more likely to be "discovered."
 
I know you probably don't want to hear this, but FA doesn't provide decent biostats information. If you don't have a strong statistics background, their summary of concepts is quite minimal. A very simple way to understand power is the following: It is the probability of detecting an association if it exists in reality.

I can't talk about power without discussing type II error, so type II error (beta) is the chance of declaring there is no association when in fact there is one (vs. type I (alpha) saying there is an association when there is in fact not one). Moreover, power is defined as 1 - type II error which is to say that as you minimize type II error you maximize power.

If I want to avoid saying there is no association of a risk factor X with development of disease Y, then I need to consider three major things.
1) Sample Size - If I increase the number of samples in my study from 5 to 5,000 and I do not see an association of risk factor X with development of disease Y, then I am more likely to see reality and thus likely to commit type II error (saying there is no association when there is). Consequently, increasing N increases power (by minimizing beta)

2) Effect Size - This is a slippery concept, but an example would be, it is easier to detect an elephant escaping from a zoo compared to a deer mouse. So, I might erroneously miss the escape of a deer mouse, but I am unlikely to miss the escape of an elephant. A medical example might be, drug A can either kill someone through acute toxicity (a large effect), or kill someone 40 years down the road through some chronic effect that may be murky and difficult to directly link to the original treatment. The "size of expected effect" thus influences the likelihood of making a type II error and thus also influences power.

3) The final thing that affects power, is the alpha-criterion, which is the amount of type I error that you are willing to accept. I medicine we generally accept a type I error of 5%; that is, we think it is OK if 5% of the time we erroneously conclude that drug A is associated with effect X when, in reality, drug A is not associated with effect X. So, if I increase the alpha-criterion from 5% to 50%, then I am more likely to reject the null hypothesis and come to the conclusion that drug A is associated with effect X (type I error), but at the same time, I am NOT concluding that drug A is NOT associated with effect X, thus I avoid making type II error and thus increase power. So, if I increase type I error I decrease type II error and therefore increase power.

Summary, power is your illumination of reality; it is the capacity to detect a difference if there is in fact a difference to be detected. Power depends on type II (beta) error and is related by P = 1 - beta. Finally, if you increase the 1) Sample Size (N), 2) Effect Size, or 3) type I (alpha) error you will increase power.
 
I know you probably don't want to hear this, but FA doesn't provide decent biostats information. If you don't have a strong statistics background, their summary of concepts is quite minimal. A very simple way to understand power is the following: It is the probability of detecting an association if it exists in reality.

I can't talk about power without discussing type II error, so type II error (beta) is the chance of declaring there is no association when in fact there is one (vs. type I (alpha) saying there is an association when there is in fact not one). Moreover, power is defined as 1 - type II error which is to say that as you minimize type II error you maximize power.

If I want to avoid saying there is no association of a risk factor X with development of disease Y, then I need to consider three major things.
1) Sample Size - If I increase the number of samples in my study from 5 to 5,000 and I do not see an association of risk factor X with development of disease Y, then I am more likely to see reality and thus likely to commit type II error (saying there is no association when there is). Consequently, increasing N increases power (by minimizing beta)

2) Effect Size - This is a slippery concept, but an example would be, it is easier to detect an elephant escaping from a zoo compared to a deer mouse. So, I might erroneously miss the escape of a deer mouse, but I am unlikely to miss the escape of an elephant. A medical example might be, drug A can either kill someone through acute toxicity (a large effect), or kill someone 40 years down the road through some chronic effect that may be murky and difficult to directly link to the original treatment. The "size of expected effect" thus influences the likelihood of making a type II error and thus also influences power.

3) The final thing that affects power, is the alpha-criterion, which is the amount of type I error that you are willing to accept. I medicine we generally accept a type I error of 5%; that is, we think it is OK if 5% of the time we erroneously conclude that drug A is associated with effect X when, in reality, drug A is not associated with effect X. So, if I increase the alpha-criterion from 5% to 50%, then I am more likely to reject the null hypothesis and come to the conclusion that drug A is associated with effect X (type I error), but at the same time, I am NOT concluding that drug A is NOT associated with effect X, thus I avoid making type II error and thus increase power. So, if I increase type I error I decrease type II error and therefore increase power.

Summary, power is your illumination of reality; it is the capacity to detect a difference if there is in fact a difference to be detected. Power depends on type II (beta) error and is related by P = 1 - beta. Finally, if you increase the 1) Sample Size (N), 2) Effect Size, or 3) type I (alpha) error you will increase power.
This is great. I still struggle with stats, so I appreciate hearing it worded differently each time. They love the "easy to spot an elephant" line in class.
 
The final thing that affects power, is the alpha-criterion, which is the amount of type I error that you are willing to accept. I medicine we generally accept a type I error of 5%; that is, we think it is OK if 5% of the time we erroneously conclude that drug A is associated with effect X when, in reality, drug A is not associated with effect X. So, if I increase the alpha-criterion from 5% to 50%, then I am more likely to reject the null hypothesis and come to the conclusion that drug A is associated with effect X (type I error), but at the same time, I am NOT concluding that drug A is NOT associated with effect X, thus I avoid making type II error and thus increase power. So, if I increase type I error I decrease type II error and therefore increase power.

So let me get this straight:

If we increase alpha, then we are saying that despite our data which may suggest H1, H0 is increasingly likely to be true. Since beta depends on H0 being false, and we've suggested that it may be increasingly true, then beta must decrease on the basis that H1 is increasingly false.

I agree that FA is not enough for biostats. Most of what I've learned so far has been through questions. I'm going to make it a focus to get through HY Biostats next month (it's a short text but not a quick read, as I can tell by having glanced at pieces of it. I'm still working on BRS Behavioral at the moment though.).

Thanks for the summary btw.

I should also throw in that I had encountered a Kaplan QBook question that asked about which value is representative of rejecting the null hypothesis when it is true. They had both type-I error and alpha as answer choices, and type-I error was correct. Apparently, they like the distinction that alpha and beta are the probabilities of committing type-I and -II errors, respectively, and that there's not actually such thing as "alpha or beta error."
 
Right, since we can never prove something to be true, but only disprove something, we use the null hypothesis and we true and disprove the null hypothesis (H0) in favor of the alternative hypothesis (H1). Increasing alpha makes the risk of making a type 1 error much more common ("If we increase alpha, then we are saying that despite our data which may suggest H1, H0 is increasingly likely to be true."). This is because the medical community has agreed that making a type I error is ok as long as the probabilty of making that error is less than or equal to 5%. We arbitrarily agree that having 95% certainty that drug A is associated with effect X is acceptable (i.e., this is p <= 0.05.

However, if I wanted to make a stronger statistical connection I could say that a 5% risk of a type I error is unacceptable and I want to have a 1% chance of making a type I error. If my data results in a p value of < 0.01 then there is less then I am 99% sure that drug A is associated with effect X. However, what if it turns out that I get a p value of 0.03? Before I started the experiment I said that my alpha-criterion would be 1%; therefore, a p value of 0.03 fails to satisfy my alpha-criterion and I have failed to disprove H0. By failing to disprove the H0 I am now at risk for a type II error since there is still a 97% chance that drug A does produce effect X, but by setting my standards too high, I might miss it. That is, if your alpha-criterion is too strict or rigorous, you increase the risk of saying there is no association when in fact there is.

It is somewhat similar to sensitivity and specificity in that there is an inverse relationship. When I decrease the alpha-criterion from 5% to 1% I am much less likely to make a type I error (saying that there is a difference when there is not one) if my data turns out a p-value less than the alpha criterion. But, I am now more likely to also say that there is no difference when there is in fact one (type II error) because if I fail to meet my alpha-criterion, then I cannot reject the null hypothesis.

Rejecting the null hypothesis when it is true is exactly type I error (you are saying there is an association when there actually isn't). Alpha is not exactly equal to type I error, it is easier to think of it that way, but in reality alpha and beta are criterion that you establish before you even conduct your study. They are related to type I errors, and if the answer choice had type I error and alpha error, and they made a big deal about that, then that is nonsense. If they just said type I error and alpha, then type I error is the better answer since alpha, in and of itself, is really the amount of error you are willing to accept when initiating a study.
 
However, if I wanted to make a stronger statistical connection I could say that a 5% risk of a type I error is unacceptable and I want to have a 1% chance of making a type I error. If my data results in a p value of < 0.01 then there is less then I am 99% sure that drug A is associated with effect X. However, what if it turns out that I get a p value of 0.03? Before I started the experiment I said that my alpha-criterion would be 1%; therefore, a p value of 0.03 fails to satisfy my alpha-criterion and I have failed to disprove H0. By failing to disprove the H0 I am now at risk for a type II error since there is still a 97% chance that drug A does produce effect X, but by setting my standards too high, I might miss it. That is, if your alpha-criterion is too strict or rigorous, you increase the risk of saying there is no association when in fact there is.

It is somewhat similar to sensitivity and specificity in that there is an inverse relationship. When I decrease the alpha-criterion from 5% to 1% I am much less likely to make a type I error (saying that there is a difference when there is not one) if my data turns out a p-value less than the alpha criterion. But, I am now more likely to also say that there is no difference when there is in fact one (type II error) because if I fail to meet my alpha-criterion, then I cannot reject the null hypothesis.

Rejecting the null hypothesis when it is true is exactly type I error (you are saying there is an association when there actually isn't). Alpha is not exactly equal to type I error, it is easier to think of it that way, but in reality alpha and beta are criterion that you establish before you even conduct your study. They are related to type I errors, and if the answer choice had type I error and alpha error, and they made a big deal about that, then that is nonsense. If they just said type I error and alpha, then type I error is the better answer since alpha, in and of itself, is really the amount of error you are willing to accept when initiating a study.

That helps a lot. Thanks.
 
Top