Databases and p-values

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

TheBoneDoctah

Full Member
Volunteer Staff
10+ Year Member
Joined
Mar 12, 2013
Messages
12,020
Reaction score
8,003
I have a stats question and wanted to see if anyone can drop some knowledge.

I am using a national database and am looking at mortality rates for different procedures at different hospitals. The database I am using approximates a 20% stratified sample of discharges from hospitals.

When I am analyzing the data, I have a question regarding p-values.

For example:
If I am comparing mortality for a procedure in hospital X and hospital Y.

Procedure: 50 procedures performed/5 deaths in hospital x = 10% mortality
70 procedures/ 5 deaths in hospital y - 7% mortality
p= 0.06

Since the number of procedures and deaths taken from the database is a 20% sample, should I weight/multiply these numbers by 5 to obtain the nationwide estimate and get a higher n?

250 procedures performed/25 deaths in hospital X = 10% mortality
350 procedures performed / 25 deaths in hospital Y = 7% mortality
p= 0.001

I know this is probably a basic concept for most of you, but I am new to research/stats and want to know if this is something that is done with databases.

Thanks!

Members don't see this ad.
 
Last edited:
No, you can't multiply the sample size
 
No, that's bad statistics. with such small sample sizes and disparate patient populations, direct comparisons is very difficult. Do you have access to a statistician for this project?

Also - I challenge you to find a procedure that carries a 10% mortality rate... what you'll find is a low volume, high risk, complex surgery. Generalizing results for something like that is a very difficult thing to do.
 
  • Like
Reactions: 1 user
Members don't see this ad :)
No, that's bad statistics. with such small sample sizes and disparate patient populations, direct comparisons is very difficult. Do you have access to a statistician for this project?

Also - I challenge you to find a procedure that carries a 10% mortality rate... what you'll find is a low volume, high risk, complex surgery. Generalizing results for something like that is a very difficult thing to do.

Oh I know. That data I put was purely an example. I was just arbitrarily using 10%.
 
I have a stats question and wanted to see if anyone can drop some knowledge.

I am using a national database and am looking at mortality rates for different procedures at different hospitals. The database I am using approximates a 20% stratified sample of discharges from hospitals.

When I am analyzing the data, I have a question regarding p-values.

For example:
If I am comparing mortality for a procedure in hospital X and hospital Y.

Procedure: 50 procedures performed/5 deaths in hospital x = 10% mortality
70 procedures/ 5 deaths in hospital y - 7% mortality
p= 0.06

Since the number of procedures and deaths taken from the database is a 20% sample, should I weight/multiply these numbers by 5 to obtain the nationwide estimate and get a higher n?

250 procedures performed/25 deaths in hospital X = 10% mortality
350 procedures performed / 25 deaths in hospital Y = 7% mortality
p= 0.001

I know this is probably a basic concept for most of you, but I am new to research/stats and want to know if this is something that is done with databases.

Thanks!
r9id6xt.jpg


What a p-Value Tells You about Statistical Data - dummies
 
  • Like
Reactions: 1 users
Unfortunately, the author doesn't tell you what a p-value means.

"You randomly sample some delivery times and ruReplyn the data through the hypothesis test, and your p-value turns out to be 0.001, which is much less than 0.05. In real terms, there is a probability of 0.001 that you will mistakenly reject the pizza place’s claim that their delivery time is less than or equal to 30 minutes."

This is incorrect. If using an alpha .05, there is a .05 probability of incorrectly rejecting a true null hypothesis. The p-value is not the probability of incorrectly [insert anything here]; p-values are not error probabilities. This a common and egregious misinterpretation. The p-value of .001 means that there is a .001 probability (0.1% chance) of seeing a result at least as extreme as the observed one, assuming the null is true.

Edited: carried a decimal
 
Last edited by a moderator:
Unfortunately, the author doesn't tell you what a p-value means.

"You randomly sample some delivery times and ruReplyn the data through the hypothesis test, and your p-value turns out to be 0.001, which is much less than 0.05. In real terms, there is a probability of 0.001 that you will mistakenly reject the pizza place’s claim that their delivery time is less than or equal to 30 minutes."

This is incorrect. If using an alpha .05, there is a .05 probability of incorrectly rejecting a true null hypothesis. The p-value is not the probability of incorrectly [insert anything here]; p-values are not error probabilities. This a common and egregious misinterpretation. The p-value of .001 means that there is a .001 probability (.01% chance) of seeing a result at least as extreme as the observed one, assuming the null is true.
What ever the author is saying is still better than literally multiplying your sample size to change p values as op had proposed. Perhaps you should write an explanation on why the author is wrong to the author.
 
What ever the author is saying is still better than literally multiplying your sample size to change p values as op had proposed. Perhaps you should write an explanation on why the author is wrong to the author.
I disagree that one gross misinterpretation that increases false security is necessarily better than inappropriate application of the idea.
I don't think it's on me to correct the author (I've given you all you need to do it, I can even post a source if you'd like), I didn't post the inaccurate resource.
 
I disagree that one gross misinterpretation that increases false security is necessarily better than inappropriate application of the idea.
I don't think it's on me to correct the author (I've given you all you need to do it, I can even post a source if you'd like), I didn't post the inaccurate resource.
You have the problem with the example. So you should be the one to correct the professor. You had enough of a problem to resurect a 6 month old thread.I'm sure the professor of stats and the editors are eagerly awaiting your input.
 
You have the problem with the example. So you should be the one to correct the professor. You had enough of a problem to resurect a 6 month old thread.I'm sure the professor of stats and the editors are eagerly awaiting your input.
Fair point on the 6 month part, but I didn't look at how far back it was; it could've been 2 years old! I went from the perspective of it's an undergraduate stats definition, should probably correct it since medical people (this forum, not necessarily the author) generally don't know what they're doing or what things mean with stats (like the people who think a hazard or hazard ratio is a probability or probability ratio).

I just want to be clear, because you seem defensive of your post. Do you agree that it is incorrect what's written in that article or do you think it is correct? I find it odd to be so defensive if you actually think it is incorrect.

I'm surprised the author does have a stats degree (although can't find a CV so who knows if it's stats or "stats education" which is her focus apparently), but it's a blog type post and a "for dummies" book so I wouldn't be surprised if someone else wrote the post and the author skimmed it and agreed or if it's intentionally written wrong, but either way, it's incorrect. I can start posting sources from people with Masters and PhD stats degrees if you'd like; this is an undergraduate definition and concept, though.

I appreciate your attempt at a backhanded comment, too, given that you're upset your post was shot down. Luckily, I don't hang my hat of pride on my internet persona :laugh:
 
Last edited by a moderator:
You have the problem with the example. So you should be the one to correct the professor. You had enough of a problem to resurect a 6 month old thread.I'm sure the professor of stats and the editors are eagerly awaiting your input.
I am eagerly awaiting your reply as to whether or not you think the link you posted is correct or incorrect. Although, we can probably infer you wouldn't have posted it to correct someone if you thought it were incorrect. This would shed light on the nature of your reply.
 
I am eagerly awaiting your reply as to whether or not you think the link you posted is correct or incorrect. Although, we can probably infer you wouldn't have posted it to correct someone if you thought it were incorrect. This would shed light on the nature of your reply.
The initial post linked something Wikipedia equivalent so the person could get a better understanding. The example at the bottom could be wrong but it just an example. The rest of the content is accurate. I don't have opinion of the correctness of the example because frankly, I dont care. Furthermore as I have stated above you can contact the professor of stats who is the author to correct them considering it bothers you to this extent.
 
The initial post linked something Wikipedia equivalent so the person could get a better understanding. The example at the bottom could be wrong but it just an example. The rest of the content is accurate. I don't have opinion of the correctness of the example because frankly, I dont care. Furthermore as I have stated above you can contact the professor of stats who is the author to correct them considering it bothers you to this extent.
The main thrust of the p value concept is missed, which seems to be the emphasis of the title. The sad part about all of this is that Wikipedia accurately represents these p value misconceptions (they have a section dedicated to it), whereas what was posted propagated a misconception.

I think it’s obvious you thought the content was correct and so you shared it. This isn’t a challenge of you, it’s a correction to what the author wrote. Your unwillingness to take a definitive stance and instead, appeal to authority, suggests you are not sure on the concept and aren’t sure how to approach it (which is okay if that’s the case, but you could just say as much). Again, if you want to appeal to authority, which isn’t a good way to approach a discussion, I can provide you with a long list of authoritative sources that come from well-regarded people in statistics that disagree with what was written, although an undergraduate statistics textbook can tell you this much. If you want to evaluate the merits of my argument about why the author is incorrect, that would be a reasonable approach.

Just remember you’re responsible for posts you make, so it is a cop out to say you don’t care on about an issue for which you posted.

Also, I only made a correction on the thread for anyone who reads it in the future so they aren’t mislead. The reason this discussion is still going is because you’re disagreeing without engaging in a discourse on why you think your source is correct.
 
Last edited by a moderator:
  • Like
Reactions: 1 user
The main thrust of the p value concept is missed, which seems to be the emphasis of the title. The sad part about all of this is that Wikipedia accurately represents these p value misconceptions (they have a section dedicated to it), whereas what was posted propagated a misconception.

I think it’s obvious you thought the content was correct and so you shared it. This isn’t a challenge of you, it’s a correction to what the author wrote. Your unwillingness to take a definitive stance and instead, appeal to authority, suggests you are not sure on the concept and aren’t sure how to approach it (which is okay if that’s the case, but you could just say as much). Again, if you want to appeal to authority, which isn’t a good way to approach a discussion, I can provide you with a long list of authoritative sources that come from well-regarded people in statistics that disagree with what was written, although an undergraduate statistics textbook can tell you this much. If you want to evaluate the merits of my argument about why the author is incorrect, that would be a reasonable approach.

Just remember you’re responsible for posts you make, so it is a cop out to say you don’t care on about an issue for which you posted.

Also, I only made a correction on the thread for anyone who reads it in the future so they aren’t mislead. The reason this discussion is still going is because you’re disagreeing without engaging in a discourse on why you think your source is correct.
The main thrust is that, I did not verify the Content. However assumed it was correct. I don't have the inclination to parse through if it is in fact erroneous. The appeal to authority is not a good argument, but I'm more inclined to trust the author compared to any anonymous contributor to the forum. I am responsible for my content, but if you haven't noticed it idgaf the post was meant as a humorous reply rather than a serious discussion Of p values, if you didn't catch that with the meme and the literal title for dummies. You seem to be the only person with a problem with it so maybe you should contact the professor.
 
The main thrust is that, I did not verify the Content. However assumed it was correct. I don't have the inclination to parse through if it is in fact erroneous. The appeal to authority is not a good argument, but I'm more inclined to trust the author compared to any anonymous contributor to the forum. I am responsible for my content, but if you haven't noticed it idgaf the post was meant as a humorous reply rather than a serious discussion Of p values, if you didn't catch that with the meme and the literal title for dummies. You seem to be the only person with a problem with it so maybe you should contact the professor.
You’re not fooling anyone, except for maybe yourself. Meme was for laughs, link was to help the OP. The for dummies title is a series of books on various topics, not meant to be a humorous read. You’ve defended the posting a few times until this point, so it seems highly unusual if your initial post was totally meant to be a joke.

I’m not asking you to trust me over the author. I’m asking you to evaluate what both the author and I have said and think critically about this fundamental concept (if you’re disputing it).

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Pretty well known list of authors in biostats. Tons of misconceptions they discuss, including that a pvalue tells us the probability of some kind of “mistake”.

Misunderstandings of p-values - Wikipedia

The first misconception is a rewording of what the author in your post said.

There are many more sources I can post if you’re looking to blindly follow an authority.

It’s disturbing how frequent it is for people in medicine to make these kinds of misinterpretations or flat out incorrect claims from statistical methods, considering much of clinical practice rests on them. Again, I provided a correction, and you’re sort of disputing it, so you have some sort of investment here. That’s okay to disagree, and definitely welcome, but you should at least back it up with a reasonable discussion of the points.
 
You’re not fooling anyone, except for maybe yourself. Meme was for laughs, link was to help the OP. The for dummies title is a series of books on various topics, not meant to be a humorous read. You’ve defended the posting a few times until this point, so it seems highly unusual if your initial post was totally meant to be a joke.

I’m not asking you to trust me over the author. I’m asking you to evaluate what both the author and I have said and think critically about this fundamental concept (if you’re disputing it).

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Pretty well known list of authors in biostats. Tons of misconceptions they discuss, including that a pvalue tells us the probability of some kind of “mistake”.

Misunderstandings of p-values - Wikipedia

The first misconception is a rewording of what the author in your post said.

There are many more sources I can post if you’re looking to blindly follow an authority.

It’s disturbing how frequent it is for people in medicine to make these kinds of misinterpretations or flat out incorrect claims from statistical methods, considering much of clinical practice rests on them. Again, I provided a correction, and you’re sort of disputing it, so you have some sort of investment here. That’s okay to disagree, and definitely welcome, but you should at least back it up with a reasonable discussion of the points.
Contact the author. Also thanks for the clarifying the intent of my comment to me.
 
Contact the author. Also thanks for the clarifying the intent of my comment to me.
Will do. Again, it would seem incredibly odd to have tried legitimately defending it for a while if your intention was just a joke. Seems like clarifying that off the bat would have been easiest. But hey, people do weird things, so maybe you wanted to keep it a secret for a while. Your meme said “that’s not how this works” then you posted an explanatory article related to the post. Totally a joke...

Let me know what you think of those rebuttals and your post!
 
Will do. Again, it would seem incredibly odd to have tried legitimately defending it for a while if your intention was just a joke. Seems like clarifying that off the bat would have been easiest. But hey, people do weird things, so maybe you wanted to keep it a secret for a while. Your meme said “that’s not how this works” then you posted an explanatory article related to the post. Totally a joke...
Thanks once again for clarifying my intent. Thanks for letting me know that a for dummies article I listed may contain an erroroneous example. Thanks also for resurrecting a 6 month old thread to do so.
 
Thanks once again for clarifying my intent. Thanks for letting me know that a for dummies article I listed may contain an erroroneous example.
I can’t take all the credit for clarifying it! You laid a good trail of breadcrumbs, which made it easier. I wasn’t just letting you know the article had a problem, it was for future readers so they can avoid a common mistake that is pervasive in medical literature and resources.
 
Top