Shrink the P Value for Significance, Raise the Bar for Research: A Renewed Call

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

Lawpy

42% Full Member
7+ Year Member
SDN Ambassador
Joined
Jun 17, 2014
Messages
63,099
Reaction score
154,727
From Medscape

JAMA Viewpoint: The Proposal to Lower P Value Thresholds to .005

Some takeaways:

“The problem with P values is that if you take their exact definition, what they convey is not something that any clinician would ever be interested in, with very rare exceptions," according to John PA Ioannidis, MD, DSc, Stanford University, California.

The new P = .005 standard would be a temporary fix until the field more consistently adopts and ingrains a more clinically relevant statistical test, or several depending on the type of analysis, he proposes.

That P values are currently "misinterpreted, overtrusted, and misused" means that a research finding within the .05 standard "is wrongly equated with a finding or an outcome (eg, an association or a treatment effect) being true, valid, and worth acting on," Ioannidis writes.

A better metric, one that would serve the needs of clinicians, would reflect whether there is a treatment effect, one large enough to be clinically meaningful. The P value, Ioannidis said, "is very remote from that. It's so remote from it that people are just misled."

More useful are hazard ratios (or relative risks or odds ratios) with confidence intervals that convey effect sizes that can show whether a treatment outcome may be clinically appealing, he said. Those metrics don't simply dichotomize results in terms of significance vs nonsignificance.

What are your thoughts?

Members don't see this ad.
 
  • Like
Reactions: 1 users
P = 0.005 is quite the aggressive change
 
  • Like
Reactions: 9 users
Members don't see this ad :)
I have a PhD. I’ve said it before and I’ll say it again: doctors aren’t researchers.

Hazards or odds ratios help for certain situations, but aren’t applicable to some study designs. Confidence intervals are fine, but you still have to understand the test that derived those confidence intervals and the study that produced those results. To the standard physician, these details don’t matter or are not considered. Also I’m not sure how a proposal to make p=0.005 helps - it’s just means the type 1 error is low without doing anything to protect the quality or applicability of the data being produced.

I really hate how medical schools try to teach evidence based medicine to the masses, and then students engage in a crappy chart review to stack the CV for the match, and suddenly people think they know research. If you want to understand research, do a PhD - as the author of the viewpoint has done. Obviously this would be a stupid idea for the majority of physicians so I think that in addition to having a standard scientific abstract with papers, there should be a clinical abstract that describes briefly what the study was (without jargon of statistical talk or basic/applied science talk), what was found (without all the jargon of p-values, sensitivity analyses, etc) and what the strengths and limitations are (like how strong is the data, what are some clinical unknowns) - this too would be subject to peer review. Basically like the summaries you see in clinical meta-analysis or guidelines articles but for an individual study.
 
  • Like
Reactions: 25 users
Agree 110% with the premise.

So much misunderstanding on the p value! Statistical and clinical significance are two entirely separate entities.

Not sure I like the 0.005 idea because it doubles down on a flawed measure. I love the idea of coming up with better definitions of significance that incorporate both statistical and clinical meaning.
 
  • Like
Reactions: 7 users
Agree 110% with the premise.

So much misunderstanding on the p value! Statistical and clinical significance are two entirely separate entities.

Not sure I like the 0.005 idea because it doubles down on a flawed measure. I love the idea of coming up with better definitions of significance that incorporate both statistical and clinical meaning.

Agreed. We need better education on statistical significance and statistical tests and how they correlate to clinical significance.
 
  • Like
Reactions: 1 users
The thing is, at least as far as I'm aware, a P <0.05 implies that "if the null hypothesis were true, the probability of finding the observed or more extreme value by random chance would be <5%". When we're discussing RR, that is practically synonymous in my mind to "A 95% confidence interval centered around the measured value does not cross 1".

How does the author on one hand argue for P <0.005 and on the other hand argue that confidence intervals are more useful? Is he implying that the standard should be a 99.5% confidence interval not crossing unity? Or should the statistics be decoupled? I already primarily look at the measurement and the confidence interval, but I can already think of a few studies off the top of my head where there was a clinically meaningful outcome measured, with a clinically meaningful result and a confidence interval to match, and the P was say, 0.01 or 0.02.

Edit: Oh, and after I opened the Medscape article, I noticed who the author was: It's Ioannidis, who for the last 13 years (since I graduated high school!) has been harping about how per his analysis half of all published results are flat-out false. This is just his regular hobbyhorse.
 
Last edited:
  • Like
Reactions: 6 users
I think the premise that we all live and die by the .05 sword is true and not a desirable position. However, making .005 the new sword isn’t going to change our sword-loving ways. All that will happen is that all the research will magically make it under the new threshold.
 
  • Like
Reactions: 1 users
What will happen to Type 2 error? I guess the world will never know
 
  • Like
Reactions: 1 user
Changing the p value threshold isn’t the answer. Just a tradeoff between type I and type II error.

The bigger issues are 1.) most academic doctors cannot give an accurate definition of what a p-value means. 2.) people overlook the importance of effect size and clinical significance at an alarming rate
 
  • Like
Reactions: 3 users
I have a PhD. I’ve said it before and I’ll say it again: doctors aren’t researchers.

Hazards or odds ratios help for certain situations, but aren’t applicable to some study designs. Confidence intervals are fine, but you still have to understand the test that derived those confidence intervals and the study that produced those results. To the standard physician, these details don’t matter or are not considered. Also I’m not sure how a proposal to make p=0.005 helps - it’s just means the type 1 error is low without doing anything to protect the quality or applicability of the data being produced.

I really hate how medical schools try to teach evidence based medicine to the masses, and then students engage in a crappy chart review to stack the CV for the match, and suddenly people think they know research. If you want to understand research, do a PhD - as the author of the viewpoint has done. Obviously this would be a stupid idea for the majority of physicians so I think that in addition to having a standard scientific abstract with papers, there should be a clinical abstract that describes briefly what the study was (without jargon of statistical talk or basic/applied science talk), what was found (without all the jargon of p-values, sensitivity analyses, etc) and what the strengths and limitations are (like how strong is the data, what are some clinical unknowns) - this too would be subject to peer review. Basically like the summaries you see in clinical meta-analysis or guidelines articles but for an individual study.


If you want to 'understand' research, all you need to do is take a basic class in statistics. You do not need a PhD.

I hate it when people make generalizations like this. There are a lot of physicians who ARE researchers more than they are clinicians.

I agree with the rest of your post though.
 
  • Like
Reactions: 6 users
If you want to 'understand' research, all you need to do is take a basic class in statistics. You do not need a PhD.

I hate it when people make generalizations like this. There are a lot of physicians who ARE researchers more than they are clinicians.

I agree with the rest of your post though.

And this is how the fallacy propagates. Go through a PhD candidacy/comprehensive exam or defense and then tell me that.

Firstly, the percentage of physicians doing any kind of research is extremely low. It's about 1.4% of ALL physicians, which includes MD/PhDs. So no, there are not a lot of physicians who do research more than clinical work. NIH RePORT - Physician Scientist-Workforce Report 2014 - Size and Composition of the Physician-Scientist Workforce in 2012

Just because a physician does research, it doesn't mean they are always capable of understanding the appropriate way to generate a hypothesis, design a study to properly test that hypothesis, use appropriate methods to assess results, understand the clinical, scientific and statistical intricacies to interpret the results, critically appraise their data in reference to the strengths/limitations of the study and also in reference to what is already known so that an appropriate conclusion can be drawn, let alone do all this when reviewing someone else's study that they had no role in.

I'm not dismissing physician roles in research - they are absolutely critical to get good clinical research done and can be very important parts of the team, and even lead the team. But there needs to be someone who knows all the nitty gritty details who can make sure every word in the paper is air tight. Same example could be said in business - the CEO of many companies probably has a general idea how their accounting, marketing, tech, innovation, manufacturing and HR departments work, but if you asked them to describe every detail they almost certainly could not do it, but it's important that there's someone on the team who can.

Again, doctors are trained clinically. They can do research, but they aren't usually researchers (i.e., they are not usually trained scientists, MD/PhD).
 
  • Like
Reactions: 3 users
Members don't see this ad :)
And this is how the fallacy propagates. Go through a PhD candidacy/comprehensive exam or defense and then tell me that.

Firstly, the percentage of physicians doing any kind of research is extremely low. It's about 1.4% of ALL physicians, which includes MD/PhDs. So no, there are not a lot of physicians who do research more than clinical work. NIH RePORT - Physician Scientist-Workforce Report 2014 - Size and Composition of the Physician-Scientist Workforce in 2012

Just because a physician does research, it doesn't mean they are always capable of understanding the appropriate way to generate a hypothesis, design a study to properly test that hypothesis, use appropriate methods to assess results, understand the clinical, scientific and statistical intricacies to interpret the results, critically appraise their data in reference to the strengths/limitations of the study and also in reference to what is already known so that an appropriate conclusion can be drawn, let alone do all this when reviewing someone else's study that they had no role in.

I'm not dismissing physician roles in research - they are absolutely critical to get good clinical research done and can be very important parts of the team, and even lead the team. But there needs to be someone who knows all the nitty gritty details who can make sure every word in the paper is air tight. Same example could be said in business - the CEO of many companies probably has a general idea how their accounting, marketing, tech, innovation, manufacturing and HR departments work, but if you asked them to describe every detail they almost certainly could not do it, but it's important that there's someone on the team who can.

Again, doctors are trained clinically. They can do research, but they aren't usually researchers (i.e., they are not usually trained scientists, MD/PhD).
There's a difference between being trained to perform the nitty gritty get your hands dirty parts of research and being trained to critically appraise someone elses research. Given the prior poster specifically talks about the minimum threshold to "understand" (not perform) research, I would think he means the latter.

My wife is an MD/PhD. She can do work I wouldn't want to do in a million years. Could I design experiments to further her work? Hell no. Not without significant additional training. But I can still sit down and read/understand the work that was done. Even the basic science research that's a million miles from what I currently do.

You cannot tell me that I am not trained to appraise the worth of research completed within my own field and *that* is the primary skill related to research that is important to me as a clinician. To evaluate the population that a trial was done in and see if it's relevant to me. To read the protocol to see if it's feasible for implementation. And to evaluate the effect size to see how meaningful it is. None of that requires me to spend four years doing PCRs/western blots.
 
  • Like
Reactions: 8 users
There's a difference between being trained to perform the nitty gritty get your hands dirty parts of research and being trained to critically appraise someone elses research. Given the prior poster specifically talks about the minimum threshold to "understand" (not perform) research, I would think he means the latter.

My wife is an MD/PhD. She can do work I wouldn't want to do in a million years. Could I design experiments to further her work? Hell no. Not without significant additional training. But I can still sit down and read/understand the work that was done. Even the basic science research that's a million miles from what I currently do.

You cannot tell me that I am not trained to appraise the worth of research completed within my own field and *that* is the primary skill related to research that is important to me as a clinician. To evaluate the population that a trial was done in and see if it's relevant to me. To read the protocol to see if it's feasible for implementation. And to evaluate the effect size to see how meaningful it is. None of that requires me to spend four years doing PCRs/western blots.

Of course, I agree there's a difference between getting the basics and knowing the details. But that's exactly the problem. You yourself even state that while you can read and generally understand what your wife does, you would not be able to provide a meaningful contribution or suggestion to what she does without significant extra training.

Your example is basic science, but i think the more relevant thing for clinicians is stuff like RCTs and epidemiology. So using that example, I'm sure an MD/PhD in epidemiology would rip many of the papers that you base your daily practice on a new one. I think a lot of people automatically assume basic science MD/PhDs, when they hear clinician-scientist - but I wouldn't want the molecular geneticist running my RCT, and I wouldn't want the epidemiologist running my PCR. So doing 4 years of PCR wasn't specifically what I was referring to.

What I am trying to say is that physicians are probably sufficiently trained to sit down, read a CLINICAL paper in their field and get the basics of it. I also fully expect that about half of the methods section in any given RCT or population cohort study will be complete jibberish to most physicians, even more so in the basic and applied sciences. The topical study assessments they teach in medical school are exactly that - topical. The intent is that a busy community physician should be able to pick up NEJM, read a study (or more likely just the abstract) and get a quick idea of whether it's something they should modify their practice on, which it sounds like is exactly what you do. This is no different from Cochrane criteria for evaluating studies - it's a checklist that anyone can follow. But getting back to the point of this thread, even if all the cochrane boxes are checked as low risk of bias on an RCT, and the p-value is significant, that does not automatically mean the study is good or useful, and knowing that often requires consideration of a lot more than "is this population relevant to me" types of questions. Here's an example:

I did my MD/PhD in engineering. Artificial intelligence is a hot topic now, especially as it's applied to medicine. Doctors see some of this stuff that are seemingly well designed studies with significant results and get super jazzed, but haven't the faintest clue about what machine learning actually is or does, how it works and what it's limitations are. Likewise, if I were to read a large epidemiology study, there are aspects in there that I would have no idea about, even if the content was in my specialty.

My point is that medicine teaches us to understand the basics, but in no way does it train us to actually conduct those studies or properly interpret and critically appraise every aspect of them. The details are far more complicated than most people think, and that's where scientific expertise comes is. To this end, it's not like a PhD grants you infinite wisdom in all areas of science, but it gives you (a) specific content expertise in a certain area, and (b) the tools, resources and experience to know how to go about finding answers in something the world knows nothing about.

So again getting back to the point of the thread, expanding to consider other fancy statistics goes beyond the "basics" of what is taught in medical school. Obviously peer review needs to be held to a high standard so that doctors are getting high quality science and don't have to worry about the fatal flaws in research designs and analyses, but that's not always the case - especially in journals outside of NEJM, JAMA, Lancet, BMJ, etc. This is where having content expertise in some area of science is useful. Since that's not practical or necessary for physicians, there just needs to be a better way of communicating results to audiences without specific content expertise in epidemiology, public health, basic science etc.
 
  • Like
Reactions: 3 users
If you want to 'understand' research, all you need to do is take a basic class in statistics. You do not need a PhD.

I hate it when people make generalizations like this. There are a lot of physicians who ARE researchers more than they are clinicians.

I agree with the rest of your post though.

You obviously have no idea what you're talking about if you think one basic stats course at the med student level covers what you need to understand clinical research. Some examples of essential basic epidemiological concepts you may not know the answer to:

1) When is it appropriate to adjust for covariates? If a covariate is related to the outcome but not the variable of interest, should we put it in the model? What exactly qualifies as a confounder? If something is a confounder, should we stratify by it and make a subgroup or control for it, and what's the difference? Do we include confounders in RCTs if the design by nature eliminates confounding?

2) How about this real example?
To investigate the association between estrogen and cancer, Yale investigators considered the possibility of ascertainment bias, where estrogens lead to vaginal bleeding that accelerates the diagnosis of existing cancer (more likely to detect it if bleeding occurs). Therefore, they stated that we can look at only vaginal bleeding cases whether they are taking estrogen or not; if these patients all have bled, they must have the same likelihood of being diagnosed with cancer. If estrogen still leads to cancer among these patients, they stated that we can say it's causal. What was the serious flaw in this methodology? (Why do we find an association between estrogen and cancer even among women who bleed, if there is no real association?) How about in Belizan et al 1997? (Hint: similar concept as the prior)
Belizán JM, Villar J, Bergel E, et al. Long-term effect of calcium supplementation during pregnancy on the blood pressure of offspring: follow up of a randomised controlled trial. BMJ. 1997;315(7103):281-285.

3) What assumptions are made with Cox regression? What if proportionality assumptions are not met (very common); how should this be interpreted? What should be used as the time scale for your Cox regression, and how does this affect the analysis? (age, follow-up time, etc)

I can provide countless other examples...even in this thread, the p-value and 95% CI are not synonymous and corresponding concepts as the above poster alluded. These are not obscure topics that will never pop up in real life. They are the bare essentials to any clinical researcher that are guaranteed to come up (but many physicians ignore them unknowingly) and anyone who has taken real graduate level introductory epidemiology classes would know the answers. This lack of fundamental understanding is one of the main reasons why so many observational studies fail to stand up to the rigor of clinical trials. With proper design, cohort studies can closely mirror the results of RCTs, and case control studies should yield effect estimates as valid as cohort studies. However, that is rarely the case in reality.
 
  • Like
Reactions: 3 users
Man I didn't think it was controversial to say "some people who really work to learn stats/epi as MDs can be good at research, and some MDs who don't study it at all are going to be bad at even interpreting the literature."

I'm not going to sit and list all my qualifications for anonymity, but I have graduate degrees from two institutions in biostats. Even though it would benefit me to protect the value of my expertise, I just think you absolutely do NOT need a PhD to do research. That's like saying you need a graduate CS degree to be a good programmer. With the availability of information today, becoming competent isn't dependent on having the letters after your name.

Now, that being said, 98% of MDs (not scientifically determined) do not take the onus to really learn proper stats/epi. The peer review process is filled with those same individuals, which is equally troubling. So the point overall, that there is an issue with what we're populating the literature with and then with allowing untrained docs to try to interpret that mess, I do agree with. Throw in special interests (like the American Urological Association recommending PSA screening because...well their Urologists, etc) and the field is far less evidenced based than most who understand would like, and most who don't understand would believe.

(PS @LoGo love the AI example - seen it myself and totally agree)
 
Last edited:
  • Like
Reactions: 2 users
You obviously have no idea what you're talking about if you think one basic stats course at the med student level covers what you need to understand clinical research. Some examples of essential basic epidemiological concepts you may not know the answer to:

1) When is it appropriate to adjust for covariates? If a covariate is related to the outcome but not the variable of interest, should we put it in the model? What exactly qualifies as a confounder? If something is a confounder, should we stratify by it and make a subgroup or control for it, and what's the difference? Do we include confounders in RCTs if the design by nature eliminates confounding?

2) How about this real example?
To investigate the association between estrogen and cancer, Yale investigators considered the possibility of ascertainment bias, where estrogens lead to vaginal bleeding that accelerates the diagnosis of existing cancer (more likely to detect it if bleeding occurs). Therefore, they stated that we can look at only vaginal bleeding cases whether they are taking estrogen or not; if these patients all have bled, they must have the same likelihood of being diagnosed with cancer. If estrogen still leads to cancer among these patients, they stated that we can say it's causal. What was the serious flaw in this methodology? (Why do we find an association between estrogen and cancer even among women who bleed, if there is no real association?) How about in Belizan et al 1997? (Hint: similar concept as the prior)
Belizán JM, Villar J, Bergel E, et al. Long-term effect of calcium supplementation during pregnancy on the blood pressure of offspring: follow up of a randomised controlled trial. BMJ. 1997;315(7103):281-285.

3) What assumptions are made with Cox regression? What if proportionality assumptions are not met (very common); how should this be interpreted? What should be used as the time scale for your Cox regression, and how does this affect the analysis? (age, follow-up time, etc)

I can provide countless other examples...even in this thread, the p-value and 95% CI are not synonymous and corresponding concepts as the above poster alluded. These are not obscure topics that will never pop up in real life. They are the bare essentials to any clinical researcher that are guaranteed to come up (but many physicians ignore them unknowingly) and anyone who has taken real graduate level introductory epidemiology classes would know the answers. This lack of fundamental understanding is one of the main reasons why so many observational studies fail to stand up to the rigor of clinical trials. With proper design, cohort studies can closely mirror the results of RCTs, and case control studies should yield effect estimates as valid as cohort studies. However, that is rarely the case in reality.

Look LoGo, there’s a difference between understanding research and being a full blown statistician. Assuming MDs have a statistician on their team to guide them, there’s no reason why they can’t perform reliable research and be researchers.
 
Of course, I agree there's a difference between getting the basics and knowing the details. But that's exactly the problem. You yourself even state that while you can read and generally understand what your wife does, you would not be able to provide a meaningful contribution or suggestion to what she does without significant extra training.

Your example is basic science, but i think the more relevant thing for clinicians is stuff like RCTs and epidemiology. So using that example, I'm sure an MD/PhD in epidemiology would rip many of the papers that you base your daily practice on a new one. I think a lot of people automatically assume basic science MD/PhDs, when they hear clinician-scientist - but I wouldn't want the molecular geneticist running my RCT, and I wouldn't want the epidemiologist running my PCR. So doing 4 years of PCR wasn't specifically what I was referring to.

*I* can rip many of the papers that I base my daily practice on a new one. We did journal clubs all throughout our clinical training to evaluate the quality of major papers in our fields. That said, I have to treat my patients with something, and the perfect is the enemy of the good.

What I am trying to say is that physicians are probably sufficiently trained to sit down, read a CLINICAL paper in their field and get the basics of it. I also fully expect that about half of the methods section in any given RCT or population cohort study will be complete jibberish to most physicians, even more so in the basic and applied sciences. The topical study assessments they teach in medical school are exactly that - topical. The intent is that a busy community physician should be able to pick up NEJM, read a study (or more likely just the abstract) and get a quick idea of whether it's something they should modify their practice on, which it sounds like is exactly what you do. This is no different from Cochrane criteria for evaluating studies - it's a checklist that anyone can follow. But getting back to the point of this thread, even if all the cochrane boxes are checked as low risk of bias on an RCT, and the p-value is significant, that does not automatically mean the study is good or useful, and knowing that often requires consideration of a lot more than "is this population relevant to me" types of questions. Here's an example:

I did my MD/PhD in engineering. Artificial intelligence is a hot topic now, especially as it's applied to medicine. Doctors see some of this stuff that are seemingly well designed studies with significant results and get super jazzed, but haven't the faintest clue about what machine learning actually is or does, how it works and what it's limitations are. Likewise, if I were to read a large epidemiology study, there are aspects in there that I would have no idea about, even if the content was in my specialty.

The thing is, that doesn't matter. At all. I think you aknowledge that too.

For example, the AI/algorithm argument that most applies to my own field is insulin pump controllers. There's volumes of literature on what the best way to approach an automated insulin pump would be, whether it's model predictive control algorithms, proportional integral derivative algorithms, or some form of fuzzy logic. I have *some* understanding on what those words mean in how the algorithms approach their goals, but if you were to ask me to formally describe what "fuzzy" logic means? I'd fail.

In fact, I don't read that literature. There's no point. If you asked me to critically evaluate the description of the algorithm, I might find it interesting, but it's totally useless to me.

But I can grab the paper where they take a pump with whatever algorithm on board, slap it on to a couple dozen kids at summer camp, and at the end of the two weeks their control of the diabetes is better. I can critically evaluate the population and their control before and after the intervention. The statistician can tell me if the number of hypoglycemic episodes is statistically significant, but the clinican can actually look at the numbers in the context of their individual patient and see if it makes a difference. (I don't treat kids, but that's the first trial that immediately came to mind, but there's lots of insulin pump trials in adults). In the end, I'm not going to be programming the fundamentals in the pump: I'll be setting some basic parameters for the algorithm to start with and should know enough to troubleshoot.

My point is that medicine teaches us to understand the basics, but in no way does it train us to actually conduct those studies or properly interpret and critically appraise every aspect of them. The details are far more complicated than most people think, and that's where scientific expertise comes is. To this end, it's not like a PhD grants you infinite wisdom in all areas of science, but it gives you (a) specific content expertise in a certain area, and (b) the tools, resources and experience to know how to go about finding answers in something the world knows nothing about.

So again getting back to the point of the thread, expanding to consider other fancy statistics goes beyond the "basics" of what is taught in medical school. Obviously peer review needs to be held to a high standard so that doctors are getting high quality science and don't have to worry about the fatal flaws in research designs and analyses, but that's not always the case - especially in journals outside of NEJM, JAMA, Lancet, BMJ, etc. This is where having content expertise in some area of science is useful. Since that's not practical or necessary for physicians, there just needs to be a better way of communicating results to audiences without specific content expertise in epidemiology, public health, basic science etc.

I don't know every aspect of the studies I read. But I see nothing wrong with how the studies I read communicate their results to me. As a clinician, I can put it in an applicable context.

I can provide countless other examples...even in this thread, the p-value and 95% CI are not synonymous and corresponding concepts as the above poster alluded. These are not obscure topics that will never pop up in real life. They are the bare essentials to any clinical researcher that are guaranteed to come up (but many physicians ignore them unknowingly) and anyone who has taken real graduate level introductory epidemiology classes would know the answers. This lack of fundamental understanding is one of the main reasons why so many observational studies fail to stand up to the rigor of clinical trials. With proper design, cohort studies can closely mirror the results of RCTs, and case control studies should yield effect estimates as valid as cohort studies. However, that is rarely the case in reality.

They are not absolutely synonymous, but they are functionally so. A relative risk in a RCT with a 95% confidence interval that doesn't cross unity will basically always be statistically significant to a 0.05 level, but not necessary to a 0.005 level.
 
  • Like
Reactions: 1 users
Look LoGo, there’s a difference between understanding research and being a full blown statistician. Assuming MDs have a statistician on their team to guide them, there’s no reason why they can’t perform reliable research and be researchers.

I'm not LoGo, and these are not concepts to become a full blown statistician, they are basics one needs to learn before knowing how to interpret or design studies. And no, a statistician doesn't solve the problem, I work on teams with great statisticians. They often do not have the clinical expertise relevant to your study, and if you don't know the right questions to ask, they cannot help. They can't do magic. If they could, we would not have so many flawed studies out there by physicians. MDs can do research easily, I published many papers before I knew what I was doing. Doing good research, however, is a different story.
 
Man I didn't think it was controversial to say "some people who really work to learn stats/epi as MDs can be good at research, and some MDs who don't study it at all are going to be bad at even interpreting the literature."

I'm not going to sit and list all my qualifications for anonymity, but I have graduate degrees from two institutions in biostats. Even though it would benefit me to protect the value of my expertise, I just think you absolutely do NOT need a PhD to do research. That's like saying you need a graduate CS degree to be a good programmer. With the availability of information today, becoming competent isn't dependent on having the letters after your name.

Now, that being said, 98% of MDs (not scientifically determined) do not take the onus to really learn proper stats/epi. The peer review process is filled with those same individuals, which is equally troubling. So the point overall, that there is an issue with what we're populating the literature with and then with allowing untrained docs to try to interpret that mess, I do agree with. Throw in special interests (like the American Urological Association recommending PSA screening because...well their Urologists, etc) and the field is far less evidenced based than most who understand would like, and most who don't understand would believe.

(PS @LoGo love the AI example - seen it myself and totally agree)

No one's saying an MD who really works to understand stats/epi can't be qualified, but it involves more than 1 simple stats class. The information is definitely out there for those who take the time and effort to do the proper reading and become educated.
 
You obviously have no idea what you're talking about if you think one basic stats course at the med student level covers what you need to understand clinical research. Some examples of essential basic epidemiological concepts you may not know the answer to:

1) When is it appropriate to adjust for covariates? If a covariate is related to the outcome but not the variable of interest, should we put it in the model? What exactly qualifies as a confounder? If something is a confounder, should we stratify by it and make a subgroup or control for it, and what's the difference? Do we include confounders in RCTs if the design by nature eliminates confounding?

2) How about this real example?
To investigate the association between estrogen and cancer, Yale investigators considered the possibility of ascertainment bias, where estrogens lead to vaginal bleeding that accelerates the diagnosis of existing cancer (more likely to detect it if bleeding occurs). Therefore, they stated that we can look at only vaginal bleeding cases whether they are taking estrogen or not; if these patients all have bled, they must have the same likelihood of being diagnosed with cancer. If estrogen still leads to cancer among these patients, they stated that we can say it's causal. What was the serious flaw in this methodology? (Why do we find an association between estrogen and cancer even among women who bleed, if there is no real association?) How about in Belizan et al 1997? (Hint: similar concept as the prior)
Belizán JM, Villar J, Bergel E, et al. Long-term effect of calcium supplementation during pregnancy on the blood pressure of offspring: follow up of a randomised controlled trial. BMJ. 1997;315(7103):281-285.

3) What assumptions are made with Cox regression? What if proportionality assumptions are not met (very common); how should this be interpreted? What should be used as the time scale for your Cox regression, and how does this affect the analysis? (age, follow-up time, etc)

I can provide countless other examples...even in this thread, the p-value and 95% CI are not synonymous and corresponding concepts as the above poster alluded. These are not obscure topics that will never pop up in real life. They are the bare essentials to any clinical researcher that are guaranteed to come up (but many physicians ignore them unknowingly) and anyone who has taken real graduate level introductory epidemiology classes would know the answers. This lack of fundamental understanding is one of the main reasons why so many observational studies fail to stand up to the rigor of clinical trials. With proper design, cohort studies can closely mirror the results of RCTs, and case control studies should yield effect estimates as valid as cohort studies. However, that is rarely the case in reality.

Can you tell us the answers
 
  • Like
Reactions: 2 users
From Medscape

JAMA Viewpoint: The Proposal to Lower P Value Thresholds to .005

Some takeaways:











What are your thoughts?

Couldn’t one just use effect size?
I understand the reduction in p value if someone is running a MANOVA, but I am pretty sure there are ad hoc tests that escalate p value scrutiny the more independent variables you are testing (Bonferroni?).

At the end of the day, I think an increase in statistical scrutiny is certainly warranted in medical research, and I believe a certain degree of clinical judgement when examining results is completely devoid in researchers and clinicians.

However, I don’t think a change in P value is the best way to do it. P value is a meh analysis made by a guy who wanted to sell beer. It certainly serves a purpose, but it should not be the statistic we hang our hat on, and to try to revise the implementation of a meh test seems like potentially reducing the problem, but not addressing the issue.

I am almost certain as well that there is an already an answer to this statistical quandary. There are so many statistical analyses out there, it’s like that’s all that these nerds do. So to offering a change in the degree of the same instead of offering different, potentially better, tests is a bit puzzling to me.

All that being said, I haven’t actually read the article, just the excerpts from above, so forgive me if I sound like a tool.
 
I think all this is going to do is propagate more fraudulent data being reported. When you are essentially forcing medical students into a ridiculous publish or perish model, just to be competitive for some specialties, it’s only going to reward bad actors. And honestly, I couldn’t blame the kid who’s wanted to be a neurosurgeon his entire life that is told he needs honors, to be involved in clubs in a leadership role, multiple publications, AOA, and a LOR from someone prominent in the field to complete an experiment and this shift the books around if the numbers don’t quite fit, in order to pad that CV.
I wish a priority would be placed on experimental design instead of outcome.
I just remember talking to an older physician on a rotation who told me I shouldn’t do research unless I wanted to, and that it’s not necessary to be competitive. Oh to be a baby boomer.
 
You obviously have no idea what you're talking about if you think one basic stats course at the med student level covers what you need to understand clinical research. Some examples of essential basic epidemiological concepts you may not know the answer to:

1) When is it appropriate to adjust for covariates? If a covariate is related to the outcome but not the variable of interest, should we put it in the model? What exactly qualifies as a confounder? If something is a confounder, should we stratify by it and make a subgroup or control for it, and what's the difference? Do we include confounders in RCTs if the design by nature eliminates confounding?

2) How about this real example?
To investigate the association between estrogen and cancer, Yale investigators considered the possibility of ascertainment bias, where estrogens lead to vaginal bleeding that accelerates the diagnosis of existing cancer (more likely to detect it if bleeding occurs). Therefore, they stated that we can look at only vaginal bleeding cases whether they are taking estrogen or not; if these patients all have bled, they must have the same likelihood of being diagnosed with cancer. If estrogen still leads to cancer among these patients, they stated that we can say it's causal. What was the serious flaw in this methodology? (Why do we find an association between estrogen and cancer even among women who bleed, if there is no real association?) How about in Belizan et al 1997? (Hint: similar concept as the prior)
Belizán JM, Villar J, Bergel E, et al. Long-term effect of calcium supplementation during pregnancy on the blood pressure of offspring: follow up of a randomised controlled trial. BMJ. 1997;315(7103):281-285.

3) What assumptions are made with Cox regression? What if proportionality assumptions are not met (very common); how should this be interpreted? What should be used as the time scale for your Cox regression, and how does this affect the analysis? (age, follow-up time, etc)

I can provide countless other examples...even in this thread, the p-value and 95% CI are not synonymous and corresponding concepts as the above poster alluded. These are not obscure topics that will never pop up in real life. They are the bare essentials to any clinical researcher that are guaranteed to come up (but many physicians ignore them unknowingly) and anyone who has taken real graduate level introductory epidemiology classes would know the answers. This lack of fundamental understanding is one of the main reasons why so many observational studies fail to stand up to the rigor of clinical trials. With proper design, cohort studies can closely mirror the results of RCTs, and case control studies should yield effect estimates as valid as cohort studies. However, that is rarely the case in reality.

Holy **** dude, calm down. Yes you need more than one stats course, but you do not need a PhD either. You can become a EM doc in 7 yrs, a sufficient understanding of statistics does not need the same time investment. To run an academic lab, yeah probably. But to know enough as a clinician to interpret which studies are bull****? Certainly not. Even to do your own research, just have a good base of knowledge and consult a statistician for the nitty-gritty. We do consults all the time in medicine. Take about 5 chill pills, and coming from me, that is saying a lot lol
 
  • Like
Reactions: 4 users
I'm not LoGo, and these are not concepts to become a full blown statistician, they are basics one needs to learn before knowing how to interpret or design studies. And no, a statistician doesn't solve the problem, I work on teams with great statisticians. They often do not have the clinical expertise relevant to your study, and if you don't know the right questions to ask, they cannot help. They can't do magic. If they could, we would not have so many flawed studies out there by physicians. MDs can do research easily, I published many papers before I knew what I was doing. Doing good research, however, is a different story.
Most doctors don’t consults statisticians, and again, I don’t think the reason for all the ****ty research is pure ignorance. It’s because you’re having kids throw together as much research as they can during medical school. Could I have designed a higher quality RCT, yes. Would I be able to get that published in time? Probably not-so time to mine an old data set...
 
Last edited:
Holy **** dude, calm down. Yes you need more than one stats course, but you do not need a PhD either. You can become a EM doc in 7 yrs, a sufficient understanding of statistics does not need the same time investment. To run an academic lab, yeah probably. But to know enough as a clinician to interpret which studies are bull****? Certainly not. Even to do your own research, just have a good base of knowledge and consult a statistician for the nitty-gritty. We do consults all the time in medicine. Take about 5 chill pills, and coming from me, that is saying a lot lol
I never said you needed a PhD, I said you needed more than 1 simple stats course to understand good vs bad lit. If you don't have a PhD, then you need to read. If you can tell what studies are flawed, then what's the major bias in the studies I stated? (1 sentence is all it takes to explain it to me). Even if you know the answer, many evidently do not. Like I said, the examples I gave are not crazy sophisticated statistical theories for advanced mathematicians. They will be covered thoroughly in intro epi courses, and will come up in your clinical research. They don't even involve any math, which is what the PhDs are learning (they go very advanced in the derivation of statistical models, which physicians don't necessarily need to know).
Regarding your second comment, I'm not even talking about the stuff that kids publish, I'm referring to papers by adults, as cited above.
 
I don’t think we should get rid of the p value, but I think there needs to better education on what it actually means and its clinical relevance.
This is the only solution. Education and utilizing statisticians is the way forward, not a crappy bandaid that allows physicians to grossly misuse and misunderstand the statistical methods and ideas they utilize. People who argue for removal of the p value either don't understand it or are strictly Bayesian and don't realize the limitations of Bayesian methodologies.

I have a PhD. I’ve said it before and I’ll say it again: doctors aren’t researchers.

Hazards or odds ratios help for certain situations, but aren’t applicable to some study designs. Confidence intervals are fine, but you still have to understand the test that derived those confidence intervals and the study that produced those results. To the standard physician, these details don’t matter or are not considered. Also I’m not sure how a proposal to make p=0.005 helps - it’s just means the type 1 error is low without doing anything to protect the quality or applicability of the data being produced.

I really hate how medical schools try to teach evidence based medicine to the masses, and then students engage in a crappy chart review to stack the CV for the match, and suddenly people think they know research. If you want to understand research, do a PhD - as the author of the viewpoint has done. Obviously this would be a stupid idea for the majority of physicians so I think that in addition to having a standard scientific abstract with papers, there should be a clinical abstract that describes briefly what the study was (without jargon of statistical talk or basic/applied science talk), what was found (without all the jargon of p-values, sensitivity analyses, etc) and what the strengths and limitations are (like how strong is the data, what are some clinical unknowns) - this too would be subject to peer review. Basically like the summaries you see in clinical meta-analysis or guidelines articles but for an individual study.
I agree with the gist that doctors don't typically understand what they're doing with the statistical aspects of researchers. In fact, most think the stats section is a quick calculation that you just "redo" if you don't like the results (or that you get more data to show what you want). They miss the point that their conclusions hinge on the appropriateness of the methodology employed and the correct interpretation of results. That being said, I don't think a PhD is the answer. Frankly, as far as a PhD would go to solve this issue:
biostats/statistics (not something else with a "concentration")>>> applied mathematics (assuming the application is statistics)/epidemiology with courses from TOP biostats department/Masters in Applied statistics > epidemiology with lots of applied stats >> public health. Quantitative slant is directly related to actually understanding the methodologies and creating new methodologies as well as knowing when violated assumptions or theoretical caveats aren't as important. This is on average, of course.

Changing the p value threshold isn’t the answer. Just a tradeoff between type I and type II error.

The bigger issues are 1.) most academic doctors cannot give an accurate definition of what a p-value means. 2.) people overlook the importance of effect size and clinical significance at an alarming rate
Agree. The former is a direct result of poor education and standards in terms of statistical knowledge for physicians.

Of course, I agree there's a difference between getting the basics and knowing the details. But that's exactly the problem. You yourself even state that while you can read and generally understand what your wife does, you would not be able to provide a meaningful contribution or suggestion to what she does without significant extra training.

Your example is basic science, but i think the more relevant thing for clinicians is stuff like RCTs and epidemiology. So using that example, I'm sure an MD/PhD in epidemiology would rip many of the papers that you base your daily practice on a new one. I think a lot of people automatically assume basic science MD/PhDs, when they hear clinician-scientist - but I wouldn't want the molecular geneticist running my RCT, and I wouldn't want the epidemiologist running my PCR. So doing 4 years of PCR wasn't specifically what I was referring to.

What I am trying to say is that physicians are probably sufficiently trained to sit down, read a CLINICAL paper in their field and get the basics of it. I also fully expect that about half of the methods section in any given RCT or population cohort study will be complete jibberish to most physicians, even more so in the basic and applied sciences. The topical study assessments they teach in medical school are exactly that - topical. The intent is that a busy community physician should be able to pick up NEJM, read a study (or more likely just the abstract) and get a quick idea of whether it's something they should modify their practice on, which it sounds like is exactly what you do. This is no different from Cochrane criteria for evaluating studies - it's a checklist that anyone can follow. But getting back to the point of this thread, even if all the cochrane boxes are checked as low risk of bias on an RCT, and the p-value is significant, that does not automatically mean the study is good or useful, and knowing that often requires consideration of a lot more than "is this population relevant to me" types of questions. Here's an example:

I did my MD/PhD in engineering. Artificial intelligence is a hot topic now, especially as it's applied to medicine. Doctors see some of this stuff that are seemingly well designed studies with significant results and get super jazzed, but haven't the faintest clue about what machine learning actually is or does, how it works and what it's limitations are. Likewise, if I were to read a large epidemiology study, there are aspects in there that I would have no idea about, even if the content was in my specialty.

My point is that medicine teaches us to understand the basics, but in no way does it train us to actually conduct those studies or properly interpret and critically appraise every aspect of them. The details are far more complicated than most people think, and that's where scientific expertise comes is. To this end, it's not like a PhD grants you infinite wisdom in all areas of science, but it gives you (a) specific content expertise in a certain area, and (b) the tools, resources and experience to know how to go about finding answers in something the world knows nothing about.

So again getting back to the point of the thread, expanding to consider other fancy statistics goes beyond the "basics" of what is taught in medical school. Obviously peer review needs to be held to a high standard so that doctors are getting high quality science and don't have to worry about the fatal flaws in research designs and analyses, but that's not always the case - especially in journals outside of NEJM, JAMA, Lancet, BMJ, etc. This is where having content expertise in some area of science is useful. Since that's not practical or necessary for physicians, there just needs to be a better way of communicating results to audiences without specific content expertise in epidemiology, public health, basic science etc.
A few points: to be fair, if I had access to a real statistician (PhD (bio)Stats), I'd chose the statistician over epidemiologist to run the trial from the stats perspective. But sans someone with a true stats degree, epi would be the one.

I think you have some good points. Journals and schools need to recognize who the specialists are in this area: the statisticians. Hire them and use them to raise the bar. Train students, residents, and clinicians better with the help of statisticians.


After all, picking a significance threshold, alpha, is supposed to vary from study to study depending on the aims and risks associated with committing a Type I or Type II error. The one-size-fits-all idea of a "standard for the field" is nonsense and promotes cookbook thinking. When was the last time you read ANY study and heard even the slightest discussion of why the alpha level was set to .1, .05, or .01, for example? I can't remember the last study in biomedical where they used something other than alpha of .05.

It all comes back to education and putting people who are qualified in the right places. Clinicians should review submitted literature within their respective expertise, and statistical methodology and theory is not typically that expertise. That's where biostatisticians are needed to review the submission with the clinical experts. Any contrary response of "that takes too long" "that's too expensive" or "that stats stuff is unimportant" demonstrates a primary misunderstanding of the issues at hand.
 
Last edited by a moderator:
  • Like
Reactions: 1 users
I never said you needed a PhD, I said you needed more than 1 simple stats course to understand good vs bad lit. If you don't have a PhD, then you need to read. If you can tell what studies are flawed, then what's the major bias in the studies I stated? (1 sentence is all it takes to explain it to me). Even if you know the answer, many evidently do not. Like I said, the examples I gave are not crazy sophisticated statistical theories for advanced mathematicians. They will be covered thoroughly in intro epi courses, and will come up in your clinical research. They don't even involve any math, which is what the PhDs are learning (they go very advanced in the derivation of statistical models, which physicians don't necessarily need to know).
Regarding your second comment, I'm not even talking about the stuff that kids publish, I'm referring to papers by adults, as cited above.
I agree with the overall thrust of what you're saying. It's hilarious when MD/PhDs suggest checking a model assumption is "unnecessary minutiae for field X." I think a PhD in stats is the best solution, but an adequate one is probably a masters in applied statistics or a PhD in epidemiology with tons of statistics courses (taught mostly by PhD statisticians). Only reason for that last caveat is I've found a few well known schools where the epi people are getting basic stats stuff wrong (i.e. saying a 95% CI of 2 to 10 has a 95% chance of capturing the true parameter).
 
I agree with the overall thrust of what you're saying. It's hilarious when MD/PhDs suggest checking a model assumption is "unnecessary minutiae for field X." I think a PhD in stats is the best solution, but an adequate one is probably a masters in applied statistics or a PhD in epidemiology with tons of statistics courses (taught mostly by PhD statisticians). Only reason for that last caveat is I've found a few well known schools where the epi people are getting basic stats stuff wrong (i.e. saying a 95% CI of 2 to 10 has a 95% chance of capturing the true parameter).

Yes, I myself am only a med student who did an MPH, which is why I say these are not PhD level concepts. Like some above, I also thought I was great at research during UG when I got an A+ in my intro stats class and designed multiple first author studies. But only after seeking further education did I realize how much I had wrong. I definitely couldn't answer the questions above before my MPH, yet they were crucial to every epidemiological study I've published. Even now, I still need to consult experts for important questions I'd never had known to ask before. (note: I'm not saying degrees are necessary, merely that it's a lot more complex to understand good vs bad research than many med students think. Any information from a degree can be acquired online to those dedicated enough.)
 
  • Like
Reactions: 1 users
Yes, I myself am only a med student who did an MPH, which is why I say these are not PhD level concepts. Like some above, I also thought I was great at research during UG when I got an A+ in my intro stats class and designed multiple first author studies. But only after seeking further education did I realize how much I had wrong. I definitely couldn't answer the questions above before my MPH, yet they were crucial to every epidemiological study I've published. Even now, I still need to consult experts for important questions I'd never had known to ask before. (note: I'm not saying degrees are necessary, merely that it's a lot more complex to understand good vs bad research than many med students think. Any information from a degree can be acquired online to those dedicated enough.)
I agree that you don't need the degree, but on average, it ain't happening. I think you're miles ahead of most medical people in the sense that you have some background and know when you're out of your element and need a real statistician. Most of the researchers I know, MD or MD/PhD, don't do that for some reason.

As with many subjects or fields, you don't know what you don't know. If you get good education from a non-stats program, you can learn a lot. Part of the issue is "biostats for medical students" type books are written by people who don't know what they're doing when it comes to stats, but the people evaluating them don't know any better. :(
 
  • Like
Reactions: 2 users
Yes, I myself am only a med student who did an MPH, which is why I say these are not PhD level concepts. Like some above, I also thought I was great at research during UG when I got an A+ in my intro stats class and designed multiple first author studies. But only after seeking further education did I realize how much I had wrong. I definitely couldn't answer the questions above before my MPH, yet they were crucial to every epidemiological study I've published. Even now, I still need to consult experts for important questions I'd never had known to ask before. (note: I'm not saying degrees are necessary, merely that it's a lot more complex to understand good vs bad research than many med students think. Any information from a degree can be acquired online to those dedicated enough.)

Important point to emphasize. People claiming that an MD is all you need to perform and fully interpret good research don't know what they don't know.
 
I never said you needed a PhD, I said you needed more than 1 simple stats course to understand good vs bad lit. If you don't have a PhD, then you need to read. If you can tell what studies are flawed, then what's the major bias in the studies I stated? (1 sentence is all it takes to explain it to me). Even if you know the answer, many evidently do not. Like I said, the examples I gave are not crazy sophisticated statistical theories for advanced mathematicians. They will be covered thoroughly in intro epi courses, and will come up in your clinical research. They don't even involve any math, which is what the PhDs are learning (they go very advanced in the derivation of statistical models, which physicians don't necessarily need to know).
Regarding your second comment, I'm not even talking about the stuff that kids publish, I'm referring to papers by adults, as cited above.

Ok, you can’t even answer number 1 with one sentence (count the number of question marks). Second of all, I know the answers to majority of your ******* questions in number 1, I didn’t look at 2 or 3 bc I’m not taking a ****ing exam here. All your questions tell me is that you don’t know anything about test writing, standardization, or psychometrics. Yes, you can ask people random questions if you like, but that does not tell you jack **** unless you have enough items to make the scales discernible, you aren’t tapping into multiple factors, and your questions are not written by an angry ferret on adderall with a chip on his shoulder. Yes, I at least know the answers to the majority of the questions in question 1. Does that give you any useful information? Again, chill pills.
 
Last edited:
This is the only solution. Education and utilizing statisticians is the way forward, not a crappy bandaid that allows physicians to grossly misuse and misunderstand the statistical methods and ideas they utilize. People who argue for removal of the p value either don't understand it or are strictly Bayesian and don't realize the limitations of Bayesian methodologies.

I agree with the gist that doctors don't typically understand what they're doing with the statistical aspects of researchers. In fact, most think the stats section is a quick calculation that you just "redo" if you don't like the results (or that you get more data to show what you want). They miss the point that their conclusions hinge on the appropriateness of the methodology employed and the correct interpretation of results. That being said, I don't think a PhD is the answer. Frankly, as far as a PhD would go to solve this issue:
biostats/statistics (not something else with a "concentration")>>> applied mathematics (assuming the application is statistics)/epidemiology with courses from TOP biostats department/Masters in Applied statistics > epidemiology with lots of applied stats >> public health. Quantitative slant is directly related to actually understanding the methodologies and creating new methodologies as well as knowing when violated assumptions or theoretical caveats aren't as important. This is on average, of course.

Agree. The former is a direct result of poor education and standards in terms of statistical knowledge for physicians.

A few points: to be fair, if I had access to a real statistician (PhD (bio)Stats), I'd chose the statistician over epidemiologist to run the trial from the stats perspective. But sans someone with a true stats degree, epi would be the one.

I think you have some good points. Journals and schools need to recognize who the specialists are in this area: the statisticians. Hire them and use them to raise the bar. Train students, residents, and clinicians better with the help of statisticians.


After all, picking a significance threshold, alpha, is supposed to vary from study to study depending on the aims and risks associated with committing a Type I or Type II error. The one-size-fits-all idea of a "standard for the field" is nonsense and promotes cookbook thinking. When was the last time you read ANY study and heard even the slightest discussion of why the alpha level was set to .1, .05, or .01, for example? I can't remember the last study in biomedical where they used something other than alpha of .05.

It all comes back to education and putting people who are qualified in the right places. Clinicians should review submitted literature within their respective expertise, and statistical methodology and theory is not typically that expertise. That's where biostatisticians are needed to review the submission with the clinical experts. Any contrary response of "that takes too long" "that's too expensive" or "that stats stuff is unimportant" demonstrates a primary misunderstanding of the issues at hand.

Whenever I think about a statistician, I get an image in my head of someone screaming into the wind.
My subconscious mind is right about some things
 
Ok, you can’t even answer number 1 with one sentence (count the number of question marks). Second of all, I know the answers to majority of your ******* questions in number 1, I didn’t look at 2 or 3 bc I’m not taking a ****ing exam here. All your questions tell me is that you don’t know anything about test writing, standardization, or psychometrics. Yes, you can ask people random questions if you like, but that does not tell you jack **** unless you have enough items to make the scales discernible, you aren’t tapping into multiple factors, and your questions are not written by an angry ferret on adderall with a chip on his shoulder. Yes, I at least know the answers to the majority of the questions in question 1. Does that give you any useful information? Again, chill pills.
If you read what I wrote, I said the major issue with the studies I raised, which would be #2, and can be stated with a single sentence (or even fragment). Even if you do know the answer, most do not.
 
Whenever I think about a statistician, I get an image in my head of someone screaming into the wind.
My subconscious mind is right about some things
So, are you saying that they're shouting into the wind because what they're saying isn't valuable, or that people aren't listening when they should be listening?

And to be fair, you mentioned earlier that people should "consult the statistician" for some quick details. Part of the issue in medicine is researchers call the statistician when there's an issue or after data have been collected because they don't get that statisticians should be involved from the very beginning of formulating research questions and deciding how the needed variables can and should be measured and then analyzed. There's a great quote from R.A. Fisher, "To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of." This is all too common.
 
  • Like
Reactions: 1 user
Most doctors don’t consults statisticians, and again, I don’t think the reason for all the ****ty research is pure ignorance. It’s because you’re having kids throw together as much research as they can during medical school. Could I have designed a higher quality RCT, yes. Would I be able to get that published in time? Probably not-some time to mine an old data set...
Nobody is saying you have to design a perfect RCT, that's the whole point of having proper knowledge on study design and analysis. There are many situations where RCTs are not feasible due to costs, ethics etc. Understanding the basis of your research methods and how to make the best of it can go a very long way towards quality clinical studies, even if they are not RCTs. There are a lot of excellent studies out there using retrospective data (yielding effect estimates comparable to RCTs), but even more terrible studies unfortunately. Learning how to do a good study will not only let you actually advance science, but also greatly benefit your CV as a med student (A first author in a top journal is guaranteed to impress more than a random case report). I do acknowledge however that 10 crappy papers still helps for residency applications, and that is part of the problem.
 
I think the p value problem can be fixed by 1) making authors report the exact value instead of p < 0.05 or p > 0.05 and 2) teaching people to think about p values more. Changing the threshold itself doesn't help if you still can't think about a p value. You can artificially drive the p value down a lot and just because it's low doesn't mean that it's a clinically significant result. Conversely, just because it's higher, say 0.10, doesn't meant that there's no clinically significant result there either. It's much higher yield to teach the next generation of scientists and clinicians to evaluate p values rather than just lowering to another arbitrary threshold.
 
  • Like
Reactions: 2 users
I think the p value problem can be fixed by 1) making authors report the exact value instead of p < 0.05 or p > 0.05 and 2) teaching people to think about p values more. Changing the threshold itself doesn't help if you still can't think about a p value. You can artificially drive the p value down a lot and just because it's low doesn't mean that it's a clinically significant result. Conversely, just because it's higher, say 0.10, doesn't meant that there's no clinically significant result there either. It's much higher yield to teach the next generation of scientists and clinicians to evaluate p values rather than just lowering to another arbitrary threshold.
Right, this gets at my point of justifying your alpha threshold in the sense that you may have far more risks associated with failing to give a drug due to not meeting significance with a calculated p-value, so you would set .1 as significant rather than .01.

Anyone with a basic understanding of p-values and the theory behind significance testing understands that one-size-fits-all thresholds don't make any sense. Alpha and p-values weren't meant to be used as they are today, and most people in medicine don't know that, which lends to why they incorrectly think p-values are post-data Type I error rates (or any error rate) or that significance is black and white.

The ORIGINAL intention behind calculating a p-value was to cause the researcher to flag a data set as "unusual", at which point, the researcher should question which of the assumptions might not be valid and how to redesign the experiment to remove potential bias. If bias was accounted for and assumptions reasonably valid, only upon repeating the experiment many times and seeing a "low" p-value would the researcher begin to question the assumption regarding the null hypothesis. Basically, the LAST element to be questioned after repetition would be the assumption of "no effect", for example.

In the current application, nearly every researcher is first concluding that the null hypothesis is false and should be rejected-- even in observational studies where p-values are far less meaningful relative to a well designed RCT. You should be able to see why you frequently hear that "new research contradicts the old research on X." (Also a laughable offense when a researcher claims two studies are in disagreement because the endpoint in one reached significance while the other did not-- yet the effect estimates are reasonably similar in magnitude and direction.)
 
  • Like
Reactions: 1 users
It seems that almost every article today runs multiple statistical tests on the data present. I personally think it's equally important to understand the how/why/when/what of adjusting for multiple comparisons in the particular field as it is to understand what a p-value is. A psychiatry researcher may be working on (or reading) a GWAS study or an fMRI study- both of these two study types are making many thousands (if not millions) of individual comparisons, and yet they correct for multiple comparisons in very different ways. In both study types, understanding how the authors corrected for multiple comparisons and why they decided on the method they did for this correction is key to understanding how valid or invalid the authors' inferences (or better yet, your own inferences) are.

Oh ya- forgot to mention that different methods for correcting for multiple comparisons actually lead to different inferences you can make based on the analysis.
 
It seems that almost every article today runs multiple statistical tests on the data present. I personally think it's equally important to understand the how/why/when/what of adjusting for multiple comparisons in the particular field as it is to understand what a p-value is. A psychiatry researcher may be working on (or reading) a GWAS study or an fMRI study- both of these two study types are making many thousands (if not millions) of individual comparisons, and yet they correct for multiple comparisons in very different ways. In both study types, understanding how the authors corrected for multiple comparisons and why they decided on the method they did for this correction is key to understanding how valid or invalid the authors' inferences (or better yet, your own inferences) are.

Oh ya- forgot to mention that different methods for correcting for multiple comparisons actually lead to different inferences you can make based on the analysis.
This is another good point. Your last point is also not necessarily true (not guaranteed), as far as I know, but isn't uncommon given that the adjustment to the p-values or the alpha/confidence level will usually be different with different methods. It's also important to justify when adjustments for multiple comparisons aren't necessary such as in hypothesis generating investigations that will be used to plan a proper study to evaluate that potential item of interest.

This also reminds me of yet another issue with medical research, particularly the chart reviews. I've seen numerous occasions of the PIs or other researchers on a paper not reporting the tons of analyses they do before finding "something interesting to present" or just something they think will be published because it's significant (or switching out the ones they report, or adding data until something is significant). It really illustrates a lack of understanding, and if understanding is present, it's deceitful to present it as the prespecified analyses.
 
This is another good point. Your last point is also not necessarily true (not guaranteed), as far as I know, but isn't uncommon given that the adjustment to the p-values or the alpha/confidence level will usually be different with different methods. It's also important to justify when adjustments for multiple comparisons aren't necessary such as in hypothesis generating investigations that will be used to plan a proper study to evaluate that potential item of interest.

This also reminds me of yet another issue with medical research, particularly the chart reviews. I've seen numerous occasions of the PIs or other researchers on a paper not reporting the tons of analyses they do before finding "something interesting to present" or just something they think will be published because it's significant (or switching out the ones they report, or adding data until something is significant). It really illustrates a lack of understanding, and if understanding is present, it's deceitful to present it as the prespecified analyses.

good ol' p-hacking!

Another reason the changes that the author proposes in JAMA probably would be less effective than people might think....
 
Last edited:
So, are you saying that they're shouting into the wind because what they're saying isn't valuable, or that people aren't listening when they should be listening?

And to be fair, you mentioned earlier that people should "consult the statistician" for some quick details. Part of the issue in medicine is researchers call the statistician when there's an issue or after data have been collected because they don't get that statisticians should be involved from the very beginning of formulating research questions and deciding how the needed variables can and should be measured and then analyzed. There's a great quote from R.A. Fisher, "To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of." This is all too common.

No, I mean statisticians often have the right answers and no one listens lol

And of course I mean get the nerds in on the ground floor. I am well aware statistics are not a magic wand to wave over ****ty data. Please don’t assume I am offering the weakest form of an argument
 
Can you tell us the answers

For those interested, I'll explain some of these from my post, #16:
For 1) and 2), the concepts here deal with causal inference, which can be explained by diagrams called directed acyclic graphs (excellent book available free here which explains these concepts I'll go over more thoroughly: google Miguel Hernan, causal inference book - cant post links yet). In short, the problem with the Yale study in point #2 is that they conditioned on a collider, and for the BMJ study, a mediator.

A collider is basically a common cause that normally blocks the flow of association between the causal variables. For example, genetics and environmental factors both cause cancer. Cancer is a collider because genetics are related to cancer, environmental factors are related to cancer, yet there is no flow of association between genetic and environmental factors in the normal population. However, if we condition on cancer, we open an association between environment and genetics (this is selection bias). If we know someone has cancer, and it is not due to the environment, then we now know that it is more likely due to genetics. So if we are looking only at cancer patients, there is now a relationship between environment and genetics. For the Yale study, uterine bleeding is a collider (common cause) of estrogens and cancer. So even if there was no prior association of estrogen and cancer, conditioning on uterine bleeding will open up an association. To resolve this issue, you would instead design a study where the association between uterine bleeding and cancer dx doesn't exist (screen all women in the study for cancer whether they bleed or not).

A mediator is a variable that lies in the middle of a causal path, and controlling on one of these does the opposite; it blocks the flow of association. Confusion between mediators and confounders is one of the most common errors in epidemiology. A simple example is smoking causing cell damage causing cancer. In this case, cell damage is the mediator. Normally, we know there is an association between smoking and cancer, mediated by cell damage. However, if we condition on those that just have cell damage, further learning they are a smoker adds no extra information if smoking causes cancer entirely through cell damage (in reality, this isn't entirely true, but you can see how it diminishes the association) Therefore, in RCTs, you want to make sure the variables you adjust for are pre-treatment variables, to avoid controlling for something on the pathway from exposure to outcome and blocking effects. The BMJ study I cited had controversy because they adjusted for the child's BMI, a post-randomization variable; while there can be reasons to want to do this, usually the negatives far outweigh them as you end up introducing strong assumptions and invalidating much of the advantages in randomization. As for why we care to adjust for baseline covariates in RCTs, even though confounding is not an issue with proper randomization - controlling for covariates strongly related to the outcome actually increases your precision significantly. In a linear model, this will decrease your standard error; in a logistic model, it could increase the standard error but also the point estimate, and the net result is still an increase in power.

I hope this was helpful to those who were curious.
 
  • Like
Reactions: 3 users
In the current application, nearly every researcher is first concluding that the null hypothesis is false and should be rejected-- even in observational studies where p-values are far less meaningful relative to a well designed RCT. You should be able to see why you frequently hear that "new research contradicts the old research on X." (Also a laughable offense when a researcher claims two studies are in disagreement because the endpoint in one reached significance while the other did not-- yet the effect estimates are reasonably similar in magnitude and direction.)

I think this is where meta-analyses come in and do a better job of reaching these goals that we want in scientific studies. While one study might not be able to recruit enough people to get adequate power, these meta-analyses can provide a better, adequately powered view of whether an effect really exists or not.
 
This also reminds me of yet another issue with medical research, particularly the chart reviews. I've seen numerous occasions of the PIs or other researchers on a paper not reporting the tons of analyses they do before finding "something interesting to present" or just something they think will be published because it's significant (or switching out the ones they report, or adding data until something is significant). It really illustrates a lack of understanding, and if understanding is present, it's deceitful to present it as the prespecified analyses.

I think it's important to note as well that whether this is acceptable or not depends on the purpose of the study. If the purpose of the study is to test whether X effect is associated with use of Y drug because you know that Y hits some receptor that is known to have an effect on X, then you need to make sure you're not doing multiple comparisons and that you're not cherry picking your data or changing your hypothesis. But if you're looking for any effects of Y drug on the human body because you're trying to understand its side effects, then you have to do multiple comparisons because there's no other way of doing it. You can do all sorts of corrections to try to eliminate the chance that you saw an effect just because of multiple comparisons but at the end of the day, you have to realize (and so do the reviewers) that you aren't setting out to prove that Y causes X. You set out to see what is associated with use of Y drug, accepting that the associations you find might be coincidental or confounded. The next study is what needs to set up a rigorous test of causality. That's how science should work.
 
  • Like
Reactions: 1 user
I think this is where meta-analyses come in and do a better job of reaching these goals that we want in scientific studies. While one study might not be able to recruit enough people to get adequate power, these meta-analyses can provide a better, adequately powered view of whether an effect really exists or not.
In an ideal world, meta-analyses would be a good solution, but as they are in practice, many are trash. At least from what I have seen, too many clinicians think these are the gold standard. Meta-analysis has many issues that people don't try to address, so like any other methodology, it's garbage-in-garbage-out as a process. Even with MA, physicians describe "conflicting results" purely by significant and nonsignificant. Looking at the magnitude and direction of estimated parameters is a much more valid method of assessing agreement and disagreement.

I think it's important to note as well that whether this is acceptable or not depends on the purpose of the study. If the purpose of the study is to test whether X effect is associated with use of Y drug because you know that Y hits some receptor that is known to have an effect on X, then you need to make sure you're not doing multiple comparisons and that you're not cherry picking your data or changing your hypothesis. But if you're looking for any effects of Y drug on the human body because you're trying to understand its side effects, then you have to do multiple comparisons because there's no other way of doing it. You can do all sorts of corrections to try to eliminate the chance that you saw an effect just because of multiple comparisons but at the end of the day, you have to realize (and so do the reviewers) that you aren't setting out to prove that Y causes X. You set out to see what is associated with use of Y drug, accepting that the associations you find might be coincidental or confounded. The next study is what needs to set up a rigorous test of causality. That's how science should work.

I think it's completely unacceptable, in any circumstance, to run analyses and fail to report them (excluding accidentally run analyses). I personally know researchers who don't report the 10+ models they fit because it wasn't as "interesting" (large p-values) than something else. These same people don't temper their conclusions-- I've seen several suggest changes in clinical practice. To make it worse, some of these have been published in highly respected journals for that field. I think this stems from a lack of knowledge. I can report that I used an adjustment for multiple comparisons, but that doesn't mean squat if you don't know that I did this for 4 different models and only reported one model.

My point is that every single paper should be held accountable to detail each and every analysis they ran and why and explain why they're reporting the ones they report (exploratory analysis is fine, and often beneficial for hypothesis generation, but then you may not really involve p-values and CIs as much, or you use higher significance cutoffs). Each paper should also justify each part of the statistical methods employed rather than treating it as an afterthought. When's the last time you saw someone use an alpha different from .05 or try to justify why alpha was set a .05? Few and far between, but that should be the standard.

Science should work by structured, high-quality inquiry that first jumps to see what kinds of bias or errors are involved in reviewing generated data, rather than jumping to the conclusion that the hypothesis is disproved on the basis of few studies. The "science" many practice today (many more than we want to admit) is far more haphazard and held to low standards. The appearance of rigor is created by incomplete reporting either from lack of education on what's being done or by dishonesty (I think much is due to the former).
 
  • Love
Reactions: 1 user
I think it's completely unacceptable, in any circumstance, to run analyses and fail to report them (excluding accidentally run analyses). I personally know researchers who don't report the 10+ models they fit because it wasn't as "interesting" (large p-values) than something else. These same people don't temper their conclusions-- I've seen several suggest changes in clinical practice. To make it worse, some of these have been published in highly respected journals for that field. I think this stems from a lack of knowledge. I can report that I used an adjustment for multiple comparisons, but that doesn't mean squat if you don't know that I did this for 4 different models and only reported one model.

My point is that every single paper should be held accountable to detail each and every analysis they ran and why and explain why they're reporting the ones they report (exploratory analysis is fine, and often beneficial for hypothesis generation, but then you may not really involve p-values and CIs as much, or you use higher significance cutoffs). Each paper should also justify each part of the statistical methods employed rather than treating it as an afterthought. When's the last time you saw someone use an alpha different from .05 or try to justify why alpha was set a .05? Few and far between, but that should be the standard.

Have you worked with population-level data before? Reporting every single analysis performed is infeasible and distracting. If I have no idea what causes increased obesity rates, I'm going to have to go looking for variables. I don't have any preconceived notion of a variable in my head. I know that this population has increased obesity but I have no idea why so I'm going to go into this in an unbiased manner and see what variables might be associated with it. In some aspects, this is better than going into this with a specific hypothesis because then you run the risk of having coincidental or confounded findings. If I look at all the variables in a population that are measured, I would probably find a few variables associated with obesity, e.g. hypertension, soda company revenue, etc. Hell, I might even report a p value for these comparisons. The key difference is that I know what kind of study this is - to discover a model that can be used to understand obesity. Others who don't have my training probably wouldn't understand and ding me for running regression on so many variables.

I think you would be hard-pressed to discover a more effective way of answering such research questions. In your world, how would you answer this same question? Remember, you have no idea what causes obesity here. Or, if you want a better example, what causes SIDS. There's no model from which you can derive testable hypotheses. How would you proceed?
 
Have you worked with population-level data before? Reporting every single analysis performed is infeasible and distracting. If I have no idea what causes increased obesity rates, I'm going to have to go looking for variables. I don't have any preconceived notion of a variable in my head. I know that this population has increased obesity but I have no idea why so I'm going to go into this in an unbiased manner and see what variables might be associated with it.
Right, this is called exploratory data analysis, but it doesn't preclude you from stating what you did, especially when you're presenting p-values. Understanding what a p-value is and isn't, and the assumptions in calculating a p-value should make this apparent. It's a sorry excuse to say clear and complete reporting is "distracting"-- I've heard the same argument numerous times from respected researchers when they're presented with important aspects such as model assumption verification and missing data methodology.

In some aspects, this is better than going into this with a specific hypothesis because then you run the risk of having coincidental or confounded findings. If I look at all the variables in a population that are measured, I would probably find a few variables associated with obesity, e.g. hypertension, soda company revenue, etc. Hell, I might even report a p value for these comparisons.
So you're making the argument that blindly swinging in the dark until you hit something is less likely to uncover a coincidental finding than trying to use current knowledge to formulate a specific hypothesis or set of hypotheses? That's an odd argument to make. Again, I'm not totally against data mining or exploration, but I'm against people reporting what they deem relevant out of the pile of analyses they ran just to present "new association X" with a p-value. You don't need p-values (or CIs) to show association that might be worth further investigation, especially in a hypothesis generating or exploratory study. An understanding of many assumptions used in the methods makes this clear that observational p-values don't mean nearly as much as from randomized experiments, but you can still come up with new ideas without using inferential statistics such as p-values and CIs.

The key difference is that I know what kind of study this is - to discover a model that can be used to understand obesity. Others who don't have my training probably wouldn't understand and ding me for running regression on so many variables.
You may know for your paper, but others reading might not. Good science is clearly detailed so the reader knows what was done and how. Earlier you say you're just shaking the tree to generate ideas, now you're using that 1 out of 18 models to understand the disease.

Since you're bringing this to the table, what is your training? Genuinely curious because I can think of quite a few people who have at least bachelors but up to a PhD in Statistics who have dinged people for trying not to report the random things they did, then finding something by mining, and then trying to draw far stronger conclusions than are warranted by the study. It's perfectly okay to data mine, but you have to report it as such, and you definitely have to limit your conclusions because of the nature of what was done.

I think you would be hard-pressed to discover a more effective way of answering such research questions. In your world, how would you answer this same question? Remember, you have no idea what causes obesity here. Or, if you want a better example, what causes SIDS.
I can think of several ways that are more effective and are more open about what's being done. First and formost, visualizing data and looking at descriptive measures get you pretty far for understanding relationships in the data and this doesn't open you up to risks of Type I or Type II errors. For formal tests, it's real easy to state you're using a lenient alpha such as .15 for variable screening (which is pretty reasonable since you're digging for something and won't be pushing strong conclusions, just generating a future study question)-- problems arise when people reviewing for journals don't know stats and think every significance threshold needs to be .05, so you don't see more judgement when selecting alpha. For your question, I would consider utilizing a LASSO regression, report what I did, and temper the conclusions I try to draw. At the very least, it's not hard to say "14 regression models were fit but only the two with significant omnibus tests are presented." Whatever I end up using though, the important part is clear reporting and appropriate conclusions in the context of the analyses performed. It's coming off that you're arguing against the basic idea of clear and complete reporting.

There's no model from which you can derive testable hypotheses. How would you proceed?
You don't need a model to come up with a hypothesis, you need subject matter expertise, and when you have none for a specific issue, you can bridge concepts that are better described-- this is scientific thinking. You don't need to think to have a computer iterate through tons of models and set aside results from a subset.

There's a reason this whole argument of lowering the significance threshold is even in play, and what you're describing is a large contributor to the underlying issue. It's not the clinical trials with preregistered study protocol and far better detailing of methods as much as it is everything else.
 
Last edited by a moderator:
Top