Fallibility of statistics

NeuroTrope · May 16, 2015

Hey y'all,

Been in a talk with various statisticians. I'll keep this short as to not bore people but I had a basic question that had a somewhat distressing answer.

My question: when you include multiple predictors with shared variance in a regression model, what determines which predictors "win" (i.e. remain significant) in explaining the criterion?

The answer: The variable with the lowest standard error wins.

The problem: Behavioral scientists often have an almost dogmatic perspective on regression, such that the significant predictors are the ones that matter. But in reality, the predictors with the best reliability wins. Remember Stats 101, where one assumption of regression is perfect reliability? Most of us acknowledge this is impossible in behavioral science so we dismiss it. But in reality, reliability is everything in determining the retention of significant predictors in the model. And reliability does not equal validity, as we all know...

Bottom line: If you are trying to predict, not a huge issue I suppose. But in identifying essentialist aspects of psychology, it is a huge issue. Our methods of collecting data are highly variable and sometimes the most important variables happen to be the least reliable (e.g. self-report, fMRI...) and so garbage in, garbage out.

BTW I call this fallibility of "statistics" and not "regression" because the model of regression applies to 90% of what we do anyway (ANOVA, chi-square, DFA, PCA) and can be as generalizable as canonical correlation analysis.

Food for thought. Anyone want to weigh in?

WisNeuro · May 16, 2015

Oh man, we could stretch a thread out by hundreds of posts about the misuse, misinterpretation, etc of stats. I think this is a major issue, although I see over reliance on p<.05 as a bigger issue in my day to day clinical life. Just another reason a solid statistics and research foundation is necessary in high level clinical work.

deleted343839 · May 16, 2015

This is why sound theory and research design are inseparable from good data analysis. The 'horse race' approach to multiple regression is abused frequently. IMO it is a technique best applied to (a) large data sets and/or (b) relatively constrained and tightly conceptualized models. Of course unstable parameter estimates are going to drop out of a multiple regression model, especially if you base your sample size on a mindless "10 observations per predictor" type guideline. Some of the strategies to manage severe multicollinearity are only tenable with larger data sets. However, I think sloppy hypotheses and model building are an even bigger problem. As you say, garbage in, garbage out.

Ollie123 · May 17, 2015

Definitely have seen a lot of blind "model building" in my field. Occasionally its justifiable, but often it seems to make little sense. If they are variables that realistically SHOULD be orthogonal save for some oddity to the sample (let's say gender and race were associated in a particular study) that seems logical to examine together - though of course reliability of measurement is less of an issue for something like that.

When I'm testing a fair number of predictors with some correlation, I've taken to just running them separately for this very reason. I'm doing this as we speak for my dissertation...testing out 26 different moderators per DV and applying a massive FDR correction. I'm not sure what would be gained by combining the significant ones into a single model, though its probably the standard approach. I can take the zero-orders and interpret based on that....often including some latent construct accounting for several of them is the driving force. I'm not sure what I'd gain from knowing that say...my alcohol craving scale "beats" my alcohol withdrawal scale when entered together. Reliability, response biases, variability/range restriction, etc. all play an enormous role in determining which one wins.

Ultimately though, I agree with wisneuro. This is really just the tip of the iceberg when it comes to statistical issues in this field (and many others - we aren't alone by any means!).

psych844 · May 17, 2015

Oh boy. Two of the pre-req's that I have left for my undergrad are stats 1 and stats 2, so right now it seems you guys are speaking a totally different language. lol

WisNeuro · May 17, 2015

Ollie123 said:
Ultimately though, I agree with wisneuro. This is really just the tip of the iceberg when it comes to statistical issues in this field (and many others - we aren't alone by any means!).

Yes, we are definitely not alone here. I was consulting on an epilepsy paper with Neurologists once, vainly trying to explain why corrections for multiple comparisons needed to be done and the importance of reporting appropriate effect size figures. Neither of these things was done when the paper went to publication. The healthcare field in general has problems with both general and bio statistics. Too much reliance on teaching people to memorize things, and not enough emphasis on teaching them to conceptualize and think.

NeuroTrope · May 17, 2015

I agree that regression is just the tip of the iceberg...it was depressing to learn what wins the horse race (though intuitively I knew it was something like that...otherwise what, the computer "knew" what was the most relevant?) I think we're due for another article highlighting misconceptions and misuse of statistics (e.g. Adams, Brown, & Grant 1985). Don't even get me started on covariates...and yet I constantly face an uphill battle when reviewers suggest I control for x y and z in a quasi design...

Ollie123 · May 17, 2015

NeuroTrope said:
I agree that regression is just the tip of the iceberg...it was depressing to learn what wins the horse race (though intuitively I knew it was something like that...otherwise what, the computer "knew" what was the most relevant?) I think we're due for another article highlighting misconceptions and misuse of statistics (e.g. Adams, Brown, & Grant 1985). Don't even get me started on covariates...and yet I constantly face an uphill battle when reviewers suggest I control for x y and z in a quasi design...

Been following Uri Simonsohn's work at all? He's published quite a bit in this area recently and I think the issue you point out is subsumed under several of the issues that he identifies, albeit his take and emphasis are quite different (i.e. its not a statistical issue so much as a methodological and ethical one).

NeuroTrope · May 18, 2015

Have not but will - thanks!

psych844 · May 18, 2015

mostly unrelated question, but any resources you guys would suggest for learning introductory stats for behavioral sciences?

473912 · May 18, 2015

psych844 said:
mostly unrelated question, but any resources you guys would suggest for learning introductory stats for behavioral sciences?

I'm sure there are several UG courses that would help. I find statistics too dry to just read and understand, especially at the introductory level. I needed the back and fourth and discourse to make it more interesting and real. By the time I got my BA, I took 3 levels of behavioral science statistics... 2 for the degree within the psych program and one senior-level class from the STAT program for my abandoned statistics minor. Not to mention the capstone lab courses where they were put into practice.

If you're like me, that's the best place to start. Good luck!

psych844 · May 18, 2015

473912 said:
I'm sure there are several UG courses that would help. I find statistics too dry to just read and understand, especially at the introductory level. I needed the back and fourth and discourse to make it more interesting and real. By the time I got my BA, I took 3 levels of behavioral science statistics... 2 for the degree within the psych program and one senior-level class from the STAT program for my abandoned statistics minor. Not to mention the capstone lab courses where they were put into practice.

If you're like me, that's the best place to start. Good luck!

Oh yeah, i will be taking two undergrad courses of it this upcoming year that are a requirement for my degree...quant 1 for behavioral sciences and quant 2..there is a more advanced version that is optional. But since i'm pretty bad at math, i wanted to work on it a bit before the start of the year.

WisNeuro · May 18, 2015

I'd suggest Andy Fields' SPSS books. At the early grad level, he does one of the best jobs I've seen of explaining the concepts of basic stats with very good examples. And then shows you how to do it in SPSS and correctly interpret the results. Good place to start anyway.

futureapppsy2 · May 18, 2015

psych844 said:
Oh yeah, i will be taking two undergrad courses of it this upcoming year that are a requirement for my degree...quant 1 for behavioral sciences and quant 2..there is a more advanced version that is optional. But since i'm pretty bad at math, i wanted to work on it a bit before the start of the year.

IME, grad school stats is much more logically and conceptually focused that calculation-focused. For example, understanding WHY an ANOVA is calculated the way it is and the impact of that on finding is more important that being able to run the calculations themselves.

Fallibility of statistics

NeuroTrope

Full Member

WisNeuro

Board Certified in Clinical Neuropsychology

deleted343839

Ollie123

Full Member

psych844

Full Member

WisNeuro

Board Certified in Clinical Neuropsychology

NeuroTrope

Full Member

Ollie123

Full Member

NeuroTrope

Full Member

psych844

Full Member

473912

Full Member

psych844

Full Member

WisNeuro

Board Certified in Clinical Neuropsychology

futureapppsy2

Assistant professor

Similar threads