Fallibility of statistics

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

NeuroTrope

Full Member
10+ Year Member
Joined
Feb 12, 2013
Messages
160
Reaction score
47
Hey y'all,

Been in a talk with various statisticians. I'll keep this short as to not bore people but I had a basic question that had a somewhat distressing answer.

My question: when you include multiple predictors with shared variance in a regression model, what determines which predictors "win" (i.e. remain significant) in explaining the criterion?

The answer: The variable with the lowest standard error wins.

The problem: Behavioral scientists often have an almost dogmatic perspective on regression, such that the significant predictors are the ones that matter. But in reality, the predictors with the best reliability wins. Remember Stats 101, where one assumption of regression is perfect reliability? Most of us acknowledge this is impossible in behavioral science so we dismiss it. But in reality, reliability is everything in determining the retention of significant predictors in the model. And reliability does not equal validity, as we all know...

Bottom line: If you are trying to predict, not a huge issue I suppose. But in identifying essentialist aspects of psychology, it is a huge issue. Our methods of collecting data are highly variable and sometimes the most important variables happen to be the least reliable (e.g. self-report, fMRI...) and so garbage in, garbage out.

BTW I call this fallibility of "statistics" and not "regression" because the model of regression applies to 90% of what we do anyway (ANOVA, chi-square, DFA, PCA) and can be as generalizable as canonical correlation analysis.

Food for thought. Anyone want to weigh in?
 
Oh man, we could stretch a thread out by hundreds of posts about the misuse, misinterpretation, etc of stats. I think this is a major issue, although I see over reliance on p<.05 as a bigger issue in my day to day clinical life. Just another reason a solid statistics and research foundation is necessary in high level clinical work.
 
This is why sound theory and research design are inseparable from good data analysis. The 'horse race' approach to multiple regression is abused frequently. IMO it is a technique best applied to (a) large data sets and/or (b) relatively constrained and tightly conceptualized models. Of course unstable parameter estimates are going to drop out of a multiple regression model, especially if you base your sample size on a mindless "10 observations per predictor" type guideline. Some of the strategies to manage severe multicollinearity are only tenable with larger data sets. However, I think sloppy hypotheses and model building are an even bigger problem. As you say, garbage in, garbage out.
 
Definitely have seen a lot of blind "model building" in my field. Occasionally its justifiable, but often it seems to make little sense. If they are variables that realistically SHOULD be orthogonal save for some oddity to the sample (let's say gender and race were associated in a particular study) that seems logical to examine together - though of course reliability of measurement is less of an issue for something like that.

When I'm testing a fair number of predictors with some correlation, I've taken to just running them separately for this very reason. I'm doing this as we speak for my dissertation...testing out 26 different moderators per DV and applying a massive FDR correction. I'm not sure what would be gained by combining the significant ones into a single model, though its probably the standard approach. I can take the zero-orders and interpret based on that....often including some latent construct accounting for several of them is the driving force. I'm not sure what I'd gain from knowing that say...my alcohol craving scale "beats" my alcohol withdrawal scale when entered together. Reliability, response biases, variability/range restriction, etc. all play an enormous role in determining which one wins.

Ultimately though, I agree with wisneuro. This is really just the tip of the iceberg when it comes to statistical issues in this field (and many others - we aren't alone by any means!).
 
Oh boy. Two of the pre-req's that I have left for my undergrad are stats 1 and stats 2, so right now it seems you guys are speaking a totally different language. lol
 
Ultimately though, I agree with wisneuro. This is really just the tip of the iceberg when it comes to statistical issues in this field (and many others - we aren't alone by any means!).

Yes, we are definitely not alone here. I was consulting on an epilepsy paper with Neurologists once, vainly trying to explain why corrections for multiple comparisons needed to be done and the importance of reporting appropriate effect size figures. Neither of these things was done when the paper went to publication. The healthcare field in general has problems with both general and bio statistics. Too much reliance on teaching people to memorize things, and not enough emphasis on teaching them to conceptualize and think.
 
I agree that regression is just the tip of the iceberg...it was depressing to learn what wins the horse race (though intuitively I knew it was something like that...otherwise what, the computer "knew" what was the most relevant?) I think we're due for another article highlighting misconceptions and misuse of statistics (e.g. Adams, Brown, & Grant 1985). Don't even get me started on covariates...and yet I constantly face an uphill battle when reviewers suggest I control for x y and z in a quasi design...
 
I agree that regression is just the tip of the iceberg...it was depressing to learn what wins the horse race (though intuitively I knew it was something like that...otherwise what, the computer "knew" what was the most relevant?) I think we're due for another article highlighting misconceptions and misuse of statistics (e.g. Adams, Brown, & Grant 1985). Don't even get me started on covariates...and yet I constantly face an uphill battle when reviewers suggest I control for x y and z in a quasi design...

Been following Uri Simonsohn's work at all? He's published quite a bit in this area recently and I think the issue you point out is subsumed under several of the issues that he identifies, albeit his take and emphasis are quite different (i.e. its not a statistical issue so much as a methodological and ethical one).
 
mostly unrelated question, but any resources you guys would suggest for learning introductory stats for behavioral sciences?
 
mostly unrelated question, but any resources you guys would suggest for learning introductory stats for behavioral sciences?
I'm sure there are several UG courses that would help. I find statistics too dry to just read and understand, especially at the introductory level. I needed the back and fourth and discourse to make it more interesting and real. By the time I got my BA, I took 3 levels of behavioral science statistics... 2 for the degree within the psych program and one senior-level class from the STAT program for my abandoned statistics minor. Not to mention the capstone lab courses where they were put into practice.

If you're like me, that's the best place to start. Good luck!
 
I'm sure there are several UG courses that would help. I find statistics too dry to just read and understand, especially at the introductory level. I needed the back and fourth and discourse to make it more interesting and real. By the time I got my BA, I took 3 levels of behavioral science statistics... 2 for the degree within the psych program and one senior-level class from the STAT program for my abandoned statistics minor. Not to mention the capstone lab courses where they were put into practice.

If you're like me, that's the best place to start. Good luck!
Oh yeah, i will be taking two undergrad courses of it this upcoming year that are a requirement for my degree...quant 1 for behavioral sciences and quant 2..there is a more advanced version that is optional. But since i'm pretty bad at math, i wanted to work on it a bit before the start of the year.
 
I'd suggest Andy Fields' SPSS books. At the early grad level, he does one of the best jobs I've seen of explaining the concepts of basic stats with very good examples. And then shows you how to do it in SPSS and correctly interpret the results. Good place to start anyway.
 
Oh yeah, i will be taking two undergrad courses of it this upcoming year that are a requirement for my degree...quant 1 for behavioral sciences and quant 2..there is a more advanced version that is optional. But since i'm pretty bad at math, i wanted to work on it a bit before the start of the year.
IME, grad school stats is much more logically and conceptually focused that calculation-focused. For example, understanding WHY an ANOVA is calculated the way it is and the impact of that on finding is more important that being able to run the calculations themselves.
 
Top