- Joined
- Feb 12, 2013
- Messages
- 160
- Reaction score
- 47
Hey y'all,
Been in a talk with various statisticians. I'll keep this short as to not bore people but I had a basic question that had a somewhat distressing answer.
My question: when you include multiple predictors with shared variance in a regression model, what determines which predictors "win" (i.e. remain significant) in explaining the criterion?
The answer: The variable with the lowest standard error wins.
The problem: Behavioral scientists often have an almost dogmatic perspective on regression, such that the significant predictors are the ones that matter. But in reality, the predictors with the best reliability wins. Remember Stats 101, where one assumption of regression is perfect reliability? Most of us acknowledge this is impossible in behavioral science so we dismiss it. But in reality, reliability is everything in determining the retention of significant predictors in the model. And reliability does not equal validity, as we all know...
Bottom line: If you are trying to predict, not a huge issue I suppose. But in identifying essentialist aspects of psychology, it is a huge issue. Our methods of collecting data are highly variable and sometimes the most important variables happen to be the least reliable (e.g. self-report, fMRI...) and so garbage in, garbage out.
BTW I call this fallibility of "statistics" and not "regression" because the model of regression applies to 90% of what we do anyway (ANOVA, chi-square, DFA, PCA) and can be as generalizable as canonical correlation analysis.
Food for thought. Anyone want to weigh in?
Been in a talk with various statisticians. I'll keep this short as to not bore people but I had a basic question that had a somewhat distressing answer.
My question: when you include multiple predictors with shared variance in a regression model, what determines which predictors "win" (i.e. remain significant) in explaining the criterion?
The answer: The variable with the lowest standard error wins.
The problem: Behavioral scientists often have an almost dogmatic perspective on regression, such that the significant predictors are the ones that matter. But in reality, the predictors with the best reliability wins. Remember Stats 101, where one assumption of regression is perfect reliability? Most of us acknowledge this is impossible in behavioral science so we dismiss it. But in reality, reliability is everything in determining the retention of significant predictors in the model. And reliability does not equal validity, as we all know...
Bottom line: If you are trying to predict, not a huge issue I suppose. But in identifying essentialist aspects of psychology, it is a huge issue. Our methods of collecting data are highly variable and sometimes the most important variables happen to be the least reliable (e.g. self-report, fMRI...) and so garbage in, garbage out.
BTW I call this fallibility of "statistics" and not "regression" because the model of regression applies to 90% of what we do anyway (ANOVA, chi-square, DFA, PCA) and can be as generalizable as canonical correlation analysis.
Food for thought. Anyone want to weigh in?