Checking for normality

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

Do you check for normality, and if so, what procedures do you use?

  • I do not routinely check for normality

    Votes: 2 7.7%
  • QQ Plots

    Votes: 0 0.0%
  • Histograms

    Votes: 2 7.7%
  • Boxplots

    Votes: 0 0.0%
  • Skewness/Kurtosis statistics

    Votes: 10 38.5%
  • Shapiro-Wilk (or similar) test

    Votes: 2 7.7%
  • Other Procedures

    Votes: 0 0.0%
  • Some combination of the above

    Votes: 10 38.5%

  • Total voters
    26

Ollie123

Full Member
15+ Year Member
Joined
Feb 19, 2007
Messages
5,651
Reaction score
3,953
Posting this out of curiosity, as I have once again gotten caught up with being overly "thorough" in my data analysis to the point that I can't seem to find a way to get past it and progress to actual analysis. This was spurred by seeing distributions that are not even remotely normally distributed on some widely used self-report measures in a quite typical protocol. After a thorough literature search I was unable to find ANYONE that did anything to address normality and I find it extremely unlikely this study was the first one where we saw a non-normal distribution on a measure that has been used hundreds, if not thousands, of times.

My colleagues seem convinced I obsess over these statistical issues when most people do not even look at these issues, so I guess I'm looking for some empirical evidence to either justify my (likely ridiculous) behavior or convince me I'm far more obsessive than is healthy. For the record, I do each of the poll options, will typically run a Box-Cox and/or do anywhere from 2-8 additional transformations (along with inclusion/exclusion of outliers) and will likely run any given analysis on several different versions to examine convergent validity of the outcomes and to test the limits of the data. I'm increasingly convinced most people just plug in the primary variable, maybe look for weird values if they don't like the results, and call it a day. Not sure I'm ever going to be comfortable with the latter, but could probably stand to find more of a middle ground.

Including a poll since I figured some may not want to admit publicly if the answer is that they don't check (though that also perhaps shows my bias :laugh:), but would also love to discuss the merits and costs/benefits of it. This is where I typically get caught up when writing. Once I have outcomes I'm confident in, I write fairly quickly, but can spend eons cleaning the data and getting it ready for analysis.
 
Last edited:
FWIW, I use histograms and skewness/kurtosis stats. Then I'll transform variables if one or both of these indicate it is necessary. So far this hasn't happened too often, knock on wood. I find that checking for normality is the most anxiety-provoking part of analysis.
 
psycscientist - Sounds like your biostatisticians are more helpful than ours! I'm also at an AMC and have found them utterly useless for most issues we encounter with our research. They don't "get" things that aren't clinical trials, and we've largely given up including them on our protocols.

I guess my issue is just making the "Close-enough" judgment - I typically see improvement with transformation, but not enough for me to feel comfortable calling it normal. For example, one of our variables has an inflated zero (essentially a floor effect) so even with intensive transformation it ends up being bimodal. I'm debating dichotimizing it (zero, non-zero), but the problem is I've generally found reviewers are barely receptive to transformation and react very poorly to things like that (one of the things that has helped convince me a huge portion of researchers don't even check).

That said, it is great to hear I'm not alone in my approach to these things.
 
Unfortunately, its a complex design (mutli-factorial, within-subjects experimental design) and to my knowledge, no one has developed a non-parametric method for data of this variety. If we had fewer levels I could do a rank-transform (which makes me gloriously uncomfortable as I'm not sure enough studies have been done to show that is acceptable) but alas, we do not.

I'm actually analyzing the data using GEE, which is incredibly flexible. Problem is that GEE isn't non-parametric in the sense of "No particular assumptions about the distribution" but non-parametric as in "You can specify a link function". Which returns me to the original issue of what on earth to call this thing. Its probably closer to a zero-inflated poisson than a normal distribution but is somewhere in the middle. Mayhaps I'll just run both and see how they compare.

I'm going to email Liang and Zeger and tell them that they need to develop a link function for "weird". They did for Gaussian, Poisson, Logit, and always these other distributions I never actually see, but left out the most common one!
 
It's possible that I may misunderstand some of what you are saying, but at least according to my multivariate/psychometrics professor, the results of normality tests generally do not go into journals. It's more something that your expected to be able to present when it's asked for. So if your doing a literature search to see what normality tests people use, I don't think you would find anything.

I know that at my school we are supposed to use multipe methods of normality testing. Certainly when it comes to dissertation that is checked for.
 
It's possible that I may misunderstand some of what you are saying, but at least according to my multivariate/psychometrics professor, the results of normality tests generally do not go into journals. It's more something that your expected to be able to present when it's asked for. So if your doing a literature search to see what normality tests people use, I don't think you would find anything.

I know that at my school we are supposed to use multipe methods of normality testing. Certainly when it comes to dissertation that is checked for.

I'd imagine this is likely the case, although I also wouldn't be surprised if many (hopefully not most) researchers eschew normality testing in general...or just glance at skewness and kurtosis values before running their analyses regardless.

Then again, if a transformation occurs, that generally should be reported in an article. So if Ollie isn't finding any mention of this happening, then yeah, that'd indicate that normality testing (or even data screening in general) isn't occurring as often as it should. The amount of mean substitution that still seems to occur wouldn't necessarily convince me otherwise, either.
 
Yeah - to clarify, I wasn't expecting people to report WHAT they did to assess normality. Rather that since I got such an incredibly skewed distribution (and presumably my sample is not THAT different from everyone else's), I would have expected prior papers to report having conducted some kind of transformation or used a non-parametric test, etc. Haven't found a single one.

Mean substitution actually doesn't bother me as much, since there is new evidence coming out suggesting we wayyyy overdo it with MI and actually only makes a difference if there a significant amount of missing data (something that has been talked about for years but no one produced a paper on it)....so running MI is great when you are missing 10% of your questionnaires, but totally unnecessary when you are missing a handful of items across your entire sample.

That said, what bothers me is that very few people ever report how it was handled. I find it hard to believe all these researchers have zero missing data across all variables. Which leads me to wonder if some people just ignore the blank values (fine for computing means...very, very bad for computing sums).
 
Yeah - to clarify, I wasn't expecting people to report WHAT they did to assess normality. Rather that since I got such an incredibly skewed distribution (and presumably my sample is not THAT different from everyone else's), I would have expected prior papers to report having conducted some kind of transformation or used a non-parametric test, etc. Haven't found a single one.

Mean substitution actually doesn't bother me as much, since there is new evidence coming out suggesting we wayyyy overdo it with MI and actually only makes a difference if there a significant amount of missing data (something that has been talked about for years but no one produced a paper on it)....so running MI is great when you are missing 10% of your questionnaires, but totally unnecessary when you are missing a handful of items across your entire sample.

That said, what bothers me is that very few people ever report how it was handled. I find it hard to believe all these researchers have zero missing data across all variables. Which leads me to wonder if some people just ignore the blank values (fine for computing means...very, very bad for computing sums).

Agreed. I actually found some of the research you're likely talking about while looking up what to do with my own missing data (which was <1%), and also finding little guidance in existing literature beyond listwise deletion (with no mention of MCAR). I actually ended up going with EM myself, just based on the ease and the perceived lack of incremental improvement that would've been had from something like MI.

But yes, it does seem odd that if you're working in an area with a frequently-used set of measures and are running into very non-normal data that no one before you would've mentioned how they handled things.
 
I think it's good you are checking for all this. My advisor makes a point about this on all the papers we reviewed together for journals. My stats professor recommended skewness and kurtosis. He didn't recommend transformations and mean imputations. We had to learn Mplus for the complex stuff (which is most of the stuff we do) and used bootstrapping or MLR for SEM models.
 
Interesting question and I can totally see why you're asking. It's like a dirty little secret or something. It's not like you can look in a selection of papers and see how they checked or normalized their data. And I'm, literally, shocked that some don't even bother checking for normality!! Holy smokes. I feel more competent than I thought I was!! Either that, or I am disappointed by the fact that anyone can basically make their data look better by normalizing or not normalizing...or anything else.
 
I'm actually surprised how many people do...in talking to other researchers I've definitely got the impression that many don't bother unless they have a reason to be suspicious. And this is a fairly basic step...I'm frightened to think how rarely some things (i.e. confirming normality of residuals) are done.

PsychResearch - out of curiosity, what did he recommend in lieu of transformations? I've never heard of someone being against transformation before. Bootstrapping should theoretically work for most anything, but I'm not sure I've ever even seen it attempted with the sort of complex experimental work that we do.

Every thread where we discuss these issues makes me want to do a quant post-doc!
 
I'm actually surprised how many people do...in talking to other researchers I've definitely got the impression that many don't bother unless they have a reason to be suspicious. And this is a fairly basic step...I'm frightened to think how rarely some things (i.e. confirming normality of residuals) are done.

PsychResearch - out of curiosity, what did he recommend in lieu of transformations? I've never heard of someone being against transformation before. Bootstrapping should theoretically work for most anything, but I'm not sure I've ever even seen it attempted with the sort of complex experimental work that we do.

Every thread where we discuss these issues makes me want to do a quant post-doc!


Ollie, bootstrapping and MLR (which is supposed to be robust to violations of non-normality) for the SEM models. The professor who taught the class is a famous methodologist. I felt like we almost went overboard sometimes and obssessed over the details. So, everything we did we had to do it twice or with multiple methods. For example, for missing data we would use both full information estimation maximum likelihood (FIML) methods and multiple imputation stategies based on the Expectation Maximization algorithm. At some point, students (who were at the dissertation stage at that time and ahead of me) had to do 5 imputed datasets and then merge those (which was a nightmare for them). At some point they were talking about 20 imputed datasets to then go back to 1! I think it is a fine line that one walks here and it is good to check for all this (and we do even if at the end of the day we end up with 2 lines saying how we checked for all the preliminary stuff -- non-normality, missing data, missing data bias etc.) and how we handled it. I also think now that it is important to be pragmatic about these things because in my case I took forever with this stuff for my first paper.

Now we have another professor who is a Clinical Psychologist by training with some advanced background in methdology and I've heard his students doing transformations. They also use the Baron and Kenny method for mediation, which is also another method the first professor told us not to use. At some point, I began to remember what not to use rather than what to use 🙂 I think specializing in quant. methods is a great idea!!
 
Top