What's the reasoning behind the sample size minimum of 10?

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

loveoforganic

-Account Deactivated-
10+ Year Member
Joined
Jan 30, 2009
Messages
4,218
Reaction score
14
I'm working on a study with a sample of slightly more than 100 adolescents from a detention center. They were interviewed using the PCL, and I'm breaking them down into smaller samples based on their scores for two of the factors (high/high, high/low, low/low, low/high, middle/middle). At any rate, as might be expected, there aren't that many subjects scoring low in one factor and high in the other (n=4, n=6).

Analyzing those five samples with ANOVA, I get some statistically significant results (all <.05, a few approaching <.01). However, I'm being told that the psychology community generally doesn't accept any samples less than 10 as valid, and even 10 is generally pushing it. I thought that the as the tests are designed to require more extreme results (f values in this case) for the smaller sample sizes, sample size wouldn't really be a factor as long as statistically significant results were produced. What is the reasoning behind this?

Thanks 🙂
 
If your cell sizes are too small the stats procedures will break down because you'll likely start violating a number of assumptions (homoskedasticity, equality of variance, normality).

This could be exacerbated by your coding. I'm not familiar with the PCL (this is the psychopathy measure, right?), but if it doesn't have a cutoff for high/low, and you're basing your cut on sample means, this doubles the problems.
 
What JockNerd said is all correct. Basically, small sample sizes have greater variability and are less likely to estimate what the underlying population actually looks like. So you have a much larger change for hetergeneity of variance, etc.

Also, JockNerd's second point was important - if your measure (which I also am not familiar with) is continuous, simply spitting scores into 'high v low' is statistically not correct and is likely messing things up even further.
 
think of it this way... an average score for 4 people is a lot less reliable than an average score for over 100 people.
 
Thank you, I hadn't considered the homogeneity of variance 🙂 Yes, the PCL is a psychopathy measure. I'm pretty sure there isn't a pre-established cutoff score for the individual factors of the PCL designating high/low values. The way the samples were broken down were

1) Top 1/3 factor 1 + Top 1/3 factor 2
2) Top 1/3 factor 1 + Bottom 1/3 factor 2
3) Bottom 1/3 factor 1 + Bottom 1/3 factor 2
4) Bottom 1/3 factor 1 + Top 1/3 factor 2
5) Middle 1/3 factor 1 + Middle 1/3 factor 2

I had received conflicting info from my advisors as to whether it was OK to analyze the data in this manner, and the general consensus was that, though the cutoff scores were arbitrary, it was still an acceptable way to view trends. However, given the sample size problem I ran into, I'm now going to be analyzing continuously and by correlation.

Thanks for the info 🙂
 
Splitting data into groups like that is "okay" in the sense that it will work (assuming you do it correctly), and can potentially provide information of interest. In this case if you were going to analyze it using splits, you should probably have grouped it differently - did you just drop people who were in the middle third on one factor and the lower or upper third on another? If you were going to use splits you should probably group it separately by each factor and run it as a 3 x 2 instead. However, unless you are explicitly interested in them as categories (i.e. validating widely used cutoffs) it is always best to leave continuous variables as continuous. To do otherwise is just not utiliizing the information you have. I've done splits for posters/presentations where it was just a preliminary report but if you want to write something up for publication or something that will undergo more rigorous peer review and split the data like that you're probably going to get critiqued for it at most major journals.
 
Last edited:
Understandable, thanks for the heads up. Hopefully the statistically signficant stuff sticks around! Assuming it doesn't, or assuming different things become significant, would you choose to just report the findings from the continuous analysis or would you note discrepancies/similarities to the ANOVA?

To your edit:

did you just drop people who were in the middle third on one factor and the lower or upper third on another?

Yes. Edit again: Was going to ask for an opinion in grouping another way, but my same sample size problems would persist, so i won't waste your time. I'll consider that grouping in that manner if it becomes relevant for another project. Thanks for the advice.
 
Last edited:
I wouldn't bother noting the discrepancy. Its a less-than-ideal way of analyzing the data, so it wouldn't surprise anyone that they are different. The idea behind leaving them continuous is that you will get a more accurate picture of the data, so I think it would be a bit awkward to explain differences between that and a less ideal form of analysis. Occasionally I see things analyzed two different ways and the results compared when one could make an argument that the analyses are equally valid or are providing slightly different information. In this case, its pretty cut and dry that analyzing them as continuous variables is "better" so I think you'd be hard-pressed to explain why you analyzed it twice.
 
I'm working on a study with a sample of slightly more than 100 adolescents from a detention center. They were interviewed using the PCL

Completely OT, but did you do all the PCLs yourself? I hear "100 PCLs" and cringe at the time involved- the PCL-R interview I do with inmates usually takes 1.5 to 2 hours each, plus an additional hour or so do to the record review and scoring.
 
Thank you for the help Ollie 🙂

And no, the PCL's were previously done :luck:
 
Top