Statistics analysis help!!

  • Thread starter Thread starter deleted4401
  • Start date Start date
This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.
D

deleted4401

I'm statistically inept, and need some help ... maybe someone can help me out.

I'm analyzing prostate cancer RT toxicities and their relation to the age of the patient.

If I have this data, for example:

Men Under 60: 33/87 had grade 1 toxicity (38%)
Men 60 to 70: 77/230 had grade 1 toxicity (33%)
Men 70 and above: 60/249 had grade 1 toxicity (24%)

How can I tell if those values are statistically significantly different from each other? Is it possible? What test would I use?

Man, this stuff makes my head spin! Any advice would be appreciated.

Simul
 
SimulD said:
I'm statistically inept, and need some help ... maybe someone can help me out.

I'm analyzing prostate cancer RT toxicities and their relation to the age of the patient.

If I have this data, for example:

Men Under 60: 33/87 had grade 1 toxicity (38%)
Men 60 to 70: 77/230 had grade 1 toxicity (33%)
Men 70 and above: 60/249 had grade 1 toxicity (24%)

How can I tell if those values are statistically significantly different from each other? Is it possible? What test would I use?

Man, this stuff makes my head spin! Any advice would be appreciated.

Simul


hi!

sounds like a terrific project! for your situation, i think you should use ANOVA. do you have a statistics package?(stata, minitab, spss, etc), cuz ANOVA by hand ain't fun. read up on the topic in your fave stats text, but the software will give you a significance value in table form and if it is <.001 you're in business.

*edit* specifically, read up on 1 way anova. 2 way anova would be for a case where you have the same data for both males and females of the same age groups and agian you want to test for statistical significance! stats is sooooo freaking cool, rite?!

i'm actually interested if someone could tell me whether they could simply work with the %ages to determine statistical significance. i was always under the assumption that you need the actual numbers to do ANOVA accurately. is there a way out of that?

less recommended, but another thing is to do a t test comparing men under 60 and men 60-70, <60 and >70, 60-70 and >70. if all 3 have statistical difference, then i'd feel safe saying there is significant difference between the 3 groups. BUT that's only if you don't want to learn ANOVa, which you should cuz it's nice and elegant(what r you gonna do with 35 groups?!)
 
A t-test is actually a form of ANOVA...just the simplist kind. The more individual t-tests that are done on a set of data increases the risk of type 1 error (rejecting a true null hypothesis)

The problem with using percent scores is that, as in this case, you may have unequal sample sizes (if the sample sizes are equal then percents and raw scores would be equivalent 🙂 )

If you need a really quick rough way to check if the groups are likely sig. different (and don't have access to statistic software), I'd suggest running 3 t-tests (between each of the groups). MS Excel has a data analysis program you can use for this purpose (but you do need the scores that make up the group means.

If you need to be sure - presenting the data, publishing, etc. you'll need to account for the unequal sample sizes...which is far from the first thing to learn about stats. Either way CaptainJack's advice is great - you really should know some basic stats to interpret research findings.

Also you need to know what p-value you will consider to be significant...there are differing opinions about how to do this: in the basic sciences p < 0.05 is often the standard, but when treatments are involved, the burden of proof is often required to be much higher (i.e. p<0.01 or p<0.001).

Good luck,
Jaegwon
 
Thanks for the help ... Okay, I looked up ANOVA earlier today and I ran it through, but I realized that it is a comparison for means/averages rather than for individual data points (i.e. for the <60 group - 33 events of grade 1 toxicity). Is that wrong or am I misunderstanding ANOVA. It seems to be 'analysis of variance', so it seemed to me that we need a mean with variances (i.e. if I had a few different groups of <60 year old men and I had a mean). Also, can this be considered a 'bell curve' or do I have to use tests that are not based on normal distribution.

Aaarrgh ... I've spent too much time on this!

I'll keep plowing through...

Thanks again,
Simul

CaptainJack02 said:
hi!

sounds like a terrific project! for your situation, i think you should use ANOVA. do you have a statistics package?(stata, minitab, spss, etc), cuz ANOVA by hand ain't fun. read up on the topic in your fave stats text, but the software will give you a significance value in table form and if it is <.001 you're in business.

*edit* specifically, read up on 1 way anova. 2 way anova would be for a case where you have the same data for both males and females of the same age groups and agian you want to test for statistical significance! stats is sooooo freaking cool, rite?!

i'm actually interested if someone could tell me whether they could simply work with the %ages to determine statistical significance. i was always under the assumption that you need the actual numbers to do ANOVA accurately. is there a way out of that?

less recommended, but another thing is to do a t test comparing men under 60 and men 60-70, <60 and >70, 60-70 and >70. if all 3 have statistical difference, then i'd feel safe saying there is significant difference between the 3 groups. BUT that's only if you don't want to learn ANOVa, which you should cuz it's nice and elegant(what r you gonna do with 35 groups?!)
 
Your right - ANOVA = ANalysis Of VAriance. The means/averages that you are talking about come from the individual data points....

The amount of variance within each group is compared with the amount of variance between groups - if the between-groups variance is relatively very large compared to the within-groups variance, then there is a significant difference between the groups.

You need the individual data points to calculate the means and standard deviations (variance) for each of the groups...and one of the assumptions of using an ANOVA is that the data come from a normal distribution (bell curve).

I think your best bet is to ask someone for help. Its one thing to describe stats theory online, but even once you have that down, you'll need to know how to feed your data into a stats program....

I hope some of that helped...
 
jaegwon said:
A t-test is actually a form of ANOVA...just the simplist kind. The more individual t-tests that are done on a set of data increases the risk of type 1 error (rejecting a true null hypothesis)

The problem with using percent scores is that, as in this case, you may have unequal sample sizes (if the sample sizes are equal then percents and raw scores would be equivalent 🙂 )

If you need a really quick rough way to check if the groups are likely sig. different (and don't have access to statistic software), I'd suggest running 3 t-tests (between each of the groups). MS Excel has a data analysis program you can use for this purpose (but you do need the scores that make up the group means.

If you need to be sure - presenting the data, publishing, etc. you'll need to account for the unequal sample sizes...which is far from the first thing to learn about stats. Either way CaptainJack's advice is great - you really should know some basic stats to interpret research findings.

Also you need to know what p-value you will consider to be significant...there are differing opinions about how to do this: in the basic sciences p < 0.05 is often the standard, but when treatments are involved, the burden of proof is often required to be much higher (i.e. p<0.01 or p<0.001).

Good luck,
Jaegwon

Watch out with the multiple t-tests--each successive one you do increases the error. If you do this, there are corrections you need to make (Tukey, SNK, Bonferroni are some examples)--these differ in their level of conservativeness, with Bonferroni being the most conservative (i.e. likely to miss small differences) and SNK the least.
 
YOU HAVE CATEGORICAL DATA. NO NEED FOR T-TESTS OR ANOVAS! You will need to conduct a 3X2 chi-square test to examine the association between age and Stage 1 toxicity.

Try this website (http://www.georgetown.edu/faculty/ballc/webtools/web_chi.html). Columns will be the age groups. Rows will be Yes or No Stage 1 Toxicity.

Here's your result: Age is significantly associated with Stage 1 Toxicity, X2=8.06, p=0.025.
 
Good call Public Health! Chi-test is quicker...

SimulD - keep in mind that tests like chi-square are non-parametric tests and are more rough estimate of confidences. If you have a choice, it is better to use parametric tests (ie ANOVA)...but chi square is useful when you need to accept data that doesn't meet the assumptions required for parametric tests.
 
I ended up figuring it out this morning that a table chi square test was what I needed. I'm glad you all confirmed it ...

I ended up getting a lange biostats book that I'm going to go through - it seems important enough to have at least a loose grasp of all this stuff.

Thanks again,
Simul
 
Yes a chi-square analysis is the best for that! Did you finished your project?
 
Chi square test or Z tests (for proportions) will do just fine. The Z test is pretty cool because it will tell you exactly where the significant differences are, not simply that there is one somewhere in your 3 groups.
 
I know this is old, but commenting for anybody seeing this in the future. If individual ages are available, I believe it would be a better analysis to use that continuous data instead of condensing it into three categories. Converting continuous data--->categorical data loses a lot of information
 
Top