Any chi Square experts out there?

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

Dr JPH

Membership Revoked
Removed
10+ Year Member
15+ Year Member
20+ Year Member
Joined
Feb 4, 2000
Messages
5,910
Reaction score
42
I have a question for anyone well versed in chi squares.

I have a set of data. A and B.

A = left handed people (10% of population)
B = right handed people (90% of population)

My data is showing the chance that a person in a given age group will have a heart attack, and then I am breaking it down into groups A and B.

(not my real data, but simplified example).

SO...I plug in the RAW data.

But, when I go to get my expected data I think there is a problem.

Lets say there are 100 people between the ages of 60-70 that have MIs.\

I would EXPECT 10 of the to be left handed (group A) and 90 of them to be right handed (group B).

But when I do the equation for the chi square analysis (column A x row 60-70y)/total , my number comes out too high in my mind.

So what I am asking...HOW do I alter the equation for inputing EXPECTED data into a chi square to reflect the population trend?

Its seems the data I am getting in my EXPECTED is assuming that both A and B are equally likely in the population which is NOT the case.

Can anyone help?

Members don't see this ad.
 
I think you are using the wrong tool - Chi square is a very poor method to use for that type of analysis. It has a high Beta error and does not do well with a stratified sample/analysis like you are suggesting in your example (by age and chacteristics) - I would say SPSS with a cross tabs analysis would give you more defendable results. The package is sold at most university book stores and does a ton of other stats.

Also you get your expected results from a normalized table (bell shaped). Generally, when you are searching for a physical characteristic to be related to an event - left handed - right handed - eye color - height - etc with an MI - CVA - etc you would do better with a regression type analysis and looking at the r value coeff's. Hope this helps.
 
I agree with james... use regression (linear might be okay in this case since the data is probably linear unless you have infants or some factor that gives your more heart attacks at younger ages). You're gonna get a p value if there is a relationship and an r that shows you well your line fits. You can then use a Fisher test to check for r difference and.... *bah too much to explain that part*.

Moral of the story: Chi-square is good for Yes/No results not a range of ages.
 
Members don't see this ad :)
I agree with james... use regression (linear might be okay in this case since the data is probably linear unless you have infants or some factor that gives your more heart attacks at younger ages). You're gonna get a p value if there is a relationship and an r that shows you well your line fits. You can then use a Fisher test to check for r difference and.... *bah too much to explain that part*.

Moral of the story: Chi-square is good for Yes/No results not a range of ages.
Yeah, but JPHazelton has dichotomous data. Chi square would be ok, as long as the "expected" cells aren't too low. (The rule of thumb is that you can't have have any expected cells under 5). If you have 100 in the Age Group 60-70 with AMI and that breaks down to expected of 10 and 90, you'd be ok in that row. Of course, you haven't mentioned your other row(s) yet. If you do have cells with sparse numbers, you have to use Fisher's exact test.

If you have multiple age rows, you can still use chi-square for general associations (hypothesis: are the proportions of L-handed and R-handed equal across ALL age groups?) and you can also do tests for trend (hypothesis: do the proportions of handedness increase/decrease as age group increases?).

But, if you go for regression methods (which you'd need to if you have more than 1 or 2 other factors you'd want to adjust for), you should use logistic (not linear) regress. You have categorical (nominal and ordinal data), not continuous (unless you convert age groups back to year, and you're hypothesis has to do with predicting AGE, not predicting handedness).

(everyone has an AMI, right?)
 
Hey JPHazelton, You have gotten some darn good advice here. Let us know which direction you head in and what you get in your results.
 
Can anyone reccomend a good book for getting up to speed on how to use this sort of thing correctly?
 
Thank you for all the advice.

I am still collecting information and I am trying to read up on all the different methods of statistical analysis.

Hopefully in the next few months I will be able to sit down and lay the data out to break it down. Right now with interviews its tough!

Thanks again!

BTW, here is a preliminary set of data:

Raw Data:

Collected from patients who have had heart attacks, questiong was asked "are you right handed or left handed".

AGE---------LEFT---------RIGHT-----TOTAL
50-59...........3..................6...............9
60-69...........7..................6...............13
70-79...........9..................8...............17
80-89...........3..................5...............8
-------------22------------25---------47

Again...I would "expect" that only 25% of my heart attack patients would be left handed (assuming that 25% of the population is left handed), though this does not seem to be the case. The 25% may not be the correct number...just used for illustration purposes.

Chi Square of this data gives me a significance of 0.694 when I calculate the "expected". But if I insert the expected data as to what it should be by percentages (25% of the total goes in the LEFT column, 75% in the RIGHT) then my significance is 0.002.

Thats my reason for initially questioning thic Chi Square as the most appropriate test.
 
Thank you for all the advice.

I am still collecting information and I am trying to read up on all the different methods of statistical analysis.

Hopefully in the next few months I will be able to sit down and lay the data out to break it down. Right now with interviews its tough!

Thanks again!

BTW, here is a preliminary set of data:

Raw Data:

Collected from patients who have had heart attacks, questiong was asked "are you right handed or left handed".

AGE---------LEFT---------RIGHT-----TOTAL
50-59...........3..................6...............9
60-69...........7..................6...............13
70-79...........9..................8...............17
80-89...........3..................5...............8
-------------22------------25---------47

Again...I would "expect" that only 25% of my heart attack patients would be left handed (assuming that 25% of the population is left handed), though this does not seem to be the case. The 25% may not be the correct number...just used for illustration purposes.

Chi Square of this data gives me a significance of 0.694 when I calculate the "expected". But if I insert the expected data as to what it should be by percentages (25% of the total goes in the LEFT column, 75% in the RIGHT) then my significance is 0.002.

Thats my reason for initially questioning thic Chi Square as the most appropriate test.

chi-square is the correct test... but your problem is as you stated... the population normally doesn't have equal proportions of left and right handed people.

The calculation will depend on what you are trying to show... a) In a regular population of patients with MI there are more lefties with MI than righties? or ... b) Lefties in general are more likely to have MI than righties.

Very subtle difference.
 
You have to be careful to state your hypothesis in terms of the variables you have. If you perform a chi-sq on the table that you set up, you're testing Age (rows) vs. Handedness (columns). All of the patients have AMI and you don't have a reference group, so it's not quite a "Lefties have more AMI than expected" question yet.

The strict null hypothesis for a two variable chi-square is "[Row variable] is not associated with [Column variable]" (and of course, the alternative hypothesis is that they are associated). The "expected" values come from your calculations of the row total * the column total divided by the total sample. So, it's ok that lefties and righties aren't 50/50. The test will be significant if you get shifts in the leftie/rightie ratio that varies with the rows (age groups) (in other words, if you saw 10% lefties in the young group and 80% lefties in the older age group, the chi-sq test might pick that up).

You can also test the proportion of lefties versus a particular number (which sounds like the direction JPHazelton was going). If you aren't as interested in what happens with the age groups and just want to test the overall hypothesis that "Proportion(lefties) equals 0.25 [or whatever your expected percent of lefties is]" (null) vs. "Proportion(lefties) is significantly different than 0.25" (alternative). You could use a chi-sq, or a binomial test, for that hypothesis.


Books: It's hard to get a get something that's comprehensive and still easy enough to use without having course work in the topic. But, one option is PDQ Statistics. It's set up for clinical researchers who need to know the basics of statistical tests.
 
By the way, I went ahead and ran your chi-sq in SAS and got the same p-value you did (p=0.694). That's because it is testing the association between age and handedness, and does not account for your different "expected" proportion of handedness. Also, the numbers will still be too low for chi-sq until you get the sample counts higher (the fisher's exact test yielded a p of 0.74).
 
By the way, I went ahead and ran your chi-sq in SAS and got the same p-value you did (p=0.694). That's because it is testing the association between age and handedness, and does not account for your different "expected" proportion of handedness. Also, the numbers will still be too low for chi-sq until you get the sample counts higher (the fisher's exact test yielded a p of 0.74).

Same results here when i did it last night.

Again you come back to the same points I made earlier... what is the hypothesis..
 
Starting to make more sense.

I am not so concerned about the age groups as I am about the entire population, LEFT vs NON LEFT as a whole.

So the chi square cannot account for the different population variance?

Does the fisher exact test account for this?\

I borrowed a friends stats book and I am going to flip through later.

Oh why didnt I pay attention in stats in college?!?! 😛
 
All fisher's exact test does extra is adjust for populations below 100.


what you need to do is this.

Since you dont care about the age groups.. sum up all the people who are left handed who had an MI... then sum the right handed people.

That would be 22 vs 25

Take the same amount of people who didnt total number of people is 47.

On a regular population of 47 people.. 25% are left handed ~12 and the rest are right handed ~35.

Now you can do a Fisher's exact test... and the answer is. p = 0.0526....oooooooh so close. (Round down and it becomes significantly different).

Congrats... more left handies got MI than what you expect from a regular population. You can put my name on the manuscript at the end... that's Faebinder with an F. hehe..
 
You can put my name on the manuscript at the end... that's Faebinder with an F. hehe..

Ok...wait...let me get this down...F-A-E...

😀

I really appreciate all the help from you folks. It will make this a whole lot easier!
 
Quick update.

With some more research I found my population variation to be closer to 23% vs 77% (Left vs Right)

With my data above I calculate a p value (using 2-tail fisher exact test) to be 0.0299

I am expecting more data on Tuesday and will enter more numbers then.

Can someone explain the difference between one-tail and two-tail in the fisher test?

My 2x2 matrix looks like this:

-----------Left-----Right
Actual:.......22..........25 (ACTUAL DATA COLLECTED)
Expected:...11..........36 (EXPECTED DATA BASED ON POPULATION VARIATION)
 
Top