# biostat q

#### MudPhud20XX

5+ Year Member
Hi all, so I noticed that "pearson correlation coefficients" are in FA. I got this q and need some help. Can anyone just explain the concepts we gotta know for step 1 in layman term? Also, if you could help me with this q, I would really appreciate it. Thank you.

An investigator is evaluating the health benefits of daily activity with use of the Minnesota Activity Survey (MAS). The MAS yields a score ranging from 100 to 0, with 100 indicating the highest activity level and 0 indicating the lowest. One such study shows MAS values compared with age and weight obtained during the routine health maintenance examinations. The results of this study are shown as Pearson correlation coefficients: (see the attached image)

Based on the data shown, which of the following is the most appropriate conclustion?

A. Age and weight are positively correlated
B. Age is a stronger predictor of MAS than is weight
C. Both age and weight are associated with activity level
D. Higher daily activity levels lead to reduced weight
E. The relationship of age to MAS is best conceptualized as linear

#### Attachments

• 7.8 KB Views: 31

#### Keto

Pearson correlation tells us the degree to which 2 variables are correlated. 2 intervals [1] are used for that (can be f.e. height, weight, blood pressure, drug dosage, ...). Pearson looks only at linear relationship.
So you could for example answer the question, whether there is a linear correlation of exercise and blood pressure? Or is there a correlation between age and sleep duration?

• Then you plot the 2 intervals and calculate your correlation. You get an r value.
• r ranges from -1 to +1.
• r = -1
strong inverse relationship
like in the image posted above (a lot of pack-years of smoking = decrease lifespan)
• r = +1
strong positive relationship
• r = 0
no relationship

If in a pearson correlation, a p value is reported, then it tells you something about significance of the correlation. p < 0.05 indicates significant correlation.

If r^2 is reported, it's called coefficient of determination, which will answer the question, how much of of y variation is explained by x.

But always remember: Correlation does not mean causation.

With that being said, try to answer your question from above and we can discuss it afterwards!

Have a great day!

[1] If you want to compare 2 ordinal level variables, look for Spearman correlation.

MudPhud20XX
OP

#### MudPhud20XX

5+ Year Member
Pearson correlation tells us the degree to which 2 variables are correlated. 2 intervals [1] are used for that (can be f.e. height, weight, blood pressure, drug dosage, ...). Pearson looks only at linear relationship.
So you could for example answer the question, whether there is a linear correlation of exercise and blood pressure? Or is there a correlation between age and sleep duration?

• Then you plot the 2 intervals and calculate your correlation. You get an r value.
• r ranges from -1 to +1.
• r = -1
strong inverse relationship
like in the image posted above (a lot of pack-years of smoking = decrease lifespan)
• r = +1
strong positive relationship
• r = 0
no relationship

If in a pearson correlation, a p value is reported, then it tells you something about significance of the correlation. p < 0.05 indicates significant correlation.

If r^2 is reported, it's called coefficient of determination, which will answer the question, how much of of y variation is explained by x.

But always remember: Correlation does not mean causation.

With that being said, try to answer your question from above and we can discuss it afterwards!

Have a great day!

[1] If you want to compare 2 ordinal level variables, look for Spearman correlation.
Dang. Thanks so much!!!

Okay so looking at the table, we have two correlations: MAS vs. weight that has r = -.43 and MAS vs. age that has r = +.31. So weight is negativley weakly associated with MAS and age is weakly positively associated with MAS right? So if I were to use process of elimination, C looks like an answer, does this sound correct?

#### Keto

Hi.

An investigator is evaluating the health benefits of daily activity with use of the Minnesota Activity Survey (MAS). The MAS yields a score ranging from 100 to 0, with 100 indicating the highest activity level and 0 indicating the lowest. One such study shows MAS values compared with age and weight obtained during the routine health maintenance examinations. The results of this study are shown as Pearson correlation coefficients: (see the attached image)

Based on the data shown, which of the following is the most appropriate conclustion?

A. Age and weight are positively correlated No, weight is negatively correlated
B. Age is a stronger predictor of MAS than is weight No, weight is stronger correlated (closer to +/-1)..
C. Both age and weight are associated with activity level There is a significant relationship between MAS and weight (negative) and MAS and age (positive)
D. Higher daily activity levels lead to reduced weight This is a causation statement, pearson only tells us something about correlation
E. The relationship of age to MAS is best conceptualized as linear Not clear, as there might be another function, that represents the relationship better then a linear curve..

I would also go with C

Have a great day

OP

#### MudPhud20XX

5+ Year Member
Keto you da best!

OP

#### MudPhud20XX

5+ Year Member
Alright, so I got these 3 biostat questions and was wondering if I could get some help and feedback.

1. A study is conducted to evaluate intelligence quotient (IQ) scores for pts with different types of schizophrenia. Results for the 100 pts who complete the test show an average IQ of 110 with a standard deviation of 20. Which of the following is the best estimate of the 95% confidence interval for the mean in this sample?

A. 70 to 130
B. 70 to 150
C. 85 to 115
D. 90 to 130
E. 105 to 115
F. 106 to 114

2. At a local medical center, the incidence of nurse-related needlestick injuires in the first year of employment is 3 times higher than that of nurses working 3 years or more. Wishing to reduce the incidence of needlestick injuires, the hospital administration hires a consultant to determine if the hospital should hire only those nursing-school graduates whose training includes formal instruction in universal precautions. To test this idea, 40 nurses who reported needlestick injuires during the first year of employment are compared wtih 40 nurses who reported no needlestick injuires durign their first year. The schools of all 80 nurses are contacted and asked about formal training programs for universal precaustions. The results of the study are presented as below:

A. (19/53)/(21/27)
B. (19/40)/(34/40)
C. (19X34)/(21X6)
D. (21/27)/(19/53)
E. (21/40)/(6/40)
F. (21X34)/(6X19)
G. (19/53) - (21/27)
H. (21/27) - (19/53)

3. In a city with a population of 1 million, 10,000 individuals have AIDS. There are 1,000 new cases of AIDS and 200 deaths each year from the dz. There are 2,500 deaths per year from all causes. Assuming no net emigration from or immigration to the city, what is the incidence of AIDS in this city?

#### W19

##### Membership Revoked
Removed
Removed
Account on Hold
Gold Donor
2+ Year Member
#1) 2 sdv below and above average gives you the 95% CI .. Since the average is 110 and sdv is 20 .. Then (110 - 40) and (110 + 40) .... 70 to 150 is the 95% CI....

Let someone else take a bite at the other one... I took biostats first week of MS1... I forget these stuff.

OP

#### MudPhud20XX

5+ Year Member
#1) 2 sdv below and above average gives you the 95% CI .. Since the average is 110 and sdv is 20 .. Then (110 - 40) and (110 + 40) .... 70 to 150 is the 95% CI....

Let someone else take a bite at the other one... I took biostats first week of MS1... I forget these stuff.
Thanks for the feedback. That's exactly what I did but I got it wrong.

#### W19

##### Membership Revoked
Removed
Removed
Account on Hold
Gold Donor
2+ Year Member
Thanks for the feedback. That's exactly what I did but I got it wrong.
Would you mind sharing the right answer?

OP

#### MudPhud20XX

5+ Year Member
I will but let's wait until we get some more feedback on this. Thank you.

W19

#### Patau

2+ Year Member
CI = range from [mean - Z(SEM)] to [mean + Z(SEM)]

For 95% Z = 1.96 or approximately 2

SEM = SD/square root(N)

So plugging in the numbers: [110 - 2(20/10)] to [110 - 2(20/10)]

106 to 114

I think you guys might have missed the idea of what a confidence interval is... which is a range of values so defined that there is a specified probability that the value of a parameter lies within.

Another way of saying that is, "I am 95% confident that the true value of the parameter (average IQ of schzophrenia patients) is in our confidence interval (106 to 114)"

I think what that +40/-40 was the 95% range... which is 95% of the data will fall into that range...

Last edited:
OP

#### MudPhud20XX

5+ Year Member
CI = range from [mean - Z(SEM)] to [mean + Z(SEM)]

For 95% Z = 1.96 or approximately 2

SEM = SD/square root(N)

So plugging in the numbers: [110 - 2(20/10)] to [110 - 2(20/10)]

106 to 114

I think you guys might have missed the idea of what a confidence interval is... which is a range of values so defined that there is a specified probability that the value of a parameter lies within.

Another way of saying that is, "I am 95% confident that the true value of the parameter (average IQ of schzophrenia patients) is in our confidence interval (106 to 114)"

I think what that +40/-40 was the 95% range... which is 95% of the data will fall into that range...
Dang Patau, you're good. Many thanks for the explanation. Could you or anyone help us with the other 2 questions?

#### Patau

2+ Year Member
For number two, I do not see a question...

3. In a city with a population of 1 million, 10,000 individuals have AIDS. There are 1,000 new cases of AIDS and 200 deaths each year from the dz. There are 2,500 deaths per year from all causes. Assuming no net emigration from or immigration to the city, what is the incidence of AIDS in this city?

First you have to know what incidence means which is the rate at which new events occur in a population. Sometimes we think we know what that means but in reality we do not. So rely on the formula to help you remember.

In a formula incidence translates to: Incidence = number of new events in a period / total number of persons at risk.

Now you must understand what the formula is telling you... There is a certain amount of new events (diseases) that occur in a year (typically studies measure in years or could be months, decades, etc) and these new events we are gonna divide by the total number of persons at risk because they are the ones who will be affected .

So now that we know the amount of new events in a period divided by the total at risk population will give us incidence We can plug in numbers. So what are the number of new events (in this case AIDS) well that is 1,000 new cases (says it word for word in the question). What is the at risk population? That is 990,000, because the total population is 1 million subtract the people who already have AIDS 10,000. The reason why you subtract 10,000 is because how can people who have AIDS be at risk for getting AIDS again???? I think of it as a conversation between two buddies: Friend #1: Bruh, you at risk for getting AIDS. Friend #2: Oh you didn't know, I am not at risk because I already have AIDS. The point is still the same you can not be at risk for AIDS if you already have AIDS. AIDS can be substituted for any disease, such as Herpes. You can not be at risk for Herpes if you already have Herpes.

1,000/990,000

Last edited:
MudPhud20XX
OP

#### MudPhud20XX

5+ Year Member
For number two, I do not see a question...

3. In a city with a population of 1 million, 10,000 individuals have AIDS. There are 1,000 new cases of AIDS and 200 deaths each year from the dz. There are 2,500 deaths per year from all causes. Assuming no net emigration from or immigration to the city, what is the incidence of AIDS in this city?

First you have to know what incidence means which is the rate at which new events occur in a population. Sometimes we think we know what that means but in reality we do not. So rely on the formula to help you remember.

In a formula incidence translates to: Incidence = number of new events in a period / total number of persons at risk.

Now you must understand what the formula is telling you... There is a certain amount of new events (diseases) that occur in a year (typically studies measure in years or could be months, decades, etc) and these new events we are gonna divide by the total number of persons at risk because they are the ones who will be affected .

So now that we know the amount of new events in a period divided by the total at risk population will give us incidence We can plug in numbers. So what are the number of new events (in this case AIDS) well that is 1,000 new cases (says it word for word in the question). What is the at risk population? That is 990,000, because the total population is 1 million subtract the people who already have AIDS 10,000. The reason why you subtract 10,000 is because how can people who have AIDS be at risk for getting AIDS again???? I think of it as a conversation between two buddies: Friend #1: Bruh, you at risk for getting AIDS. Friend #2: Oh you didn't know, I am not at risk because I already have AIDS. The point is still the same you can not be at risk for AIDS if you already have AIDS. AIDS can be substituted for any disease, such as Herpes. You can not be at risk for Herpes if you already have Herpes.

1,000/990,000
Thank you again Patau. Shouldn't we substract the number of deaths from the denominator? Since we can ignore dead people from the total number of persons at risk, right?

here is #2:

2. At a local medical center, the incidence of nurse-related needlestick injuires in the first year of employment is 3 times higher than that of nurses working 3 years or more. Wishing to reduce the incidence of needlestick injuires, the hospital administration hires a consultant to determine if the hospital should hire only those nursing-school graduates whose training includes formal instruction in universal precautions. To test this idea, 40 nurses who reported needlestick injuires during the first year of employment are compared wtih 40 nurses who reported no needlestick injuires durign their first year. The schools of all 80 nurses are contacted and asked about formal training programs for universal precaustions. The results of the study are presented as below:

A. (19/53)/(21/27)
B. (19/40)/(34/40)
C. (19X34)/(21X6)
D. (21/27)/(19/53)
E. (21/40)/(6/40)
F. (21X34)/(6X19)
G. (19/53) - (21/27)
H. (21/27) - (19/53)

#### Patau

2+ Year Member
Thank you again Patau. Shouldn't we substract the number of deaths from the denominator? Since we can ignore dead people from the total number of persons at risk, right?
I see your point. You have to think of deaths as a distraction because in any given population (1 million) there will be some deaths in a year (2,500 from all causes and 200 from AIDS). Incidence does not take into account deaths only the population at risk. So even though deaths occur these people where alive at some point in the year and were at risk to getting the disease (AIDS).

This study is taking into account data from previous years and asking you to solve for incidence. Yes deaths occur however they are not represented by the incidence because they were alive and at risk when the data was recording.

I see a passage for question 2 but no question... can you reread the question and see if you missed copying a line or something?

Last edited:
MudPhud20XX
OP

#### MudPhud20XX

5+ Year Member
Got it, so disregard death in incidence! Thanks so much.

Here is the q for the 2nd q.

"Which of the following formulas would most likely be used to analyze data from this study?"

#### Patau

2+ Year Member
Okay that makes more sense now...

When you analyze data from different types of studies there are formulas that correspond with each type of study. In this example case-control is analyzed by odds ratio. You are essentially comparing a group of people with disease (nurses who injure) to a group of people without disease (nurses who do not injure). You are then looking at a prior exposure or risk factor (formal training). Then you asks the question, "What happened?" (Did they get proper training?).

So you have to realize what type of study this is: Case-Control.
What formula is used for Case-control? Odds ratio.
What is the formula for Odds ratio? (A x D)/ (B x C)
(21 x 34) / (6 x 19)

The odds ratio tells you that nurses who had no formal training had higher odds of injuring than those who had formal training. It determines causality. Formal training is the cause of nurses not injuring as many patients.

MudPhud20XX
OP

#### MudPhud20XX

5+ Year Member
A study is conducted to determine the efficay of a new drug in preventing hospitalization of patients from HIV-related pneumonia. Medical records from participating physicians are used to select one group of pts who received the drug in the year preceding the observation period (Group A), and a second group of pts who had not received the drug in the preceding year (Group B). Both groups are followed for a 3 yr period to determine the number of hospitalizations and the number of deaths from all causes. The number of hospitalizations for Group A was lower (p = 0.001) than fro the control group. However, the mortality rate from all causes was found to be higher (p = 0.01) in Group A. Based on these results, the researchers conclude that the drug should not be used. This conclusion is most likely invalid b/c of which of the following scenarios?

A. Expectiations of the researchers are affected the outcome of the study
B. Knowledge about whether pts had taken the drug or not biased the measurement of the outcome variables
C. Pts who received the new drug may be less healthy than those who did not
D. Subjects fail to accurately recall events in the past
E. The existence of a Hawthrone effect

OP

#### MudPhud20XX

5+ Year Member
A study is conducted to determine the efficay of a new drug in preventing hospitalization of patients from HIV-related pneumonia. Medical records from participating physicians are used to select one group of pts who received the drug in the year preceding the observation period (Group A), and a second group of pts who had not received the drug in the preceding year (Group B). Both groups are followed for a 3 yr period to determine the number of hospitalizations and the number of deaths from all causes. The number of hospitalizations for Group A was lower (p = 0.001) than fro the control group. However, the mortality rate from all causes was found to be higher (p = 0.01) in Group A. Based on these results, the researchers conclude that the drug should not be used. This conclusion is most likely invalid b/c of which of the following scenarios?

A. Expectiations of the researchers are affected the outcome of the study
B. Knowledge about whether pts had taken the drug or not biased the measurement of the outcome variables
C. Pts who received the new drug may be less healthy than those who did not
D. Subjects fail to accurately recall events in the past
E. The existence of a Hawthrone effect
Anyone?

OP

#### MudPhud20XX

5+ Year Member
Can anyone help me with this too?

A study is conducted to examine the relationship btw alcohol consumption and medical school performance. Participants in this study were classified as abstainers, light drinkers, or heavy drinkers using established Research Diagnostic Criteria. The same participants were also classified as being in the top, middle, or bottom of their class. Results showed that students in the top or the bottom of class were more likely to be heavy drinkers with a p-value of < 0.01. Which of the following statistical tests was most likely used to generate this result?

A. Analysis of variance
B. Chi-square
C. Matched pairs t-test
D. Meta-analysis
E. Pearson correlation coefficient
F. Pooled t-test

#### TheNsg300

##### A Neurosurgeon in the making :)
2+ Year Member
A study is conducted to determine the efficay of a new drug in preventing hospitalization of patients from HIV-related pneumonia. Medical records from participating physicians are used to select one group of pts who received the drug in the year preceding the observation period (Group A), and a second group of pts who had not received the drug in the preceding year (Group B). Both groups are followed for a 3 yr period to determine the number of hospitalizations and the number of deaths from all causes. The number of hospitalizations for Group A was lower (p = 0.001) than fro the control group. However, the mortality rate from all causes was found to be higher (p = 0.01) in Group A. Based on these results, the researchers conclude that the drug should not be used. This conclusion is most likely invalid b/c of which of the following scenarios?

A. Expectiations of the researchers are affected the outcome of the study
B. Knowledge about whether pts had taken the drug or not biased the measurement of the outcome variables
C. Pts who received the new drug may be less healthy than those who did not
D. Subjects fail to accurately recall events in the past
E. The existence of a Hawthrone effect
I wud go wit C,,selection bias,,use of medical records of selecting patients in group A nd likelihood of receiving experimental drug wud be associated wit recent admissions nd more comorbities than group B patients,,that didn't receive the drug,,

MudPhud20XX
OP

#### MudPhud20XX

5+ Year Member
Officials of a large community hospital report an increased incidence of AML among children age 5-12 yrs. They observe that some households in the community are exposed to chemical waste from a nearby factory and worry that exposure to this waste is responsible for the increased incidence of AML. A case-control study is designed to evaluate the hospital officials' claim that exposure to chemical waste increase the risk for developing AML in childhood. Which of the following populations is most likely to function as the control group?

A. Children who do not have AML and are exposed to chemical waste
B. Children who do not have AML and are not exposed to chemical waste
C. Children who do not have AML, regardless of exposure status to chemical waste
D. Children who have AML and are exposed to chemical waste
E. Children who have AML and are not exposed to chemical waste
F. Children who have AML, regardless of exposure status to chemical waste

OP

#### MudPhud20XX

5+ Year Member
FA says selection bias is "Error in assigning subjects to a study group resulting in an unrepresentative sample. Most commonly a sampling bias." So is selection bias = sampling bias?

#### Prince090

5+ Year Member
Officials of a large community hospital report an increased incidence of AML among children age 5-12 yrs. They observe that some households in the community are exposed to chemical waste from a nearby factory and worry that exposure to this waste is responsible for the increased incidence of AML. A case-control study is designed to evaluate the hospital officials' claim that exposure to chemical waste increase the risk for developing AML in childhood. Which of the following populations is most likely to function as the control group?

A. Children who do not have AML and are exposed to chemical waste
B. Children who do not have AML and are not exposed to chemical waste
C. Children who do not have AML, regardless of exposure status to chemical waste
D. Children who have AML and are exposed to chemical waste
E. Children who have AML and are not exposed to chemical waste
F. Children who have AML, regardless of exposure status to chemical waste
I am confused between b and e...

If I had to go with one, it would be option e...

#### Keto

Officials of a large community hospital report an increased incidence of AML among children age 5-12 yrs. They observe that some households in the community are exposed to chemical waste from a nearby factory and worry that exposure to this waste is responsible for the increased incidence of AML. A case-control study is designed to evaluate the hospital officials' claim that exposure to chemical waste increase the risk for developing AML in childhood. Which of the following populations is most likely to function as the control group?

A. Children who do not have AML and are exposed to chemical waste
B. Children who do not have AML and are not exposed to chemical waste
C. Children who do not have AML, regardless of exposure status to chemical waste
D. Children who have AML and are exposed to chemical waste
E. Children who have AML and are not exposed to chemical waste
F. Children who have AML, regardless of exposure status to chemical waste
I would go with C. The question describes a case control study. How are cases and controls in a case-control studies? In case control, we select cases and controls by outcome (disease) (D-F wrong) and not by risk factors (or exposure) (A-B wrong).
The cases would be children with AML and matched [1] controls children without AML (regardless of exposure to chemical waste).

In a cohort study (or retrospective cohort study), you would select patients by risk factors (or exposure) and then follow up.
The Boston University statistics website [2] explains this also very well:

1. The comparison group ("controls") should be representative of the source population that produced the cases.
2. The "controls" must be sampled in a way that is independent of the exposure, meaning that their selection should not be more (or less) likely if they have the exposure of interest.
You have your 2x2 table, on top you have AML+ and AML- and on the side you have exposure+ and exposure-. Now, you want to fill this table for a case-control study, in order to calculate the odds ratio. You have your 100 AML+ cases (90 exposure+ and 10 exposure-). Now, you cannot select by outcome AND exposure (as the proposed answers A and B), because you would have in your table something like AML- cases (600 exposure+ and 0 exposure-) or AML- cases (0 exposure+ and 600 exposure-).. what you want, is matched [1] controls, that are selected by disease (AML-) and then look for the exposure and risk factors (you might find something like 200 exposure+ and 400 exposure-)

[2] http://sphweb.bumc.bu.edu/otlt/MPH-Modules/EP/EP713_Case-Control/EP713_Case-Control5.html

Last edited:
OP

#### MudPhud20XX

5+ Year Member
Researchers are studying the relationship between essential hypertension and a common mutation in the structure of a sodium channel protein. A study population is randomly selected and blood samples are obtained for leukocyte genotyping. The prevalence of hypertension is determined based on mean blood pressure measurements obtained using standardized ambulatory blood pressure monitoring conducted over 1 week. Based on the analysis results, the researchers conclude that the sodium channel structure mutation is associated with hypertension. Which of the following best describes the study design used by the investigators?

A. case-control study
B. cross-sectional study
C. prospective cohort study
D. randomized clinical trial
E. retrospective cohort study