biostat q

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

MudPhud20XX

Full Member
10+ Year Member
Joined
Nov 26, 2013
Messages
1,349
Reaction score
193
Hi all, so I noticed that "pearson correlation coefficients" are in FA. I got this q and need some help. Can anyone just explain the concepts we gotta know for step 1 in layman term? Also, if you could help me with this q, I would really appreciate it. Thank you.

An investigator is evaluating the health benefits of daily activity with use of the Minnesota Activity Survey (MAS). The MAS yields a score ranging from 100 to 0, with 100 indicating the highest activity level and 0 indicating the lowest. One such study shows MAS values compared with age and weight obtained during the routine health maintenance examinations. The results of this study are shown as Pearson correlation coefficients: (see the attached image)

Based on the data shown, which of the following is the most appropriate conclustion?

A. Age and weight are positively correlated
B. Age is a stronger predictor of MAS than is weight
C. Both age and weight are associated with activity level
D. Higher daily activity levels lead to reduced weight
E. The relationship of age to MAS is best conceptualized as linear

Members don't see this ad.
 

Attachments

  • pearson.jpg
    pearson.jpg
    7.8 KB · Views: 67
Pearson correlation tells us the degree to which 2 variables are correlated. 2 intervals [1] are used for that (can be f.e. height, weight, blood pressure, drug dosage, ...). Pearson looks only at linear relationship.
So you could for example answer the question, whether there is a linear correlation of exercise and blood pressure? Or is there a correlation between age and sleep duration?

2yO8guU.png

  • Then you plot the 2 intervals and calculate your correlation. You get an r value.
  • r ranges from -1 to +1.
  • r = -1
    strong inverse relationship
    like in the image posted above (a lot of pack-years of smoking = decrease lifespan)
  • r = +1
    strong positive relationship
  • r = 0
    no relationship
pearson-correlation-coefficient-interpretation.jpeg


If in a pearson correlation, a p value is reported, then it tells you something about significance of the correlation. p < 0.05 indicates significant correlation.

If r^2 is reported, it's called coefficient of determination, which will answer the question, how much of of y variation is explained by x.

But always remember: Correlation does not mean causation.

With that being said, try to answer your question from above and we can discuss it afterwards!

Have a great day!


[1] If you want to compare 2 ordinal level variables, look for Spearman correlation.
 
  • Like
Reactions: 1 user
Pearson correlation tells us the degree to which 2 variables are correlated. 2 intervals [1] are used for that (can be f.e. height, weight, blood pressure, drug dosage, ...). Pearson looks only at linear relationship.
So you could for example answer the question, whether there is a linear correlation of exercise and blood pressure? Or is there a correlation between age and sleep duration?

2yO8guU.png

  • Then you plot the 2 intervals and calculate your correlation. You get an r value.
  • r ranges from -1 to +1.
  • r = -1
    strong inverse relationship
    like in the image posted above (a lot of pack-years of smoking = decrease lifespan)
  • r = +1
    strong positive relationship
  • r = 0
    no relationship
pearson-correlation-coefficient-interpretation.jpeg


If in a pearson correlation, a p value is reported, then it tells you something about significance of the correlation. p < 0.05 indicates significant correlation.

If r^2 is reported, it's called coefficient of determination, which will answer the question, how much of of y variation is explained by x.

But always remember: Correlation does not mean causation.

With that being said, try to answer your question from above and we can discuss it afterwards!

Have a great day!


[1] If you want to compare 2 ordinal level variables, look for Spearman correlation.
Dang. Thanks so much!!!

Okay so looking at the table, we have two correlations: MAS vs. weight that has r = -.43 and MAS vs. age that has r = +.31. So weight is negativley weakly associated with MAS and age is weakly positively associated with MAS right? So if I were to use process of elimination, C looks like an answer, does this sound correct?
 
Hi.


An investigator is evaluating the health benefits of daily activity with use of the Minnesota Activity Survey (MAS). The MAS yields a score ranging from 100 to 0, with 100 indicating the highest activity level and 0 indicating the lowest. One such study shows MAS values compared with age and weight obtained during the routine health maintenance examinations. The results of this study are shown as Pearson correlation coefficients: (see the attached image)

Based on the data shown, which of the following is the most appropriate conclustion?

A. Age and weight are positively correlated No, weight is negatively correlated
B. Age is a stronger predictor of MAS than is weight No, weight is stronger correlated (closer to +/-1)..
C. Both age and weight are associated with activity level There is a significant relationship between MAS and weight (negative) and MAS and age (positive)
D. Higher daily activity levels lead to reduced weight This is a causation statement, pearson only tells us something about correlation
E. The relationship of age to MAS is best conceptualized as linear Not clear, as there might be another function, that represents the relationship better then a linear curve..


I would also go with C

Have a great day
 
  • Like
Reactions: 2 users
Members don't see this ad :)
Alright, so I got these 3 biostat questions and was wondering if I could get some help and feedback.

1. A study is conducted to evaluate intelligence quotient (IQ) scores for pts with different types of schizophrenia. Results for the 100 pts who complete the test show an average IQ of 110 with a standard deviation of 20. Which of the following is the best estimate of the 95% confidence interval for the mean in this sample?

A. 70 to 130
B. 70 to 150
C. 85 to 115
D. 90 to 130
E. 105 to 115
F. 106 to 114

2. At a local medical center, the incidence of nurse-related needlestick injuires in the first year of employment is 3 times higher than that of nurses working 3 years or more. Wishing to reduce the incidence of needlestick injuires, the hospital administration hires a consultant to determine if the hospital should hire only those nursing-school graduates whose training includes formal instruction in universal precautions. To test this idea, 40 nurses who reported needlestick injuires during the first year of employment are compared wtih 40 nurses who reported no needlestick injuires durign their first year. The schools of all 80 nurses are contacted and asked about formal training programs for universal precaustions. The results of the study are presented as below:

upload_2015-9-5_10-21-0.png


A. (19/53)/(21/27)
B. (19/40)/(34/40)
C. (19X34)/(21X6)
D. (21/27)/(19/53)
E. (21/40)/(6/40)
F. (21X34)/(6X19)
G. (19/53) - (21/27)
H. (21/27) - (19/53)

3. In a city with a population of 1 million, 10,000 individuals have AIDS. There are 1,000 new cases of AIDS and 200 deaths each year from the dz. There are 2,500 deaths per year from all causes. Assuming no net emigration from or immigration to the city, what is the incidence of AIDS in this city?
 
#1) 2 sdv below and above average gives you the 95% CI .. Since the average is 110 and sdv is 20 .. Then (110 - 40) and (110 + 40) .... 70 to 150 is the 95% CI....

Let someone else take a bite at the other one... I took biostats first week of MS1... I forget these stuff.
 
#1) 2 sdv below and above average gives you the 95% CI .. Since the average is 110 and sdv is 20 .. Then (110 - 40) and (110 + 40) .... 70 to 150 is the 95% CI....

Let someone else take a bite at the other one... I took biostats first week of MS1... I forget these stuff.
Thanks for the feedback. That's exactly what I did but I got it wrong.
 
CI = range from [mean - Z(SEM)] to [mean + Z(SEM)]

For 95% Z = 1.96 or approximately 2

SEM = SD/square root(N)

So plugging in the numbers: [110 - 2(20/10)] to [110 - 2(20/10)]

106 to 114

Know your equations!

I think you guys might have missed the idea of what a confidence interval is... which is a range of values so defined that there is a specified probability that the value of a parameter lies within.

Another way of saying that is, "I am 95% confident that the true value of the parameter (average IQ of schzophrenia patients) is in our confidence interval (106 to 114)"

I think what that +40/-40 was the 95% range... which is 95% of the data will fall into that range...

Dang Patau, you're good. Many thanks for the explanation. Could you or anyone help us with the other 2 questions?
 
For number two, I do not see a question...

3. In a city with a population of 1 million, 10,000 individuals have AIDS. There are 1,000 new cases of AIDS and 200 deaths each year from the dz. There are 2,500 deaths per year from all causes. Assuming no net emigration from or immigration to the city, what is the incidence of AIDS in this city?

First you have to know what incidence means which is the rate at which new events occur in a population. Sometimes we think we know what that means but in reality we do not. So rely on the formula to help you remember.

In a formula incidence translates to: Incidence = number of new events in a period / total number of persons at risk.

Now you must understand what the formula is telling you... There is a certain amount of new events (diseases) that occur in a year (typically studies measure in years or could be months, decades, etc) and these new events we are gonna divide by the total number of persons at risk because they are the ones who will be affected .

So now that we know the amount of new events in a period divided by the total at risk population will give us incidence We can plug in numbers. So what are the number of new events (in this case AIDS) well that is 1,000 new cases (says it word for word in the question). What is the at risk population? That is 990,000, because the total population is 1 million subtract the people who already have AIDS 10,000. The reason why you subtract 10,000 is because how can people who have AIDS be at risk for getting AIDS again???? I think of it as a conversation between two buddies: Friend #1: Bruh, you at risk for getting AIDS. Friend #2: Oh you didn't know, I am not at risk because I already have AIDS. The point is still the same you can not be at risk for AIDS if you already have AIDS. AIDS can be substituted for any disease, such as Herpes. You can not be at risk for Herpes if you already have Herpes.

1,000/990,000
Thank you again Patau. Shouldn't we substract the number of deaths from the denominator? Since we can ignore dead people from the total number of persons at risk, right?

here is #2:

2. At a local medical center, the incidence of nurse-related needlestick injuires in the first year of employment is 3 times higher than that of nurses working 3 years or more. Wishing to reduce the incidence of needlestick injuires, the hospital administration hires a consultant to determine if the hospital should hire only those nursing-school graduates whose training includes formal instruction in universal precautions. To test this idea, 40 nurses who reported needlestick injuires during the first year of employment are compared wtih 40 nurses who reported no needlestick injuires durign their first year. The schools of all 80 nurses are contacted and asked about formal training programs for universal precaustions. The results of the study are presented as below:

upload_2015-9-5_10-21-0-png.195885


A. (19/53)/(21/27)
B. (19/40)/(34/40)
C. (19X34)/(21X6)
D. (21/27)/(19/53)
E. (21/40)/(6/40)
F. (21X34)/(6X19)
G. (19/53) - (21/27)
H. (21/27) - (19/53)
 
Got it, so disregard death in incidence! Thanks so much.

Sorry about it.

Here is the q for the 2nd q.

"Which of the following formulas would most likely be used to analyze data from this study?"
 
A study is conducted to determine the efficay of a new drug in preventing hospitalization of patients from HIV-related pneumonia. Medical records from participating physicians are used to select one group of pts who received the drug in the year preceding the observation period (Group A), and a second group of pts who had not received the drug in the preceding year (Group B). Both groups are followed for a 3 yr period to determine the number of hospitalizations and the number of deaths from all causes. The number of hospitalizations for Group A was lower (p = 0.001) than fro the control group. However, the mortality rate from all causes was found to be higher (p = 0.01) in Group A. Based on these results, the researchers conclude that the drug should not be used. This conclusion is most likely invalid b/c of which of the following scenarios?

A. Expectiations of the researchers are affected the outcome of the study
B. Knowledge about whether pts had taken the drug or not biased the measurement of the outcome variables
C. Pts who received the new drug may be less healthy than those who did not
D. Subjects fail to accurately recall events in the past
E. The existence of a Hawthrone effect
 
A study is conducted to determine the efficay of a new drug in preventing hospitalization of patients from HIV-related pneumonia. Medical records from participating physicians are used to select one group of pts who received the drug in the year preceding the observation period (Group A), and a second group of pts who had not received the drug in the preceding year (Group B). Both groups are followed for a 3 yr period to determine the number of hospitalizations and the number of deaths from all causes. The number of hospitalizations for Group A was lower (p = 0.001) than fro the control group. However, the mortality rate from all causes was found to be higher (p = 0.01) in Group A. Based on these results, the researchers conclude that the drug should not be used. This conclusion is most likely invalid b/c of which of the following scenarios?

A. Expectiations of the researchers are affected the outcome of the study
B. Knowledge about whether pts had taken the drug or not biased the measurement of the outcome variables
C. Pts who received the new drug may be less healthy than those who did not
D. Subjects fail to accurately recall events in the past
E. The existence of a Hawthrone effect
Anyone?
 
Can anyone help me with this too?

A study is conducted to examine the relationship btw alcohol consumption and medical school performance. Participants in this study were classified as abstainers, light drinkers, or heavy drinkers using established Research Diagnostic Criteria. The same participants were also classified as being in the top, middle, or bottom of their class. Results showed that students in the top or the bottom of class were more likely to be heavy drinkers with a p-value of < 0.01. Which of the following statistical tests was most likely used to generate this result?

A. Analysis of variance
B. Chi-square
C. Matched pairs t-test
D. Meta-analysis
E. Pearson correlation coefficient
F. Pooled t-test
 
A study is conducted to determine the efficay of a new drug in preventing hospitalization of patients from HIV-related pneumonia. Medical records from participating physicians are used to select one group of pts who received the drug in the year preceding the observation period (Group A), and a second group of pts who had not received the drug in the preceding year (Group B). Both groups are followed for a 3 yr period to determine the number of hospitalizations and the number of deaths from all causes. The number of hospitalizations for Group A was lower (p = 0.001) than fro the control group. However, the mortality rate from all causes was found to be higher (p = 0.01) in Group A. Based on these results, the researchers conclude that the drug should not be used. This conclusion is most likely invalid b/c of which of the following scenarios?

A. Expectiations of the researchers are affected the outcome of the study
B. Knowledge about whether pts had taken the drug or not biased the measurement of the outcome variables
C. Pts who received the new drug may be less healthy than those who did not
D. Subjects fail to accurately recall events in the past
E. The existence of a Hawthrone effect
I wud go wit C,,selection bias,,use of medical records of selecting patients in group A nd likelihood of receiving experimental drug wud be associated wit recent admissions nd more comorbities than group B patients,,that didn't receive the drug,,
 
  • Like
Reactions: 1 user
Officials of a large community hospital report an increased incidence of AML among children age 5-12 yrs. They observe that some households in the community are exposed to chemical waste from a nearby factory and worry that exposure to this waste is responsible for the increased incidence of AML. A case-control study is designed to evaluate the hospital officials' claim that exposure to chemical waste increase the risk for developing AML in childhood. Which of the following populations is most likely to function as the control group?

A. Children who do not have AML and are exposed to chemical waste
B. Children who do not have AML and are not exposed to chemical waste
C. Children who do not have AML, regardless of exposure status to chemical waste
D. Children who have AML and are exposed to chemical waste
E. Children who have AML and are not exposed to chemical waste
F. Children who have AML, regardless of exposure status to chemical waste
 
FA says selection bias is "Error in assigning subjects to a study group resulting in an unrepresentative sample. Most commonly a sampling bias." So is selection bias = sampling bias?
 
Officials of a large community hospital report an increased incidence of AML among children age 5-12 yrs. They observe that some households in the community are exposed to chemical waste from a nearby factory and worry that exposure to this waste is responsible for the increased incidence of AML. A case-control study is designed to evaluate the hospital officials' claim that exposure to chemical waste increase the risk for developing AML in childhood. Which of the following populations is most likely to function as the control group?

A. Children who do not have AML and are exposed to chemical waste
B. Children who do not have AML and are not exposed to chemical waste
C. Children who do not have AML, regardless of exposure status to chemical waste
D. Children who have AML and are exposed to chemical waste
E. Children who have AML and are not exposed to chemical waste
F. Children who have AML, regardless of exposure status to chemical waste

I am confused between b and e...


If I had to go with one, it would be option e...
 
Officials of a large community hospital report an increased incidence of AML among children age 5-12 yrs. They observe that some households in the community are exposed to chemical waste from a nearby factory and worry that exposure to this waste is responsible for the increased incidence of AML. A case-control study is designed to evaluate the hospital officials' claim that exposure to chemical waste increase the risk for developing AML in childhood. Which of the following populations is most likely to function as the control group?

A. Children who do not have AML and are exposed to chemical waste
B. Children who do not have AML and are not exposed to chemical waste
C. Children who do not have AML, regardless of exposure status to chemical waste
D. Children who have AML and are exposed to chemical waste
E. Children who have AML and are not exposed to chemical waste
F. Children who have AML, regardless of exposure status to chemical waste

I would go with C. The question describes a case control study. How are cases and controls in a case-control studies? In case control, we select cases and controls by outcome (disease) (D-F wrong) and not by risk factors (or exposure) (A-B wrong).
The cases would be children with AML and matched [1] controls children without AML (regardless of exposure to chemical waste).

In a cohort study (or retrospective cohort study), you would select patients by risk factors (or exposure) and then follow up.
The Boston University statistics website [2] explains this also very well:

  1. The comparison group ("controls") should be representative of the source population that produced the cases.
  2. The "controls" must be sampled in a way that is independent of the exposure, meaning that their selection should not be more (or less) likely if they have the exposure of interest.

Maybe it helps to think about this question in the following way:
You have your 2x2 table, on top you have AML+ and AML- and on the side you have exposure+ and exposure-. Now, you want to fill this table for a case-control study, in order to calculate the odds ratio. You have your 100 AML+ cases (90 exposure+ and 10 exposure-). Now, you cannot select by outcome AND exposure (as the proposed answers A and B), because you would have in your table something like AML- cases (600 exposure+ and 0 exposure-) or AML- cases (0 exposure+ and 600 exposure-).. what you want, is matched [1] controls, that are selected by disease (AML-) and then look for the exposure and risk factors (you might find something like 200 exposure+ and 400 exposure-)

Y3rJEXH.png



[1] http://forums.studentdoctor.net/threads/disadvantages-of-case-control-study.1160004/
[2] http://sphweb.bumc.bu.edu/otlt/MPH-Modules/EP/EP713_Case-Control/EP713_Case-Control5.html
 
Last edited:
  • Like
Reactions: 1 users
Researchers are studying the relationship between essential hypertension and a common mutation in the structure of a sodium channel protein. A study population is randomly selected and blood samples are obtained for leukocyte genotyping. The prevalence of hypertension is determined based on mean blood pressure measurements obtained using standardized ambulatory blood pressure monitoring conducted over 1 week. Based on the analysis results, the researchers conclude that the sodium channel structure mutation is associated with hypertension. Which of the following best describes the study design used by the investigators?

A. case-control study
B. cross-sectional study
C. prospective cohort study
D. randomized clinical trial
E. retrospective cohort study
 
Top