Statistical Analysis of the Most Important Factors to gain an Acceptance

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

1viking

Full Member
10+ Year Member
7+ Year Member
15+ Year Member
Joined
Jun 21, 2004
Messages
302
Reaction score
1
I don't post at all here, but I do read these forums religiously from 1 to 5 AM while my baby is awake. I found myself posting, however, the following and was asked by the readers to make a new thread. The repliers asked me to remove the statistics lingo, but I am leaving it as is since I want people to understand that this wasn't a wimpy study, but a robust scientific analysis. You can believe what you want. Maybe this will start a large discussion about some of the controversial (hahahaha) findings 🙂 . If you want the statistical output from SHAZAM, send me a private message and I will email it to you (I can send it in a Word Doc if you want). Enjoy!

What are the most important factors for medical school acceptance? I had this same question as I undertook a project for my econometrics class this last semester. I decided to run a heteroskedastic robust ordinary least squares estimator (I also ran a LOGIT model since binary dependent variable) to find the solution to this question. Not to give too much info--if you want more, I'll give it--but I acquired the data from the Health Pro advisement center that they acquired since 1994 for all students from my school applying to med schools (osteo and allo). The dependent variable was a binary variable of acceptance or not (who cares if you got accepted to 10 schools; you can only go to one). Variables included all of the MCAT scores by section (WS given a numerical value), Science GPA, All other, Total GPA, # of schools applied to, and whether you took MCAT in August or April.

Results:

If you divide the OLS estimators by their standard deviations, you get a "standardized coefficient" which is what we want since GPA has a range of 0 to 4 and the MCAT has a range from 1 to 15. So the results are the standardized coefficients. Since our dependent has a range of zero or one, you can assume these coefficients are percentages (i.e. Increase your verbal MCAT score by 1 point and you will increase your probability of being accepted by x amount--be aware that I divided by the standard deviation, so this intuition doesn't necessarily work out; just a help).

**Verbal MCAT=.137
**Physical MCAT=.080
Bio MCAT=.0449
Writing Sample=.0276
*Science GPA=.1943
All other GPA=.0527
Total GPA=.0285
**# Schools applied to=.1333 (duh)
*MCAT April=.0672
MCAT August=.0000

sample N=1564 R squared=.21 (awesome for a binary model!)

** denotes the coefficient is significantly different from 0 at the 1% level
* denotes the coefficient is significantly different from 0 at the 5% level

Interpretation:
Ace your science classes, since it accounts for almost 20% of your admittance according to this model. Study your verbal section of the MCAT. Forget the bio section; it wasn't even significant at the 15% level. The writing sample is even worse, not even at the 25% level. Don't apply to just three schools; you'll regret it. And don't even think about missing the april MCAT.

I hope this puts the debate about WS to rest. Adds a new debate, though, the bio score.
By the way, I got an A.
 
That's pretty cool. Now I wish I understood it
 
Good God! I thought I wasted time on this site!!! 😎
Actually, pretty interesting! Thanks!
 
If you have any questions about the analysis, just ask. I'll gladly explain.
 
Is there a way to look at the numbers in terms of some percentage or relative value?
I have no idea what they mean or how they stack up against each other.
Thanks!
 
Hey Viking,

So the 0.21 R-squared means this model explains 21% of the variability in acceptance using those variables listed? Is that right?
 
Wow, that's an amazingly adept analysis. I always knew about the link between VR and admissions, but I had no idea it was that strong. And I also expected the link between overall GPA to be stronger, but science GPA's correlation was very strong. Good work. 👍
 
Viking-

You've stumped the SDN cognoscenti. I'm impressed.

Now make it English. I assume your data shows, for example, that Verbal scores are 13% or so of the weight in the admissions process. So can you assess what score would get that number higher or anything like that?

dc
 
sacc,
that is right. 80% of the variance isn't explained by the model. This seems like a huge amount, but for an OLS with a binary dependent variable it is great. Econometrics Journals publish data that have this low (high?) of R squared.

TarHokie_08,
I will post the non-standardized coefficients. I just have to format it better. This will show what a 1 unit change would cause. Be careful, though, since a change of 3.0 to 4.0 Science GPA is a LOT!
 
:laugh: Interesting analysis...Healthy sample size...

What were your logistic regression results? That may make more sense to people here... Also, depending on what undergraduate school you were studying, you may want to see if there is any effect modification according to *where* you were admitted.. (i.e. state school vs. out-of-state).

-MD08
 
1viking said:
sacc,
that is right. 80% of the variance isn't explained by the model. This seems like a huge amount, but for an OLS with a binary dependent variable it is great. Econometrics Journals publish data that have this low (high?) of R squared.

Ok cool and thanks for the great post!!
 
These are the non-standardized coefficients. They are "percentages" if you increase that variable by one unit holding all others constant. Be careful in saying all that matters is Science GPA--a 1 unit change is huge!
Verbal MCAT .036007
PhysicsMCAT .020628
BiologyMCAT .013005
WritingMCAT .0065489
Science GPA .26175
All Othr GPA .10872
Total GPA .050032
# Schools App .0094769
MCAT April .078839
MCAT August .000023148

By the way, use the statistical significance from above to see what is important and what means squat for your admittance.
 
I can post the logit results, but they are much more difficult to interpret for those that have had little stats background. If you don't know how to interpret this type of model, don't ask cause I won't explain it. It would take a class in regression. Just stick with the OLS. Here they are:

V 0.19824
P 0.098592
B 0.081790
W 0.034109
BCPM 1.3711
AO 0.51861
TOT 0.30014
NUMAPP 0.049497
MCATS 0.43194
MCATF 0.021261

MD08,

The single school issue is the biggest drawback to this study, I feel. I wish that I had a more random sample.
 
bigdan said:
Viking-

You've stumped the SDN cognoscenti. I'm impressed.


You forced me to look up the world cognoscenti. 😡
 
This is the most awesome thing I have seen on this forum. I'm gonna PM you for some more info.
 
I'm not surprised that VR is big on predicting if you will get it. It has also been proven to be big in predicting whether or not you will succeed in school.
 
Jalby said:
I'm not surprised that VR is big on predicting if you will get it. It has also been proven to be big in predicting whether or not you will succeed in school.

Ahhhh, yes. I did well here...I'll be sure to relay this to my interviewers... 😎


dc
 
It's pretty much what I expected. Though the numbers help to give an idea of how much more important verbal compared to the other sections. MCAT courses freely tell their students that verbal is most important because it has the highest correlation with success in medical school and boards.

Edit: Jalby beat me to it
 
So what about bio and total GPA? Why so unimportant?
 
1viking-

Well, maybe there was very little variation in these factors (total GPA, and bio score)..I would guess *most* med school applicants have a GPA of 3.2-3.8. In terms of your model, that is not quite the "one-point" difference you are trying to detect... You probably don't have enough statistical power to detect the chances of admission between someone who has a 3.4 and a 3.6.. Are the mean GPAs of the two groups (accepted vs. non-accepted) similiar?

I think that another thing that we could take away from this is that there are obviously other variables related to medical school acceptance besides GPA and MCAT scores (but we knew this, right?) What about volunteer work, clinical experience, or research?

Oh, thanks for the logistic coeff... I can only assume that the same factors were found to be significant in this model, also?

-MD08
 
1viking said:
So what about bio and total GPA? Why so unimportant?
I don't really understand the basis of the study, but assuming this is saying there is little correlation between bio score and admittance...I would say that most people who score a 7 on bio would not apply to medical school, whereas many more with a 7 on verbal will. Thus, you could be comparing scores on a much smaller (and higher) range. I get the feeling that any double-digit score is good enough for adcoms, so a 13B will not be better than an 10B as much as 10V is better than 7V.

Physical Sciences is somewhere in between, because maybe people discount the usefulness of physics in medical school and thus a low score won't discourage an applicant.
 
Sorry about not posting the "English". Think about the model as a linear function, with a slope, an intercept and a dependent variable 👍. You can add as many independent variables (x) as you like to make your model what you want it to be. A perfect model would be one that explained real life--that isn't possible, so we do our best. The y variable is admittance (it can take on a value of 0 or 1) the x variables are those that have been posted above. The numbers next to the x variables are the slope for that specific variable, holding everything else constant (partial differentiation). Because the model is binary, you can think of this slope as a percentage. Therefore, as you increase your Science GPA by 1 unit (3.0 to 4.0) you increase the probability of getting accepted to a medical school by 26.17%. This doesn't depend upon anything else (MCAT Scores or whatever). Be careful and remember that some aspects are not included in the model which are VERY important, such as quality of interview. I just don't know how I would quantify that.
The standardized coefficients are different. You need to think about it as what is most important. It is a little bit trickier to interpret. Maybe I should have just stuck with the non-stand. coefficients. I hope this helps.
 
Just remember kids... There are lies, Damn lies, and Statistics! LOL

But really this assumes that other area's are average or above. If you got a 15 on the VR and 3 on the other 2 section of MCAT, you will not even get looked at. This also fails to take race into account.
 
1viking,
what school was this study done at?
also, how relevant is standard deviation because if you look at the MSAR, the verbal and writing sections are bimodal.
also, race, gender, and undergraduate institution are not a part of this study, right?
this study also doesn't address correlations between the factors, therefore science gpa and mcat scores might collectively work as one factor.

basically, your model separates each factor. what if two of the factors are linked? for example, what if people with high gpa's always have high verbal mcat scores?

also, geographic regon, undergraduate institution, personal statement, essays are not taken into consideration in this essay.

I think your study is entirely interesting and fascinating, however, I feel it simplifies the components. obiously, high scores and gpa are important for getting ino medical schol, but a statistical analysis is not necessary for this information.

Interesting idea though! It would be pretty cool if someone did this study on a national level 🙂
 
how come Total MCAT wasn't used as a variable?
 
koma said:
how come Total MCAT wasn't used as a variable?
It can't be, because it would generate an error called perfect multicollinearity. This just means that the Matrix failed to invert (remember linear algebra). In OLS, you must leave out a variable that all others can be compared against. therefore, there is no TOT MCAT, nor BOTH MCAT (a flaw in this is that there are some that take it three times, not too many, though). In all, it doesn't mattter, since the Individual scores all add up to the total anyway. Hope this helps.
 
trauma_junky said:
Just remember kids... There are lies, Damn lies, and Statistics! LOL

But really this assumes that other area's are average or above. If you got a 15 on the VR and 3 on the other 2 section of MCAT, you will not even get looked at. This also fails to take race into account.
The model is an OLS, which requires PARTIAL differentiation. True, you probably won't get looked at. But this isn't the aim or scope of the model. The idea is to show how increasing (decreasing) one variable will increase (decrease) your chance of getting accepted. So your question isn't an issue. Race was already described. About the lies, well, we know how to lie with stats. This isn't the case (I didn't throw data away). If you want the data, I will send it to you so that you can reproduce my results (true scientist).
 
rerock said:
1viking,
what school was this study done at?
also, how relevant is standard deviation because if you look at the MSAR, the verbal and writing sections are bimodal.
also, race, gender, and undergraduate institution are not a part of this study, right?
this study also doesn't address correlations between the factors, therefore science gpa and mcat scores might collectively work as one factor.

basically, your model separates each factor. what if two of the factors are linked? for example, what if people with high gpa's always have high verbal mcat scores?

also, geographic regon, undergraduate institution, personal statement, essays are not taken into consideration in this essay.

I think your study is entirely interesting and fascinating, however, I feel it simplifies the components. obiously, high scores and gpa are important for getting ino medical schol, but a statistical analysis is not necessary for this information.

Interesting idea though! It would be pretty cool if someone did this study on a national level 🙂
1. the bimodal would be important if this were a population study, which it isn't.
2, Race data was not available (sure would be interesting).
3. Gender wasn't sig. different from zero at the 50% level, so I left it out because it cause (of all things) a formatting issue in the final draft of my paper.
4. I wish that I had more data from other schools. That would be great!
5. The model used is the heteroskedastic robust OLS. This model corrects for serial correlation (High GPA correlated with high Verbal). This isn't an issue. I know what I am doing.
6. It sure would be nice to quantify things such as undergrad school, region, or the strength of your personal essay. Have any ideas?
7. The analysis is used to show which factors are most important of the variables known.
Hope this helps.
 
dhoonlee said:
It is very frustrating how statistics can be misinterpreted.




This study does not prove that the bio section is not important. In general more people do well on the biosection (looking at the average applicant pool, I'm not talking overall because obviously, the grades are assigned on a curve). It may be that the verbal section is what differentiates successful applicants. Therefore the correct piece of advice would be do well on bio to get on equal footing with your peers, THEN kick verbal's ass.

The study doesn't take into account ECs. You can't look at stats and immediately make assumptions as to cause and effect. They are just numbers and don't tell the whole picture. Perhaps applicants with very strong ECs and LORs are more likely to have high verbals and GPAs. Therefore

ECs/LORs =====> acceptance
&
ECs/LORs =====> high verbal/GPA

is a possibility. If this was true (which I'm not sure of, this is just an example of how the OPs analysis may be flawed), the ECs/LORs are the true indicators of applicant success and a high verbal/GPA is just a side effect. In this study, the correlation between verbal/GPA and acceptance may be correct but is NOT a reason for applicant success. Maintaining a high verbal/GPA may have LESS of an impact on your chances than you might think after reading this studys unfounded conclusions.

Anyway, I think the one thing that can be said with any certainty is medical school is very competitive and once any criteria starts to become less useful for separating out applicants, it will be replaced by another. (I.E. research, volunteering abroad) Try your best to separate yourself from the crowd but don't forget about your mundane stats (GPA, MCAT). :laugh:
Hmmm. I'll start from the top. I tried to show to people that may not understand regression an "extreme" situation--high coefficient means important, low means not so important. To put it to rest, bio is not statistically different from zero (aka, increasing your score by one won't help you get accepted to med school). I didn't make up the data, sorry. Talk to the med schools about this. You are right about doing well on Bio, though. I wouldn't risk it. It gives reassurance to my best friend that got an 8 on the bio but a 13 on the verbal.
The EC/LOR statement is an interesting one. It is impossible to quantify this measure, but this isn't included in the model. I am not trying to explain it, so it doesn't matter! Also, in regression, you need to leave out variables. Perhaps these are the most significant left out. They would be, explained then. You could see this through the R squared. Since the R squared is so high for a binary model, I actually believe your EC/LOR are not too important. This supports my thesis. You should take a challenging grad level multivariable linear regression class. It will help keep your foot out of your mouth.
 
MD08 said:
1viking-

Well, maybe there was very little variation in these factors (total GPA, and bio score)..I would guess *most* med school applicants have a GPA of 3.2-3.8. In terms of your model, that is not quite the "one-point" difference you are trying to detect... You probably don't have enough statistical power to detect the chances of admission between someone who has a 3.4 and a 3.6.. Are the mean GPAs of the two groups (accepted vs. non-accepted) similiar?

I think that another thing that we could take away from this is that there are obviously other variables related to medical school acceptance besides GPA and MCAT scores (but we knew this, right?) What about volunteer work, clinical experience, or research?

Oh, thanks for the logistic coeff... I can only assume that the same factors were found to be significant in this model, also?

-MD08
1. the bimodal would be important if this were a population study, which it isn't.
2, Race data was not available (sure would be interesting).
3. Gender wasn't sig. different from zero at the 50% level, so I left it out because it cause (of all things) a formatting issue in the final draft of my paper.
4. I wish that I had more data from other schools. That would be great!
5. The model used is the heteroskedastic robust OLS. This model corrects for serial correlation (High GPA correlated with high Verbal). This isn't an issue. I know what I am doing.
6. It sure would be nice to quantify things such as undergrad school, region, or the strength of your personal essay. Have any ideas?
7. The analysis is used to show which factors are most important of the variables known.
Hope this helps.
 
so if i get a 15 verbal and a 10 physics and 0 Bio im set!

you can try to quantify it all you want, but it doesnt take a rocket scientist (or statistician) to tell you that quantifying the app process is impossible. You can shove as many equations showing that mcat Bio is essentially a non issue in admissions as you want, but guess what, noone is going to believe you nor take your equation as advice.

This reminds me of that scene in "A Beautiful Mind" where John Nash is asking his soon to be wife how he knows what "love" is because he doesnt understand unless it is in some equation form.

Looks like you had fun playing with your model though. cheers.


Oh yeah, since you're such a scientist, the next step would be to TEST your model. You should take the MCAT, bomb the bio section, get crappy LORs and enter zero EC's and according to your model, it'll have little impact on your chances of acceptance right?
 
dhoonlee,

EC/LOR was included in the study by not being included. By virtue of the High R2, we can assume the EC/LOR (and the other ommitted variables) are no where near as important as the other big factors. So, it was included.
 
exmike said:
so if i get a 15 verbal and a 10 physics and 0 Bio im set!

you can try to quantify it all you want, but it doesnt take a rocket scientist (or statistician) to tell you that quantifying the app process is impossible. You can shove as many equations showing that mcat Bio is essentially a non issue in admissions as you want, but guess what, noone is going to believe you nor take your equation as advice.

This reminds me of that scene in "A Beautiful Mind" where John Nash is asking his soon to be wife how he knows what "love" is because he doesnt understand unless it is in some equation form.

Looks like you had fun playing with your model though. cheers.


Oh yeah, since you're such a scientist, the next step would be to TEST your model. You should take the MCAT, bomb the bio section, get crappy LORs and enter zero EC's and according to your model, it'll have little impact on your chances of acceptance right?
You obviously don't understand regression.
 
Well put! Also, my common sense would be to ace my science classes and not worry so much on the bio as the verbal. Obviously, I would study the bio, but I would put more emphasis on the verbal. But still, I would do great on the science classes. You put more into them than the MCAT anyway.
 
1viking said:
You obviously don't understand regression.

now what kind of arrogant response is that? I was pointing out that despite your model nothing will change the way anyone goes about the whole process of medical school admisisons.

Now I cant figure out if you were trying to give us an interesting quantitative analysis of medical school admissions (and yes, it was quite interesting) or if you were just trying to show off your statistical prowress.
 
undergrad school is a large factor as well.. i think it was huge for me. have fun trying to quantify things that cannot be. Its like trying to define beauty...we all know about jawlines and cheekbones, but there are ranges that are appropriate and work well together, no set standard.
 
Pretty much confirms what I've always known about the importance of the verbal section. Bravo to the OP for posting this info!
 
pathdr2b,
glad you enjoyed it.
 
1viking said:
MCAT April .078839
MCAT August .000023148

Looks like an August MCAT is a waste of time......
 
exmike said:
now what kind of arrogant response is that? I was pointing out that despite your model nothing will change the way anyone goes about the whole process of medical school admisisons.

Now I cant figure out if you were trying to give us an interesting quantitative analysis of medical school admissions (and yes, it was quite interesting) or if you were just trying to show off your statistical prowress.
It is all about the showing off my prowress. You see, people who hear the name 1viking will shudder. When I go to med school this fall, people will ask me, "Are you 1Viking?" Also, I receive fan mail because I have shown how intelligent I am. .
 
thewzdoc said:
Looks like an August MCAT is a waste of time......
The coefficient means not that it isn't a wast of time, but that taking it in August won't help you any to get into medical school, holding everything else constant. The take home message is this: your advisors say to take the MCAT in April, well this is proof that they know what they are talking about.
 
1viking said:
It is all about the showing off my prowress. You see, people who hear the name 1viking will shudder. When I go to med school this fall, people will ask me, "Are you 1Viking?" Also, I receive fan mail because I have shown how intelligent I am. My name is Zoolander.

especially during biostats!
 
What about the personal statement? How much weight does that hold?
 
1viking said:
The coefficient means not that it isn't a wast of time, but that taking it in August won't help you any to get into medical school, holding everything else constant. The take home message is this: your advisors say to take the MCAT in April, well this is proof that they know what they are talking about.

Therefore it's a waste of time :scared: (ps I know what you mean It's just interesting that there is such a difference...I guess the advisors do know what they are talking about.... well this aspect anyway....)
 
1viking said:
The coefficient means not that it isn't a wast of time, but that taking it in August won't help you any to get into medical school, holding everything else constant.

I actually kinda get this although I'll be retaking the MCAT in August since I'm lookign at MD/PhD. I guess another way to look at my situation is that if they didn't want me in "April" they won't want me in "August" since I'm essentially the "same" person. 😛
 
pathdr2b said:
I actually kinda get this although I'll be retaking the MCAT in August since I'm lookign at MD/PhD. I guess another way to look at my situation is that if they didn't want me in "April" they won't want me in "August" since I'm essentially the "same" person. 😛
Be careful w/ the interpretation. Remember, to reference the regression for MCAT date, something had to be left out. That one thing was those who took the MCAT both months. Therefore, those that took the August MCAT had never taken the MCAT before. They got their results back late in the game. I can change the variables in the OLS so that August or April is left out instead of both. Just ask.
 
Top