Aggregate MDApps data

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

Euxox

Full Member
10+ Year Member
Joined
Dec 24, 2011
Messages
646
Reaction score
471
I took some time to scrape all of the 2012 applicant data on MDApplicants. I'm posting it here in the hopes that some data wizzes will pick it up and do some interesting analyses. I, for one, would like to see some quantification of the advantage of submitting secondaries early, or how the likelihood of an interview turning into an acceptance changes with MCAT/GPA.

I screened out some profiles that looked bogus and/or incomplete. This leaves around 800 applicants and 12,000 applications. Happy data trawling!

Download link: http://www.sendspace.com/file/191vs3

PM me if you want a copy of the scraping code or of the raw scraped data (YAML format).
 

Attachments

Last edited:
Broken link.

The link doesn't seem to work for me but I was able to download the .csv file.
 
Last edited:
The link seems to work for me... I attached the file to the original post in case the link isn't working for anyone else.
 
I updated the spreadsheet with a few minor fixes.

Also, here is a graph I made from the data. Hopefully this will give some inspiration for others to work through the data as well.

0BtEvg6.png


EDIT: For this graph, acceptance rate equals acceptances/applications, so someone who applied to twenty schools and was accepted to five would have an acceptance rate of 25%.
 
Last edited:
I'm interested in going over some of this data.

Perhaps during spring break....
 
Oh my God. I love data. Thank you so much for doing this!!! 🙂
 
Great idea. I'll have some fun playing around with this during spring break.
 
Another graph:

I updated the spreadsheet with a few minor fixes.

Also, here is a graph I made from the data. Hopefully this will give some inspiration for others to work through the data as well.



EDIT: For this graph, acceptance rate equals acceptances/applications, so someone who applied to twenty schools and was accepted to five would have an acceptance rate of 25%.


These are great ideas, but we would really need to do a multi-linear regression on this data to get an accurate sense of what's going on. For example, it seems that later submitted applications have lower acceptance rates, but is this perhaps because those that submitted later were also more likely to have weaker applications?

Or, in looking at MCAT score and acceptance rate, how do we account for those with low stats applying to different schools than those with high stats? Even if a 14 VR did have an equal acceptance rate as an 8 VR, those could be applications to a very different subset of schools. We'd need to start looking at the LizzyM score for those schools, and also look at the timing of those applicants (do high VR applicants apply earlier than low VR applicants?).

Practically, it will be difficult to do this analysis with the limited amount of data in the MDapps (so many applicants that don't give their scores, or that give very vague ranges, etc). It's great for seeing individual cases though since you can probe very deeply into an application (we're much more than just the basic stats). Don't get me wrong, this is great work though, and thanks for providing the data for others to look through 👍

This thread is also great for those that haven't seen it yet: http://forums.studentdoctor.net/showthread.php?t=888650
 
Hmmm... While this data is interesting and I am a stat-oholic, I think Narmerguy points out some very relevant points. I also think too many people do not update their profiles later in the season and that it is too early to consider statistics considering there a likely a lot of people still waiting for decisions or on a waitlist that do not otherwise have an acceptance.

Maybe pull the data sometime in June and include the date they last updated their profile to help figure out who is slacking on contributing to the greater good of SDN 😉
 
Hi everyone! I played around with this data a bit and calculated acceptance rates for every school broken down by LizzyM score. Interesting to look at! (Hint: if you download the file, it's color-coded for easier reading 🙂 )

https://docs.google.com/file/d/0B2JPuHQ79Wk0eUxJUVFzTW9VUDA/edit?usp=sharing

This spreadsheet does not include schools that had fewer than 50 applicants. (I do have the data for every school if anyone is interested.)

Screen shot of part of the spreadsheet:

n63zpy.jpg
 
Last edited:
A lot of these images are being blocked at my school due to "nudity". What host are you using?

:laugh:

Ahh, I forgot a lot of places block imgur! I switched to a different host, hopefully you can view it now! :laugh:
 
postimage.org, but I just switched to imgur.

Edit: Are my images being blocked too now? Oh, well, I'm too lazy to change, you'll just have to wait until you get home. 🙄
 
Last edited:
You realize that not nearly all decisions (not even close) have been made yet, and many don't plan on posting accurate data till the cycle is over? Makes this data pretty meaningless as of now
 
Yeah, DAPI already mentioned that. I could rerun the scraper on the 2011 data if anyone is interested in seeing it, otherwise I will rerun the scraper in August or September.
 
Another graph:
6oxS6uJ.png

Is the acceptance rate on this chart the % that got any acceptances, or the % accepted out of schools applied to.

i.e. Does a 10% acceptance rate indicate that 10% of students at that date got at least one acceptance, or that students at that date were accepted (on average) to 10% of the schools to which they applied?
 
Is the acceptance rate on this chart the % that got any acceptances, or the % accepted out of schools applied to.

i.e. Does a 10% acceptance rate indicate that 10% of students at that date got at least one acceptance, or that students at that date were accepted (on average) to 10% of the schools to which they applied?

Likely the former since there's no percentage above 20%. I know for a fact there are people on MDapps that have higher than 20% acceptances from schools they applied to.
 
somewhat convincing... except likely there are stronger students who are in that first month as well.... compared to the later months.
 
Likely the former since there's no percentage above 20%. I know for a fact there are people on MDapps that have higher than 20% acceptances from schools they applied to.

It's actually the opposite: Someone who applied to 10 schools and was accepted to 4 has a 40% acceptance rate on the graph.

I got rid of every datapoint with a >18% acceptance rate because applicants in this range were complete anomalies. They formed a straightish (but sparse) line across the graph. So I guess means that if your numbers are strong enough, it doesn't matter when you apply. Take that with a grain of salt, though, because there really weren't enough people with >18% acceptances to get any solid trend in that range.

BTW, I'm working on scraping the 2011 data, so stay tuned. I'll probably have it up near the end of the week.
 
Last edited:
Another graph:
6oxS6uJ.png

Those numbers seem suspiciously low. Only 14% for people who submitted in the first week? ~50% of all applicants get accepted, assuming a random sample from a uniform distribution you would expect to see 50% acceptance rate. Since we know this isn't a uniform distribution though (earlier applicants obviously have higher acceptance rates) then the earlier applicants should actually have an acceptance rate greater than 50%. Also the drop off seems way too high. Conventional wisdom is that while it's good to submit ASAP, the sharp drop off in acceptance rates doesn't happen until late August/early September. According to that graph it happens in late June/early July.

Something's not right with those numbers.

EDIT: Just saw the post above mine which answers my question lol. I see, the numbers aren't % of applicants who got accepted but rather the % of schools they got accepted to out of all that they applied to. In that case the numbers make a lot more sense.
 
It's actually the opposite: Someone who applied to 10 schools and was accepted to 4 has a 40% acceptance rate on the graph.

I got rid of every datapoint with a >18% acceptance rate because applicants in this range were complete anomalies. They formed a straightish (but sparse) line across the graph. So I guess means that if your numbers are strong enough, it doesn't matter when you apply. Take that with a grain of salt, though, because there really weren't enough people with >18% acceptances to get any solid trend in that range.

BTW, I'm working on scraping the 2011 data, so stay tuned. I'll probably have it up near the end of the week.

Nevermind. Looking back at what I said didn't make sense either. I hate stats. :laugh:
 
Hi everyone! I played around with this data a bit and calculated acceptance rates for every school broken down by LizzyM score. Interesting to look at! (Hint: if you download the file, it's color-coded for easier reading 🙂 )

https://docs.google.com/file/d/0B2JPuHQ79Wk0eUxJUVFzTW9VUDA/edit?usp=sharing

This spreadsheet does not include schools that had fewer than 50 applicants. (I do have the data for every school if anyone is interested.)

Screen shot of part of the spreadsheet:

n63zpy.jpg

What formula did you use for the LizzyM score? 10(gpa)+mcat - 1 or 10(gpa)+mcat +1 or 10(gpa)+mcat?
 
I just wish more people would make an MDApps, so we could get a better pool of data. Even so, most of the overall acceptance rates on somethingdeep's spreadsheet are within +/-1% of the USNews reported data.
 
I just wish more people would make an MDApps, so we could get a better pool of data. Even so, most of the overall acceptance rates on somethingdeep's spreadsheet are within +/-1% of the USNews reported data.

Be sure to make one yourself when you're applying!
 
I just wish more people would make an MDApps, so we could get a better pool of data. Even so, most of the overall acceptance rates on somethingdeep's spreadsheet are within +/-1% of the USNews reported data.

Same here!! When this cycle is over, I'm considering shooting an email to my undergrad's pre-med listserv asking everyone to make a profile 😀 And I hadn't had the chance to check the acceptance figures, so I'm glad to hear that!
 
I just wish more people would make an MDApps, so we could get a better pool of data. Even so, most of the overall acceptance rates on somethingdeep's spreadsheet are within +/-1% of the USNews reported data.

Be sure to make one yourself when you're applying!

Yep I agree...and make them actually useful (i.e. GPA/MCAT, rough idea of activities, etc) at least after you're accepted. The really vague ones, though sometimes effective at gaining attention, sadly don't do much to help anyone learn from your application.
 
It would be nice if SDN (or MDApps) would send members an email each year in August reminding them to fill out their profile.
 
Top