Data: A Case Study in Bias for Prestigious Undergrads, Yale SOM

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.
EDIT 08/03/2017: Fixed a bug in my program that failed to download all of the correct pages of the school's PDF bulletin. I have fixed the script and reproduced the graphs, the graphs you see in the OP from 08/03/2017 onwards are correct. The problem arose because of the way my browser indexed the pages of PDF downloads. I verified that the fix worked by counting two years of graduates by hand and the number of graduates counted by my script and by hand matched perfectly on both accounts. I also added five number summaries for the top 10 feeder schools only, and some plots for people who entered Yale with grad degrees.

I think this analysis is superb, well done so far! To address some of the posters' comments, perhaps a deidentified, retrospective study could be performed at one of these elite institutions, and each applicant could be calibrated to WedgeDawg's WARS to control for ECs, LizzyM, undergrad institution, and research output?

Members don't see this ad.
 
Which are the grade-inflating prestigious undergrads?
Yale, Harvard, Duke, Stanford, Brown?

If you're in HS, those are the schools you should be aiming for.

I've never heard of anyone who's experienced grade inflation at Duke, at least not in STEM majors. The others I'd definitely agree with


Sent from my iPhone using SDN mobile
 
  • Like
Reactions: 1 users
I think this analysis is superb, well done so far! To address some of the posters' comments, perhaps a deidentified, retrospective study could be performed at one of these elite institutions, and each applicant could be calibrated to WedgeDawg's WARS to control for ECs, LizzyM, undergrad institution, and research output?

Yah, the holy grail would be for the AAMC to release all of their raw data for free. Right now you can only request their raw data if you are a researcher at an institution. Even then, I think you have to pay.

I think that the data posted on pg 2 from some of the top LACs / undergrads does add a lot to the first analysis. It's obvious from their internal data, especially the more complete datasets like Amherst, WashU, Penn, Yale and MIT, that students at certain undergrads have far better odds at almost every LizzyM level than those at other schools.

Next, take into account that the proportion of students coming from the Top X undergrads at Yale SOM is not that different for the MD and MD/PhD programs (see the link in the OP for those graphs), in fact the MD program seems to be have slightly higher proportions. If we make the assumption that other programs will have comparable proportions of "top feeder" schools at both their MD and MD/PhD programs (admittedly, not an insignificant assumption), then we can go on to say that selection for brand-name undergrads is not an isolated practice at Yale SOM, but occurs at a lot if not all of the top medical schools.

To do a very rough test of this last assumption, I looked at the general stats for Harvard's entering class of 2017 (Harvard is the school I predict, based on MD/PhD data, to have the most perceived selection for prestigious undergrads) and Yale's graduating class of 2017 (so entering class of 2013/12 but it's the latest data set I had). Harvard had a class of 165 with 65 undergrads being represented. At the same time, Yale had a graduating class of 83 with 39 undergrads being represented. As a simple proportion, the Harvard number gives a ratio of 2.5:1; Yale, the ratio 2.12: 1.

That info in hand, we have to ask ourselves: Given that Harvard has a similar class size to undergrad representation ratio to Yale (a little higher, actually), is it the case that all undergrads are equally represented at Harvard (each of 65 sending about 2-3 undergrads that year) or is it the case that we can expect a similar distribution to Yale where a handful of schools send the majority of the class and most undergrads only send a single student (the very same distribution, interestingly enough, that you see at Harvard undergrad for high schools, albeit, less pronounced)? Luckily for me I know 3 HMS students IRL across different years and I know from them that a few schools, especially Harvard, are very well represented in their class, just as the MD/PhD data predicts.
 
Last edited:
Members don't see this ad :)
Yah, the holy grail would be for the AAMC to release all of their raw data for free. Right now you can only request their raw data if you are a researcher at an institution. Even then, I think you have to pay.

I think that the data posted on pg 2 from some of the top LACs / undergrads does add a lot to the first analysis. It's obvious from their internal data, especially the more complete datasets like Amherst, WashU, Penn, Yale and MIT, that students at certain undergrads have far better odds at almost every LizzyM level than those at other schools.

Next, take into account that the proportion of students coming from the Top X undergrads at Yale SOM is not that different for the MD and MD/PhD programs (see the link in the OP for those graphs), in fact the MD program seems to be have slightly higher proportions. If we make the assumption that other programs will have comparable proportions of "top feeder" schools at both their MD and MD/PhD programs (admittedly, not an insignificant assumption), then we can go on to say that selection for brand-name undergrads is not an isolated practice at Yale SOM, but occurs at a lot if not all of the top medical schools.
Any clue on how expensive it would be to accesss their raw data for research? all of the grants I got as an undergrad we’re around $1k. It would honestly be a great opportunity for someone who’s has the funding.
 
Any clue on how expensive it would be to accesss their raw data for research? all of the grants I got as an undergrad we’re around $1k. It would honestly be a great opportunity for someone who’s has the funding.

the cheapest, non free data you can get from them is the Data Book for 227$. But I'm not that familiar with how much data would be in there that is new.

For their raw data you need to petition them directly with your project idea and institutional affiliation. I don't think "we're going to post literally everything you guys know on the internet" would be very attractive to them
 
Last edited:
  • Like
Reactions: 1 users
Good to know. My examples were based on hearsay.

I mean tbh so are mine. I wasn't saying you're definitely wrong; I'm just basing my opinion on old HS classmates who went to all those schools. For instance, Yalies and Stanford kids reported having their STEM classes curved to B+/A-'s while the Duke kids had more traditional C+/B- curves. As for Harvard and Brown I think we all know at this point that they inflate pretty generously


Sent from my iPhone using SDN mobile
 
  • Like
Reactions: 1 user
One immortal "feature" of SDN is handwringing about how much undergraduate prestige matters and in what situations. It is very hard to gain an objective foothold on this question and most of us have a sense that "yah, it matters -- but not that much, it won't save an otherwise poor application, and probably only matters at schools that value that certain je ne sais qois in their applicants (*cough* Harvard *cough*)." One of the reasons this is a difficult question to answer is that, unlike most MD/PhD programs, MD programs do not post their directories for the public and, if they do, they do not list the undergraduate institutions of their students. Luckily for us, Yale School of Medicine posts a yearly bulletin in which they list the undergraduate institutions of each member of their graduating class that year.

I took the liberty of altering the same Python script I used to look at the possible 'prestige' bias of dual-degree admissions.

This time, the script converts the PDF bulletins to text, and (exploiting the fact that the syntax of graduate listings in every bulletin is identical) searches for the undergraduate institutions of the graduates, counts, and categorizes them.

I looked at 11 years of data -- graduates of Yale SOM from 2007-2017 -- and split them up into 5/6 year chunks in order to see if there were any interesting differences. Here are the results. T30, 20, 10 refers to the Top X undergraduate institutions according to USNWR in 2017. HYPSM refers to Harvard, Yale, Princeton, Stanford, and MIT, respectively.

Table 1: Proportion of Graduating Class Coming from Top X Institution During Z years.
TYxrXEk.png

If we include the Top 20 / 10 Liberal Arts Colleges in these calculations (which are ranked separately in USNWR and therefore do not count towards the above table), the proportions are a little higher:

Table 1b: Table 1 including Top 20, 10 Liberal Arts Colleges

KqBLaTC.png

Next I looked at the volume of students from X undergrad who graduated from Yale SOM. In other words, I tried to find the possible "feeder" schools and general trends in "undergrad diversity".
7l9zrAa.png

6rUxKer.png


And here are data from both plots aggregated over the past decade:

v4EV9pk.png

From 2007-2012, the total number of undergraduate institutions represented in Yale SOM's graduating class was 86. From 2013-2017, 81. Overall, from 2007-2017, 116 undergrads were represented in the graduating class.

The five number summaries for the feeder data sets are as follows. (min, 25th percentile, 50th percentile, 75th percentile, maximum):

2007-2012: 1, 1, 1, 3, 99 (mean = 4.62)
2013-2017: 1, 1, 1, 4, 91 (mean = 4.78)
2007-2017: 1, 1, 2, 5, 190 (mean = 6.88)

The mode in every data set was 1.

The five number summaries for the feeder data sets only taking into account the top 10 feeder schools are as follows:

2007-2012: 10, 12, 16, 22, 99 (mean = 27.9, mode = 10)
2013-2017: 10, 14, 16.5, 18, 91 (mean = 26.4, mode = 14)
2007-2017: 23, 26, 29, 39, 190 (mean = 54, mode = 27)

Thus, showing that the top 10 feeder schools send several times more matriculants to Yale School of Medicine than even the 25% most represented undergrads in the general matriculant population.

So, most schools (>50%) who send students to Yale SOM send only one student, meanwhile, the top 25% of schools send 3-4 and (as you can also see from the plots of the top 10 feeders) a small number of outliers send dozens. Yale University is, unsurprisingly, the biggest feeder for Yale SOM, given that prestigious schools tend to have much more "inbreeding".

Graduate Degrees

A large number of Yale MD grads had graduate degrees upon graduation (~33%). Yale offers a competitively funded fifth research year to its students where they can earn a MHS (Master of Health Sciences) degree prior to graduating. Working under the assumption that all of those graduates who did not earn an MHS from Yale earned their graduate degrees prior to matriculating to Yale SOM, I made another plot of the top 5 "graduate feeders" to Yale. All of these graduate degrees are non-doctoral degrees. Excluding MHS holders, 112 of 922 (12%) Yale grads had additional non-doctoral graduate degrees upon graduation, ~50% of which came from Yale or Harvard.

gy880bC.png

Of 112 graduate degree holding "matriculants" (again, this is more of a speculative term given that I dont know when people earned these degrees; safe to say, any grad degrees not from Yale were almost certainly earned before medical school), 16 of them did not go to a Top 30 undergraduate institution (most went to Top 20s). Of those 16, 12 of them (75%) received a graduate degree from one of the 5 institutions represented in the above graph.

Not Surprising:
- Top Echelon medical school has mostly students from top echelon undergrads
- Yale SOM favors its own undergrads
- Vast majority of schools only send one student to Yale

Surprising (IMO):
- I did not expect the proportion of T30,20,10 to be so high. HYPSM was actually about where I imagined it would be, around 1/3 of all students, since I've always speculated that HYPSM give the most significant "prestige boost" of all undergrads and my anecdotal knowledge of these schools from experience / friends / etc. tells me that these schools are very well represented at top med schools.
- The skew of representation is pretty severe. The mean is obviously skewed to the right given the kind of distribution we're looking at, but even then Yale and Harvard alone are sending 15-20 times as many students as the statistically "average" school.
- The proportion of med students coming from the Top X undergrads has stayed remarkably constant through the past decade in spite of MCAT scores and GPAs creeping up for years. Remember, these are graduates of Yale SOM, so they entered 4-5 years prior to the year marked on the bulletin. 2001-2 was an entirely different world in terms of overall competitiveness for undergraduate and medical schools when compared to 2012-13. I think this is compelling evidence that weighing undergraduate institutions is a systematic practice of medical school admissions (at least at Yale, to be fair) and not simply a product of "the most competitive applicants always being at the most competitive undergrads". The latter explanation has some truth to it -- to be sure, the average Harvard applicant is probably a lot better than the average Kutztown applicant -- and it probably was even more true in 2001-2002 when undergrad admissions were significantly less competitive. It is also worth considering that Yale has historically shown a lot of love to non-traditional students, so these students might have graduated 2 to 3 years before that, even, on average. Just 10 years ago in 2007, Harvard's acceptance rate was nearly twice what it is today (the same is true for its peer institutions). Not that far back, in 1995, Penn had an acceptance rate of 30%. Combine that with the fact that the cost to go to college, graduate school, not starve, pay rent has all gone up rapidly in the past decade and you can't deny the fact that, today, more and more very bright, very competitive students are either not able to be admitted or able to comfortably afford going to prestigious, private undergrads (if they dont qualify or earn the generous aid these schools might offer). In the case of Yale, no one will deny that Berkeley is an academically comparable institution, and yet Harvard students outnumber Berkeley students at Yale SOM over the past decade by 5-6 times; Yale students, 7-9 times.

Notes
- International students (rather, anyone who studied at an international institution for undergrad) were excluded from any figures reported here. From visually inspecting the data, I saw a lot of "Oxford", "Cambridge", and big Canadian universities (McGill, Toronto).

- sooooo many yale students have graduate degrees. So many, in fact, that making sure I wasnt capturing graduate institutions was probably the biggest hurdle in altering my script. I'm interested to see where these graduate degrees are from. Upon inspection, I feel like a lot of students are getting grad degrees at Yale. I'd like to know what proportion of students from Top X undergrads have graduate degrees from Top X school before coming to Yale -- might be interesting, might be pointless.


- If anybody knows of similar documents for other schools, I'd be happy to do this for that school.
@efle
@Lawper

EDIT 08/03/2017: Fixed a bug in my program that failed to download all of the correct pages of the school's PDF bulletin. I have fixed the script and reproduced the graphs, the graphs you see in the OP from 08/03/2017 onwards are correct. The problem arose because of the way my browser indexed the pages of PDF downloads. I verified that the fix worked by counting two years of graduates by hand and the number of graduates counted by my script and by hand matched perfectly on both accounts. I also added five number summaries for the top 10 feeder schools only, and some plots for grad degree holding MDs.

Care to do a case study on Johns Hopkins Med, which has posted at least a couple of years worth of class rosters in its class bulletins that are posted online? My eyeballs tell me that your results are likely to be similar to Yale Med's but again without MCAT data, it's difficult to isolate the exact nature of the favoritism toward prestigious undergrads.

Bonus points if you also post links to resources that explain how to use Python to reformat .pdf documents.
 
I actually ran some of the internal data on Hopkins interviewees, which removes the major bias of matriculating to home region. That is, I imagine all the highly competitive applicants from California toss apps and get interviews from schools like Hopkins or Penn, but will tend to prefer to matriculate to Stanford or UCSF/UCLA. Looking at the list of interviewees and their alma maters should be a better glimpse at the phenomenon.

Here's a figure:

f0N37sY.png


I also shared some of the interesting numbers in this comment:

Alright, for those curious, here are some interesting values regarding the 2017-2018 cohort of interviewees at a top private medical school (n > 500):

Attended Canadian college: 3%
Attended a college ranked only regionally by US News: 3%
Was the only interviewee from their college for the year: 13%

Among schools receiving a US News national rank...

US News ranked 1-25: 68% of interviewed students
US News ranked 26-50: 13%
US News ranked 51-75: 8%
US News ranked 76-100: 3%
US News ranked from 101+: 9%

The best represented colleges...

Hopkins, Harvard, Duke, Stanford, UCLA, Yale, Penn, Brown, Cornell, Dartmouth, UNC, Princeton, WashU, USC, Berkeley, U Chicago, Columbia. These 17 together provided the majority (54%) of interviewees.

Interesting aside not from this dataset: While the above 17 produced most interviewees, it's still only a very small minority of the yearly premeds generated from these schools. We're talking ~300 interviewees from the above, when in 2016 there were ~6,400 total MD applicants from these same 17 colleges.

Universities which were ranked in the top 25 but did not provide any interviewees:

Caltech, Carnegie Mellon

Overall pretty similar, with ~70% of interviewed students coming from just a couple dozen feeder undergrads
 
  • Like
Reactions: 1 user
I actually ran some of the internal data on Hopkins interviewees, which removes the major bias of matriculating to home region. That is, I imagine all the highly competitive applicants from California toss apps and get interviews from schools like Hopkins or Penn, but will tend to prefer to matriculate to Stanford or UCSF/UCLA. Looking at the list of interviewees and their alma maters should be a better glimpse at the phenomenon.

Here's a figure:

f0N37sY.png


I also shared some of the interesting numbers in this comment:



Overall pretty similar, with ~70% of interviewed students coming from just a couple dozen feeder undergrads

Interesting to see less of a bias for HYPSM at Hopkins compared the bias for HYPSM at Yale (0.19 vs 0.38), but this might also be due to yale undergrad....
 
Interesting to see less of a bias for HYPSM at Hopkins compared the bias for HYPSM at Yale (0.19 vs 0.38), but this might also be due to yale undergrad....
Yeah, bunch of confounders here like Yale undergrad feeding a lot into its own med school, regional bias of New England vs Mid Atlantic, and this being interview invites instead of actual admits or matriculants.

Same general trend though, it feels like almost everybody went to the same few dozen major universities.
 
  • Like
Reactions: 2 users
Top