probability of first choice in match

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

BKN

Senior Member
15+ Year Member
Joined
Oct 31, 2005
Messages
1,560
Reaction score
4
Well here it is. Thanks to all who contributed.

I had 40 data points 33 from US allopaths, 4 from DOs and 3 IMGs. All matched.
Grains of salt:
1. biased because voluntary reporting. The "embarrassed" may be less likely to report
2. not enough DOs or IMGs to draw much in conclusions.
3. The commle is an average of mle 1 and mle 2.

Conclusions:
1. 73% got their first choice :D
2. The MLEs on average looked higher to me than the pool of interviewed applicants (back to the bias thing)
3. DOs were less likely to get their first choice (p=.025)
4. There was a clear trend for higher commmles to get their first choice but it didn't reach stat significance.

I've got a file that shows probability of matching to first choice (y axis) against average MLE (x). As near as I can see, I can't just insert the file, it's got to be a link to a URL. I'll post it once I get it on a website.

Members don't see this ad.
 
BKN said:
I've got a file that shows probability of matching to first choice (y axis) against average MLE (x). As near as I can see, I can't just insert the file, it's got to be a link to a URL. I'll post it once I get it on a website.

How big is the file? You can attach it as a word document or compress it into a zip file. It looks like you can't do excel. I don't know, maybe you have to be a paying member to attach files though, it says I have 0 bytes available. I am sure a paying member on the forum would attach it though.

Me poor, me not pay for forum :( . Bad girl. (But my new avatar still the cutest on the forum :p )
 
Members don't see this ad :)
BKN is the man.. no one needs to say more!!
 
StudentDoc327 said:
upload the file to www.zshare.net for us


btw thanks for putting that together

I'm letting em junkie try to insert it here, we'll see.

thanks.

BTW the site works faster with no background, I like it.
 
EM Junkie said:
Never got the email BKN.....

Will try again tomorrow! Time for :sleep:

Sent it last night at 7 my time. I'll try again.
 
BKN said:
Sent it last night at 7 my time. I'll try again.

Found the problem, image was too big for the limits. Will edit it at home and post tonight. Trust me. :laugh:
 
I've said it before, and I'll say it again. BKN, you rock :clap: !
 
EM Junkie said:
Here goes nothing....

thanks, Junk. now I'm gonna try to insert it directly.

not so bad.:)
 

Attachments

  • prediction graph 1.jpg
    22.9 KB · Views: 926
Members don't see this ad :)
I don't think I understand the graph. The x axis is an average of board scores and the y axis is the probabillity of getting your first choice but what are the individual points? Is each point an individual applicant? Then isn't their probabillity of getting their first choice either 1 if they got it or 0 if they didn't? I must be missing something but I'm too tired right now to figure it out.
 
Also, your equation has a slope of 0.00 but your line doesn't. ???
 
y = mx + b

I think m (slope) = 0.80, but the font is so small it looks like 0.00
 
ERMudPhud said:
Also, your equation has a slope of 0.00 but your line doesn't. ???

Oh, I hadn't wanted to get into this, I just wanted to give you a graph that you could use. But if so, I should have suppressed that function, which was from my stat program autmatically fitting a line to a graph that appeared to be linear. In fact it is not, it just happens that the part of the function shown appears to be close to linear.

Remember that the x axis (mle scores) are not bounded above in theory.

All probability functions have the characteristic that the cumulative area under the curve from -inf to +inf must equal 1 and that no y can be below 0 or above 1. Therefore they aren't linear since any form y=mx+b will have no boundaries in y or x.

I can't format this correctly even when I import the equation: eg(x) is actually e raised to g(x) power in what follows

The logistic regression model is :

Prob(x) = eg(x)/(1 + eg(x)), note that when g(x) is small, Prob(x) approaches 0, when g(x) is large Prob(x) approaches 1. and since anything raised to e returns itself the cumulative distiribution of prob(x) from -inf to +inf is 1.

g(x) = -3.8+.021*commle

What that means the whole function looks like is y value of almost 0 from x=0 to 150 or so and a value of almost 1 from x=270 or so on up to infinity.

The whole curve looks like a sigmoid similar to the hemoglobin dissociation curve which you are familiar with. In fact the dissociation curve is a probability function of proprotion of hemes associated against oxygen tension.

Aren't you glad you asked?:D
 
So, the points on your curve are calculated probabillities based on your logistical regression equation for the individual board scores you were given by the 40 or so people who reported to you but we don't know how well those calculated results compare to their actual success rates in getting their first choice. I guess to validate the model you would need a bunch of people next year to give you their board scores, predict their success rate, and then see how accurate you where. Can you select a specific MLE cutoff above which you will get your first choice and then calculate ROC curves or is there not enough data for that?

Sorry if I'm yanking your chain I'm just using this as an exercise to try to relearn an area of statistics that I hadn't thought about in years
 
ERMudPhud said:
So, the points on your curve are calculated probabillities based on your logistical regression equation for the individual board scores you were given by the 40 or so people who reported to you but we don't know how well those calculated results compare to their actual success rates in getting their first choice. I guess to validate the model you would need a bunch of people next year to give you their board scores, predict their success rate, and then see how accurate you where. Can you select a specific MLE cutoff above which you will get your first choice and then calculate ROC curves or is there not enough data for that?

Sorry if I'm yanking your chain I'm just using this as an exercise to try to relearn an area of statistics that I hadn't thought about in years

You know I :love: to talk about stats. But I am writing the program information form in prep for our RRC visit in a couple of months. Its over 100 pages usually. So SDN participation will be limited for a while.

First round validation is accomplished just by looking at each individual, seeing what his probability prediction was (<.5 won't match to first choice, >.5 will match) and comparing it to his actual outcome. Count up the correct predictions. In fact, the mle model is no better than saying everybody in the group will get their first choice. Remember that 73% did so and if you look at the graph, even the lowest scoring person had a better than .5 chance (that's the cutpoint). So the model just predicts everybody will match first choice and the same error rate as just predicting the average. You don't have to a ROC since the natural cut point is 0.5. All you do is solve the equation backwards. Looking at the graph it will be about 180-190. But we don't really have much data down there, so I'm not sure I would believe it.

The model that I did for d.o. status was significantly better than that. No MLE scores were added, just do or not. The model then predicted d.o.s would not match first choice and non-d.o.s would. Since 1 of 4 DOs did match first this model had 3 more correct predictions than just guessing the mean. Accuracy went from 73% to 77.5%. Since the model was significantly better, if this was a random sample of allopathic and osteopathic applicants (it's not), the reputed antiDO bias is confirmed. If significance is seen with only 4 DOs, it's a powerful effect. However, there's just too little data to have much confidence.

There's far too little data to do a model with all three factors in it (mle, do, img).

So if you want to learn this, the first book is "Applied Logistic Regression" by Hosmer and Lemeshow. It's all medical data sets and if you can find a copy of SPSS with the advanced stats package, you can go to town.

BTW what's your phud in?
 
Apollyon said:
A quote from 18 years ago (when I started college):

"SPSSx is from HELL!"

Precisely! It certainly was.

But the introduction of GUI versions with radio buttons and dialog boxes has made it a walk in the park. You don't have to write the syntax, just point and click. The problem now is newbies doing the wrong tests. The program will do them for you, but the don't mean anything.
 
BKN said:
First round validation is accomplished just by looking at each individual, seeing what his probability prediction was (<.5 won't match to first choice, >.5 will match) and comparing it to his actual outcome. Count up the correct predictions. In fact, the mle model is no better than saying everybody in the group will get their first choice. Remember that 73% did so and if you look at the graph, even the lowest scoring person had a better than .5 chance (that's the cutpoint). So the model just predicts everybody will match first choice and the same error rate as just predicting the average. You don't have to a ROC since the natural cut point is 0.5. All you do is solve the equation backwards. Looking at the graph it will be about 180-190. But we don't really have much data down there, so I'm not sure I would believe it.

Hmm. It's not Spanish. Hmm. It's not Turkish.... It's definitely not English. Well I'm all tapped out of languages. :oops:
 
trkd said:
Hmm. It's not Spanish. Hmm. It's not Turkish.... It's definitely not English. Well I'm all tapped out of languages. :oops:

:p
 
BKN said:
BTW what's your phud in?

Officially it says "Molecular Biology and Cellular Biophysics" but practically its molecular immunology and biochemistry. I do the rare t test and even then I go back and read my stats book to decide if I want two tailed or one tailed and equal or unequal variance. Otherwise I leave the stats to someone else. Logistical regression, multivariate analysis, etc... are mostly turkish to me.
 
ERMudPhud said:
Officially it says "Molecular Biology and Cellular Biophysics" but practically its molecular immunology and biochemistry. I do the rare t test and even then I go back and read my stats book to decide if I want two tailed or one tailed and equal or unequal variance. Otherwise I leave the stats to someone else. Logistical regression, multivariate analysis, etc... are mostly turkish to me.

But is the chi-squared Greek to you?
 
The problem, IMHO, with this graph is that there are two rounds of "selection" prior to the match. The first in the granting of interviews and the second at the interviews themselves. This graph, as BKN pointed out, shows that the U.S. grad with a USMLE of 190 still has a good chance matching at their first choice. Which may be true given the ROL as entered. But, what would be more beneficial in terms of planning for one's career (as a soon to be M4) would be the probability of being able to "call one's shot" based on any metric (or multiples). At what USMLE score (or grades, or experience, or research, etc.) can a U.S. grad (MD or DO) state at the start of 4th year, "I want to go to program x" and end up there? I know that is unmeasurable, but I really don't buy that the score provides any predicitive value at all. The interview process is going to select out the "likely" match for everyone. That is, applicants are, by human nature, going to rank places they feel "want them" more highly. The extension of the interview in the first place indicates that the USMLE score is at least "acceptable" to the program.

What might be interesting to analyze next year is score versus interviews offered as a percentage of applications (in other words, at what score "cutoff" will you likely see >95% of your applications become interview offers). That way, a person with a 190 (or a 200, 210, or 240) USMLE could have an idea of how many applications were needed to reach the "magic" 10 interviews.
 
Squad51 said:
The problem, IMHO, with this graph is that there are two rounds of "selection" prior to the match. The first in the granting of interviews and the second at the interviews themselves. This graph, as BKN pointed out, shows that the U.S. grad with a USMLE of 190 still has a good chance matching at their first choice. Which may be true given the ROL as entered. But, what would be more beneficial in terms of planning for one's career (as a soon to be M4) would be the probability of being able to "call one's shot" based on any metric (or multiples). At what USMLE score (or grades, or experience, or research, etc.) can a U.S. grad (MD or DO) state at the start of 4th year, "I want to go to program x" and end up there? I know that is unmeasurable, but I really don't buy that the score provides any predicitive value at all. The interview process is going to select out the "likely" match for everyone. That is, applicants are, by human nature, going to rank places they feel "want them" more highly. The extension of the interview in the first place indicates that the USMLE score is at least "acceptable" to the program.

What might be interesting to analyze next year is score versus interviews offered as a percentage of applications (in other words, at what score "cutoff" will you likely see >95% of your applications become interview offers). That way, a person with a 190 (or a 200, 210, or 240) USMLE could have an idea of how many applications were needed to reach the "magic" 10 interviews.

Good post. I agree and I said that this data is biased and post hoc. I also pointed out that just guessing that you would get your first choice was as good as inserting your mles. The people providing the info are generally those who succeeded and there is very little data below 200, which is actually the group of most interest. In short, this data set is for our mutual amusement and shouldn't be taken very seriously.

As for the rest of your post: People may gravitate to places that they think want them, but you really can't tell. None of us are going to be rude to applicants. It's both cruel and against our interests. And in fact, I will be encouraging because I'd be happy to train the vast majority of those I interview.

BTW I don't think that there is a mle score that will guarantee >95% of your applications become interviews. There are too many other things used to make the offer decision (dean's letter, SLORs, transcripts, part of the country, phases of the moon, etc).

But if your desire is to know how many applications to ensure 10 interviews, just ask a faculty who does a lot of advising (me for instance). Here's my guess and I imagine if you ask others around the country they won't differ by that much.:

1. MLE>230, 20 apps
2. 230>MLE>200, 30 apps
3. MLE<200, 50 apps

If you failed a couple of courses or annoyed somebody, are a D.O. or an IMG apply more places than indicated. If it gets to be Nov 15 and the interviews aren't rolling in, apply more places.

Other stuff:

Take your EM rotation(s) in academic EM centers in July, August or September. See the chair or the PD as soon as you get there and request enough shifts with them to get a good SLOR in place no later than Oct 10. Arrive early, stay late, convince everybody that you'd be a dream to work with. I suspect that in the end good SLORs carry as much weight as anything.
 
Bumping. Quinn or spyderdoc - this might be a good FAQ link!!!
 
Dr Mom or Quinn,

Seems I hijacked my own thread. Is it possible to take posts #30-33 and make a new thread out of them with a new title? Something like "2006 match stats for EM".
 
BKN said:
It's an attempt to suggest the relationship between # of positions offered by the institutions and the numbers of positions desired. It does not say anything about the numbers of each category who matched. Those numbers are in different tables in the report, which I presume scutwork will post as the interview season fires up.

My post says that 1383 positions were offered and if you divide that by the total number of US seniors who ranked an EM program you get 1.3. If by total number of IAs who ranked EM you get 3.1 and if by all applicants 0.9.

I conclude that 1 person in 10 who ranked EM did not get an EM position.

Presumably those unfortunates are mostly those who did not make EM a first choice. That category is almost exactly 1/10 the applicants. The only way that wouldn't be true is if the not ranked EM first group was generally going for more competitive specialties with EM as a backup.

So the end result? There was a position for almost everybody who wanted one. :)

They just had to play their interview season right.

This data is cool (thanks so much BKN), but it seems so suprising to me. As a rising M3 it seems like everyone in the world wants to do EM. Where do they all end up?
 
AmoryBlaine said:
This data is cool (thanks so much BKN), but it seems so suprising to me. As a rising M3 it seems like everyone in the world wants to do EM. Where do they all end up?
I think alot of this has to do with the mistaken "observation" that EM is a lifestyle specialty. Not sure what experiences they are receiving during school but I'm sure there will be an "abandon ship" call soon.....I hope so....I only have 6 years to wait to apply. :laugh:
 
BKN said:
Dr Mom or Quinn,

Seems I hijacked my own thread. Is it possible to take posts #30-33 and make a new thread out of them with a new title? Something like "2006 match stats for EM".


I moved those posts to a new thread as requested. If anyone wants to post responses to the stats from 2006, use the link. :)
 
DrMom said:
She's on vacation & the Highland cow is filling in :)

I like cows better. :thumbup: Moooooo.
 
Bump for the new interview season
 
reassuring, but I'm still very anxious as my posting at 2:30 AM would attest to.
 
Top