please explain what 3 digit score means in percentiles

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

deuce

New Member
7+ Year Member
15+ Year Member
Joined
Jan 21, 2004
Messages
56
Reaction score
0
USMLE site is very recondite about this - their web site says mean is 210-230 depending on the year and SD is 20 - so does that mean a 240 is 75th percentile and a 260 means you scored higher than 95 % of those taking step 1 ?

Members don't see this ad.
 
Actually 240 is the 85th percentile and 265-266 is the 99th percentile
 
Members don't see this ad :)
Actually 240 is the 85th percentile and 265-266 is the 99th percentile
If the mean is 221 and SD is 20, then 241 is 1 SD above mean. 34% of scores fall between the mean & 1 SD above the mean.

so 241 is: 50(mean) +34 (1SD above) = 84 %ile.
http://www.mnstate.edu/wasson/ed602lesson7.htm
Aren't you guys basically saying the same thing? For all intents and purposes, I doubt it'll matter too much whether you're equating 240 with the 85th percentile or 241 with the 84th.


His calculation is correct; the only error is assuming that the scores are normally distributed.
Are they not close enough to approximate usefully? Honest question, by the way. I've never seen a national Step 1 score breakdown.
 
Are they not close enough to approximate usefully?

I've only seen the breakdown for my school, which had moderate negative skew (that is, a long tail stretching to the low scores; median > mean). If it is representative of the nation, using the above method would overpredict your percentile somewhat.

It should still be "useful" for whatever you need a percentile for.
 
McGillGrad, I'm pretty sure nshams knows the difference between "percent" and "percentile". Just because you've been hanging around the Step 1 forum for years like a college kid hanging around a high school campus doesn't make you some sort of grand master authority on the subject especially when you can't even explain the logic behind your statements. At least nshams attempted to explain his reasoning, whereas all you could come up with is they sky is blue and space has no color.

Anyway.
This guy. He's the one who created that NBME-USMLE score correlation chart. I think he may be on to something.

One other mathematical fact:
The 2-digit score USMLE provides is not based on any statistical distribution. It is a simply a linear representation of 3-digit score on a different scale.
We know that minimum passing score is 185, and from score reporting forum threads we know that for last one year or so the minimum 99 is 236.

My assumption is that 185/236 also represent standard-deviation times some number on a 3-digit statistical distribution graph. For example if SD = 20 and mean = 215, then 185 = mean - 1.5 * SD, and 236 = mean + 1.05 * SD.

But regardless of what they represent, the 2-digit score can be calculated by following simple line-equation:

Y = (Y2 - Y1) / (X2 - X1) * (X - X1) - Y1

Where Y2 = 99, Y1 = 75, X2 = 236, X1 = 185,
So the equation can be simplified to:
Y = (99 - 75) / (236 - 185) * (X - 185) - 75
Y = (0.4715 * (X - 185)) - 75
Y = (0.4715 * X) - 87.5 - 75
Y = (0.4715 * X) - 12.5

So for 236/185 markers we can compute a complete 2-digit table using this equation.

It is a little more complicated than that.
Let me try to explain how exams like USMLE are designed. What I am going to describe is based on general theory, and USMLE most likely does things differently.

Designing a good exam is a science by itself. A lot of statistical calculations go into finding the perfect question, answer choices, and score distribution. There are many statisticians who work along side field-professionals to design professional exams.

USMLE uses statistical normalization to compute and calibrate score for each question; yes! each and every question. That is also the reason, why you never know about maximum possible score. How does it work? I don't know how USMLE does it, but here is an example of how it can be done:

Suppose I am to start teaching a class in a college. To test my students I want to develop a question-bank from which I can give the final exam every semester. One of my goals is to keep adding new questions to this bank and keep removing the old ones. When adding the questions, I want to make sure that questions are not very-very easy and they are also not very-very difficult. My other goal is to keep the grading standard consistent. For this I define my own statistical distribution based on either some standard mathematical model or define a new one. My plan is to give 300 questions on final and for this I pick a distribution with mean M = 215 and standard-deviation SD = 20. I will call it SKM95 distribution.

Given all the parameters, now let's work on the problem in a simplest possible way.

I will start by giving a practice test to all my students every week. Let's assume that each question on the test is worth 10 points. After every practice test I will compute mean and standard-deviation for each question. So, let's say that for Question-1 (Q1) the M = 5.2 and SD = 1.5.

Based on M and SD of every practice test question, I can decide which questions are too difficult and which are too easy. I can then remove those from my final question-bank, e.g. if on particular practice test question all students get 10/10 or all get 0/10 then that question will not make to final, I will drop it from question-bank.

At the end of the semester I will have a full question-bank from which I can randomly pick 300 questions and give the final to my students. Let's assume Q1 is on the final. Now pick 3 students A,B, and C randomly. On the final A gets 10/10, B gets 0/10, and C gets 7/10 on Q1. Now I will compute z-value for these 3 students, for Q1.
Formula for z = (Score - M) / SD.
For A, z = (10 - 5.2) / 1.5 = 3.2
For B, z = ( 0 - 5.2) / 1.5 = -3.47
For C, z = ( 7 - 5.2) / 1.5 = 1.2

For a full 300 question final, the M and SD will be different per question and to compute final z-score I have to add z-values for all 300 questions and divide by 300.

For this example let's assume that final exam has only one question. So, now the next step is to compute final score by mapping this z-score to my SKM95 distribution, and that can be computed by using the same z-value formula, except in this case the unknown is Final Score, so the equation will be:
Final Score = M + (SD * z)
For A, Final = 215 + (20 * 3.2 ) = 279
For B, Final = 215 + (20 * -3.47) = 146
For C, Final = 215 + (20 * 1.2 ) = 239

You can see that even with 100% correct the Max score was 279 and not 300, and with 0% correct the minimum was 146 and not zero.

Now you know why no one knows the maximum score on USMLE, and since very question has different z-score therefore no one can answer the question, "how many questions I need to answer to get a 236/99?"

In this imaginary scenario I can also repeat some of the questions in next semester practice tests. For these questions the M and SD will have to be re-computed based on new sample size (previous semester class size plus next semester class size). It will skew the per question distribution curve slightly so before the next semester final I will have re-adjust my SKM95 mean and standard deviation (.e.g. I may have to move M from 215 to 214 and SD from 20 to 21, etc..), to keep the grading standard consistent.

Now you know why USMLE SD and M shifts over period of time.

In my make-believe world I can also add some 50 new practice test questions to final and increase the final to 350 questions and call these practice questions experimental questions. These 50 won't be counted towards final score and only I will know about these experimental questions (sound familiar!)

With this scoring system all the students can theoretically get a perfect 270+ on final exam. But probability of that happening is next to zero, why? Because in practice/experimental questions if all students get 10/10 or 0/10, then that question is thrown away. But on paper I can still claim that every one can get a perfect score ;-)

Now this was a very simple explanation of how a scoring system like SUMLE can be designed. In reality it is much more complex.

USMLE, most likely, repeats same experimental question for 1 year before converting it to a question that counts, therefore sample size is huge/10000+.
Each experimental question then goes through item-analysis. This is where some of the following functions are performed:

Calculate p-value. This is probability of getting this question right. On a five-response multiple choice question, optimum difficulty level is 0.50 for maximum discrimination between high and low achievers.

Calculate point-biserial correlation (PBC). This is how good students did on question compared to their over all test score. A highly discriminating question indicates that the students who had high tests scores got the question correct whereas students who had low test scores got it incorrect. Goal for USMLE like exam is to get PBC of 0.4 or more.

Calculate Reliability coefficient. Using Kuder-Richardson formula compute the degree to which a question measures a single cognitive construct. Goal is to get 0.9 or above for USMLE like exam.

Distracter Analysis. On a multiple choice question if A is correct answer then B,C,D, & E are distracters (wrong answers). Distracters should appeal to low scorers who have not mastered the material whereas high scorers should infrequently select the distracters.

Distribution skew. In most processional exams the distribution is always negatively skewed (to right), including USMLE exam.

So, based on item-analysis USMLE can design each and every test question with very well defined boundary conditions. Ever wondered why Kaplan or UW score estimators predict a wide ranging score compared to NBME tests? That's because NBME questions are taken from actual USMLE exams, and all of them have gone through rigorous item-analysis.

OK! that's enough statistics for the day! Back to studying for Step-1!!!!!

One other mathematical fact:
The 2-digit score USMLE provides is not based on any statistical distribution. It is a simply a linear representation of 3-digit score on a different scale.
We know that minimum passing score is 185, and from score reporting forum threads we know that for last one year or so the minimum 99 is 236.

My assumption is that 185/236 also represent standard-deviation times some number on a 3-digit statistical distribution graph. For example if SD = 20 and mean = 215, then 185 = mean - 1.5 * SD, and 236 = mean + 1.05 * SD.

But regardless of what they represent, the 2-digit score can be calculated by following simple line-equation:

Y = (Y2 - Y1) / (X2 - X1) * (X - X1) - Y1

Where Y2 = 99, Y1 = 75, X2 = 236, X1 = 185,
So the equation can be simplified to:
Y = (99 - 75) / (236 - 185) * (X - 185) - 75
Y = (0.4715 * (X - 185)) - 75
Y = (0.4715 * X) - 87.5 - 75
Y = (0.4715 * X) - 12.5

So for 236/185 markers we can compute a complete 2-digit table using this equation.

There is no need to panic. I don't think there will be any change in calculation method for the USMLE final score. It will still be derived by mapping the raw score to a pre-defined distribution. There may be slight shift in the predefined distribution, but overall anyone who would have scored 185 in 350-exam will still score 185 in 336-exam and those who would have scored 265+ in 350-exam will still score 265+ in 336-exam.

I don’t think USMLE picks questions randomly from there question-bank for each administration of exam. They have pre-defined exams, and for each exam there is a pre-defined mean and standard-deviation. Let's consider a hypothetical example of 2 exams. Each exam has 300 questions (assume 36 are experimental out of 336). Exam-1 has mostly difficult questions. So based on statistical analysis of each question in that exam, the mean for Exam-1 is M=180 and standard-deviation SD=15. Exam-2 has easy questions, where M=240 and SD=25.
Now let's pick 2 students A and B.
During exam preparation student A gets 480 (214 3-digit) on all 6 NBMEs and student B gets 680 (253 3-digit) on all 6 NBMEs.

They both take Exam1 one day and Exam2 the next day.
Student A scores 175 on Exam1 and 235 on Exam2.
Student B scores 205 on Exam1 and 285 on Exam1.

Now let's map this to our USMLE distribution with M=218 and SD=22.

Z for Student A in Exam1 = 175-180/15 = -0.3333
Z for Student A in Exam2 = 235-240/25 = -0.2

Z for Student B in Exam1 = 205-180/15 = 1.6666
Z for Student B in Exam2 = 285-240/25 = 1.8

USMLE score for student A in Exam1 = -0.33*22+218 = 211
USMLE score for student A in Exam2 = -0.20*22+218 = 213

USMLE score for student B in Exam1 = 1.66*22+218 = 254
USMLE score for student B in Exam2 = 1.80*22+218 = 257

So, relax! You will still get a score close to what your NMBE average is/was (as long you don't panic during exam!).

Now, for fun, here are 3 equations to compute your percentile, 2-digit score and 3-digit score from NBME score.

Enter you NBME score in spreadsheet cell A1

Enter following 3-digit equation in cell A2
=IF(AND(200<=A1,A1<=800),ROUND((70.2-0.0000002067*A1^3+0.0000937*A1 ^2+0.30096*A1),0),"Error")

Enter following 2-digit equation in cell A3
=IF(AND(100<=A2,A2<=300),IF(A2 > 236, 99, ROUND((0.4715*A2-12.5),0)),"Error")

Enter following percentile equation in cell A4
=IF(AND(200<=A1,A1<=800),ROUND(NORMSDIST((A1-500)/100)*100,2),"Error")

No wizardry here!
Fact 1.
NBME to 3-Digit USMLE data is provided here (by NBME):
https://apps.nbme.org/nsasweb/doc/sample_CBSSA.pdf
Fact 2.
NBME score data (200-800) in the link above has normal-distrubution, therefore percentile score can be easily computed.
Fact 3.
2-Digit USMLE score is a straight line from passing 75 (185) to Min. 99 (236).
Fact 4.
Approx. correct answers data can be derived if you have developed OCD with analyzing NBME/USMLE scoring system!!!!
 
McGillGrad, I'm pretty sure nshams knows the difference between "percent" and "percentile". Just because you've been hanging around the Step 1 forum for years like a college kid hanging around a high school campus doesn't make you some sort of grand master authority on the subject especially when you can't even explain the logic behind your statements. At least nshams attempted to explain his reasoning, whereas all you could come up with is they sky is blue and space has no color.

Anyway.
This guy. He's the one who created that NBME-USMLE score correlation chart. I think he may be on to something.



:laugh::laugh::laugh:
 
Top