Why are step 1 and 2 averages rising?

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

Newyawk

Full Member
7+ Year Member
Joined
Sep 30, 2016
Messages
751
Reaction score
1,586
CK now has an average of 244 with an SD of 17, up 2 points from just this past autumn. While step 1 has been stable over the past few years, since 2011 or so it has increased by around 3.

Im not a statistician by any means. Why does the USMLE allow for this? How is this 3 digit number calculated? Like other seemingly arbitrary test scores (sat, mcat) why isnt the nbme scaled around an established mean based on the populations performance? Wouldnt this make it easier for PDs to compare applicants over time?

Members don't see this ad.
 
CK now has an average of 244 with an SD of 17, up 2 points from just this past autumn. While step 1 has been stable over the past few years, since 2011 or so it has increased by around 3.

Im not a statistician by any means. Why does the USMLE allow for this? How is this 3 digit number calculated? Like other seemingly arbitrary test scores (sat, mcat) why isnt the nbme scaled around an established mean based on the populations performance? Wouldnt this make it easier for PDs to compare applicants over time?

Contact Us
 
Members don't see this ad :)
CK now has an average of 244 with an SD of 17, up 2 points from just this past autumn. While step 1 has been stable over the past few years, since 2011 or so it has increased by around 3.

Im not a statistician by any means. Why does the USMLE allow for this? How is this 3 digit number calculated? Like other seemingly arbitrary test scores (sat, mcat) why isnt the nbme scaled around an established mean based on the populations performance? Wouldnt this make it easier for PDs to compare applicants over time?

My score report said our average was 228 and 242 for c/o 2019.
 
Multifactorial: Firstly, USMLE averages are rising because studying is becoming more efficient. There are lots of companies out there that have spent years optimizing high yield study resources so you don't have to. It started with the classic First Aid / UWorld, then pathoma, sketchy, anki, MedEd, USMLERx not to mention countless other high yield study resources. You no longer have to sift through thousands of lecture notes, heck you don't even have to waste time sitting through lectures anymore. Just watch at 2x speed in your bedroom and use the saved time for extra board prep. You don't have to agonize through multiple 1400 page mind-numbing textbooks in a library somewhere to try and wrap your head around a specific concept that's giving you trouble, you can just search wikipedia or uptodate. Studying for boards has been made magnitudes more efficient than it was even 10 years ago and continues to get more efficient and polished with each new resource and revision released every year. And secondly, entry barriers to med school continue to rise, meaning the average matriculant continues to be slightly higher performing than in years previous. The result of these changes is continuously rising average board scores throughout recent years.
 
Multifactorial: Firstly, USMLE averages are rising because studying is becoming more efficient. There are lots of companies out there that have spent years optimizing high yield study resources so you don't have to. It started with the classic First Aid / UWorld, then pathoma, sketchy, anki, MedEd, USMLERx not to mention countless other high yield study resources. You no longer have to sift through thousands of lecture notes, heck you don't even have to waste time sitting through lectures anymore. Just watch at 2x speed in your bedroom and use the saved time for extra board prep. You don't have to agonize through multiple 1400 page mind-numbing textbooks in a library somewhere to try and wrap your head around a specific concept that's giving you trouble, you can just search wikipedia or uptodate. Studying for boards has been made magnitudes more efficient than it was even 10 years ago and continues to get more efficient and polished with each new resource and revision released every year. And secondly, entry barriers to med school continue to rise, meaning the average matriculant continues to be slightly higher performing than in years previous. The result of these changes is continuously rising average board scores throughout recent years.
This doesnt answer my question though

The NBME is graded on a curve with a seemingly arbitrary 3 digit numerical value. Why is the average changing if its graded on a curve?
 
This doesnt answer my question though

The NBME is graded on a curve with a seemingly arbitrary 3 digit numerical value. Why is the average changing if its graded on a curve?
I could be wrong, but I think there is a score that is considered "failing" and not the minimal competency to pass. I believe more people are getting farther from the failing threshold and the NBME minimum level of knowledge is being surpassed by a larger number of people. Therefore the central tendency of the scores may be going up but an average is an average...
 
i am not sure why step 2 has a higher average than step 1. Both exams should be curved to the same average +/- few points depending on the distribution
 
The simple fact is the increase in competition has narrowed the base of the functional triangle. The basically fixed number of slots vs. an increasing population size has kept the demand and therefore the competition high. That drives the numbers up predictably.
 
The USMLE Step exams are not graded on a curve - they're graded on a scale. So if everyone got 90% of questions right, they'd all get 255 or higher (no one knows what the exact scale is). There is also an equating process where you get added points if you have a hard form and deducted points if you have an easy form. I would guess that the NBME sets a target mean score of 230 (with SD of 20) for step 1 but if students outperform then the average score will go up. In which case they'll probably make the exam harder. Average step 1 scores have been pretty consistent since 2014.
 
This doesnt answer my question though

The NBME is graded on a curve with a seemingly arbitrary 3 digit numerical value. Why is the average changing if its graded on a curve?

You aren’t just graded against students in your testing cohort or even testing year. You’re scored against everyone who has ever taken it. So as someone above said studying is getting more efficient and on average we are now scoring higher than those who took it years or even decades ago.

As for how individual questions are scored, I would imagine that again as someone above posted questions get awarded some value based on level of difficulty. While I can’t begin to speculate on what the value algorithm is, what we do know is that it’s based on percent of student who get a given question correct. If you look at their question writing standards they track percent of students getting a question correct over time. Questions get removed once precent correct goes above a certain value based on their original difficulty, on the assumption that students are sharing it with each other.
 
The USMLE Step exams are not graded on a curve - they're graded on a scale. So if everyone got 90% of questions right, they'd all get 255 or higher (no one knows what the exact scale is). There is also an equating process where you get added points if you have a hard form and deducted points if you have an easy form. I would guess that the NBME sets a target mean score of 230 (with SD of 20) for step 1 but if students outperform then the average score will go up. In which case they'll probably make the exam harder. Average step 1 scores have been pretty consistent since 2014.

Even if true, that scale is curved and that curve is scrutinized annually by our NBME overlords.

Competition is one thing, but another thing that cannot be denied is the preparation material is getting better as well. No UWorld or Sketchy 10 years ago.
 
The score is very likely based off of a formula of percent correct. This explains the slight variations in average/standard deviation of score year-to-year. Otherwise, scores could be reported just as percentiles (which, you could argue, they should be).

The formula likely sets a certain value to each question based on how many students got it right. That allows for a kind of curving/consistency across years as the value can be updated each year. That would also explain the experimental questions: data collection to increase sample size for future value calculation.
 
The score is very likely based off of a formula of percent correct. This explains the slight variations in average/standard deviation of score year-to-year. Otherwise, scores could be reported just as percentiles (which, you could argue, they should be).

The formula likely sets a certain value to each question based on how many students got it right. That allows for a kind of curving/consistency across years as the value can be updated each year. That would also explain the experimental questions: data collection to increase sample size for future value calculation.
This makes the most sense. Thanks. I still dont see why its done this way. I guess bc testing occurs all year round and many different forms exist?
 
This makes the most sense. Thanks. I still dont see why its done this way. I guess bc testing occurs all year round and many different forms exist?

Because of multiple forms, unanticipated response to questions (should a question that 95% of students got correct count the same as one that just 3% got correct?) and because of the need for standardization. NBME realizes program directors, etc, probably pay less attention to specific distributions of scores vs "cut-offs," so scores have to be meaningful in terms of a greater spectrum of time than a single year. A 240 has to mean something specific regardless of the year, for example. It has to always be a pretty good, but not amazing, score.

The NBME has to balance two aspects of score: the percentile in which the tester ends up (how smart are they compared to their peers?) vs the total percent correct (objectively, how well did they do on the test?) For a distribution without effective bounds on either end (almost every test-taker gets between 60-90% of questions correct, without many doing either better or worse), this is an effective approximation.


if this exam is not curved, how can give they a percentile.

Scores may be normally distributed, but not assigned based on a normal distribution. You could get a percentile score based on your 3rd grade spelling test (if enough 3rd graders took the same test). Doesn't mean it's curved.
Put another way, a percentile is a statistical description of a score in a distribution, i.e. "20% of students scored above a 90% correct on the exam." A "curve" is a modification of that score to fit a pre-determined distribution, i.e. "let's give the top 20% of performers on this test an A, regardless of what percent they got correct."
 
You aren’t just graded against students in your testing cohort or even testing year. You’re scored against everyone who has ever taken it. So as someone above said studying is getting more efficient and on average we are now scoring higher than those who took it years or even decades ago.

As for how individual questions are scored, I would imagine that again as someone above posted questions get awarded some value based on level of difficulty. While I can’t begin to speculate on what the value algorithm is, what we do know is that it’s based on percent of student who get a given question correct. If you look at their question writing standards they track percent of students getting a question correct over time. Questions get removed once precent correct goes above a certain value based on their original difficulty, on the assumption that students are sharing it with each other.
This isn't really how it works per my understanding. Individual questions aren't assigned a value, your overall test is. It is blended in a manner that each test contains the same number of easy, medium, and challenging difficulty questions. Everyone who takes a given administration of an exam is compared to the expected performance of that exam, as determined by comparing that particular exam with historical exams in relative difficulty. This process is called normalization. This is not curving, as the exam scores ARE NOT changed based on your performance relative to others who took the same exam. Rather, the NBME evaluates the exam extremely carefully to ensure that it falls within expectations and evaluates any questions that prove too challenging or too easy to determine if they should be reclassified with regard to difficulty, which will change the expected number of correct answers to achieve a given normalized score. If students are suspected of sharing answers to questions, their results are invalidated. If a question that was thought to be challenging turns out to not be so sure to new study resources, it is reclassified with regard to difficulty, not removed.
 
In the end, it's relative. As test scores rise, programs adjust their parameters.
 
Old docs: "Because millennial students nowadays are soft, weak, fluff cakes. They don't grind as hard as us... Wait... Their test score averages are higher now???"

Cricket... cricket... cricket...

The alternative hypothesis is that the next generation of doctors are better test-takers, and not necessarily better doctors than previous generations. But, we are what the system forces us to be, with the emphasis on Step 1 as either the highest or nearly the highest contributing factor on PD surveys as far as getting interview invites and ranking candidates.

Check out the blog posts re: USMLE by Bryan Carmody, MD (pediatric nephrologist at EVMS) at:

The Sheriff of Sodium
 
Top