Why are Step 2 scores so much higher than Step 1?

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

MedScat

Full Member
5+ Year Member
Joined
Jul 5, 2017
Messages
542
Reaction score
785
Is step 2 easier than step 1? Do shelf exams prepare you for Step 2? What's different about the two exams? I just got my step 1 score and although I'm happy with my score (scored around average for most regular specialties, not derm or anything like that), I have a terrible fear that I won't be able to keep up the performance for step 2. But I am genuinely curious why people tend to score higher on step 2

Members don't see this ad.
 
You have to compare the percentile not the scaled score. It’s a different test with a different average and a different number of questions. Most people will score near the same percentile as their step 1. On average people do less dedicated board prep for ck than step 1 which probably allows some people to overcompensate and achieve above average score increases.


Sent from my iPhone using SDN
 
  • Like
Reactions: 1 user
Scoring on the exams is not directly comparable.

I thought the actual exam was a little harder than Step 1, but when I got my score back I was shocked at how high it was given how I thought I did. It was higher than my Step 1 score and I took it without studying on my one day at home during a week I went on 3 interviews. I would guess that the scoring is rather more generous on CK.

The content for CK is basically all the shelf exams combined, so you will have seen it all and studied it all before. This definitely makes it possible to just review a little bit before you take it and go for it. Most people I know studied part-time for 1-10 days some time during fourth year and took it.

If you work hard and pay attention on your clinical rotations (including shelf studying) I predict you will be just fine.
 
  • Like
Reactions: 1 users
Members don't see this ad :)
Average step 1 is 230 something and step 2 is 240 something...scoring higher on step 2 does not necessarily mean doing better...look at your percentile like other posts mentioned...you can find percentile easily by google.
 
step 2 is an easier, and lower quality exam. You can pretty much just wing it and get a decent score since no one was particularly "trying" to get good scores in previous years, decades.
 
  • Like
Reactions: 1 user
The scales are different for them. Most people will see a bump on Step 2. My Step 2 score was 13 points higher than my Step 1 score but the percentiles were roughly the same, with Step 2 being maybe 3-4 points higher.
 
Also, for reference, the passing score for Step 1 is 194 and the passing score for Step 2 CK is 209.
 
Step 2 has about 20 Qs more (300 vs 280)

The bare minimum test taker has to get ~60-70% to pass, and the high end are scoring near 100%, so let's just ballpark the typical tester as low 80s.

81% of 280 on Step 1= 226.8 (actual avg is 226)
81% of 300 on Step 2= 243 (actual avg is 243)


So there you go. The gap in scores can be explained by the additional Qs, if performance is constant.
 
  • Dislike
Reactions: 1 user
Step 2 has about 20 Qs more (300 vs 280)

The bare minimum test taker has to get ~60-70% to pass, and the high end are scoring near 100%, so let's just ballpark the typical tester as low 80s.

81% of 280 on Step 1= 226.8 (actual avg is 226)
81% of 300 on Step 2= 243 (actual avg is 243)


So there you go. The gap in scores can be explained by the additional Qs, if performance is constant.
This is entirely wrong. It's based off of percentiles. NBME has never explained why the scores are actually different and you're equating percent correct to the overall score.
 
  • Like
Reactions: 1 users
This is entirely wrong. It's based off of percentiles. NBME has never explained why the scores are actually different and you're equating percent correct to the overall score.
You've got a lot of reading to do. The step exams are criterion-referenced, not norm-referenced, so they're actually probably the first major standardized exam you've taken where percentiles are not the basis of the scaled numerical. Dr Carmody has a nice blog that details the history and design of the exam if you wanted to read a few articles.
 
  • Haha
Reactions: 1 user
Step 2 has about 20 Qs more (300 vs 280)

The bare minimum test taker has to get ~60-70% to pass, and the high end are scoring near 100%, so let's just ballpark the typical tester as low 80s.

81% of 280 on Step 1= 226.8 (actual avg is 226)
81% of 300 on Step 2= 243 (actual avg is 243)


So there you go. The gap in scores can be explained by the additional Qs, if performance is constant.
I was under the impression that 20+% of the questions are experimental (I.e. unscored). Dont think that goes against the point of ur post, but the numbers
 
  • Like
Reactions: 1 user
I was under the impression that 20+% of the questions are experimental (I.e. unscored). Dont think that goes against the point of ur post, but the numbers
Yeah it actually turns out a full 80 questions on Step 1 are experimental, that was accidentally revealed by the USMLE when they described their plan to drop those questions and give a 200 item test. Makes sense why nobody can accurately predict how they did at the end of test day, when nearly a third of items are unscored.

So I'm sure there's a series of small adjustments, like stretching your percent on the 200 validated items to fit the 280 point scale, or slight adjustments to align different forms' difficulty.
 
  • Like
Reactions: 1 users
Members don't see this ad :)
You've got a lot of reading to do. The step exams are criterion-referenced, not norm-referenced, so they're actually probably the first major standardized exam you've taken where percentiles are not the basis of the scaled numerical. Dr Carmody has a nice blog that details the history and design of the exam if you wanted to read a few articles.

That's why i was trying to push Steps to be more like the MCAT in design but got shot down because Steps should be used for competency and nothing else :(
 
I honestly felt better prepared for step 2. All those shelf exams you take all year allow you to prep well for the entire year. I jumped >30 points on CK from step 1
 
  • Like
Reactions: 1 user
Step 2 has about 20 Qs more (300 vs 280)

The bare minimum test taker has to get ~60-70% to pass, and the high end are scoring near 100%, so let's just ballpark the typical tester as low 80s.

81% of 280 on Step 1= 226.8 (actual avg is 226)
81% of 300 on Step 2= 243 (actual avg is 243)


So there you go. The gap in scores can be explained by the additional Qs, if performance is constant.

Step 2 has 318 questions


Sent from my iPhone using SDN
 
Step 2 has about 20 Qs more (300 vs 280)

The bare minimum test taker has to get ~60-70% to pass, and the high end are scoring near 100%, so let's just ballpark the typical tester as low 80s.

81% of 280 on Step 1= 226.8 (actual avg is 226)
81% of 300 on Step 2= 243 (actual avg is 243)


So there you go. The gap in scores can be explained by the additional Qs, if performance is constant.

What weird logic. This implies that the score you receive is equivalent to the number of questions you get correct.
 
  • Like
Reactions: 1 users
What weird logic. This implies that the score you receive is equivalent to the number of questions you get correct.

It must be more nuanced than this because test difficulty has to be taken into consideration. Some versions are definitely easier than others.
 
  • Like
Reactions: 1 user
Step 2 has about 20 Qs more (300 vs 280)

The bare minimum test taker has to get ~60-70% to pass, and the high end are scoring near 100%, so let's just ballpark the typical tester as low 80s.

81% of 280 on Step 1= 226.8 (actual avg is 226)
81% of 300 on Step 2= 243 (actual avg is 243)


So there you go. The gap in scores can be explained by the additional Qs, if performance is constant.
None of that makes sense. Doesn’t account for experimental questions, doesn’t account for adjustments made based off difficulty, assumes 81% is the average score people get by saying the passing score is 60-70%(where did you even come up with that?), and step 1 average is 231 and step 2 has 318 questions
 
  • Like
Reactions: 1 user
Step 2 has 318 questions


Sent from my iPhone using SDN
It must be more nuanced than this because test difficulty has to be taken into consideration. Some versions are definitely easier than others.
What weird logic. This implies that the score you receive is equivalent to the number of questions you get correct.
None of that makes sense. Doesn’t account for experimental questions, doesn’t account for adjustments made based off difficulty, assumes 81% is the average score people get by saying the passing score is 60-70%(where did you even come up with that?), and step 1 average is 231 and step 2 has 318 questions
Interesting I thought it had a theoretical max of >300 in the past but was being given these days as 300, similar to how step 1 used to allow more than 280 but is now proctored as 280. Didn't realize it was currently 318. What a weird number.

I do think the ultimate scaling is supposed to represent a value that captures your number correct, roughly. Look for example at how the NBME provided "equated percent correct" scaled scores on the shelves. The ~60-70% pass threshold is from one of their webpages, and I find it way too coincidental that it converts directly over (194/280=69% on Step 1, and 209/318 = 66% on step 2).

It's really easy to account for experimental items. Ready set go: 150/200 valid correct. Scaled: 210/280. Both 75%, one is your real performance and the other is your performance scaled up to fit the number of Qs if they had all been valid. Then also adjust for form differences and there's your score.

Since it's criterion referenced, not norm referenced, there's really no other basis for them to be building their scale. My money says the real meaning of a 250/280 on Step 1 is "based on their performance on this form, we believe this test taker correctly knows the answer for 89% of our valid test item bank"

This is all speculation though! I didn't hack their servers or anything
 
Interesting I thought it had a theoretical max of >300 in the past but was being given these days as 300, similar to how step 1 used to allow more than 280 but is now proctored as 280. Didn't realize it was currently 318. What a weird number.

I do think the ultimate scaling is supposed to represent a value that captures your number correct, roughly. Look for example at how the NBME provided "equated percent correct" scaled scores on the shelves. The ~60-70% pass threshold is from one of their webpages, and I find it way too coincidental that it converts directly over (194/280=69% on Step 1, and 209/318 = 66% on step 2).

It's really easy to account for experimental items. Ready set go: 150/200 valid correct. Scaled: 210/280. Both 75%, one is your real performance and the other is your performance scaled up to fit the number of Qs if they had all been valid. Then also adjust for form differences and there's your score.

Since it's criterion referenced, not norm referenced, there's really no other basis for them to be building their scale. My money says the real meaning of a 250/280 on Step 1 is "based on their performance on this form, we believe this test taker correctly knows the answer for 89% of our valid test item bank"

This is all speculation though! I didn't hack their servers or anything
The question number is weird because there are 6 blocks of 40 questions and 2 blocks of 38 and those blocks have long research study design questions. So I guess actually 316 questions. But yeah I agree I think that it is the expected score of whatever you would get on the valid test items.
 
  • Like
Reactions: 1 user
Top