Why are Step 2 scores so much higher than Step 1?

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

MedScat

Full Member
5+ Year Member
Joined
Jul 5, 2017
Messages
543
Reaction score
782

Members don't see this ad.
Is step 2 easier than step 1? Do shelf exams prepare you for Step 2? What's different about the two exams? I just got my step 1 score and although I'm happy with my score (scored around average for most regular specialties, not derm or anything like that), I have a terrible fear that I won't be able to keep up the performance for step 2. But I am genuinely curious why people tend to score higher on step 2
 

bobjonesbob

Full Member
7+ Year Member
Joined
Sep 17, 2015
Messages
188
Reaction score
231
You have to compare the percentile not the scaled score. It’s a different test with a different average and a different number of questions. Most people will score near the same percentile as their step 1. On average people do less dedicated board prep for ck than step 1 which probably allows some people to overcompensate and achieve above average score increases.


Sent from my iPhone using SDN
 
  • Like
Reactions: 1 user

longhaul3

Full Member
7+ Year Member
Joined
Feb 29, 2016
Messages
1,117
Reaction score
2,269
Scoring on the exams is not directly comparable.

I thought the actual exam was a little harder than Step 1, but when I got my score back I was shocked at how high it was given how I thought I did. It was higher than my Step 1 score and I took it without studying on my one day at home during a week I went on 3 interviews. I would guess that the scoring is rather more generous on CK.

The content for CK is basically all the shelf exams combined, so you will have seen it all and studied it all before. This definitely makes it possible to just review a little bit before you take it and go for it. Most people I know studied part-time for 1-10 days some time during fourth year and took it.

If you work hard and pay attention on your clinical rotations (including shelf studying) I predict you will be just fine.
 
  • Like
Reactions: 1 users

frenchyn

Full Member
7+ Year Member
Joined
Dec 10, 2012
Messages
650
Reaction score
788
Average step 1 is 230 something and step 2 is 240 something...scoring higher on step 2 does not necessarily mean doing better...look at your percentile like other posts mentioned...you can find percentile easily by google.
 

7331poas

Full Member
7+ Year Member
Joined
Jun 17, 2015
Messages
3,043
Reaction score
4,018
step 2 is an easier, and lower quality exam. You can pretty much just wing it and get a decent score since no one was particularly "trying" to get good scores in previous years, decades.
 
  • Like
Reactions: 1 user

sharkbyte

Take me to the top
7+ Year Member
Joined
Dec 27, 2013
Messages
1,323
Reaction score
1,565
The scales are different for them. Most people will see a bump on Step 2. My Step 2 score was 13 points higher than my Step 1 score but the percentiles were roughly the same, with Step 2 being maybe 3-4 points higher.
 

akinetopsia

some dude
15+ Year Member
Joined
Feb 17, 2008
Messages
547
Reaction score
268
Also, for reference, the passing score for Step 1 is 194 and the passing score for Step 2 CK is 209.
 

efle

not an elf
7+ Year Member
Joined
Apr 6, 2014
Messages
14,141
Reaction score
22,743
Step 2 has about 20 Qs more (300 vs 280)

The bare minimum test taker has to get ~60-70% to pass, and the high end are scoring near 100%, so let's just ballpark the typical tester as low 80s.

81% of 280 on Step 1= 226.8 (actual avg is 226)
81% of 300 on Step 2= 243 (actual avg is 243)


So there you go. The gap in scores can be explained by the additional Qs, if performance is constant.
 
  • Dislike
Reactions: 1 user

Sriddymopboi

Full Member
2+ Year Member
Joined
May 2, 2020
Messages
60
Reaction score
75
Step 2 has about 20 Qs more (300 vs 280)

The bare minimum test taker has to get ~60-70% to pass, and the high end are scoring near 100%, so let's just ballpark the typical tester as low 80s.

81% of 280 on Step 1= 226.8 (actual avg is 226)
81% of 300 on Step 2= 243 (actual avg is 243)


So there you go. The gap in scores can be explained by the additional Qs, if performance is constant.
This is entirely wrong. It's based off of percentiles. NBME has never explained why the scores are actually different and you're equating percent correct to the overall score.
 
  • Like
Reactions: 1 users

efle

not an elf
7+ Year Member
Joined
Apr 6, 2014
Messages
14,141
Reaction score
22,743
This is entirely wrong. It's based off of percentiles. NBME has never explained why the scores are actually different and you're equating percent correct to the overall score.
You've got a lot of reading to do. The step exams are criterion-referenced, not norm-referenced, so they're actually probably the first major standardized exam you've taken where percentiles are not the basis of the scaled numerical. Dr Carmody has a nice blog that details the history and design of the exam if you wanted to read a few articles.
 
  • Haha
Reactions: 1 user

kb1900

Full Member
7+ Year Member
Joined
Oct 4, 2015
Messages
1,516
Reaction score
3,096
Step 2 has about 20 Qs more (300 vs 280)

The bare minimum test taker has to get ~60-70% to pass, and the high end are scoring near 100%, so let's just ballpark the typical tester as low 80s.

81% of 280 on Step 1= 226.8 (actual avg is 226)
81% of 300 on Step 2= 243 (actual avg is 243)


So there you go. The gap in scores can be explained by the additional Qs, if performance is constant.
I was under the impression that 20+% of the questions are experimental (I.e. unscored). Dont think that goes against the point of ur post, but the numbers
 
  • Like
Reactions: 1 user

efle

not an elf
7+ Year Member
Joined
Apr 6, 2014
Messages
14,141
Reaction score
22,743
I was under the impression that 20+% of the questions are experimental (I.e. unscored). Dont think that goes against the point of ur post, but the numbers
Yeah it actually turns out a full 80 questions on Step 1 are experimental, that was accidentally revealed by the USMLE when they described their plan to drop those questions and give a 200 item test. Makes sense why nobody can accurately predict how they did at the end of test day, when nearly a third of items are unscored.

So I'm sure there's a series of small adjustments, like stretching your percent on the 200 validated items to fit the 280 point scale, or slight adjustments to align different forms' difficulty.
 
  • Like
Reactions: 1 users
Members don't see this ad :)

Lawpy

42% Full Member
7+ Year Member
SDN Ambassador
Joined
Jun 17, 2014
Messages
62,565
Reaction score
154,439
You've got a lot of reading to do. The step exams are criterion-referenced, not norm-referenced, so they're actually probably the first major standardized exam you've taken where percentiles are not the basis of the scaled numerical. Dr Carmody has a nice blog that details the history and design of the exam if you wanted to read a few articles.

That's why i was trying to push Steps to be more like the MCAT in design but got shot down because Steps should be used for competency and nothing else :(
 

ciestar

All grown up!
7+ Year Member
Joined
Sep 18, 2013
Messages
8,164
Reaction score
11,596
I honestly felt better prepared for step 2. All those shelf exams you take all year allow you to prep well for the entire year. I jumped >30 points on CK from step 1
 
  • Like
Reactions: 1 user

bobjonesbob

Full Member
7+ Year Member
Joined
Sep 17, 2015
Messages
188
Reaction score
231
Step 2 has about 20 Qs more (300 vs 280)

The bare minimum test taker has to get ~60-70% to pass, and the high end are scoring near 100%, so let's just ballpark the typical tester as low 80s.

81% of 280 on Step 1= 226.8 (actual avg is 226)
81% of 300 on Step 2= 243 (actual avg is 243)


So there you go. The gap in scores can be explained by the additional Qs, if performance is constant.

Step 2 has 318 questions


Sent from my iPhone using SDN
 

7331poas

Full Member
7+ Year Member
Joined
Jun 17, 2015
Messages
3,043
Reaction score
4,018
Step 2 has about 20 Qs more (300 vs 280)

The bare minimum test taker has to get ~60-70% to pass, and the high end are scoring near 100%, so let's just ballpark the typical tester as low 80s.

81% of 280 on Step 1= 226.8 (actual avg is 226)
81% of 300 on Step 2= 243 (actual avg is 243)


So there you go. The gap in scores can be explained by the additional Qs, if performance is constant.

What weird logic. This implies that the score you receive is equivalent to the number of questions you get correct.
 
  • Like
Reactions: 1 users

MavFab

Full Member
5+ Year Member
Joined
Oct 17, 2016
Messages
97
Reaction score
144
What weird logic. This implies that the score you receive is equivalent to the number of questions you get correct.

It must be more nuanced than this because test difficulty has to be taken into consideration. Some versions are definitely easier than others.
 
  • Like
Reactions: 1 user

redsox93

Full Member
5+ Year Member
Joined
Aug 23, 2016
Messages
1,490
Reaction score
3,286
Step 2 has about 20 Qs more (300 vs 280)

The bare minimum test taker has to get ~60-70% to pass, and the high end are scoring near 100%, so let's just ballpark the typical tester as low 80s.

81% of 280 on Step 1= 226.8 (actual avg is 226)
81% of 300 on Step 2= 243 (actual avg is 243)


So there you go. The gap in scores can be explained by the additional Qs, if performance is constant.
None of that makes sense. Doesn’t account for experimental questions, doesn’t account for adjustments made based off difficulty, assumes 81% is the average score people get by saying the passing score is 60-70%(where did you even come up with that?), and step 1 average is 231 and step 2 has 318 questions
 
  • Like
Reactions: 1 user

efle

not an elf
7+ Year Member
Joined
Apr 6, 2014
Messages
14,141
Reaction score
22,743
Step 2 has 318 questions


Sent from my iPhone using SDN
It must be more nuanced than this because test difficulty has to be taken into consideration. Some versions are definitely easier than others.
What weird logic. This implies that the score you receive is equivalent to the number of questions you get correct.
None of that makes sense. Doesn’t account for experimental questions, doesn’t account for adjustments made based off difficulty, assumes 81% is the average score people get by saying the passing score is 60-70%(where did you even come up with that?), and step 1 average is 231 and step 2 has 318 questions
Interesting I thought it had a theoretical max of >300 in the past but was being given these days as 300, similar to how step 1 used to allow more than 280 but is now proctored as 280. Didn't realize it was currently 318. What a weird number.

I do think the ultimate scaling is supposed to represent a value that captures your number correct, roughly. Look for example at how the NBME provided "equated percent correct" scaled scores on the shelves. The ~60-70% pass threshold is from one of their webpages, and I find it way too coincidental that it converts directly over (194/280=69% on Step 1, and 209/318 = 66% on step 2).

It's really easy to account for experimental items. Ready set go: 150/200 valid correct. Scaled: 210/280. Both 75%, one is your real performance and the other is your performance scaled up to fit the number of Qs if they had all been valid. Then also adjust for form differences and there's your score.

Since it's criterion referenced, not norm referenced, there's really no other basis for them to be building their scale. My money says the real meaning of a 250/280 on Step 1 is "based on their performance on this form, we believe this test taker correctly knows the answer for 89% of our valid test item bank"

This is all speculation though! I didn't hack their servers or anything
 

redsox93

Full Member
5+ Year Member
Joined
Aug 23, 2016
Messages
1,490
Reaction score
3,286
Interesting I thought it had a theoretical max of >300 in the past but was being given these days as 300, similar to how step 1 used to allow more than 280 but is now proctored as 280. Didn't realize it was currently 318. What a weird number.

I do think the ultimate scaling is supposed to represent a value that captures your number correct, roughly. Look for example at how the NBME provided "equated percent correct" scaled scores on the shelves. The ~60-70% pass threshold is from one of their webpages, and I find it way too coincidental that it converts directly over (194/280=69% on Step 1, and 209/318 = 66% on step 2).

It's really easy to account for experimental items. Ready set go: 150/200 valid correct. Scaled: 210/280. Both 75%, one is your real performance and the other is your performance scaled up to fit the number of Qs if they had all been valid. Then also adjust for form differences and there's your score.

Since it's criterion referenced, not norm referenced, there's really no other basis for them to be building their scale. My money says the real meaning of a 250/280 on Step 1 is "based on their performance on this form, we believe this test taker correctly knows the answer for 89% of our valid test item bank"

This is all speculation though! I didn't hack their servers or anything
The question number is weird because there are 6 blocks of 40 questions and 2 blocks of 38 and those blocks have long research study design questions. So I guess actually 316 questions. But yeah I agree I think that it is the expected score of whatever you would get on the valid test items.
 
  • Like
Reactions: 1 user
Top