How do they grade this beast of a test?

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

VijayEM

Junior Member
7+ Year Member
15+ Year Member
20+ Year Member
Joined
Nov 29, 2001
Messages
20
Reaction score
0
Does anyone have any idea on how they grade Step 1? Because, when I took the test I seriously guessed on MOST of the questions. Honestly........I'm not just saying that to say it. And when i got the score back I passed and did a little better than passing also.

Many of my friends also guessed on the majority of questions and passed. What do they do in the 3 weeks they grade this thing? I mean, it's computer based, should be done quickly.

Members don't see this ad.
 
VijayEM said:
Does anyone have any idea on how they grade Step 1? Because, when I took the test I seriously guessed on MOST of the questions. Honestly........I'm not just saying that to say it. And when i got the score back I passed and did a little better than passing also.

Many of my friends also guessed on the majority of questions and passed. What do they do in the 3 weeks they grade this thing? I mean, it's computer based, should be done quickly.

I wonder about that too. When I took it, there are about 15 questions in each block that I'm not sure about. I still waiting for my score.

I think I read somewhere that they compare your score to the past 4 June administation of the exam.
 
PhillyGuy said:
I wonder about that too. When I took it, there are about 15 questions in each block that I'm not sure about. I still waiting for my score.

I had way more than 15 in each block man. I didn't even count.......but still ended up passing.
 
VijayEM said:
Does anyone have any idea on how they grade Step 1? Because, when I took the test I seriously guessed on MOST of the questions. Honestly........I'm not just saying that to say it. And when i got the score back I passed and did a little better than passing also.

Many of my friends also guessed on the majority of questions and passed. What do they do in the 3 weeks they grade this thing? I mean, it's computer based, should be done quickly.

They grade it like this:

For EACH question they have a little database that tells them what % of students got it correctly.

They then scale the "points" you get towards the final score based on this percentage so answering a few "hard" questions correctly is better that answering a few "easy" questions.

The then scale your score so that the average student gets about 213-218, and the SD is 23? I believe or so.

etc, etc.

The software and that does this is obviously very secure and proprietary.
 
Members don't see this ad :)
so does that mean it matters the day you take the test and who you're taking it with OR is it compared to all test takers that month/year? and how do you know this?
 
The questions you take have been seen by others many times before (that's why every step 1 exam has a "trial" block of questions, so the NBME can develop means for the questions before they begin administering them for real). If I had to guess I'd say each question is compared to the mean as a whole with a marked emphasis on recent takers. Now what "recent" turns out to be is anybody's guess...
 
LanceArmstrong said:
They grade it like this:

For EACH question they have a little database that tells them what % of students got it correctly.

They then scale the "points" you get towards the final score based on this percentage so answering a few "hard" questions correctly is better that answering a few "easy" questions.

The then scale your score so that the average student gets about 213-218, and the SD is 23? I believe or so.

etc, etc.

The software and that does this is obviously very secure and proprietary.



LanceA. where did you get this information? can you please show the source, ie website, publication.



consider the following scenario:

a newly adopted question with 50/100 people answering correctly.
Vs. an old question with 15000/20000 people answering correctly.
Now, how do you interpret the "value" of each question?

if the board exam continues to use the 'newly' adopted question, the value of that question might change, ie from 50/100 = 50% value to 75/125 = 60%.

Now how would it be fair to someone if one gets only 50% of the value for a correct answer in 2003 and someone who got it right in 2006 gets 60% for that same correct answer. Of course, my example illustrates only small difference in the total score. But, i hope you would realize that this kind of scoring system is not consistent.
 
Interesting idea, but I think you took it in the wrong direction. The value would be inversely proportional to the number of people who got it correct. In other words, questions that are older would be decreasing in value as more people got it right. This would mean reused questions would be less valued in 2006 than in 2003, assuming that more people had the opportunity to see it beforehand and studied the material. This methoid would actually allow the test to self-correct for question/idea reuse.

osler said:
LanceA. where did you get this information? can you please show the source, ie website, publication.



consider the following scenario:

a newly adopted question with 50/100 people answering correctly.
Vs. an old question with 15000/20000 people answering correctly.
Now, how do you interpret the "value" of each question?

if the board exam continues to use the 'newly' adopted question, the value of that question might change, ie from 50/100 = 50% value to 75/125 = 60%.

Now how would it be fair to someone if one gets only 50% of the value for a correct answer in 2003 and someone who got it right in 2006 gets 60% for that same correct answer. Of course, my example illustrates only small difference in the total score. But, i hope you would realize that this kind of scoring system is not consistent.
 
Supposedly 60-70% is passing, so if you're "not sure" on 15 questions in each block, you're doing pretty well.
 
AlternateSome1 said:
Interesting idea, but I think you took it in the wrong direction. The value would be inversely proportional to the number of people who got it correct. In other words, questions that are older would be decreasing in value as more people got it right. This would mean reused questions would be less valued in 2006 than in 2003, assuming that more people had the opportunity to see it beforehand and studied the material. This methoid would actually allow the test to self-correct for question/idea reuse.


Very good point!
 
(nicedream) said:
Supposedly 60-70% is passing, so if you're "not sure" on 15 questions in each block, you're doing pretty well.


Maybe 60, but I doubt you have to get anywhere close to 70% to just pass. There is no way I got OVER 60% of the questions correct on my exam, let alone getting to the 60% mark.
 
VijayEM said:
Maybe 60, but I doubt you have to get anywhere close to 70% to just pass. There is no way I got OVER 60% of the questions correct on my exam, let alone getting to the 60% mark.

Just quoting FA.
 
PhillyGuy said:
I thought FA said that if you get 70%, then you are well above average.

That's what Kaplan says about Qbank. 2006 FA says in the past answering 60-70% of Step I correctly has equated to passing.
For Qbank Kaplan says:

>70=high score
65-70=high pass
60-65=pass
55-60=low pass
<55=danger of failure
 
(nicedream) said:
That's what Kaplan says about Qbank. 2006 FA says in the past answering 60-70% of Step I correctly has equated to passing.


of course, that's still a guess also. just like our guess of nbme grading with a program that sees how many others got it right.

I'm more for the latter, since that is the only way to make the test somewhat "fair". how else can u make a test that gives out totally different sets of questions to people "fair?"
 
VijayEM said:
how else can u make a test that gives out totally different sets of questions to people "fair?"

Although I know nothing about how the actual scoring system works, I've wondered if they might just throw out 50 experimental questions and give you the # of remaining questions correct as your 3-digit score.

If you take random samples of 300 questions from the large bank of available questions, the tests should not vary substantially in overall difficulty. It would be like surveying 300 random people -- you usually get a representative slice of the population. Yes, you might score a few points higher or lower depending on the test you get, but the purpose of the test is not to provide a precise way of differentiating between 2 students with slightly different ability. It is to determine if a student is likely to possess what is considered the acceptable minimum, for which a random sampling should be about as effective as any more complicated method.

Is there any official information about how the process is done? I've seen a lot of speculation (this post included) and hearsay-repeated-as-fact (a common SDN theme) around here but nothing authoritative.
 
imo they randomly assign each student a number/score 😀

seriously, this is what they do.
 
Top