- Joined
- Jun 11, 2010
- Messages
- 74,954
- Reaction score
- 120,277
Wait, what?????. One personal statement about a rotting possum corpse dropped someone to the bottom.
Wait, what?????. One personal statement about a rotting possum corpse dropped someone to the bottom.
And you thought it was only my school!Wait, what?????
Someone gave a very vivid description of a possum decaying in the sun on a hot day. Thankfully, we didn’t end up recruiting Dexter Morgan.Wait, what?????
Come on - you now Dexter would be too smart to put all that on paper 😉Someone gave a very vivid description of a possum decaying in the sun on a hot day. Thankfully, we didn’t end up recruiting Dexter Morgan.
Yes, it's the new normal. Step 2 is the new Step 1.
It was the predictable outcome of making Step 1 P/F.
Now there is only one bite at the apple and it comes too late to change course if there is an unfortunate score.
From a purely psychometric standpoint, it would be reasonable to create a new exam that is designed to stratify. The current step exams are designed to yield a yes/no result around a central question of minimal competency, which is a different goal.A better scenario would be to make all licensing tests pass-fail, then create a "new" test purely for residency stratification purposes that allows for retakes. I put "new" in quotes because, practically speaking, this would just be a Step 2/3 rip-off.
From a purely psychometric standpoint, it would be reasonable to create a new exam that is designed to stratify. The current step exams are designed to yield a yes/no result around a central question of minimal competency, which is a different goal.
Of course, this would require someone having to build and administer this new exam, which would be expensive, time-consuming, and add testing and financial burdens to medical students. The question of whether or not the resulting stratification is actually meaningful would likely persist, as well.
State boards don't care about number of step scores as long as none are failures. Even then, most allow more than 1 fail per step before they care.They provide a score and a percentile, though. I know their original intent was to be criterion-based, but the scoring definitely allows for stratification at this time.
I think the natural answer to this would be the NBME with its decades of experience in building these tests.
The cynic in me says that the NBME just makes a "new" test which is just a Frankenstein of Step 2+3 to rake in more money.
The hopeful in me says that the NBME keeps Step 2 scored, allows for retakes, then the individual state licensing boards just deal with the fact that some people will have multiple Step 2 scores.
State boards don't care about number of step scores as long as none are failures. Even then, most allow more than 1 fail per step before they care.
Here's what my state has to say about it:
For the United States Medical Licensing Examination or the Comprehensive Osteopathic Medical
Licensing Examination, or the Medical Council of Canada Qualifying Examination, the applicant
shall pass all steps within ten years of passing the first taken step. The results of the first three takings
of each step examination must be considered by the board. The board may consider the results from a
fourth taking of any step; however, the applicant has the burden of presenting special and compelling
circumstances why a result from a fourth taking should be considered.
So basically if you haven't passed a Step exam by the 4th try you can't get licensed. But if you take it 6 times to try and get the highest score possible, the state doesn't care as long as you passed one of the first 4 times.
This statement sort of gets at the original point of this thread: the fact that scores and percentiles are generated as a byproduct of a pass/fail exam does not make them meaningful.They provide a score and a percentile, though. I know their original intent was to be criterion-based, but the scoring definitely allows for stratification at this time.
the fact that scores and percentiles are generated as a byproduct of a pass/fail exam does not make them meaningful.
Is that a new thing, because that definitely wasn't the case when I was in med school?Right, but I think they haven't cared because the NBME makes you stop once you pass.
In my mind "Fail -> Pass" (or even "Fail -> Fail -> Pass") is much more straightforward to interpret from a licensing point of view than "Pass -> Fail" ( or "Pass -> Fail -> Pass"). With the former you can say "This person had some learning deficiencies, which they shored up and are now proficient to practice medicine"; with the latter you'd have to say something like "this person achieved proficiency, then... lost proficiency, but they're still good to go anyways."
Like I said, maybe I'm overthinking these rare edge cases.
Yeah you can’t retake a Step exam after you’ve passed it.Is that a new thing, because that definitely wasn't the case when I was in med school?
Is that a new thing, because that definitely wasn't the case when I was in med school?
Is that a new thing, because that definitely wasn't the case when I was in med school?
Must have been right after me then, I'm 15 years out from Step 1 and we could take it again if we wanted (basically no one did though).From the USMLE FAQs:
"If you pass a Step, you are not allowed to retake it, except to comply with certain state board requirements which have been previously approved by USMLE governance."
As to how recent it is--I'm not sure, but this was definitely the policy 10+ years ago when I took them.
Whatever the problem is with the current USMLE system, the answer cannot possibly be "make another high stakes test."I don't think Step 2 could seriously have been considered a "second bite at the apple", especially at the height of Step 1 madness, especially for competitive specialties.
If there's going to be a high-stakes test for residency stratification purposes, I would much rather have that be something similar to Step 2 and 3 than Step 1; so, although I don't think this was the intention of the Step 1 change, I do agree with the result.
A better scenario would be to make all licensing tests pass-fail, then create a "new" test purely for residency stratification purposes that allows for retakes. I put "new" in quotes because, practically speaking, this would just be a Step 2/3 rip-off. Alternatively, we just keep using Step 2, allow for retakes, and let state boards figure out what to do in the rare case where someone gets a passing score on Step 2 followed by a retake where they fail. Perhaps this would be as easy as saying "your most recent Step 2 score must be in the passing range".
Whatever the problem is with the current USMLE system, the answer cannot possibly be "make another high stakes test."
Yeah but presumably you'd spend thousands of dollars taking this specialty-specific exam. I can't imagine the specialty-specific exam would be any sooner than current Step 2 timeframe, so it's not like you could make useful decisions based on that information. And there just aren't enough months before ERAS opens to allow for yet another dedicated study period, plus sub-I, plus away rotations.It's tough to imagine, but if step 1 remains P/F it might be worth it for those aiming for competitive specialties.
Imagine spending thousands of dollars setting up away rotations before taking step 2, and getting a very low score. People who don't match into competitive specialties often forfeit thousands of dollars taking research years to improve their applications.
Yeah but presumably you'd spend thousands of dollars taking this specialty-specific exam. I can't imagine the specialty-specific exam would be any sooner than current Step 2 timeframe, so it's not like you could make useful decisions based on that information. And there just aren't enough months before ERAS opens to allow for yet another dedicated study period, plus sub-I, plus away rotations.
For all of the above reasons I think it is a net negative that Step 1 is now P/F, but now that we are here I think applicants who choose to shoot for a competitive specialty just have to embrace a certain level of risk. If risk makes you uncomfortable, then pick a different specialty. Again--not that I am saying this is by any means FAIR, but I'm not sure there is a good alternative.
Must have been right after me then, I'm 15 years out from Step 1 and we could take it again if we wanted (basically no one did though).
Basically, you're old @VA Hopeful Dr 🤣I took Step 1 summer of 2008. At that time it you couldn’t retake it if you passed. I also don’t recall any kerfluffle about a policy changed and I feel like that’s something that would have been a hot topic of conversation and much complaining from various students. So I suspect the policy has been around since at least 2006.
The MCAT you could take a million times.
Now now, I’m saying I’m equally as old. 😂 2008 puts me at 15 years out too! Not trying to throw shade at my fellow “seasoned” docs. 😂Basically, you're old @VA Hopeful Dr 🤣
Lol, I can't say much--I think mine was 2011 🤣 But by then there was definitely no ambiguity, you only got one shot, and I actually was unaware that at one point you could have retaken the exam if you were a masochist.Now now, I’m saying I’m equally as old. 😂 2008 puts me at 15 years out too! Not trying to throw shade at my fellow “seasoned” docs. 😂
I found a thread here from 2008 saying you couldn't retake it, so I'm guessing I heard wrong at the time.I took Step 1 summer of 2008. At that time it you couldn’t retake it if you passed. I also don’t recall any kerfluffle about a policy change and I feel like that’s something that would have been a hot topic of conversation and much complaining from various students. So I suspect the policy has been around since at least 2006.
The MCAT you could take a million times.
Yeah but you did like eleventy billion PGY years so in doctor years I'm older.Now now, I’m saying I’m equally as old. 😂 2008 puts me at 15 years out too! Not trying to throw shade at my fellow “seasoned” docs. 😂
Nor am I a psychmetrician, so take this with a grain of salt. But, if you're building an exam to assess a minimum level of knowledge, then the basic question for each exam item is "will a minimally competent test-taker get this right?" The exam only needs to be long enough to provide statistical heft to that analysis, and the passing threshold set to minimize false positives.Is the issue that the questions are not good enough to accurately assess knowledge, or is there an issue with the scoring method itself that inherently increases the error in a score?
I admit I don't know enough about standardized testing methodology--I just assumed that the NBME practices were the best we could do given their long history.
Even if what you are saying is correct and score differences are meaningless, how would you concretely suggest PDs stratify a bunch of applicants whose applications are otherwise also very similar? Because that’s what it always comes back to for me—complaining about the system doesn’t help if you don’t have a realistic suggestion for something better.My point was not whether a 250 vs 260 is statistically different, I'm sure it is. My point was that statistically different might mean absolutely nothing if we were to find out the true value. If I were a PD I would not care whether someone got 5 more questions right on a 300 question exam, regardless of its statistical significance, because the real world meaning of that is close to 0. When I go to practice clinical medicine I would use that same principle and not chose a diabetes drug that reduces A1C by an additional .001% when that clearly has no real effect vs other factors of the drug. When we report scores as they are now with percentiles, we are encouraging PD's to draw conclusions about the score that might not be true. Why not report an equated percent correct without percentiles and let them draw their own conclusion? If a PD sees 85% vs 82% and doesn't really care for the difference, doesn't that mean it's silly to show 260 vs 250 and encourage them to conclude the 260 is a superior student because they are 26 percentile places higher?
We need to start showing how bunched up students are in terms of raw performance because 26 percentile places might be very few questions. It's like they're hiding information that would make people take the test less seriously. I am all for standardized testing, but if everyone starts doing well you can't game the system by making some people look bad for being in the 5th percentile even if they are barely doing worse in terms of raw questions correct than someone in the 25th percentile.
Reveal the raw data and let people draw conclusions for themselves.
A score should look like:
260- 80th percentile- In 2022 students who scored a 260 got an average of 270/318 questions correct across all forms
250-54th percentile- In 2022 students who scored a 250 got an average of 255/318 questions correct across all forms
With this data you at least give program directors the chance to say hey I don't really care about an X question difference. Right now they don't have a choice but to blindly trust that higher is better without knowing the true difference.
(I made the raw numbers up as an example)
The USMLE is designed to give medical licensing boards a binary yes/no answer regarding an individual's possession of a minimum level of medical knowledge. It is expressly made for that purpose. All other uses of the score are secondary.Regarding psychometrics as described by @Med Ed, although in general that's true and often argued by the USMLE as a reason not to use scores, it's also not really applicable because the USMLE isn't a test designed to assess minimum knowledge.
This is what I get for posting while sleep deprived!Not to nit pick, but the standard deviation doesn't tell you whether a score of 250 is different from 260. That's approximated by the standard error of measurement, which is much smaller (about 9 I think). And even that doesn't say that scores within 9 points are "indistinguishable" -- unless you want to make that statement with 66+% certainty.
This is a USMLE talking point. They say it over and over. It's simply not true. As I mentioned before, if they want to design a test that really tests minimal knowledge, they should do so by building one that has a minimum pass around 85% of the questions correct, and the most common score would be 100%.The USMLE is designed to give medical licensing boards a binary yes/no answer regarding an individual's possession of a minimum level of medical knowledge. It is expressly made for that purpose. All other uses of the score are secondary.
This is a USMLE talking point. They say it over and over. It's simply not true. As I mentioned before, if they want to design a test that really tests minimal knowledge, they should do so by building one that has a minimum pass around 85% of the questions correct, and the most common score would be 100%.
Put another way: yes, the USMLE uses the test to determine minimum competence. Using it to assess general medical knowledge is a secondary use. But doing so is completely statistically valid. Whether it reflects anything other than ability to pick the right answer on an MCQ test is an open question. But the USMLE stating that programs shouldn't use it because it wasn't designed for that is silly. At least from a psychmetric viewpoint.
I wasn't aware of that. Was there a sharp drop in Step 2 scores after Step 1 became pass/fail?I 100% agree with you!! There is a reason why they made Step 2 much more difficult when Step 1 became P/F.
I think expanding ethics to 15% of the test was a play at adding a CARS like section that can't be as well studied for so to say...
Where is the data on this?I 100% agree with you!! There is a reason why they made Step 2 much more difficult when Step 1 became P/F.
What would be the unintended consequences to such an approach?This is a USMLE talking point. They say it over and over. It's simply not true. As I mentioned before, if they want to design a test that really tests minimal knowledge, they should do so by building one that has a minimum pass around 85% of the questions correct, and the most common score would be 100%.
What statistically valid secondary use is the NBME saying you should avoid?Put another way: yes, the USMLE uses the test to determine minimum competence. Using it to assess general medical knowledge is a secondary use. But doing so is completely statistically valid. Whether it reflects anything other than ability to pick the right answer on an MCQ test is an open question. But the USMLE stating that programs shouldn't use it because it wasn't designed for that is silly. At least from a psychmetric viewpoint.
Good question. The pass rate on Step 2 has actually crept up to 99% for first time MD takers. For DO first takers it's 97%.Where is the data on this?
The issue is that there is something inherently wrong with the exam when used for stratification. The test-to-test variability is incredibly high, especially when compared to aptitude tests like the SATs or MCATs. You can take 10 predictive practice tests and still have a predicted score range of ~30 points (e.g., 250 +/- 15). It's meant to maximize accurate prediction right around a passing score, 209. So when most people are scoring in the 240 range on average, it's already nearing the ceiling of the exam.My suggestion is to include the raw data and let PD's decide for themselves how to interpret it. There's nothing inherently wrong with the exam or the fact that it is used to stratify. I'm not suggesting to make it p/f. My problem is with how scores are reported in a way that encourages conclusions that might not be true.
Is there any good reason not to reveal the raw data?
Is there any proof of this, or is step 2 just harder for people who took step 1 P/F? As someone who took step 1 scored but is now going through rotations with students who took it P/F, I've noticed wildly different study habits in this group of students. I'm not saying that's a bad thing either, because step exams always emphasized the wrong thing (i.e., obscure details over concepts). However, students I work with now are way, way less focused on minutiae and generally operate at a lower level of content mastery. Again, not saying that's a bad thing. This profession has needed to shift away from knowledge and shift towards interpersonal skills, leadership, and business/management for at least 20 years.I 100% agree with you!! There is a reason why they made Step 2 much more difficult when Step 1 became P/F.
I assume you're asking: what would the unintended consequences be if they changed the USMLE to have a raw score of 85% to pass? It would be similar to reporting a pass/fail score only. Fail would remain a negative as it is today. Pass would be uninterpretable other than knowing that you passed. Since most people would get 95-100% of the questions correct, there would be absolutely no discrimination at that level of performance. There would be a slight difference from just P/F, as those scoring 85-95% would likely be considered differently than those scoring >=95%. Perhaps students wouldn't bother studying very much for the exam - similar to concerns raised about S1 being P/F. Is that what you're getting at?What would be the unintended consequences to such an approach?
Again, not sure what you're asking. I'm saying that a USMLE score of 250 shows that you "know more as assessed on an MCQ test" than people with a 240, and those more than those with a 230. The NBME seems to think that I should just treat anyone with a score higher than passing the same? This makes no sense to me at all. Again, I completely agree that a higher score on the USMLE doesn't necessarily predict that someone will be a better doctor/resident. But to state that it doesn't represent anything seems incorrect.What statistically valid secondary use is the NBME saying you should avoid?
Based on what? How certain are we that the SAT doesn;t have ranges like this? And the MCAT has a smaller range because the overall score range is smaller. We can fix that with the USMLE if we want -- simply divide the score by 10 and report that. Round it to a whole number if you wish. Now, scores will range from 16-28, pass will be a 20, and inter test variability will be 1.5. Does that make it better?The test-to-test variability is incredibly high, especially when compared to aptitude tests like the SATs or MCATs.
Who says that these predictive practice tests are actually reflective of the test? Honestly, I think this is the biggest scam of all. The NBME should not be in the business of selling practice exams for it's own high stakes exam. This is all sorts of wrong.You can take 10 predictive practice tests and still have a predicted score range of ~30 points (e.g., 250 +/- 15).
Do you agree that the group of students entering this year's match cycle that have high numeric Step 1 scores (such as those who may have delayed a year for research or other reasons) will have an advantage over those students with Step 1 scores of "PASS"?I'm not certain I understand what you're asking.
I assume you're asking: what would the unintended consequences be if they changed the USMLE to have a raw score of 85% to pass? It would be similar to reporting a pass/fail score only. Fail would remain a negative as it is today. Pass would be uninterpretable other than knowing that you passed. Since most people would get 95-100% of the questions correct, there would be absolutely no discrimination at that level of performance. There would be a slight difference from just P/F, as those scoring 85-95% would likely be considered differently than those scoring >=95%. Perhaps students wouldn't bother studying very much for the exam - similar to concerns raised about S1 being P/F. Is that what you're getting at?
Again, not sure what you're asking. I'm saying that a USMLE score of 250 shows that you "know more as assessed on an MCQ test" than people with a 240, and those more than those with a 230. The NBME seems to think that I should just treat anyone with a score higher than passing the same? This makes no sense to me at all. Again, I completely agree that a higher score on the USMLE doesn't necessarily predict that someone will be a better doctor/resident. But to state that it doesn't represent anything seems incorrect.
Based on what? How certain are we that the SAT doesn;t have ranges like this? And the MCAT has a smaller range because the overall score range is smaller. We can fix that with the USMLE if we want -- simply divide the score by 10 and report that. Round it to a whole number if you wish. Now, scores will range from 16-28, pass will be a 20, and inter test variability will be 1.5. Does that make it better?
Who says that these predictive practice tests are actually reflective of the test? Honestly, I think this is the biggest scam of all. The NBME should not be in the business of selling practice exams for it's own high stakes exam. This is all sorts of wrong.
It’s always baffled me that people actually pretend students with higher board scores don’t know more than students with lower scores.I'm not certain I understand what you're asking.
I assume you're asking: what would the unintended consequences be if they changed the USMLE to have a raw score of 85% to pass? It would be similar to reporting a pass/fail score only. Fail would remain a negative as it is today. Pass would be uninterpretable other than knowing that you passed. Since most people would get 95-100% of the questions correct, there would be absolutely no discrimination at that level of performance. There would be a slight difference from just P/F, as those scoring 85-95% would likely be considered differently than those scoring >=95%. Perhaps students wouldn't bother studying very much for the exam - similar to concerns raised about S1 being P/F. Is that what you're getting at?
Again, not sure what you're asking. I'm saying that a USMLE score of 250 shows that you "know more as assessed on an MCQ test" than people with a 240, and those more than those with a 230. The NBME seems to think that I should just treat anyone with a score higher than passing the same? This makes no sense to me at all. Again, I completely agree that a higher score on the USMLE doesn't necessarily predict that someone will be a better doctor/resident. But to state that it doesn't represent anything seems incorrect.
Based on what? How certain are we that the SAT doesn;t have ranges like this? And the MCAT has a smaller range because the overall score range is smaller. We can fix that with the USMLE if we want -- simply divide the score by 10 and report that. Round it to a whole number if you wish. Now, scores will range from 16-28, pass will be a 20, and inter test variability will be 1.5. Does that make it better?
Who says that these predictive practice tests are actually reflective of the test? Honestly, I think this is the biggest scam of all. The NBME should not be in the business of selling practice exams for it's own high stakes exam. This is all sorts of wrong.
That's not really what they are trying to say. The problem with comparing low and high board scores is that:It’s always baffled me that people actually pretend students with higher board scores don’t know more than students with lower scores.
No. I don't think it will matter much, really. Programs that are focused on USMLE scores will just use Step 2.Do you agree that the group of students entering this year's match cycle that have high numeric Step 1 scores (such as those who may have delayed a year for research or other reasons) will have an advantage over those students with Step 1 scores of "PASS"?
It is true that the standard error of measurement of the USMLE is around 6, so a 240 and 250 "overlap" if you +/- the SE. But on average, the person getting the 250 has a better performance than the person getting the 240. Although it's possible that their actual performance is equal and the person with the 250 just had a "good day" and the 240 had a "bad day", it's more likely that the 250 represents a better performance. Programs are willing to accept much less than a 95% certainty.That's not really what they are trying to say. The problem with comparing low and high board scores is that:
1) There is a range of error so that a 240 vs. 250 are not that different from each other due to the ranges of error overlapping, yet this range represents the 35 - 60 percentile and most residency PD's will absolutely treat those two scores differently
Certainly true, but there's 2-3 years to study for these exams. Theoretically, you're learning all the material all along.2) Certain school curricula have inherent advantages (6 week dedicated period, board study rotations, etc) versus schools where students only get a 2 week dedicated
Agreed. Presumably that's what clinical grades / performance are supposed to measure.3) Boards do not test on other skills, such as history taking, communication skills, teamwork, etc. All of these are important for patient outcomes.
Which is why USMLE should be part of application review.4) Boards are broad, not deep. Amassing a large amount of knowledge needed to do well on Step doesn't necessarily translate to having critical thinking and problem solving skills. A friend of mine was the top bioengineering student in his undergraduate class and got a 521 on the MCAT, yet his step scores are average because he doesn't do well with memorizing every little detail. But he wiped the floor with me on rotations because he is BRILLIANT.
I think their importance is overstated here. Sure, if you get a 203 on S2, your chances of getting ortho are minimal. But for most applicants to most fields, a decent score is all you need. The step score insanity is driven mostly by student neuroticism, not reality.I do agree that boards are more than just a score, and serve as a proxy for work ethic and ability to think and reason and learn information quickly. All of these are important skills, so I am not saying that boards are worthless. But I still strongly believe that there is a lot of issues with how much of a role they play in residency selection.
No. I don't think it will matter much, really. Programs that are focused on USMLE scores will just use Step 2.
Also, there's this theme here on SDN that somehow USMLE scores are the key factor in evaluating applicants. I doubt this is true for most programs. Some fields it likely will have a bigger impact. I expect that for most programs, they may have a score below which they don't invite people, a borderline range where they look at the rest of the application, and a high enough score where the USMLE is no longer a disqualifying feature and the decision to invite is based upon the rest of the application.
It is true that the standard error of measurement of the USMLE is around 6, so a 240 and 250 "overlap" if you +/- the SE. But on average, the person getting the 250 has a better performance than the person getting the 240. Although it's possible that their actual performance is equal and the person with the 250 just had a "good day" and the 240 had a "bad day", it's more likely that the 250 represents a better performance. Programs are willing to accept much less than a 95% certainty.
Certainly true, but there's 2-3 years to study for these exams. Theoretically, you're learning all the material all along.
Agreed. Presumably that's what clinical grades / performance are supposed to measure.
Which is why USMLE should be part of application review.
I think their importance is overstated here. Sure, if you get a 203 on S2, your chances of getting ortho are minimal. But for most applicants to most fields, a decent score is all you need. The step score insanity is driven mostly by student neuroticism, not reality.
So what? Cutoffs exist because it’s too much work to sift through apps otherwise. That doesn’t change unless you limit the amount of apps. And as I discussed previously, there’s very little difference in the majority of apps besides scores.That's not really what they are trying to say. The problem with comparing low and high board scores is that:
1) There is a range of error so that a 240 vs. 250 are not that different from each other due to the ranges of error overlapping, yet this range represents the 35 - 60 percentile and most residency PD's will absolutely treat those two scores differently
2) Certain school curricula have inherent advantages (6 week dedicated period, board study rotations, etc) versus schools where students only get a 2 week dedicated
3) Boards do not test on other skills, such as history taking, communication skills, teamwork, etc. All of these are important for patient outcomes.
4) Boards are broad, not deep. Amassing a large amount of knowledge needed to do well on Step doesn't necessarily translate to having critical thinking and problem solving skills. A friend of mine was the top bioengineering student in his undergraduate class and got a 521 on the MCAT, yet his step scores are average because he doesn't do well with memorizing every little detail. But he wiped the floor with me on rotations because he is BRILLIANT.
I do agree that boards are more than just a score, and serve as a proxy for work ethic and ability to think and reason and learn information quickly. All of these are important skills, so I am not saying that boards are worthless. But I still strongly believe that there is a lot of issues with how much of a role they play in residency selection.