This may be a lazy observation, but my assumption is that people who make these tests have decided that the 93-96% pass rate seems to be ideal. It has nothing to do with being competent or having a certain level of comprehension of the material. PDs never looked COMLEX the same way they looked at STEP, thus that arms race never happened. People try harder on STEP thus the threshold to pass has to go up. Also I'm aware the populations are different, but have no idea how to factor that in. STEP has Harvard grads and Caribbean grads. COMLEX is only DO.
It's possible that the USMLE is targeting a 5% failure rate. That would be a norm referenced cut off -- define the mean and SD of the exam, then set the minimum pass at mean - 1.65 * SD (which is the 5th percentile). Agreed this is super lazy.
The other option is that they systematically determine a criterion based cutoff (using experts, there are all sorts of ways to do this) and then determine the minimum pass from that. Any criterion based system is open to bias of all sorts. Assuming that applicants to medical school "smarts" (at least as measured on a MCQ exam) remains stable over time, then the same percentage of people would be expected to fail over time.
What we are missing are raw scores. If the raw score to pass has gone up over time, then one explanation is that the testing system has become more strict over time to ensure the fail rate remains steady. There are other explanations -- such as that society demands more from physicians today than in the past (hence higher cutoffs). Or the nature / content of the exam has changed such that higher raw scores would be expected.
From my perspective, the most likely explanation for 80% of this phenomenon is just that people took their eye of the ball when the test went P/F. The next class will see this train wreck and make different choices. After about three cohorts we'll have a good idea of what the "new normal" looks like. Until then everyone just needs to take some deep breaths.
Interesting given that those who run the USMLE mentioned this possibility, and there was widespread condemnation about the statement.
True but don’t most PDs only look at USMLE anyways so it doesn’t really matter if the complex exam is easier?
It's COMLEX. Not all PD's ignore it.
I don’t think it’s hard but many students disagree. A 196 pass threshold is too close to 200s for an exam with large standard errors. That’s why i kept pushing to lower the pass threshold to 180 so that people scoring in the 200s-210s can feel comfortable that they’ll pass
Given the current nature of the exam. the mean is 228 with a SD of 18 or so. A 180 is a z score of (228-180)/18 = -2.67. That equates to a percentile of 0.38%. If the fail rate is that low, then there's almost no point in using the exam at all.
I really think scoring a 215 approximates to foundational understanding of phys, path, micro, pharm, stats, biochem, genetics. However, I've seen the argument made that the confidence interval for step1 is +/- 20, which puts one at risk for failing.
The confidence interval of the exam is not 20. The Standard Deviation of the distribution of exam scores is about 20. This describes the distribution of everyone's score, not the accuracy of any one score.
The metric you're (usually) looking for is the Standard Error of Measurement. This is a "standard deviation" measure of the accuracy of a test result. The SEM of Step 1 is 6. Therefore, if someone scores a 215, their "true score" is somewhere 215 +/- 12 95% of the time.
And it can get more complicated. You can calculate a Standard Error of the Estimate which helps answer: Given a score on an exam, what is the likely range of scores if the exam (with new items) is given to the same person with the same knowledge. This can get messy quickly as the confidence intervals on that are often asymmetric (due to regression to the mean, a repeat result is more likely to be between the given result and the mean rather than the other way). This quickly exhausts my statistical knowledge, and might be wrong -- anyone is free to chime in.
But I'm 100% certain you can't use the SD in the way you suggest. The actual accuracy of the test is much better.