Based on what? How certain are we that the SAT doesn;t have ranges like this? And the MCAT has a smaller range because the overall score range is smaller. We can fix that with the USMLE if we want -- simply divide the score by 10 and report that. Round it to a whole number if you wish. Now, scores will range from 16-28, pass will be a 20, and inter test variability will be 1.5. Does that make it better?
On what earth am I saying that the absolute size of the score range is indicative of inter-test variability?
Obviously absolute range based on an arbitrary scoring scale is irrelevant to inter-test variability. The USMLE very likely has
higher % variability within the relevant scoring range.
The AAMC has done actual studies for the MCAT which predict +/- 1 on each section (~7% of the relevant range, buffered over four sections for smaller overall variability). That's actually quite tight and matches my anecdotal experience. When I took it, going from a 37 on practice tests to a 33 on the real deal was rare. Sure, some 38 hopefuls settled for 35s, and some people with 33 averages managed 35s, but people fell within the range you'd expect,
especially in the more common 26-34 score range. You could safely bin test takers into at least 6-7 meaningful score ranges.
The NBME does not do these sorts of studies, so we can't say for sure how much scores vary. However, the inability of multiple companies, including the NBME, to reliably predict your score speaks volumes about the overall quality control. Also, anecdotes of wild score variations (vs. practice tests, preclinical grades, etc...) were
far more rampant with the step exams. There were countless stories of people averaging in the 220s on NBMEs and UWorld SAs who wound up with a score in the 250s and vice-versa. A friend of mine scored a 250 on her last practice exam and got a 199 on the real deal. Step exams definitely
are notoriously weird and variable. You can probably safely bin 99% of test takers into 3 groups, but further stratification would be meaningless.
A 32 and a 35 were treated very differently on the old mcat and that could literally be 3 questions. Similarly, a 508 and 512 were treated differently when I applied and that could be the difference between 4 questions. People repeat years in med school because they failed by a question or two. A line has to be drawn somewhere.
Eh, a 32 and a 35 were pretty different scores, and it definitely wasn't a 3 question difference. If people made big jumps, it was towards the top end of the test where there's more variability. Plenty of people went from 38 to 35 or from 36 to 39. In the 26-34 range, scores were pretty tight. Statistically all 39+ scores were indistinguishable, and 36-38 were pretty darn close, so you would see a lot of variability there. Step exams basically have the same issue near the top of their range. However, because they are designed to maximize predictability near 209 and not near 245, tons of test takers are near the top of the range and bouncing around wildly.