Are we talking about an evaluation of a medical student (who is getting a grade) or a resident?
The problem with both sets of evaluations is if you read the descriptions or just grade by the numbers blindly.
With regards to resident evaluations, when the ACGME came out with milestones, they came out with examples of what they thought evals should look like for the individual milestones they wanted data on. Those evaluations (at least for internal medicine) up looking something like this:
Now, that's a simple 5 point scale (though it's effectively the same as the old 9-point scale given the half points). One could easily imaging reading the descriptions and then picking based on the description... except that it's very hard to do with how it's formatted. So the way that the descriptions are, 1/5 is a critical deficiency, 4/5 is "ready for unsupervised practice", and 5/5 is a descriptor of an experienced attending (what we should aspire to). So, one can imagine, in a three year field, the ACGME expects interns to be somewhere around a 2/5, pgy2s to be a 3/5, and towards the end of pgy3, you should be 4/5 (with each person being above 4 in some categories, but certainly not all of them). And those are the instructions they gave out.
Great.
Except read the actual long description of 2/5. Someone who is unable to safely practice (but doesn't flat out disregard the need for signout). Is that really an appropriate descriptor of most interns? I hope not. Obviously they aren't able to safely practice *on their own*, but by reading the plan English of the descriptions, your typical intern that isn't at the bottom of his class probably starts out (for this milestone) at 3/5. But by ACGME expectations, they should average a 2/5. So you either inflate the numbers by reading the descriptions, or you just grade by "average intern gets a 2, average second year gets a 3, etc". It becomes super inconsistent depending on who is grading you, and if you're evaluating a resident, I don't think it's abnormal to have a mix of some goods (~4?) and some excellents (~5). It's not like the eval is more for than the resident's personal growth.
If you read
the actual paper that figure came from, the authors say that this system leads to a lot more granularity showing progression from pgy1->2->3... but that they don't know if that simply reflects PDs implementing the ACGME's suggestion that interns get a 2, etc. Of *course* it does.
----------
On the other hand, student evals are MUCH more important than resident evals (to the student): They get graded. And they get graded relative to their peers. The same problem can easily apply with some people inflating/deflating scores based on descriptions, others just giving scores based on numbers. And the even bigger problem is that medical student grades are commonly super inflated relative to the descriptions. When I was a student, we had evaluations based on the same form the residencies were doing: There isn't a student on the planet that should get a 5/5 on the above form (or a 9/9 on the old one). The descriptions just don't make any sense in that context, it would be like saying this student is the equivalent of our chairman when it comes to clinical practice. Except that students were routinely getting all 8-9s down the line if they were an above average student with some graders. Why? Because people just looked at the numbers, didn't read the descriptions, and decided that the average should be somewhere around 7/9 (thats 77%, right?), and if you were an "above average" student, they'd give you an 8 or a 9. The problem was other graders would honestly try to implement the descriptions, thinking that giving a student a 7 was a major complement (and it would be, based on the descriptions).
Basically, if you truly think a student is above average (at least at the school I went to), you might have to give them the maximum rating most of the way down, because that's one of the only ways they would actually get an advantage for their final grade.