Even comparing from one USMLE examines to another the standard error of estimation for STEP1 is 9 points.
This means that up to a 18 point difference between scores should statistically be considered equal.
Comparing between COMLEX and STEP will likely have even lower precision and likely will not be statistically significant even if it is trending in a certain direction
No, that's not how you should interpret these statistics. That's not what an SEM/SEE means, nor how it should be interpreted. It's a common misconception. Even the USMLE's explanation is lacking.
The unfortunate truth is that every single study I have ever seen comparing USMLE and COMLEX show that the fail cutoff for COMLEX falls below that for USMLE. The lastest study is this one:
A Concordance Study of COMLEX-USA and USMLE Scores . The paper itself didn't include the useful graph, but it's in the attached supplemental materials:
The colored lines are my additions - they are the failing score for COMLEX (400) and USMLE (196). Everyone in the upper right passes both exams. Everyone in the lower left fails both exams.
The problems are the other two quarters. Everyone in the upper left fails COMLEX but passes USMLE - but it appears there are only 3 people who do so. Everyone in the lower right passes COMLEX but fails the USMLE. That's a big swath of people. Including that very unfortunate soul whom appears to have gotten a 750+ on COMLEX yet manage to get a ~185 on USMLE.
This data doesn't tell us why this is happening. The exams test somewhat different material which DO schools may not cover as well. DO students may not be as good test takers. I don't know. But if we consider both the USMLE S1 and COMLEX L1 to be tests of minimum competency, for the same occupation, we have a problem since they don't match. What this data also doesn't tell us is which of these two minima are "right" - perhaps the USMLE cutoff is too high.
This is almost certainly statistically significant given the clustering of the data and the high n. They didn't calculate it in the paper, but it's obvious from looking at the data plots. If the exams were measuring the same thing, the regression line would go through the intersection of those two colored lines.
What is interesting is that the regression isn't straight. If you were to ignore the high and low ends and focus on COMLEX scores between 500-700, that regression line extended out would go right through the fail intersection. But the curve drops off, suggesting that those students who perform below average on the COMLEX, especially those below 450, do "worse than expected" on the USMLE.