I hope i don't sound childish, but how is the MCAT standardized between all applicants if the test I took could have been filled with electricity/magnetism stuff and another's was full of harmonics? I clearly remember my latest test being filled with all the physics/chem topics I was weakest at. I'm not complaining, just saying the test is always different.
Also, how do you factor in the people who have to travel or have a situation on the day of the test. I personally had to drive 2 hours away to take mine, leaving early in the morning when I couldn't get much sleep the night before. Not to mention someone very very close to me died two days before my test date. I know people who had to go to different states and stay in a hotel. I really don't think a 5 hour test is representative of my abilities or application. Did I need to study more? Probably, but I only had enough money to take it once and apply broadly. They say it's the great equalizer, but I beg to differ, there's just too many variables. Thank God for DO schools looking past that 5 hour assessment of knowledge I'll likely never need to use again (sans bio). That being said, I'll never forget V = IR.
Well the MCAT (any any other standardized test) is designed with that in mind. This is a hand-wavy explanation (those who know more, feel free to correct me) but essentially, my understanding is that the questions are all tested beforehand as "experimental questions." From here, we can learn roughly how difficult each question is. This "difficulty weight" of each question is then used (in some complicated statistical process) to produce the raw score conversion scale for each section (PS/BS/VR). For example, if harmonics are really that hard of a topic and the PS was 100% harmonics questions, then it's likely that the conversion scale will be more "generous".
Now, you might argue that some people know some topics better than others. This is true but the MCAT tests a variety of topics on any given section. There is inherently some luck involved in this respect but if you only know topics X, Y, and Z, it's extremely unlikely that you will find yourself taking a MCAT that tests only X, Y, and Z.
tl;dr: It is very unlikely for you to fluke your way to a high score just because the content varies from test to test.
As for your second argument about there being "too many variables":
1) Yes, things like travel and sickness matter. But let's just be realistic here. If you have the capability of scoring 40, you are most likely going to score mid-30s at worse barring some extraordinary circumstances (e.g., you have a heart attack or stroke). Conversely, if you can only score 25 on the AAMC practice tests, you are almost certainly not going to score a 40 because you had a good night's sleep.
2) Travel and sickness are variables that adversely affect the utility of the MCAT. However, I would argue that the variables that adversely affect comparisons for something like GPA are maybe worse. The difference in course rigor between majors (or between schools) can be extremely significant but it hard to compensate for in a statistically meaningful way (e.g., should a 3.3 MIT Math/EE major be viewed to be equal to a 3.6 Penn State English major? I really don't really know...) The difference in MCAT rigor between tests is relatively insignificant (and is compensated for by statistically meaningful scaling).
3) MCAT might only be a few hours but that's not the right way to look at it. Like the GPA, the MCAT is the culmination of years worth of classes/knowledge. In fact, you could argue with the VR that the MCAT is the culmination of DECADES of your life (after all, many of the critical reading skills are acquired in MS/HS or earlier - this is why improving VR scores are so hard).