Soooooooooooo. What you are saying is people have different "versions" of the test, but their scores have no relation to the difficulty/content of the test they took. I used "versions" loosely because one C/P version may have 4 O.Chem passages and 0 biochem while another has the complete opposite. Using your drugs analogy, basically, you are giving people drug X and try to assess their responses. The caveat is that there are, say, 20 ingredients in drug X and you can only pack around 6 with random proportions. Each cohort is given a different version. But you never analyze individual cohort but group all of them.
Now you go to the tail end of an arbitrarily defined reactions scale, then arbitrarily bin them in random percentile and declare that there is a difference between the 99% tile and 95-98%tile.
Analogy incorrect. More like 20 ingredients in drug X but each of those ingredients has an alternative that is supposed to do the same thing (maybe in a slightly different way). Just like how each passage is supposed to measure similar things - i.e. content knowledge, reasoning from the data, reasoning beyond the text, etc. The only case where your analogy would make sense is for the limited discrete questions you get on the exam, which may not all measure the same thing across exams (although I would argue that if one is prepared for all the topics on the exam, then it doesn't matter if one gets an OChem discrete or a biochem discrete).
Also, scores have very much to do with difficulty of the version of exam. Just not in the way you think. Scores are curved based on the difficulty of that version (based on a set curve). But your scores should not be affected by you getting four passages on topics you're "weak" on because you shouldn't be "weak" in any topics. That's on you. You should gain an adequate understanding to do well especially in your weak subjects. As I said above, I'd put that at 7/10 personally.
There is at least a statistical difference between the 99th+ percentile and 95th percentile. From the curve, a 99th+ percentile score corresponds to 523 and up whereas a 95th percentile score corresponds to a 516. Those confidence intervals do not overlap. That's not a matter of opinion.
What is a matter of opinion is whether I think there are finer distinctions between individuals who have overlapping confidence intervals. I do.
How did you magically conjure up those bins? I just asked you what caused the fluctuation, I couldn't care less about variability within a data set of many test takers (I don't think you took the MCAT more than once right? Or enough to make what may resemble a data set and therefore the confusion). Therefore, why did not you score 528? And in the versions and you scored below 526 why did you not score 526 then? Very simple questions with similarly simple answer, starting with "R."- wink-
Like I said, I do not believe that if one goes into the exam and takes it multiple times, conditions
ceteris paribus, that one would get more than a 1-point deviation in each subsection. That is the only
luck or "randomness" I would allow for, as I said above to another poster. You're interested in fluctuations in score. I'm not. I'm interested in whether there is any randomness with scoring 520+. As in, is a person scoring 520+ actually different from someone in the 515-520 range or is it just "luck"? I'm not arguing that there's no
statistical randomness associated with measuring someone's real MCAT score. But there's nothing random that put them there as opposed to the next lowest level.
No. I did not have adequate depth, I am beyond deep in every single all the topics in the outline. That was how I know if a word, not even a topic, a word, is not covered. Speak with other high scorers? You cannot seriously think that is a good idea. What do you want me to do? Retake the test to score higher than 520 and declare myself lucky? And stop with the data analysis man, those are easy.
I believe that if one has adequate reasoning abilities, one can go into much less depth and still do very well on the exam. As I said, the way you can usually tell is by any discrepancy between CARS and other subsection scores. If someone needs to go into excessive depth to do well, then that person is relying more on the small details getting them to a high score than their reasoning abilities. That shows because on CARS, there's no detail to know. There's only reasoning.
I don't know man. Since I know both of us has absolutely concrete evident in hand your guess is as good as mine. But let's go a bit deeper. That was what I said and what you dodged:
If you look at the 130-132 bins of every single section you will notice that the height of each of them is about 50%-33% with respect to the previous bin. Does that mean I was right that to get +520 you had to rely on luck? Of course not. I need more data than that. BUT at least, I have some numerical basis to lean on.
I believe this has more to do with your statistical and mathematical reasoning than me. I ignored it because it didn't make sense and here's why it didn't make sense. The first problem is that if the 50% and 33% chance of getting a question right has a direct relationship with getting a 130 vs. 131 or 131 vs. 132, then one would expect each one of those successive bins to be equal in height to or half the height of the previous bin, respectively - that is, 100-50%
not 50-33%. For example, 100 people are stuck between 131 and 132. Imagine that the score hinges on one question and all of those people have a 50% chance of getting that right. Then one would expect 50 to get it right and 50 to get it wrong. So the first 50 would get the 132 whereas the last 50 would get a 131. The height of those bins would be the same. Now imagine the same scenario but with a 33% chance of getting the question right. You would expect about 33 people to get it right and 67 to get it wrong. Therefore, 33 people would get the 132 and 67 would get a 131. The size of the 132 bin would be half that of the 131 bin.
Second, the above even gives you the benefit of the doubt that there's a direct, 1:1 relationship between getting a question right and scoring that extra point on the MCAT. The 50% drop in height of the bins from 130-132 with each successive point in the
histogram only tells you that 50% fewer
test takers score in that bin. That is, there are fewer and fewer people getting those scores. The 50% chance of getting a question correct after eliminating two (or two-and-a-half) answer choices would correlate directly with the drop in frequency of people getting into those top bins only if scoring on the MCAT did
not take into account the curve (by curve, I mean the standardized curve that is already set, lest anybody think I'm referring to a curve that's set the day of). If it turns out that only 5-10% of test-takers get a question right (let's factor in also all the people who cannot rule out an answer choice), then that question won't be given much weight by the AAMC - or, rather, there will be more leeway on that version of the exam. So there's no 1:1 relationship between bin size and each successive point on the MCAT past 130.
Third, how much weight a question is given (reflected in the leeway allowed in a particular version of the exam) is based not on how many high-scorers get it right but rather on how many people who take the exam get it right. So if a high scorer can eliminate, say, 2 answer choices out of 5, that person now has a 33% chance of guessing correctly. But if a high-scorer only has a 33% chance of getting it right, then what does an average test-taker? 15%? 10%? And what about the other half of people on the other side of the bell curve? 5%? 1%? So a tough discrete on which a high-scorer would have a 33% chance might actually only be answered correctly 5% of the time in the general test-taking population and thus not be accorded much weight by the AAMC. So then your score becomes driven even more by reasoning ability and less by content knowledge.
So no, you actually don't have good numerical bases to lean on.
As to your CARS thing, geez I don't know. How do you explain 126/132/127/128 -> retake 128/128/130/130. Did this person suddenly lose his "reasoning ability" that "required to put them there in their four years of undergraduate education"?
I wonder what the data for CARS look like if we only look at test taker with humanity majors? I think that the curve will shift considerably to the right. Do you wanna bet? What about CARS data for only Canadians who must rely on CARS to get into med school? I think the same will happen. Wanna bet?
No, but that person probably wasn't taking the test under conditions
ceteris paribus the second time around. Because that's a rather large change in CARS score. There's a reason why there exists a consensus that CARS score is most difficult to change - and that applies in both directions.
I said that someone with weak reasoning ability would have that exposed in the CARS section even if they do well in the science sections. The reverse doesn't necessarily hold. In other words, just because they do well in CARS doesn't mean that they can reason
scientifically, much less have the content knowledge at adequate depth to perform that well in the sciences.
And finally, to answer your question about my own score, I did not score a 528 because although my reasoning abilities were good enough for the exam, my content knowledge in B/BC was only 6/10 at best and my P/S knowledge was 7/10. My C/P knowledge was 9-10/10 because I'm a graduate student in chemistry (the 9 is because I'm not the best at some physics). I had better uses for my time other than going over more content so that I could get a couple extra points at the top end of the scale.