strict vs. lenient mcat curves based on test dates...

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

jringo1984

Full Member
10+ Year Member
Joined
Feb 27, 2009
Messages
13
Reaction score
0
i know this has been brough up a couple times and i read those threads, but apparently nobody knows for sure if there are certain test dates that have more lenient curves (due to more students having less time to prep). does the aamc say anything about this or do they want it to be a secret? if there are the same distribution of scores on every administration, i would think that some test dates have to have not just harder or easier tests but harder or easier curves.
 
i know this has been brough up a couple times and i read those threads, but apparently nobody knows for sure if there are certain test dates that have more lenient curves (due to more students having less time to prep). does the aamc say anything about this or do they want it to be a secret? if there are the same distribution of scores on every administration, i would think that some test dates have to have not just harder or easier tests but harder or easier curves.

You are probably referring to the fact that Fall test dates have a slightly lower average score (1 point I believe). It has nothing to do with them being more lenient, more likely it has to do with the fact that there is a broad distinction between prepared and unprepared, which hurts the normalization of the curve.
 
The curves are already set so it has nothing to do with whoever took it that day.


yea, i get that, but like the above poster said - the average score is slightly lower, so it would seem that the same raw score would be scaled a slight bit higher unless i'm missing something (which is entirely possible at this hour of the night):laugh:
 
yea, i get that, but like the above poster said - the average score is slightly lower, so it would seem that the same raw score would be scaled a slight bit higher unless i'm missing something (which is entirely possible at this hour of the night):laugh:

Yeah my brain is working quite right either, plus I've got a genetics paper to finish before 10AM.

I don't think it should change though, and I have this perfect explanation why in my head but I just can't put it into words at this hour.
 
Just do your best on the MCAT.

regardless of the date you take it, if there is a 1pt diffrence (which I doubt) due to the day you take your test, 1pt is not going to make much of an impact.
 
i know this has been brough up a couple times and i read those threads, but apparently nobody knows for sure if there are certain test dates that have more lenient curves (due to more students having less time to prep). does the aamc say anything about this or do they want it to be a secret? if there are the same distribution of scores on every administration, i would think that some test dates have to have not just harder or easier tests but harder or easier curves.
I had contacted them about this. They do not disclose any details, but just know that the curve applies to many different test dates. So even if you take the test in fall, the curve is still going to include all the tests from winter, spring, and likely several years back. At least that's what they told me.

During some tests they also throw in questions to get a "feel" as to how students do on those. Depending on the outcome, these questions are not graded, at least not initially. Therefore, on some tests it is possible to miss a question or more and still get a perfect score. I'd attribute a strong seasonal variation in the scores to the combination of factors like several questions as floaters and the inability of most students to answer those so they don't get graded.
 
The curves are already set so it has nothing to do with whoever took it that day.

That's a big fat rumor. The AAMC has always been ambiguous about how the curve is set and many believe test takers from that day DO play a role. The AAMC never denied that.
 
They are least prepared in the Fall. AAMC does publish the average MCAT scores. I don't recall where, but I have seen them.

This seems like hearsay. We need a link.

If anything, I would say students who are taking in april or may, during their finals etc. may be less prepared.

Students taking on august may have used their whole summer to study, and thus could be more prepared.

Obviously this is totally subjective and I actually believe that students representing all areas of preparedness are included in each test.
 
I would think middle of the fall semester would be where people on average are least prepared...since they didnt study over the summer..their science class are not done yet...they dont have time to study during mid semester? what do you think?
 
The curves are already set so it has nothing to do with whoever took it that day.

This is incorrect. I got curious and went digging recently for information on how the MCAT is curved, and found a document called the "MCAT Interpretive Manual," which AMCAS sends to med schools and premed counselors. It explains that the MCAT is curved by "equipercentile equating," which means that if you get the same numerical score on two different MCATs (say, an 8), this means that you scored in the same percentile range on both tests (in this case, around the 50th percentile). Obviously, you can't do percentile rankings unless you know the performance of the people who took the test alongside you. So the curves can't be set in advance.

However, this method requires some adjustment (called "smoothing"), because the number of people who take the same test version on a given day is not quite large enough to expect a smooth bell curve. The MCAT uses a couple of methods for this. One is "same-item performance," which measures how different groups of test-takers performed on identical questions. This provides a benchmark for comparing or "equating" two multiple versions of the same test. In assessing same-item performance, AMCAS uses groups of test-takers going back multiple years. That is why past exams do have some influence on the curve. Also, AMCAS tries to compose exams that don't vary too much in difficulty level, but we can all argue whether they succeed in that goal.

Regardless, it is a serious misconception to think that "the curve is set in advance." By far the biggest influence on the curve is how your fellow test-takers perform on the same version you are taking.
 
So taking it when others are not as prepared does give an advantage.

Probably.

Back in the good ol days, when it was only offered twice a year, people said the April administration had a more lenient curve because students would prep all summer exclusively for the August administration.
 
This is incorrect. I got curious and went digging recently for information on how the MCAT is curved, and found a document called the "MCAT Interpretive Manual," which AMCAS sends to med schools and premed counselors. It explains that the MCAT is curved by "equipercentile equating," which means that if you get the same numerical score on two different MCATs (say, an 8), this means that you scored in the same percentile range on both tests (in this case, around the 50th percentile). Obviously, you can't do percentile rankings unless you know the performance of the people who took the test alongside you. So the curves can't be set in advance.

However, this method requires some adjustment (called "smoothing"), because the number of people who take the same test version on a given day is not quite large enough to expect a smooth bell curve. The MCAT uses a couple of methods for this. One is "same-item performance," which measures how different groups of test-takers performed on identical questions. This provides a benchmark for comparing or "equating" two multiple versions of the same test. In assessing same-item performance, AMCAS uses groups of test-takers going back multiple years. That is why past exams do have some influence on the curve. Also, AMCAS tries to compose exams that don't vary too much in difficulty level, but we can all argue whether they succeed in that goal.

Regardless, it is a serious misconception to think that "the curve is set in advance." By far the biggest influence on the curve is how your fellow test-takers perform on the same version you are taking.

Interesting. But the approach in the first paragraph seems to conflict with the approach in second paragraph. So is the question, what does "the same percentile range" mean? say, 49 to 51 versus 35 to 65?
 
Last edited:
Too much overanalysis. Just take the test when you're ready to do your best.

No kidding.

There is going to be some randomness in your scoring no matter when you take the test...you might get sick the day before the test, or run over a squirrel on the way to the test center, or get a question that you just went over in a biochem class...the only thing that should matter to you when picking a test date is when the timing works for your schedule AND you can give plenty of time to prep for it...
 
Interesting. But the approach in the first paragraph seems to conflict with the approach in second paragraph. So is the question, what does "the same percentile range" mean? say, 49 to 51 versus 35 to 65?

The numerical scores are just arbitrary breakpoints chosen by the test designers for reporting the results. AAMC chose to have 15 score brackets for each section of the MCAT, but it could just as well have been 4 or 24; each system would just be a different aggregation of the same data. The goal of the scoring system is to have the scores follow an approximately normal distribution, and for a theoretical test taker who takes different versions of the MCAT to get approximately the same numerical score each time. (This is the goal of the equipercentile equating system AAMC uses.) But, due to random variation, there is going to be some fluctuation in the same person's exact percentile rank from test to test. So, in order to reduce the effect of these random fluctuations, AAMC uses percentile ranges for each score, and tweaks the ranges if necessary to keep the scores approximately normally distributed.

If you look at the score breakdown for a given MCAT (like the 2008 exam, given here, you'll note that the width of the ranges varies, depending on where you are in the distribution. In the middle (say, around the score of 8), the bands are about 15 percentile points wide, but at the very low or very high ends, they are only about 2 pp wide. This primarily reflects the fact that many more people score in the middle than at the extremes.

If you compare score breakdowns from different years (e.g. 2007 and 2008), you'll see that the percentile ranges associated with a given score in a given section tend to stay about the same. For instance, in '07 a 10 in PS meant about 68th-80th percentile, while in '08 it was 71-82. But there has been some "drift" over the years, especially in the bio section, where average scores seem to be going up (and the percentile ranks associated with those scores are accordingly going down). In 2002, a 10 in bio was the 69-84 range, while in '08 it meant the 59-77 range.
 
if you get the same numerical score on two different MCATs (say, an 8), this means that you scored in the same percentile range on both tests (in this case, around the 50th percentile).

One is "same-item performance," which measures how different groups of test-takers performed on identical questions. This provides a benchmark for comparing or "equating" two multiple versions of the same test. In assessing same-item performance, AMCAS uses groups of test-takers going back multiple years.

I guess i just dont understand; in the first paragrpah, they are saying you are curved against the other test-takers that day. In the second, they are saying you are curved against past test-takers.

So i dont get how those two methods, together, determine your score ...
 
I guess i just dont understand; in the first paragrpah, they are saying you are curved against the other test-takers that day. In the second, they are saying you are curved against past test-takers.

So i dont get how those two methods, together, determine your score ...

This is actually quite a complicated topic, and I probably didn't use precisely correct language in my description. Let me try one more time:

The purpose of equipercentile equating is to equate scores between harder and easier forms of the same test. But you also have variations in the test-taking skills and abilities of different groups of test-takers. A group of MCAT takers from, say, MIT would be expected to score higher on the same test than a group from Podunk Community College. But you don't want to penalize someone for taking the test next to smart people vs. dumb people--that wouldn't be fair. So you need some way to adjust the scores to compensate for the differing abilities of the testing groups.

As a statistician would look at it, both groups are just samples from a theoretical "target population," which in this case is all applicants taking the MCAT over the course of a given year (or multiple years). The goal is to compute percentile ranks that reflect where a score would fall in this target population. The problem is that we don't know what the target pop really looks like--we have to use estimation techniques. What we want, ideally, is a way to judge how our test-taking group relates to that target population. The information a test collects to try to estimate this is called the "equating design." (I incorrectly referred to this earlier as "smoothing." Smoothing does take place in score distributions, but it's a separate process from equating.)

The MCAT uses 2 equating designs, "random groups" and "common item." "Random groups" means that people taking the test in the same place are randomly assigned one of two test forms. This gives you two groups which are both assumed to differ from the target pop by the same amount, since they are randomly selected. This allows the 2 different test forms to be equated with each other. "Common item," as I said before, uses the same questions taken by different groups of people. The peformance of a given group on the common questions is then used as a benchmark for the overall ability of the group compared to a reference standard (which, in this case, includes past MCAT takers). This allows the scores of the group to be adjusted upward or downward to approximate what they would be if they were representative of the whole target pop. THEN the percentile rankings are applied.


If you want a much better explanation of all this, look at the following document: http://www.ets.org/Media/Research/pdf/LIVINGSTON.pdf This is an informational monograph from the Educational Testing Service about how test equating works. It is not about the MCAT per se, but it explains all the techniques used in scoring the MCAT.
 
This is actually quite a complicated topic, and I probably didn't use precisely correct language in my description. Let me try one more time:

The purpose of equipercentile equating is to equate scores between harder and easier forms of the same test. But you also have variations in the test-taking skills and abilities of different groups of test-takers. A group of MCAT takers from, say, MIT would be expected to score higher on the same test than a group from Podunk Community College. But you don't want to penalize someone for taking the test next to smart people vs. dumb people--that wouldn't be fair. So you need some way to adjust the scores to compensate for the differing abilities of the testing groups.

As a statistician would look at it, both groups are just samples from a theoretical "target population," which in this case is all applicants taking the MCAT over the course of a given year (or multiple years). The goal is to compute percentile ranks that reflect where a score would fall in this target population. The problem is that we don't know what the target pop really looks like--we have to use estimation techniques. What we want, ideally, is a way to judge how our test-taking group relates to that target population. The information a test collects to try to estimate this is called the "equating design." (I incorrectly referred to this earlier as "smoothing." Smoothing does take place in score distributions, but it's a separate process from equating.)

The MCAT uses 2 equating designs, "random groups" and "common item." "Random groups" means that people taking the test in the same place are randomly assigned one of two test forms. This gives you two groups which are both assumed to differ from the target pop by the same amount, since they are randomly selected. This allows the 2 different test forms to be equated with each other. "Common item," as I said before, uses the same questions taken by different groups of people. The peformance of a given group on the common questions is then used as a benchmark for the overall ability of the group compared to a reference standard (which, in this case, includes past MCAT takers). This allows the scores of the group to be adjusted upward or downward to approximate what they would be if they were representative of the whole target pop. THEN the percentile rankings are applied.


If you want a much better explanation of all this, look at the following document: http://www.ets.org/Media/Research/pdf/LIVINGSTON.pdf This is an informational monograph from the Educational Testing Service about how test equating works. It is not about the MCAT per se, but it explains all the techniques used in scoring the MCAT.

Thanks! Interesting and definitely complicated. I'll try to wrap my head around this when i get a chance ...
 
Top