MCAT Scoring Theories-How your Score is Calculated

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

johnaddams

Full Member
10+ Year Member
Joined
Aug 20, 2011
Messages
43
Reaction score
0
I was thinking of how AAMC calculates your score, we all know that it is based on how well you do compared to the other test takers. Which brings up the point if you write in Jan will you get a better score, they say no and I was wondering why so here are a couple theories on how your score is calculated.

1) Your score is based on preset numbers for the version of the test you write just like AAMC practices. The scaling is then readjusted to account for variation for a new group of test takers.

2) Your score is based on a live curve based on the people write that day. (Problem: not enough students to give a good representation given the many versions of the test.)

3) Combination of the two above: rescaling of the raw scores based on the current and previous test performances, then your raw score is converted to a scaled score.

The last one would explain why writing on any given day makes no difference. What are your thoughts, theories, comments?
 
I was thinking of how AAMC calculates your score, we all know that it is based on how well you do compared to the other test takers. Which brings up the point if you write in Jan will you get a better score, they say no and I was wondering why so here are a couple theories on how your score is calculated.

1) Your score is based on preset numbers for the version of the test you write just like AAMC practices. The scaling is then readjusted to account for variation for a new group of test takers.

2) Your score is based on a live curve based on the people write that day. (Problem: not enough students to give a good representation given the many versions of the test.)

3) Combination of the two above: rescaling of the raw scores based on the current and previous test performances, then your raw score is converted to a scaled score.

The last one would explain why writing on any given day makes no difference. What are your thoughts, theories, comments?

it's based on pre-set curve. done. end of ur speculation.
 
it's based on pre-set curve. done. end of ur speculation.

why does it take them 30 days to score our essays? i always thought that our "curve" was set based off how the relative population does writing that particular test

😕
 
why does it take them 30 days to score our essays? i always thought that our "curve" was set based off how the relative population does writing that particular test

😕

Nope not at all.. then the exam will not be standardized and unfair in each administration.... For instance, one particular administration could have a lot intelligent people and other a lot of less intelligent ones... if you set a curve around each date it will be unfair

Curve is pre-set to give as fair of an exam as possible.... As for why it takes 30 days to grade.. nobody really knows... maybe this is a part of enduring process of becoming a doctor =P
 
it's based on pre-set curve. done. end of ur speculation.

I don't know. I have to think there are some mitigating factors they take into account.

For example, have you ever had a class where they would drop 1 of say, 3 midterms? And then for the hardest midterm, half of the class wouldn't show up? And then that TOTALLY ruins the curve. Only the people who feel prepared show up, so you're only being graded against the prepared students (as opposed to normal tests, where you're graded against the entire class). Does that make sense? It's like if the people who would get the D's and F's don't show up, they aren't taken into account in the curve, and the people who would have otherwise gotten C's and B's become the new D's and F's.

LONG STORY SHORT. The MCAT has got to have days like that. For example, I took a particularly hard exam, and a lot of people in my test room left WAY early; clearly they gave up and voided. If an unusually high % of test takers void because a test is quite unlike the AAMC's or for whatever reason, I would hope it would somehow be taken into account. Does that make sense? I know that say, a 35, is always the 98th percentile. But I'd think there would be some kind of mitigating factor when say, a third of the students taking the exam void (as opposed to a normal 10%). Otherwise it's just not fair. On an easier test, even the people who are going to score badly might feel confident enough/feel that the test was fair and not void. On a ludicrously hard exam, you probably lose the bottom third of test takers, meaning that everyone's scores will be artificially deflated.

Sorry if my statistics are totally jacked up or something. But I definitely think they take some things like this into account when setting a curve...
 
Nope not at all.. then the exam will not be standardized and unfair in each administration.... For instance, one particular administration could have a lot intelligent people and other a lot of less intelligent ones... if you set a curve around each date it will be unfair

Curve is pre-set to give as fair of an exam as possible.... As for why it takes 30 days to grade.. nobody really knows... maybe this is a part of enduring process of becoming a doctor =P

almost the only thing you keep constant from test to test is the pool of test takers. u can assume every sample size is the same which means an equal number of "intelligent" test takers and "non-intelligent" ones.
 
1) Your score is based on preset numbers for the version of the test you write just like AAMC practices. The scaling is then readjusted to account for variation for a new group of test takers.

The first part of this is correct. The bolded, second part is bogus.

2) Your score is based on a live curve based on the people write that day. (Problem: not enough students to give a good representation given the many versions of the test.)

Nope.

3) Combination of the two above: rescaling of the raw scores based on the current and previous test performances, then your raw score is converted to a scaled score.

Nope.

why does it take them 30 days to score our essays? i always thought that our "curve" was set based off how the relative population does writing that particular test

😕

"It takes between 15 and 20 days to score all examinees' responses after each administration."
...
• Each essay is scored twice on a 6-point scale:
-- Human
-- Machine
• This results in four scores (2 scores for each essay)
• These scores are summed (4-24) and then converted converted to the alphabetic scale to the alphabetic scale"

[Source]

I don't know. I have to think there are some mitigating factors they take into account.

For example, have you ever had a class where they would drop 1 of say, 3 midterms? And then for the hardest midterm, half of the class wouldn't show up? And then that TOTALLY ruins the curve. Only the people who feel prepared show up, so you're only being graded against the prepared students (as opposed to normal tests, where you're graded against the entire class). Does that make sense? It's like if the people who would get the D's and F's don't show up, they aren't taken into account in the curve, and the people who would have otherwise gotten C's and B's become the new D's and F's.

LONG STORY SHORT. The MCAT has got to have days like that. For example, I took a particularly hard exam, and a lot of people in my test room left WAY early; clearly they gave up and voided. If an unusually high % of test takers void because a test is quite unlike the AAMC's or for whatever reason, I would hope it would somehow be taken into account. Does that make sense? I know that say, a 35, is always the 98th percentile. But I'd think there would be some kind of mitigating factor when say, a third of the students taking the exam void (as opposed to a normal 10%). Otherwise it's just not fair. On an easier test, even the people who are going to score badly might feel confident enough/feel that the test was fair and not void. On a ludicrously hard exam, you probably lose the bottom third of test takers, meaning that everyone's scores will be artificially deflated.

Sorry if my statistics are totally jacked up or something. But I definitely think they take some things like this into account when setting a curve...

Because the MCAT is a standardized test, your performance is being compared to the performance of previous MCAT test-takers. Your current cohort will have nothing to do with the scale. The scale is pre-set when you take the exam. When you hit that final "Submit", your score is pretty much calculated (except for the writing - even half of which is graded by a computer).

How this works (as with every standardized test) is that in every administration the AAMC will include some "experimental items" which will not count towards your score but will be used to collect data about the difficultly level of the item. Once enough data about an experimental item is gathered, the AAMC will include that experimental item in a real MCAT. Using the data collected for each item and statistical methods, the AAMC can pre-set the curve for each MCAT with very good accuracy to result in a nice standard distribution for the scores falling in the appropriate percentile ranges. All major standardized tests use this approach (MCAT, SAT, LSAT, GMAT, GRE, etc.)

tl;dr: MCAT is pre-scaled. The scale for your MCAT is already set before you take it. Once you submit your answers for the MCAT, your MC score is pretty much set in stone. This is how standardized tests work. Yay.
 
Can anyone provide a citation for any of this information they are giving? Not to ruin the party but I feel like sometimes people read stuff on a web forum from someone they believe to be credible and turns out that person did the same thing for someone else and all of the sudden you have this huge list of inaccuracies building up over time...Not saying that is the case here - Just adding this to the discussion I suppose....

I don't mean to spoil anything here, but could you please provide a citation on MCAT methodology that proves that the information you're giving is accurate?
 
Well then I'm ******ed. But it's definitely good to know that the scale is determined over all the tests, and not your test date. 👍
 
Why are there so many threads on this nonsense when there is no mystery to this because they tell you exactly what they do?
 
Why are there so many threads on this nonsense when there is no mystery to this because they tell you exactly what they do?

Because as long as people think they deserve a 36 and get a 28, they will continue to look for an excuse outside their own performance.
 
What's the purpose of not disclosing the amount of questions you were correct for?

I know when I got my SAT scores in the mail, it would say my score then the # of questions I was correct and # I was incorrect. Why don't they do this for the MCAT as well?

Maybe remembering questions and then you'd know which ones you got wrong, then telling other people? Still not sure, but I feel like it could save people a lot of agony. (but lose the AAMC a lot of rescore $...)
 
What's the purpose of not disclosing the amount of questions you were correct for?

I know when I got my SAT scores in the mail, it would say my score then the # of questions I was correct and # I was incorrect. Why don't they do this for the MCAT as well?

Maybe remembering questions and then you'd know which ones you got wrong, then telling other people? Still not sure, but I feel like it could save people a lot of agony. (but lose the AAMC a lot of rescore $...)

Fairly obvious explanation: the presence if experimental items makes it impractical to release # correct for each section.

The SAT is able to release that info because the SAT has a whole experimental section (which you won't get # correct for).
 
.How this works (as with every standardized test) is that in every administration the AAMC will include some "experimental items" which will not count towards your score but will be used to collect data about the difficultly level of the item. Once enough data about an experimental item is gathered, the AAMC will include that experimental item in a real MCAT. Using the data collected for each item and statistical methods, the AAMC can pre-set the curve for each MCAT with very good accuracy to result in a nice standard distribution for the scores falling in the appropriate percentile ranges. All major standardized tests use this approach (MCAT, SAT, LSAT, GMAT, GRE, etc.)

I've studied for and have taken the GRE. (CBT version) It's nothing like how the MCAT works.

There's no curve other than the fact that they label certain questions as "easy", "medium" and "hard."

You get one question at a time and depending on if you get it right or not, the difficulty of the next question varies. (i.e. they start with a medium question, you get it wrong, they will give you an easier question. If you get a medium question right, they will move you up to a harder question. They will keep doing this until you level out and land on a particular score in the 0-800 range) So EVERYONE has a different test if you think about it. Plus, you can schedule a GRE anytime you want. They pretty much have 5-6 days available on any given week. Thus, I doubt two GRE test takers have ever taken the same test as another test taker. It's just highly unlikely on the CBT version.

Also, no curve other than the fact (stated above) about how they label easy-medium-hard questions. I am sure they used experimental questions (like the SAT) and evaluated which one is easy or moderate or hard and then re-distributed them on future tests.

In my opinion, I just think MCAT is a very unique test that has their own unique method. Sure, the percentiles and statistical methods might be somewhat similar but GRE/SAT is totally different from MCATs. There aren't too many tests that used to be ONE DAY long (pre-CBT, paper version) and no other test requires 4-5 different subjects to completely master.
 
I've studied for and have taken the GRE. (CBT version) It's nothing like how the MCAT works.

There's no curve other than the fact that they label certain questions as "easy", "medium" and "hard."

You get one question at a time and depending on if you get it right or not, the difficulty of the next question varies. (i.e. they start with a medium question, you get it wrong, they will give you an easier question. If you get a medium question right, they will move you up to a harder question. ............

I don't know, that kind of sounds stupid..
Why would they want to do that? What is the advantage of doing that?
😕
 
I've studied for and have taken the GRE. (CBT version) It's nothing like how the MCAT works.

I wasn't trying to say that the GRE pre-scales like the MCAT. What I was trying to say is that the GRE uses experimental items to determine question difficultly like all standardized tests do. Starting August 1, 2011, a new GRE ("GRE revise") started being administered. The scores for this administration will not be released till mid-November because the ETS (GRE test makers) will be collecting data about the questions during this initial testing period.

There's no curve other than the fact that they label certain questions as "easy", "medium" and "hard."

You get one question at a time and depending on if you get it right or not, the difficulty of the next question varies. (i.e. they start with a medium question, you get it wrong, they will give you an easier question. If you get a medium question right, they will move you up to a harder question. They will keep doing this until you level out and land on a particular score in the 0-800 range) So EVERYONE has a different test if you think about it. Plus, you can schedule a GRE anytime you want. They pretty much have 5-6 days available on any given week. Thus, I doubt two GRE test takers have ever taken the same test as another test taker. It's just highly unlikely on the CBT version.

There is a curve. The whole purpose of this "adaptive scaling" is for the GRE's statistical system to accurately place your in the appropriate percentile range by using questions of varying difficultly. This goal is the same for the MCAT. However, the MCAT probably chooses to pre-scale and administer the same test to everyone during a certain administration because it is hard to use adaptive scaling with passage-based questions and vestigial reasons. Maybe you already know this but the GRE Subject Tests are more like the MCAT in that they don't use adaptive scaling.

Just an interesting aside: If you're curious about GRE's adaptive scaling system, check out this pub: http://www1.ets.org/Media/Research/pdf/RR-93-07-Schaeffer.pdf (It's a report on the field test of the current GRE CBT with the specifics on the math and statistical models behind the scaling system they employ)

Also, the AAMC has been researching adaptive testing for the VR section.
 
Last edited:
...Just an interesting aside: If you're curious about GRE's adaptive scaling system, check out this pub: http://www1.ets.org/Media/Research/pdf/RR-93-07-Schaeffer.pdf (It's a report on the field test of the current GRE CBT with the specifics on the math and statistical models behind the scaling system they employ)

Also, the AAMC has been researching adaptive testing for the VR section.

Holy crap, 47 pages? No way I'm that curious..
Mathematical model and statistical analysis?? How could a study like that be interesting to anyone??😱
 
I don't know, that kind of sounds stupid..
Why would they want to do that? What is the advantage of doing that?
😕

LOL. Don't ask me. I took it just to get into a master's program.

They changed it this year and it sounds like it's turning into a more reasoning test (more graph analysis, interpretations, less random individual questions). They are even giving people a basic calculator to use during the test to minimize computations. (This probably means they want you to use your time thinking rather than on your scratch paper trying to multiply and add 3 or 4 numbers.)

I don't even really know how much GRE scores are considered when people apply for master's or PhD programs. It seems like it's more about recommendations, grades, personal statements and interviews. My program didn't have an interview but I know for sure they made they really heavily weighed my Personal Statement over other things.
 
Top