- Joined
- Sep 8, 2007
- Messages
- 8
- Reaction score
- 0
I think people here should read Stephen J. Gould's "The Mismeasure of Man", if only for the new perspective it introduces. I am sure lots of people here are numbers-oriented, and in large measure numbers do have an important impact upon life prospects: whether or not you go to a great undergrad, whether or not you get into a good grad program or med school, and whether or not you pass your licensing exams.
BUT let's consider a few things here:
1. Standardized exams DO NOT perfectly or directly measure what they purport to measure. Say we want to create a test to measure intelligence, or "IQ". Well in actuality there is no such observable/quantifiable thing as intelligence. We sometimes can say we know it when we see it, but we don't have any way to directly measure it. So what we do is we find things that are indicators of intelligence. By an indicator I mean some measurable and observable variable which correlates to the unmeasurable "latent" variable such as intelligence. But we never directly observe or measure the latent variable. We simply have a few measures that we think correlate with the underlying variable.
2. A standardized exam does not make a valid exam, per se. As Gould points out, there are many ways to tinker with an exam to make it seemingly relevant. The mere fact that the distribution of scores for an exam follows a normal Gaussian distribution does not mean that whosoever scores in the upper percentiles is the "best". Almost any exam can be analyzed statistically, but that does not mean that the top scorers are better than the lower scorers. For one, the exam may not actually measure what we say it measures (as per above) and two, even though we have a tendency to feel that the exam is valid as a predictor of some inherent ability, there is a significant degree of randomness. For example, a student may have a bad day; or the exam itself might have a particularly difficult passage; or the exam room might have been 100 degrees. Who is to say, for example, that a person who got two more questions wrong, and got an 11 instead of a 12, is actually less capable than the person who received a 12? It is entirely possible, and plausible, that a person had a great score because the passage was easier. Even if we assume that scores are standardized in order to set a so-called "curve", the randomness is still present. The 12 examinee may have had a passage he was really familiar with (say he had many genetics questions and he worked for a genetics lab) while the 11 examinee had a set of questions on a passage topic he had not studied as much on, or that simply was not as typical. To give a real example, many people here have complained that several questions on the July 24 PS were about angular momentum, but that it was rarely, if ever, covered on practice exams. A few people may have gotten the questions correct simply because they happened to be Physics majors, or because they happened to study the topic in passing. Anyways, my point is that there is so much randomness thrown in that, even though on the whole randomness is eliminated by the large pool of scores (because it is assumed to be nonsystematic), there ARE systematic issues for individuals when they take the exam.
3. Population non-systematic randomness that is averaged out does not preclude individual systematic randomness that is not. 1000's of people take the MCAT, so any non-random error is eliminated, but we only take the exam 1 or 2 times, meaning randomness is a very important factor. This is one reason why I feel that multiple exam scores are more indicative of some underlying ability, because of the principle of regression towards the mean. But few of us take the exam so many times as to come up with a truly reflective score average- assuming it actually reflects anything to begin with.
So in the end I feel that:
1. Standardized exams probably correlate only weakly to some underlying variable, whether that is intelligence or ability in a particular discipline.
2. Randomness for any individual test-taker should not be excluded, even if it can be excluded at the population level by virtue of sheer numbers.
3. Multiple exam re-takes provide scores that tend towards a mean that is more indicative of the underlying variable. The random component works both ways: it can give a bad score or a good one with near equal probability.
BUT let's consider a few things here:
1. Standardized exams DO NOT perfectly or directly measure what they purport to measure. Say we want to create a test to measure intelligence, or "IQ". Well in actuality there is no such observable/quantifiable thing as intelligence. We sometimes can say we know it when we see it, but we don't have any way to directly measure it. So what we do is we find things that are indicators of intelligence. By an indicator I mean some measurable and observable variable which correlates to the unmeasurable "latent" variable such as intelligence. But we never directly observe or measure the latent variable. We simply have a few measures that we think correlate with the underlying variable.
2. A standardized exam does not make a valid exam, per se. As Gould points out, there are many ways to tinker with an exam to make it seemingly relevant. The mere fact that the distribution of scores for an exam follows a normal Gaussian distribution does not mean that whosoever scores in the upper percentiles is the "best". Almost any exam can be analyzed statistically, but that does not mean that the top scorers are better than the lower scorers. For one, the exam may not actually measure what we say it measures (as per above) and two, even though we have a tendency to feel that the exam is valid as a predictor of some inherent ability, there is a significant degree of randomness. For example, a student may have a bad day; or the exam itself might have a particularly difficult passage; or the exam room might have been 100 degrees. Who is to say, for example, that a person who got two more questions wrong, and got an 11 instead of a 12, is actually less capable than the person who received a 12? It is entirely possible, and plausible, that a person had a great score because the passage was easier. Even if we assume that scores are standardized in order to set a so-called "curve", the randomness is still present. The 12 examinee may have had a passage he was really familiar with (say he had many genetics questions and he worked for a genetics lab) while the 11 examinee had a set of questions on a passage topic he had not studied as much on, or that simply was not as typical. To give a real example, many people here have complained that several questions on the July 24 PS were about angular momentum, but that it was rarely, if ever, covered on practice exams. A few people may have gotten the questions correct simply because they happened to be Physics majors, or because they happened to study the topic in passing. Anyways, my point is that there is so much randomness thrown in that, even though on the whole randomness is eliminated by the large pool of scores (because it is assumed to be nonsystematic), there ARE systematic issues for individuals when they take the exam.
3. Population non-systematic randomness that is averaged out does not preclude individual systematic randomness that is not. 1000's of people take the MCAT, so any non-random error is eliminated, but we only take the exam 1 or 2 times, meaning randomness is a very important factor. This is one reason why I feel that multiple exam scores are more indicative of some underlying ability, because of the principle of regression towards the mean. But few of us take the exam so many times as to come up with a truly reflective score average- assuming it actually reflects anything to begin with.
So in the end I feel that:
1. Standardized exams probably correlate only weakly to some underlying variable, whether that is intelligence or ability in a particular discipline.
2. Randomness for any individual test-taker should not be excluded, even if it can be excluded at the population level by virtue of sheer numbers.
3. Multiple exam re-takes provide scores that tend towards a mean that is more indicative of the underlying variable. The random component works both ways: it can give a bad score or a good one with near equal probability.