Biostats question - First Aid - Error / equivocation? Maybe?

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

Phloston

Osaka, Japan
Removed
Lifetime Donor
10+ Year Member
Joined
Jan 18, 2012
Messages
3,880
Reaction score
1,676
P. 57 of FA2012, about 1/3 down on the page, says, "Mode is least affected by outliers in the sample."

I had encountered a practice question in Kaplan QBook last week that asked about which variable would be least affected by outliers, and I put median, and that was correct. I had annotated that into FA just for reinforcement.

I'm just now realizing that FA says mode, not median, for that.

The FA2012 errata does not touch upon this.

By all means, the middle value (median) in a sample is completely unbiased with respect to outliers (if an odd number of values).

Any thoughts?
 
P. 57 of FA2012, about 1/3 down on the page, says, "Mode is least affected by outliers in the sample."

I had encountered a practice question in Kaplan QBook last week that asked about which variable would be least affected by outliers, and I put median, and that was correct. I had annotated that into FA just for reinforcement.

I'm just now realizing that FA says mode, not median, for that.

The FA2012 errata does not touch upon this.

By all means, the middle value (median) in a sample is completely unbiased with respect to outliers (if an odd number of values).

Any thoughts?

Mode is the most common value seen in a sample (as I'm sure you know), so having an outlier or two won't change that at all. However, the median will definitely be changed by adding in outliers, even if only slightly. Basically, it's no change (mode) vs small change (median).

I see what you're saying though, if you have 9 numbers and with a median of say 50, changing the highest 2 numbers to extreme outliers wouldn't affect that.

I think FA is talking about going from a normal distribution to say a positive skew due to outliers, the mode would still be the same.

Hmmm...if this comes up on boards, I guess you'll just have to think about exactly what the question is asking about. Regardless of adding in vs substituting in outliers, the mode will not change...
 
Last edited:
Honestly, they both seem more or less equally plausible to me. If I had to choose one, it would be mode, I guess, but neither would be affected at all unless you a) shifted a data point across the median (would change the median) or b) shifted data points away from the most frequent value (would change the mode).

If they're talking about skew, it's definitely mode. Outliers, I don't really see any difference between the two
 
Just going by the definitions of each, I would go with mode regardless of the situation. I would assume that there was an error with the Kaplan question unless the answer choices did not include mode.

A sample number set for calculations:

1, 2, 2, 3, 4, 5, 6, 6, 6

Your mode is 6 as the number that appears most often in the data set.
Your median is 4 as the number that splits the data set into an upper and lower half.
And your mean is 3.889 (for completeness sake)

Now you add an outlier to the data set, 100.

1, 2, 2, 3, 4, 5, 6, 6, 6, 100

Your mode is still 6 (stays the same), your median moves up to 4.5 as calculated by the average of the two middle values - 4 and 5 (increases slightly), and your mean is 13.5 (increases the most).

An outlier can change the median slightly but should never change the mode.

Hope that was helpful.
 
The Kaplan question was worded along the lines such that it discussed a data set and mentioned that a few of the values were outliers. Then It asked which variable would be least biased with respect to them.

I chose median because it only depends on itself (if an odd # of data points) or on the two central values (if an even # of data points).

My impression was that, based on the question alone, it wasn't possible to deduce whether the outlier values were actually also the mode, so I went with median (i.e. the values could have been 1 2 3 4 5 6 7 8...20 1500 1500, so although the latter two are "outliers," they're still the mode).

I understand that it's based on the specific question, where adding an outlier would change the median, but wouldn't necessarily change the mode, but I just wanted clarification. Thanks again,
 
The Kaplan question was worded along the lines such that it discussed a data set and mentioned that a few of the values were outliers. Then It asked which variable would be least biased with respect to them.

I chose median because it only depends on itself (if an odd # of data points) or on the two central values (if an even # of data points).

My impression was that, based on the question alone, it wasn't possible to deduce whether the outlier values were actually also the mode, so I went with median (i.e. the values could have been 1 2 3 4 5 6 7 8...20 1500 1500, so although the latter two are "outliers," they're still the mode).

I understand that it's based on the specific question, where adding an outlier would change the median, but wouldn't necessarily change the mode, but I just wanted clarification. Thanks again,

It is certainly possible that your outliers could also be the mode, but it is highly unlikely, because they are, in fact, outliers...which by definition are far removed from the majority of the other values. This is more difficult to illustrate with out little samples of 7 or 8 numbers, but if you had a data set with 100 or 1000 data points, it is not hard to see why the 1 or 2 outliers stand little to no chance of being the mode.

Just out of curiousity, does anyone happen to know the definition of an outlier? 2 standard deviations above the mean? 3 SD's above the mean? I know there is an actual statistical way to determine if a value is an outlier, but I can't remember from undergrad how you determine it.
 
The Kaplan question was worded along the lines such that it discussed a data set and mentioned that a few of the values were outliers. Then It asked which variable would be least biased with respect to them.

I chose median because it only depends on itself (if an odd # of data points) or on the two central values (if an even # of data points).

My impression was that, based on the question alone, it wasn't possible to deduce whether the outlier values were actually also the mode, so I went with median (i.e. the values could have been 1 2 3 4 5 6 7 8...20 1500 1500, so although the latter two are "outliers," they're still the mode).

I understand that it's based on the specific question, where adding an outlier would change the median, but wouldn't necessarily change the mode, but I just wanted clarification. Thanks again,

Right. If both your outliers are the same that could be the new mode, but in a large sample set the odds of that would be fairly small. If you are only adding a single outlier as I was referencing, then the mode should not change. If you're adding multiple, anything goes... so you would have to rely on context/if they gave any clues about whether the outliers are the same number, the number of values with the mode, how they compare, etc. Either way, I don't think that first aid is wrong.
 
Top