Median Split

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

clin2012orbust

New Member
10+ Year Member
Joined
Aug 9, 2010
Messages
6
Reaction score
0
I am working on revisions for a paper on the impact of motivational interviewing vs. standard care for keeping first outpatient appointment on an inpatient ward. It's actually a secondary data paper, and its a paper that my professor wrote but basically doesn't have time to refine and publish, so he's giving it to me. Moral of the story, I had little to do with the initial analyses and the initial study, and I'd like to not have to dig up SPSS files from 10 years ago if I don't have to.

Without going into lots of extra details, we looked at a number of factors in a logistic regression. He dichotomized a few variables based on median split method.

The reviewer was cool with that except, we dichotomized adherence to CBT outpatient group using the median split, and it happened that the median was right around 55%. The reviewer says that he recognizes we used median split, but that adherence in this context would be somewhere near 65 or 70%.

Is there any type of defense I can use for how I used median split? One way I was thinking of framing is that if we made low and high adherence something like >70 and <30, we would miss a lot of subjects, and that would limit my statistical power---correct?

Any help appreciated!!!
 
Does he want an explanation of your rationale included in the paper or is he asking that you present the data in a different way? [In my limited knowledge in this area]...I think addressing statistical power is worthwhile, though he still may want to see it tried in a different way to see if there is a "better" way to present the data.
 
I encountered a similar problem of wanting to do a median split, but having that split not work out because of the skewed data distribution. Likewise, the less than greater than split didn't work, because our greater than sample was very small. We ended up finding a way to analyze continuously after talking with the statistician, and were told that if that's an option, there's really not a lot of cases where you're justified using a discontinuous method.

At any rate, just my measly second hand 2c that I can pass along. Don't have much else to offer.
 
Am I wrong or isn't the problem with dichotomous splitting when you do something like <25% and >75% and drop out the middle? Here, I am not doing that, I included everyone, they are just saying I guess that WE should have not used median, but made the cut for low adherence high adherence at >70% and <70%. We split at the median which was in the high 50s% I haven't done clinical work yet, but I think having dually diagnosed people come to over 1/2 of their outpatient treatment is adherent!!
 
Ahhh, I see. No, splitting is generally not a good idea regardless of how it is done, though there are exceptions (e.g. you have very obviously bimodal data, or other weird distribution problems that are not easily corrected, a theoretical case can be made for taxa). However, I haven't heard anything that implies that this is one of those exceptions. The reviewer is making an argument that you should be splitting on theoretical grounds rather than the median. I suppose its as valid as doing a split in the first place, though I have to say I'm a little confused why it is being analyzed that way in the first place.

Either way, I suspect you'll have a tough time getting this published without doing what the reviewer wants. You could perhaps argue that the estimate of 65-70% was done with a different population with higher adherence in general (if true), though I suspect even then you would probably want to analyze it anyways and include it in your response to the review.

Truth be told, I would just treat it continuously. I can understand doing a median split to fit something in as a factor in an ANOVA model, but if you are running a logistic regression I'm a little confused what the purpose of dichotomizing is. Think of it this way...you are throwing out information no matter what if you are categorizing things from continuous data. Does it make sense to treat the person who had 54% adherence versus 56% adherence the same as if they had 10% adherence versus 90% adherence? Again, sometimes this can be justified, but you need to have a reasonable argument and not wanting to dig up the SPSS file is probably not going to placate the reviewer😉

As for the power question...that's complicated. Dropping subjects would generally reduce power, but only comparing the two extremes could artificially inflate it as well. Again, unless you have a specific reason to dichotomize, you are almost always better off treating it as continuous.
 
Last edited:
Truth be told, I would just treat it continuously. I can understand doing a median split to fit something in as a factor in an ANOVA model, but if you are running a logistic regression I'm a little confused what the purpose of dichotomizing is. Think of it this way...you are throwing out information no matter what if you are categorizing things from continuous data. Does it make sense to treat the person who had 54% adherence versus 56% adherence the same as if they had 10% adherence versus 90% adherence? Again, sometimes this can be justified, but you need to have a reasonable argument and not wanting to dig up the SPSS file is probably not going to placate the reviewer😉

Yup.
 
Top