Is SPSS really limited...

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

yeti2213

Full Member
10+ Year Member
Joined
Apr 8, 2011
Messages
94
Reaction score
0
Points
0
Advertisement - Members don't see this ad
or am I missing something.

I am using SPSS in full seriousness now and wondering if i just don't know what to do with it. In particular, the data manipulation seems terribly limited. Coming from a database programming environment I want to do things like only run an analysis on a distinct set of values based on say just 10 of the 180 items in my dataset... but there does not seem to be an simple way to do that. I have started using R cause it seems much more powerful in this realm... am I missing something... should I go back and give SPSS another chance?
 
SPSS is actually far more "cutting edge" than it used to be (even just a few years ago they hadn't implemented some major statistical techniques), but you are right it is somewhat limited. I keep intending to sit down and learn R and just haven't had the chance yet (have a programming background so it won't take that long, but hard to prioritize something like that).

That said, I'm not entirely certain what you are trying to do but if you are referring to a subsample you just need to use select cases. If you are referring to specific variables I'm not entirely certain what issue you could be encountering...there is a way to get it to iterate through a series of analyses (i.e. run the same model with 10 different DVs by specifying the range) but I've never done it.

That said, the main hurdle for most people is that things like SAS/R are "scary" because they require syntax and offer far more options. If you are comfortable with those there are likely few (if any) advantages to SPSS beyond it being commonplace so if you do need help with something you are more likely to find it, many datasets you will encounter will be in SPSS format (not that exporting/importing to other software is a big deal), etc.
 
I keep intending to sit down and learn R and just haven't had the chance yet (have a programming background so it won't take that long, but hard to prioritize something like that).

yeah. its not that bad. bit of a pain to figure out how to set up your environment properly and understand all the objects you need do something. But once you get that not bad at all.

That said, I'm not entirely certain what you are trying to do but if you are referring to a subsample you just need to use select cases. If you are referring to specific variables I'm not entirely certain what issue you could be encountering...there is a way to get it to iterate through a series of analyses (i.e. run the same model with 10 different DVs by specifying the range) but I've never done it.

So here is the problem I was trying to solve. Gonna construct a hypo for you. Imagine data set below

Client || Session || Had Icecream || Enjoyed It (has value if Yes to last q)
1 1 Y Y
1 2 N
1 3 N
1 4 Y N
2 1 N
2 2 N
2 3 N
2 4 N

So I need a couple of results. One I need a list of everyone that had Icecream atleast once in the course of their therapy. So it would look like
Client || Had Icecream
1 Y
2 N

After that it gets a little more complicated. In actuality some clients were encouraged to eat ice-cream and others were not explicitly told to do so. So we also want to see for each therapist, how many that were encouraged to eat icecream did, and how many that were not still did. We would also want to see if there were differences in per and post-therapy measures for clients based on if they had ice-cream or not etc etc

I think you see the problem I am having. The cases are each session but the analysis I need to do thinks of the entire course of therapy as the "case". Now this is easy enough to do once I manipulate the data with SQL. In pesudo code I can do something like this

Create a variable HadIceCreamOverWholeCourse
If had ice-cream even once set HadIceCreamOverWholeCourse = 1
select a distinct set of Client, HadIceCreamOverWholeCourse

Then I would just run my analysis on the distinct set I selected. I can do this in R with minimal difficulty. But I can't do it in SPSS without manual manipulation of data. I can't figure out if that is because I am conceptualizing the problem like a database programmer. And the SPSS solution would have different steps. Or if its because SPSS is simply not suited to this sort of problem. Your thoughts are appreciated.
 
Oh...if I understand the problem correctly, that is easily doable in SPSS. It does operate a bit differently from something like SQL so you may need to learn how to approach problems in SPSS a bit differently (not just a new language but a shift in thinking that needs to occur, like going from an object-oriented language to a procedural one). I'm a "functional" programmer (i.e. little formal training but great at coming up with weird inelegant solutions that achieve what I need them to) so there is probably a prettier way to achieve the same thing, but things like this there is almost a way to do it - it just takes a bit of creativity at times. SPSS seems like it was really designed with the assumption that you'd have your final dataset that you need to analyze (with very minimal manipulation left). Its a far cry from something like an Oracle DB, but that's not really what its intended for.

As I'm sure you are aware coming from a database background there are always a million different ways to solve any given problem, so keep that in mind too - I'm just going to give the first thing that sprung to mind. I assume Yes/No is dummy coded so I would just use an aggregate function to get the mean of had ice cream for each individual and then do a simple recode it so that 0 = 0 and >0 = 1, so you then have "ever ice cream" as a separate variable and your data is still in long format. Then you could either restructure into horizontal format to take the frequency, use select cases to choose the first session only (assuming everyone had one) or however else you wanted to do it. Truthfully,a simple restructure would likely solve many of your problems. Depending on the analysis you plan to do you may need to do it eventually anyways (i.e. if you left it in that format and ran a simple regression or ANOVA, your results would be invalid, though you could do something like HLM/GEE).

That's just one way to do it. You could also restructure into horizontal format from the get go and use compute statements to create that variable (similar to what you wrote in SQL). It sounds like the main issue you were having was getting SPSS to recognize that even though they are on different lines they are all one subject, and this is an ongoing issue I've ran into as well. Truthfully I think at least in this particular case they made the right call setting it up that way - its irritating from a database management perspective, but from an analysis standpoint it makes sense to assume independence unless otherwise specified. As I note above there are prettier ways of handling this, I just haven't had any need to figure them out yet given I could do the above inside of a minute with a GUI and less if I've got syntax handy.

Hope that helps. If I totally misunderstood what you were trying to do or what the issue was, let me know. I don't get to flex my programmer muscles often anymore (not that this is programming...but that style of thinking) so I actually get a kick out of solving issues like this.
 
As I'm sure you are aware coming from a database background there are always a million different ways to solve any given problem, so keep that in mind too - I'm just going to give the first thing that sprung to mind. I assume Yes/No is dummy coded so I would just use an aggregate function to get the mean of had ice cream for each individual and then do a simple recode it so that 0 = 0 and >0 = 1, so you then have "ever ice cream" as a separate variable and your data is still in long format. Then you could either restructure into horizontal format to take the frequency, use select cases to choose the first session only (assuming everyone had one) or however else you wanted to do it.

Yeah. That is pretty much how I did it in SPSS. Created aggregates. Then chose just session 1 and run a frequency on that. But what a hacky solution! Which ever entry level programmer on my team suggested that would have gotten a long talk on the importance of readibility of solution i.e. the person looking at your code should be easily able to understand your intention.

Maybe I just have to get out of thinking like a team lead in addition to a db programmer 🙂

Truthfully,a simple restructure would likely solve many of your problems. Depending on the analysis you plan to do you may need to do it eventually anyways (i.e. if you left it in that format and ran a simple regression or ANOVA, your results would be invalid, though you could do something like HLM/GEE).

That's just one way to do it. Ysou could also restructure into horizontal format from the get go and use compute statements to create that variable (similar to what you wrote in SQL). It sounds like the main issue you were having was getting SPSS to recognize that even though they are on different lines they are all one subject, and this is an ongoing issue I've ran into as well.

Interesting. I am realizing I don't understand the power of the restructure as yet. In R they have two functions, Melt (makes a row for each variable) and Cast (makes a case from a number of rows that share a case ID) that people keep talking about for problems like this. This is going to be my next task, to understand how these restructure ideas can be exploited. If you have a good resource please share.

Truthfully I think at least in this particular case they made the right call setting it up that way - its irritating from a database management perspective, but from an analysis standpoint it makes sense to assume independence unless otherwise specified.

I agree. This is totally how the data should be stored and collected. How it should be structured for analysis though...that is another question. One of the beauties of memory becoming so cheap is that now we think about saving several copies of the same data in different organizational schemes. One format might be optimized for data collection and storage. Another might be optimized for users to report on. I suspect research data would have similar multi-use characteristics.
 
No real resource suggestions, but the best way to learn SPSS basics that I have found is just to use the context menus and then hit "paste" to generate the syntax. Once you figure it out the first time, problems like the one you describe can be done quickly and easily. I don't recall ever formally "learning" how to do that, just kinda played with it until I figured it out.

Its possible I'm not understanding your real issue with this...why shouldn't it be structured like this for analysis? From a statistical standpoint, this seems the most logical way to do it - your data must be structured according to the analysis you want to run and it maps onto the underlying statistical theory and mathematics pretty neatly. I'm unaware of any stats software (SAS, R, S-Plus) that doesn't operate in this way. I find it very helpful for my data view to visually reflect my analyses - if I want each subject to equal a case, I can make it like that and tell at a glance. For many analyses, I need the data to look like you had it originally and to treat each case separately, and any kind of embedded option "linking" subject IDs would have significant potential to severely mess up analyses in ways that might not always be obvious in the output. How would you want it structured? If I understand how you would set it up, I might either agree and/or be able to provide a rationale for why its done the way it is. SPSS certainly has some (major) flaws when it comes to certain aspects of data manipulation and analysis, but the issues we seem to be talking about are not something I've ever considered one of them. I want my data to visually represent the analyses I'm conducting, and would vastly prefer to do a quick restructure or case select when I want to conduct analyses like you mentioned rather than have some kind of embedded option allowing me to identify subject IDs as "cases" while retaining the stacked format (just a guess - but I think this is how you would prefer it?).

As for having multiple copies in different formats...this is what I usually do for the final database (albeit we're talking the poor man's version of just having two separate saved files so the fact that they aren't linked obviously has enormous drawbacks).
 
Top Bottom