Statistics software

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

pointodr

Full Member
7+ Year Member
Joined
Jan 2, 2015
Messages
48
Reaction score
1
Hi all

I have some experience with statistics and want to brush up and become proficient in one or more useful languages / software packages. What do you all find are used most frequently in basic science and medical research?

Thanks

Members don't see this ad.
 
I always use SPSS (my school lets us get free subscriptions) but I've heard of people using R instead.
 
The ones I've seen used most are SAS and Stata. That being said R is free and seems to be the direction most younger people are moving in.

Sent from my SAMSUNG-SM-G920A using SDN mobile
 
Members don't see this ad :)
I learned R and it's handy having a free program that is so powerful. It also has vast amounts of online support, and is being updated frequently. That being said it's IMO a pretty tough system to pick up without some computer background. I now use STATA and personally prefer it. I think the way the do-files are set up are slick and the language feels more intuitive to me. SAS is similar but I don't have much experience with it.


Sent from my iPhone using SDN mobile app
 
RealStats for excel is very easy to use. I've used it for many manuscripts - it's pretty robust for a free program. So is MiniTab.


Sent from my iPhone using SDN mobile
 
I always use SPSS (my school lets us get free subscriptions) but I've heard of people using R instead.
The ones I've seen used most are SAS and Stata. That being said R is free and seems to be the direction most younger people are moving in.

Sent from my SAMSUNG-SM-G920A using SDN mobile
I learned R and it's handy having a free program that is so powerful. It also has vast amounts of online support, and is being updated frequently. That being said it's IMO a pretty tough system to pick up without some computer background. I now use STATA and personally prefer it. I think the way the do-files are set up are slick and the language feels more intuitive to me. SAS is similar but I don't have much experience with it.


Sent from my iPhone using SDN mobile app

So just to clarify, what are the pros and cons of using R vs Stata vs SAS? I have used R and Stata in research and classwork and don't have much of a preference for either. Is R generally preferred because it's free and regularly updated?
 
So just to clarify, what are the pros and cons of using R vs Stata vs SAS? I have used R and Stata in research and classwork and don't have much of a preference for either. Is R generally preferred because it's free and regularly updated?


Some people seem to think it's a bit more user friendly than SAS. I don't know all of the capabilities of SAS, but it also seems like you can do some other non stats stuff with it. I was trained in SAS and am just getting started learning R, but I like it quite a bit. No experience with STATA.

I also like than I can easily run R on my Mac, whereas SAS required me to set up a virtual machine on my mac to run windows.

In my grad program (epidemiology) people seem to think R > or = STATA > SAS

I used SPSS a bit in undergrad. Thought it was ok.
 
just as an aside, coursera has free online training for at least R from Hopkins and a few other places. I think I've also seen free options for SAS and STATA to learn online.
 
So just to clarify, what are the pros and cons of using R vs Stata vs SAS? I have used R and Stata in research and classwork and don't have much of a preference for either. Is R generally preferred because it's free and regularly updated?
IMO they will all be more than sufficient for what a med student researcher needs to do, so it's probably best to just work with the one you prefer.

The place I think SAS really shines over Stata is cleaning and manipulating data. You can have numerous large datasets loaded at once and it's very easy to play around with them until you build the set of variables you want. For something like Medicare data where you have numerous different files containing various pieces of data for the same patients or physicians it makes it pretty simple to merge them together. In Stata you get to have one open at a time. The SAS language and formatting are kind of picky, but not that hard once you get the hang of it. The other thing SAS has going for it is that the previous generation of epi/biostats people were all trained to work with it , so it makes it a little easier to communicate with them if you understand SAS commands and output.

Stata is a little more user friendly IMO. The interface is easy to navigate and the language is more straightforward (at least to me). It also does some things very easily that I think are a pain in the ass in SAS (collapsing across observations, destringing variables). Neither SAS nor Stata makes beautiful base graphics but they can fine with some work (Stata imo is better in this regard).

I've used R much less than the other two, so I can't comment much on pros and cons. New statistical methods show up in R before the others since users can write their own programs. It also makes nice figures, and as mentioned before, it's becoming much more common to learn so in the future it may be beneficial to know it in the way SAS is now. I believe most of the bioinformatics people use R exclusively now (via bioconductor project).

Sent from my SAMSUNG-SM-G920A using SDN mobile
 
just as an aside, coursera has free online training for at least R from Hopkins and a few other places. I think I've also seen free options for SAS and STATA to learn online.
R and Python seem to be the most flexible and free solutions with packages and seems the way the world is going. There is also a free edx course on python, R and Machine learning /Data Analysis.
 
RealStats for excel is very easy to use. I've used it for many manuscripts - it's pretty robust for a free program. So is MiniTab.


Sent from my iPhone using SDN mobile
If you don't have 500k plus data rows and you have it in one file Excel is the right answer most of the time.
 
R and Python seem to be the most flexible and free solutions with packages and seems the way the world is going. There is also a free edx course on python, R and Machine learning /Data Analysis.


Yeah, edx and coursera are really picking up a bunch of free courses in bioinformatics type stuff, stats, and data science in addition to courses on various programming languages. Pretty cool.
 
Yeah, edx and coursera are really picking up a bunch of free courses in bioinformatics type stuff, stats, and data science in addition to courses on various programming languages. Pretty cool.
I completed the R one on Coursera and it was OK , they lock the quizzes if you don't pay, but honestly just going through the R tutorial package is good practice for basic operations. I am slugging through the data science /machine learning in Python after completing the Python basics Edx MOOC. The Edx stuff seems better to me and there is no content lockout.
 
I completed the R one on Coursera and it was OK , they lock the quizzes if you don't pay, but honestly just going through the R tutorial package is good practice for basic operations. I am slugging through the data science /machine learning in Python after completing the Python basics Edx MOOC. The Edx stuff seems better to me and there is no content lockout.

Thanks for the info. Was going to do one of those as soon as s finish grad school. I've noticed edx seems to have a wider/better selection available.
 
Best one is R - because (a) it is free, and (b) it is what everyone uses for bioinformatics.

Edit: on R, download the ggplot2 package as well. It is used for all the pretty graphs you ever see - great to learn early.
 
Last edited:
Some people seem to think it's a bit more user friendly than SAS. I don't know all of the capabilities of SAS, but it also seems like you can do some other non stats stuff with it. I was trained in SAS and am just getting started learning R, but I like it quite a bit. No experience with STATA.

I also like than I can easily run R on my Mac, whereas SAS required me to set up a virtual machine on my mac to run windows.

In my grad program (epidemiology) people seem to think R > or = STATA > SAS

I used SPSS a bit in undergrad. Thought it was ok.
just as an aside, coursera has free online training for at least R from Hopkins and a few other places. I think I've also seen free options for SAS and STATA to learn online.
IMO they will all be more than sufficient for what a med student researcher needs to do, so it's probably best to just work with the one you prefer.

The place I think SAS really shines over Stata is cleaning and manipulating data. You can have numerous large datasets loaded at once and it's very easy to play around with them until you build the set of variables you want. For something like Medicare data where you have numerous different files containing various pieces of data for the same patients or physicians it makes it pretty simple to merge them together. In Stata you get to have one open at a time. The SAS language and formatting are kind of picky, but not that hard once you get the hang of it. The other thing SAS has going for it is that the previous generation of epi/biostats people were all trained to work with it , so it makes it a little easier to communicate with them if you understand SAS commands and output.

Stata is a little more user friendly IMO. The interface is easy to navigate and the language is more straightforward (at least to me). It also does some things very easily that I think are a pain in the ass in SAS (collapsing across observations, destringing variables). Neither SAS nor Stata makes beautiful base graphics but they can fine with some work (Stata imo is better in this regard).

I've used R much less than the other two, so I can't comment much on pros and cons. New statistical methods show up in R before the others since users can write their own programs. It also makes nice figures, and as mentioned before, it's becoming much more common to learn so in the future it may be beneficial to know it in the way SAS is now. I believe most of the bioinformatics people use R exclusively now (via bioconductor project).

Sent from my SAMSUNG-SM-G920A using SDN mobile
R and Python seem to be the most flexible and free solutions with packages and seems the way the world is going. There is also a free edx course on python, R and Machine learning /Data Analysis.

Thanks for the replies! In that case, I'll start relearning and focusing primarily on R for statistics work.
 
Thanks for the replies! In that case, I'll start relearning and focusing primarily on R for statistics work.
I would go for Python with matplotlib, pandas , sckitlearn packages.
 
Mini tab and SPSS have an interface that is easy to use (both look/operate similar to Excel)

If you feel comfortable coding, R and Stata are great options.

Although each of the software packages has different strengths and weaknesses, most biostaticians use SAS. However, SAS has a steep learning curve that is likely not worth the hassle for most people
 
Personally, I like SAS the most; haven't used R as much so I can't reasonably comment on it, but it seems to be popular and very good as well. I personally tend to like writing code over using a GUI for stats.

SPSS seems to be the standard in a lot of labs in my field, but I personally find it to be a bit clunky (probably because it runs as sort of a GUI/code hybrid that doesn't always quite fit together, IMHO). It's good for lots of basic stuff though.

If you are going to be joining a research group or working with an attending on a project, it's best to use whatever they already use so that when you run into problems, it's easier to troubleshoot.
 
Top