Best computer science skills/language to learn?

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

blissworm

Full Member
2+ Year Member
Joined
Nov 20, 2020
Messages
79
Reaction score
49
I got bored and decided to browse job listings for my specialties of interest and saw a lot of clinical postings and research hospitals or university hospitals that include research opportunities with clinical work were asking for bioinformatics and data science skills. Guess my dad was right, I should get come CS skills. Which makes sense because my fields of interest intersect with public health and epidemic work. Only issue is I dropped computer science because I just sucked at it. However I want to try again.

What skills and/or language is best for me to learn? I want to do some research with my clinical work and want to be proficient in bioinformatics and AI/machine learning because it seems to be the future of research. Where do I start, what should my goal be? Python or Java or R? Should I learn linux, whatever that is?

Members don't see this ad.
 
Python. R is mostly just stats and you can also do stats in python. Also there’s great libraries for bioinformatics work in python
 
Members don't see this ad :)
I'd say Python as well, followed by R if you delve into a second language. I've seen a few too many people doing stats in Python that would have been much better served (both in simplicity and performance) by using R. I'm very much a fan of using the right tool/language for the task rather than doing everything with the same language.
 
The most important thing is to fully grasp logical flow, data structures, and the important algorithms. If you get these 3 down, then most of the skills transfer between languages, and the learning curve goes down drastically.

I'd say learn Python first to grasp these 3 concepts (best exercise is to implement data structures and algorithms on your own!), then maybe learn a compiled language like Java. This will allow you to do build a surprising number of programs that you can customize to your specific lab needs.

TL;DR: Python to learn DS&A and programming logic -> (Maybe learn a compiled language like Java) --> Mess around with machine learning libraries like TensorFlow or PyTorch using Youtube/PDFs.

You can learn R to a comfortable level within a week after getting the basics down with Python.
 
Where do I start, what should my goal be? Python or Java or R?

All three are useful.
Should I learn linux, whatever that is?
If you have the time, it would be great. Since most Linux distributions have free versions, these are popular in the sciences. With that said, finding drivers and compatibility with professional software packages can be a pain. Unix based operating systems are also used commonly in academia. Mac OS X (all variants) and Mac OS 11 are all Unix based but come with nicely packaged GUIs and software support. If you are pressed for time, switch to Mac (if you aren't already using one). It is quite intuitive and easy to learn.
 
I was a CS major. If you're organizing and running large studies (BIG DATA), then you should focus on R. If you're dealing with proteins, python should be more suited. Regardless, Python and R are very similar with python being slightly more "traditional." If you learn one, you should be able to easily grasp the other.
 
As others have said Python and R. Also my bioinformatics lab has a central project that's done in C++ and distributed to other researchers, but most of the testing is still done in Python and R.
 
tips for learning cs if i already tried and it just didnt settle in my head? ways to go about it, tricks of the trade etc
 
The only way to truly learn programming is through doing projects. You can start with Khan Academy projects, if they offer R or python. Otherwise, you can google programming projects for beginners online, preferably ones that have YouTube videos to reference.
 
If you have the time, it would be great. Since most Linux distributions have free versions, these are popular in the sciences. With that said, finding drivers and compatibility with professional software packages can be a pain. Unix based operating systems are also used commonly in academia. Mac OS X (all variants) and Mac OS 11 are all Unix based but come with nicely packaged GUIs and software support. If you are pressed for time, switch to Mac (if you aren't already using one). It is quite intuitive and easy to learn.

You can also just use WSL/WSL2 with Windows 10 now and have a native Linux distribution to run native Linux binaries under. I've been able to replace my Linux VM with WSL2 now that Docker has added WSL2 engine support.
 
I got bored and decided to browse job listings for my specialties of interest and saw a lot of clinical postings and research hospitals or university hospitals that include research opportunities with clinical work were asking for bioinformatics and data science skills. Guess my dad was right, I should get come CS skills. Which makes sense because my fields of interest intersect with public health and epidemic work. Only issue is I dropped computer science because I just sucked at it. However I want to try again.

What skills and/or language is best for me to learn? I want to do some research with my clinical work and want to be proficient in bioinformatics and AI/machine learning because it seems to be the future of research. Where do I start, what should my goal be? Python or Java or R? Should I learn linux, whatever that is?

R is pretty useful
 
Members don't see this ad :)
Python >> R. Make sure you know numpy (and about pandas/scipy/sklearn) and then you have everything R has but in a real language. I will not consider any applicants who only have R on their cv.

Edit: To expand since it made it seem like I arbitrarily hate students who use R. Given that the majority of ML research is done in python's C extensions (or julia in a couple years hopefully), they will be at a significant handicap compared to someone who can work in python. If I ask a student to run some experiment with a recent architecture/etc, I'll point them towards the actual code they need to use in python and then they just need to modify it to fit their needs; I can't do that if they only now R, and the research will be significantly slowed due to the language mismatch friction.

To put in perspective, R only recently got access to pytorch (Introducing torch for R) which is what the majority of research code uses at major conferences for the past few years.
 
Last edited:
Python >> R. Make sure you know numpy (and about pandas/scipy/sklearn) and then you have everything R has but in a real language. I will not consider any applicants who only have R on their cv.

Edit: To expand since it made it seem like I arbitrarily hate students who use R. Given that the majority of ML research is done in python's C extensions (or julia in a couple years hopefully), they will be at a significant handicap compared to someone who can work in python. If I ask a student to run some experiment with a recent architecture/etc, I'll point them towards the actual code they need to use in python and then they just need to modify it to fit their needs; I can't do that if they only now R, and the research will be significantly slowed due to the language mismatch friction.

To put in perspective, R only recently got access to pytorch (Introducing torch for R) which is what the majority of research code uses at major conferences for the past few years.

Are you speaking about applicants for employment rather than applicants for medical school? I think that you need to make that clear before some people go right off a cliff after reading this.
 
Python >> R. Make sure you know numpy (and about pandas/scipy/sklearn) and then you have everything R has but in a real language. I will not consider any applicants who only have R on their cv.

Edit: To expand since it made it seem like I arbitrarily hate students who use R. Given that the majority of ML research is done in python's C extensions (or julia in a couple years hopefully), they will be at a significant handicap compared to someone who can work in python. If I ask a student to run some experiment with a recent architecture/etc, I'll point them towards the actual code they need to use in python and then they just need to modify it to fit their needs; I can't do that if they only now R, and the research will be significantly slowed due to the language mismatch friction.

To put in perspective, R only recently got access to pytorch (Introducing torch for R) which is what the majority of research code uses at major conferences for the past few years.

Are you talking about recruiting post-docs? This is a premed forum...
 
Are you speaking about applicants for employment rather than applicants for medical school? I think that you need to make that clear before some people go right off a cliff after reading this.
Are you talking about recruiting post-docs? This is a premed forum...
To address both of these: It's regarding med students asking to do research in my lab. I often collaborate with the med school on the ML side of health research so it's not uncommon to get these requests. The point still stands that R is generally a half decade behind in ML methods, which is an important issue for blissworm's question.
 
What resources would you guys recommend to learn Python (or any language) from scratch? It's hard to figure how to dive in when there's so much out there. What's the best way to start learning so that one keeps at it?
 
What resources would you guys recommend to learn Python (or any language) from scratch? It's hard to figure how to dive in when there's so much out there. What's the best way to start learning so that one keeps at it?
Really depends on your current time situation. I'm a fan of the MIT open courses (Introduction to Computer Science and Programming in Python) to avoid any significant holes in your learning while doing it at your own pace. Complete the 'projects' along with the course and you will be set as the rest of 'data science' is just math.
 
To address both of these: It's regarding med students asking to do research in my lab. I often collaborate with the med school on the ML side of health research so it's not uncommon to get these requests. The point still stands that R is generally a half decade behind in ML methods, which is an important issue for blissworm's question.

So if I was a med student wanting to do research in your lab, me knowing Python (but not linear algebra, probability, optimization, ML, etc) would be enough for me to meaningfully contribute to your research? I genuinely ask this bc I may be interested in such research as a med student
 
So if I was a med student wanting to do research in your lab, me knowing Python (but not linear algebra, probability, optimization, ML, etc) would be enough for me to meaningfully contribute to your research? I genuinely ask this bc I may be interested in such research as a med student
I really don't want to discourage you, but it would be challenging to do get your foot in the door without being conceptually familiar with linear algebra, calc, and probability. The bare minimum would be what derivatives/gradients/inner products/matrix multiplication represent and basic knowledge about distributions (which you already know from the required stats stuff).

I have an MD colleague who is learning ML (sitting in on lab meetings, reading groups, etc) and they're really raking in the pubs within the medical domain using ML, so it might be a worthwhile depending on what you want to do post med school (fellowship, academic positions, etc).
 
I'm going from tech to healthcare. I agree that python or R are good to get you started, and data structures and algorithms can help you think about big data, but you could get away without knowing that much. I feel like the important thing to learn is how you think about logic and explaining tradeoffs between different decisions. I would definitely talk to people in the field to see what tech stacks they are using, whether they're using certain libraries or infrastructure. Building ML models can be technical and mathematical, but there are also a lot of non-ML computational problems as well. A good place to start is learning about linear and logistic regression, ideas of underfitting and overfitting, cross validation, tuning models, etc. Medium has a lot of good articles with code that you can interact with and easy to understand explanations.
 
Top