December 17, 2016

While working as a marine biologist, I decided to switch career tracks slightly and pursue data science. I already had experience in mathematics, R programming and data visualization but filling in other knowledge gaps led me to searching the internet for the best places to learn these skills.

With the price tags on both formal education and data science bootcamps ranging from $15,000 - $40,000 (after I’ve already completed both a Bachelors and Masters degree) I wanted to look for a more cost-effective solution.

Enter the Open Source Data Science Master’s Program. This fantastic resource lists several books, websites, and free (or low-cost) online courses to help you become a data scientist. While the main project is focused on using python tools and resources, a repository for R tools is also available. You can find other collections like this aimed at helping data-scientists-to-be find the resources most likely to help them succeed in the field.

I am not planning to re-invent the wheel or develop my own data science curriculum, but finding a comprehensive list of places to look for good information has been invaluable for me so far in learning data science and I want to pay it forward.

So here we go. A list (and my comments) on the classes I have taken so far.

Anyone is welcome to use this list as they wish (**obviously**), but here are a few notes about my level of baseline knowledge before starting this journey.

I already had a solid background in:

- Statistics
- Calculus
- Mathematics
- General programming knowledge

and experience in using R for specific purposes. If you don’t have these background skills yet (particularly statistics), I’d highly recommend starting there.

I’d suggest looking for courses offered at Coursera, DataCamp, Udemy, and Khan Academy to fill in those gaps.

Looking for a general “introduction to the field” class? I took:

**Data Scientist’s Toolbox**by Coursera and Johns Hopkins University**My Rating**:**Difficulty Level**: Beginner**Cost**: $29 (for certificate / free to audit)**Time to Complete**: Suggested 4 weeks**Comments**: If you’re brand new to the field, take this course. If you’re already familiar with R and general data science topics, you can skip this one.

While I’ve been programming and conducting analysis in R for years, I had only been using it for specific purposes and decided to fill in any knowledge gaps I had with the statistical language. Here’s what I took:

**R Programming**by Coursera and Johns Hopkins University**My Rating**:**Difficulty Level**: Intermediate**Cost**: $49 (for certificate / free to audit)**Time to Complete**: Suggested 4 weeks**Comments**: The lectures do a good job of familiarizing you with the basics of R programming, but the assignments often felt far too advanced given the basic subject matter of the course.

**Introduction to R**by DataCamp**My Rating**:**Difficulty Level**: Beginner**Cost**: FREE**Time to Complete**: 4 Hours**Comments**: This interactive platform is excellent for teaching you the very basics.

**Intermediate R**by DataCamp**My Rating**:**Difficulty Level**: Intermediate**Cost**: $29/month for unlimited access to DataCamp’s entire course library**Time to Complete**: 6 Hours**Comments**: Video courses mixed in with interactive exercises help make these more complicated concepts easier. Also has option of a 2nd practice course.

**Importing and Cleaning Data**by DataCamp**My Rating**:**Difficulty Level**: Beginner**Cost**: $29/month for unlimited access to DataCamp’s entire course library**Time to Complete**: 3 Hours (Total 10 for both parts and case studies)**Comments**: Great overview course to importing any kind of data into R. Also has a Part 2 course and a course with case studies for practice.

**Cleaning Data in R**by DataCamp**My Rating**:**Difficulty Level**: Beginner**Cost**: $29/month for unlimited access to DataCamp’s entire course library**Time to Complete**: 4 Hours**Comments**: If you need help turning a messy dataset into a clean one, this course will do the trick. The focus is primarily on the`tidyr`

package.

**Writing Functions in R**by DataCamp**Data Manipulation in R with dplyr**by DataCamp**My Rating**:**Difficulty Level**: Intermediate**Cost**: $29/month for unlimited access to DataCamp’s entire course library**Time to Complete**: 4 Hours**Comments**: If I could give this class 6 stars, I would. It’s well taught and it certainly kicked off my love of the`dplyr`

package.

Until recently, I only ever used R in the command interface, but after these courses on using RStudio and RMarkdown, I’m never looking back. **Fun fact**: This website was written using RStudio!

**Working with RStudio IDE**by DataCamp**My Rating**:**Difficulty Level**: Beginner**Cost**: $29/month for unlimited access to DataCamp’s entire course library**Time to Complete**: 3 Hours**Comments**: The class was simple and helpful in orienting me to some of the basic features of RStudio.

**Reporting with RMarkdown**by DataCamp**My Rating**:**Difficulty Level**: Beginner**Cost**: $29/month for unlimited access to DataCamp’s entire course library**Time to Complete**: 3 Hours**Comments**: I had no idea how simple RMarkdown could be before this course. Now I use RMarkdown all the time. In fact, each of my blog posts and projects were written using RMarkdown, making report-writing simple.

As I’m sure you’ve read by now, machine learning is an often sought after for data scientists, so having a good foundation is key. I’ve taken:

**Introduction to Machine Learning**by DataCamp**My Rating**:**Difficulty Level**: Intermediate - Advanced**Cost**: $29/month for unlimited access to DataCamp’s entire course library**Time to Complete**: 6 Hours**Comments**: This class gave me a good foundation of*how*to do some machine learning, but some of the theory behind it seemed to be a little lacking.

**Machine Learning Toolbox**by DataCamp**My Rating**:**Difficulty Level**: Intermediate - Advanced**Cost**: $29/month for unlimited access to DataCamp’s entire course library**Time to Complete**: 4 Hours**Comments**: Honestly, I got through this course and didn’t feel like I understood*why*I was being asked to do any of the specific tasks. It was only after completing the course that I realized that the`caret`

package (which they focus on throughout the course) is super useful.

**Statistical Learning**by Stanford University Lagunita**My Rating**:**Difficulty Level**: Intermediate - Advanced**Cost**: FREE**Time to Complete**: 15+ Hours**Comments**: While you can complete assignments for credit in this course, I’ve been enjoying it as a YouTube playlist of videos that accompanies the book Introduction to Statistical Learning. The material is thorough and has helped clear up some misconceptions for me, but the pace is a bit slow for my taste.

**Kaggle R Tutorial for Machine Learning**by DataCamp**My Rating**:**Difficulty Level**: Intermediate**Cost**: FREE**Time to Complete**: 1 Hour**Comments**: This course helps walk you through submitting your first entry to a Kaggle competition. It is fun and very helpful for submitting an entry, but probably not worth your time if you aren’t planning on using Kaggle.

While the Importing and Cleaning Data class from DataCamp covers how to import data from a SQL database into R, most job postings require that you have experience with SQL queries. Here are the courses I’ve taken:

**Learn SQL**by CodeAcademy**My Rating**:**Difficulty Level**: Beginner**Cost**: FREE**Time to Complete**: 1-2 Hours**Comments**: This quick interactive course introduces the most basic topics needed for querying SQL databases. If you’ve been using`dplyr`

in R, this should be an easy transition.

**SQL Table Transformation**by CodeAcademy**My Rating**:**Difficulty Level**: Intermediate**Cost**: FREE**Time to Complete**: 1-2 Hours**Comments**: The course builds on the learnings from the Learn SQL course. It’s still relatively simple and interactive.

**SQL Analyzing Business Metrics**by CodeAcademy**My Rating**:**Difficulty Level**: Intermediate**Cost**: FREE**Time to Complete**: 1-2 Hours**Comments**: The course also builds on the learnings from the Learn SQL course. It’s still relatively simple and interactive. The material included covers more advanced aggregates.

**Manipulating Time Series Data in R with xts and zoo**by DataCamp**My Rating**:**Difficulty Level**: Intermediate**Cost**: $29/month for unlimited access to DataCamp’s entire course library**Time to Complete**: 4 Hours**Comments**: While this course does an OK job of explaining how to use the`xts`

package to analyze time series, the*why*of it isn’t necessarily explained well.

As I said, this is by no means a total list of every course that you should take to become a data scientist, and I plan to make add new courses to this list as I continue learning. Feel free to comment below about some of your favorite (or least favorite) online courses or reach out to me with questions.

**Happy learning!**