⚠️IMPORTANT UPDATE⚠️ April 14, 2019

In my original version of this post, I highly recommended several DataCamp courses. I am now rescinding my recommendation of this platform. It has come to my attention that they have very poorly handled an issue of sexual assault by one of their executives. They only publicly acknowledged the incident over a year later, after months of pressure from the instructors on the platform. Their public statement indicated that the harasser received little to no consequences for their action and the target of the harassment was not given any resources to make them or others feel more safe (in fact, their note is filled with victim-blaming language). It has also been discovered that their statement has been non-indexed, meaning that it won’t be surfaced in search engines, AND that at least two employees were fired (and offered money to not speak publicly about this incident) for raising concerns inside the company. This entire situation is completely unacceptable and I no longer recommend using the DataCamp platform. Folks are compiling some alternatives here.

Introduction

While working as a marine biologist, I decided to switch career tracks slightly and pursue data science. I already had experience in mathematics, R programming and data visualization but filling in other knowledge gaps led me to searching the internet for the best places to learn these skills.

With the price tags on both formal education and data science bootcamps ranging from $15,000 - $40,000 (after I’ve already completed both a Bachelors and Masters degree) I wanted to look for a more cost-effective solution.

Enter the Open Source Data Science Master’s Program. This fantastic resource lists several books, websites, and free (or low-cost) online courses to help you become a data scientist. While the main project is focused on using python tools and resources, a repository for R tools is also available. You can find other collections like this aimed at helping data-scientists-to-be find the resources most likely to help them succeed in the field.

I am not planning to re-invent the wheel or develop my own data science curriculum, but finding a comprehensive list of places to look for good information has been invaluable for me so far in learning data science and I want to pay it forward.

So here we go. A list (and my comments) on the classes I have taken so far.

Prerequisites

Anyone is welcome to use this list as they wish (obviously), but here are a few notes about my level of baseline knowledge before starting this journey.

I already had a solid background in:

  • Statistics
  • Calculus
  • Mathematics
  • General programming knowledge

and experience in using R for specific purposes. If you don’t have these background skills yet (particularly statistics), I’d highly recommend starting there.

I’d suggest looking for courses offered at Coursera, DataCamp, Udemy, and Khan Academy to fill in those gaps.

Courses I’ve Taken

Getting Started in Data Science

Looking for a general “introduction to the field” class? I took:

  • Data Scientist’s Toolbox by Coursera and Johns Hopkins University
    • My Rating:
    • Difficulty Level: Beginner
    • Cost: $29 (for certificate / free to audit)
    • Time to Complete: Suggested 4 weeks
    • Comments: If you’re brand new to the field, take this course. If you’re already familiar with R and general data science topics, you can skip this one.

Getting Started in R

While I’ve been programming and conducting analysis in R for years, I had only been using it for specific purposes and decided to fill in any knowledge gaps I had with the statistical language. Here’s what I took:

  • R Programming by Coursera and Johns Hopkins University
    • My Rating:
    • Difficulty Level: Intermediate
    • Cost: $49 (for certificate / free to audit)
    • Time to Complete: Suggested 4 weeks
    • Comments: The lectures do a good job of familiarizing you with the basics of R programming, but the assignments often felt far too advanced given the basic subject matter of the course.
  • Introduction to R by DataCamp

    Review removed due to the DataCamp executives’ unwillingness to keep their community safe (see beginning of post for details). If you can afford to do so, please don’t support them. Here’s one of many alternatives.

  • Intermediate R by DataCamp

    Review removed due to the DataCamp executives’ unwillingness to keep their community safe (see beginning of post for details). If you can afford to do so, please don’t support them.

  • Importing and Cleaning Data by DataCamp

    Review removed due to the DataCamp executives’ unwillingness to keep their community safe (see beginning of post for details). If you can afford to do so, please don’t support them.

  • Cleaning Data in R by DataCamp

    Review removed due to the DataCamp executives’ unwillingness to keep their community safe (see beginning of post for details). If you can afford to do so, please don’t support them.

  • Writing Functions in R by DataCamp

    Review removed due to the DataCamp executives’ unwillingness to keep their community safe (see beginning of post for details). If you can afford to do so, please don’t support them. Here’s one of many alternatives.

  • Data Manipulation in R with dplyr by DataCamp

    Review removed due to the DataCamp executives’ unwillingness to keep their community safe (see beginning of post for details). If you can afford to do so, please don’t support them.

RStudio Courses

Until recently, I only ever used R in the command interface, but after these courses on using RStudio and RMarkdown, I’m never looking back. Fun fact: This website was written using RStudio!

  • Working with RStudio IDE by DataCamp

    Review removed due to the DataCamp executives’ unwillingness to keep their community safe (see beginning of post for details). If you can afford to do so, please don’t support them.

  • Reporting with RMarkdown by DataCamp

    Review removed due to the DataCamp executives’ unwillingness to keep their community safe (see beginning of post for details). If you can afford to do so, please don’t support them.

Machine Learning Courses

As I’m sure you’ve read by now, machine learning is an often sought after for data scientists, so having a good foundation is key. I’ve taken:

  • Introduction to Machine Learning by DataCamp

    Review removed due to the DataCamp executives’ unwillingness to keep their community safe (see beginning of post for details). If you can afford to do so, please don’t support them.

  • Machine Learning Toolbox by DataCamp

    Review removed due to the DataCamp executives’ unwillingness to keep their community safe (see beginning of post for details). If you can afford to do so, please don’t support them.

  • Statistical Learning by Stanford University Lagunita
    • My Rating:
    • Difficulty Level: Intermediate - Advanced
    • Cost: FREE
    • Time to Complete: 15+ Hours
    • Comments: While you can complete assignments for credit in this course, I’ve been enjoying it as a YouTube playlist of videos that accompanies the book Introduction to Statistical Learning. The material is thorough and has helped clear up some misconceptions for me, but the pace is a bit slow for my taste.
  • Kaggle R Tutorial for Machine Learning by DataCamp

    Review removed due to the DataCamp executives’ unwillingness to keep their community safe (see beginning of post for details). If you can afford to do so, please don’t support them.

SQL Database

While the Importing and Cleaning Data class from DataCamp covers how to import data from a SQL database into R, most job postings require that you have experience with SQL queries. Here are the courses I’ve taken:

  • Learn SQL by CodeAcademy
    • My Rating:
    • Difficulty Level: Beginner
    • Cost: FREE
    • Time to Complete: 1-2 Hours
    • Comments: This quick interactive course introduces the most basic topics needed for querying SQL databases. If you’ve been using dplyr in R, this should be an easy transition.
  • SQL Table Transformation by CodeAcademy
    • My Rating:
    • Difficulty Level: Intermediate
    • Cost: FREE
    • Time to Complete: 1-2 Hours
    • Comments: The course builds on the learnings from the Learn SQL course. It’s still relatively simple and interactive.
  • SQL Analyzing Business Metrics by CodeAcademy
    • My Rating:
    • Difficulty Level: Intermediate
    • Cost: FREE
    • Time to Complete: 1-2 Hours
    • Comments: The course also builds on the learnings from the Learn SQL course. It’s still relatively simple and interactive. The material included covers more advanced aggregates.

Other Courses

  • Manipulating Time Series Data in R with xts and zoo by DataCamp

    Review removed due to the DataCamp executives’ unwillingness to keep their community safe (see beginning of post for details). If you can afford to do so, please don’t support them.

Wrap-Up

As I said, this is by no means a total list of every course that you should take to become a data scientist, and I plan to make add new courses to this list as I continue learning. Feel free to comment below about some of your favorite (or least favorite) online courses or reach out to me with questions.

Happy learning!