Data Science Specialization by Johns Hopkins University

Data Science Specialization by Johns Hopkins University

The first MOOC I've ever taken is the Data Science Specialization by Johns Hopkins University and the classes have exceeded my expectations thus far.

I've always had to analyze business and product data and metrics, run experiments, and do A/B tests but wanted to get into deeper inferential and predictive modeling as well as pick up R as another tool in my belt. This class series offered it all.

Course Overview

In short, this program teaches the following skills:

  • R programming and important R packages
  • Data analysis as a formal and repeatable scientific discipline
  • An end to end foundation from raw data to statistical analysis to formal presentation
  • Specific methodologies in inferential statistics, regression, and machine learning using R

The specialization is composed of nine classes, each four weeks, and a capstone project of eight weeks.

I have a math and computer science background as well as a practical engineering background in numerous languages I elected to take many of the classes concurrently. I took the introductory class by itself, then took the next four courses concurrently -- which turned out to be a mistake as I'll explain below.

General Observations

I'd like to provide some general observations that you will not get from the description.

  • I picked up R independently through online tutorials and applied R to my work. However, this course taught me to use R more effectively by approaching the problem from a perspective different then a traditional imperative language.
  • The instructors are hackers and are credible and provide good guidance. They are solid practitioners in programming and in data science.
  • The lectures do not have all the information required to complete tests and projects. You'll need to leverage resources to problem solve -- search online, the student forum, etc. If you don't have a programming background, give yourself extra time.
  • The scientific approach is a really great formalism and worthwhile even when operating outside of research.
  • The classes do take effort. If you are uninitiated to programming and/or stats the provided time commitment estimates are likely realistic.
  • If you want to acquire a solid base then you'll want to watch the lectures, do the projects, and do the extra credit projects (such as the swirl courses). So make sure to set aside time if you've got a busy work schedule and family like I do.
  • The instructors are talented and their toolchains and approaches are worth modeling.
  • If you use Linux for your personal machine like I do, give yourself some extra time. Some of the quizzes and projects ask you to install R packages... and some of those packages have external dependencies that you have to apt-get... and many times figuring out how to resolve the dependency errors requires research.
  • The projects are rooted in real world application, the practice is relevant.
  • Another good reason to practice is if you program in other languages. I have an inclination on how to build an algorithm that loops through a matrix to clean and summarize data, but in R you can use sub-setting and reducing functions to get to a result more concisely. I'd say in in R you can get closer to asking the language "what" you want to do instead of "how."

In Closing

The data science specialization has been an awesome experience so far and a great introduction to MOOCs for me. I'd recommend it and I'm hoping other available courses are just as good. Remember, you'll get out what you put in.