Supervised learning is a general term for any machine learning technique that attempts to discover the relationship between a data set and some associated labels for prediction. In regression, the labels are continuous numbers. This course will focus on classification, where the labels are taken from a finite set of numbers or characters. The prototypical and perhaps most well-known example of classification is image recognition. The goal is to take an image (represented by its pixel values) and determine what objects are in the image. Is it a dog? A grapefruit? A stop sign?
There are many practical classification tasks, such as determining whether an individual's financial history makes them high risk for a loan, whether there is a defect in a material based on some sensor readings, or whether a new email is spam or not. These problems share the same basic form and can be solved with many different types of mathematical, statistical, and probabilistic models developed by the machine learning community.
In this course, you will explore several powerful and commonly utilized techniques for supervised learning. You will implement each of these techniques using the free and open-source statistical programming language R with real-world data sets. The focus will be on making these methods accessible for you in your own work.
You are required to have completed the following courses or have equivalent experience before takingthis course:
- Understanding Data Analytics
- Finding Patterns in Data Using Association Rules, PCA, and Factor Analysis
- Finding Patterns in Data Using Cluster and Hotspot Analysis
- Regression Analysis and Discrete Choice Models