Introduction to Supervised Learning in Machine Learning: - My journey for becoming a Data Scientist

When I started learning Machine Learning, I was found it confusing, especially when I want to use the many algorithms, techniques, and methods which are useful for different types of the datasets. How should I use them? So, I decided to put everything in order. First, let’s talk about what exactly ML does.

What is Machine Learning?

Machine learning (ML) is the study of computer algorithms that improve automatically through experience. ML is a set of algorithms that classify or predict new data based on information obtained from a training set. In other words, we train the machine with our sample data then the machine classifies the new data based on the information gained.

There are 4 types of the machine learning algorithms.

Supervised Learning
Unsupervised Learning
Semi Supervised leaning
Reinforcement Learning

In this post I am going to talk about supervised learning and different types of algorithms and techniques that are used to solve problems. What is the Supervised Learning?

The Supervised works based on the input and output variables. In Supervised learning we use the algorithm to learn the mapping function from the input to the output variables. This algorithm has a target variable which is to be predicted from a given set of other independent variables. Supervised Learning is divided in two group based on the target variables.

Regression: The target variable is continuing. The goal in the regression is to predict the target value. There many types of the regression, in the list below are the names of some of them.

Linear Regression
Multilinear Regression
Polynomial Regression
Ridge Regression
Lasson Regression
Elastic Net Regression

Classification: The target variable is categorical. Classification is a predictive modeling problem where a class label is predicted for testing data. There are 4 main types of the classification.

1.Binary Classification: It refers to those classification tasks that the target variable has two class labels. Popular algorithms that can be used are:

Logistic Regression
K-Nearest Neighbors
Decision Trees
Support Vector Machine
Naive Bayes

2 .Multi Classification: It refers to those classification tasks that the target variable has more than two class labels. In the list below some of the popular algorithms are coming.

K-Nearest Neighbors
Decision Trees
Naive Bayes
Random Forest
Gradient Boosting

The binary classification algorithms can be used for multi-class problems. To do so, we need to fit multiple binary classification models. We have two options, first option is to fit one model for each class vs. all other classes, second option is to fit one model for each class vs. every other class separately.

One-vs-Rest: Fit one binary classification model for each class vs. all other classes.
One-vs-One: Fit one binary classification model for each pair of classes.

Binary classification algorithms that can use these strategies for multi-class classification include:

Logistic Regression.
Support Vector Machine.

3 .Multi Label Classification: It refers to those classification tasks that the target variable has two or more class labels, and maybe for one sample more than one class label is predicted.

Multi-label Decision Trees
Multi-label Random Forests
Multi-label Gradient Boosting

Classification algorithms used for binary or multi-class classification cannot be used directly for multi-label classification. Specialized versions of standard classification algorithms can be used.

4 .Imbalanced Classification: It refers to classification tasks where the number of examples in each target variable labels are unequal.

Cost-sensitive Logistic Regression.
Cost-sensitive Decision Trees.
Cost-sensitive Support Vector Machines.

Under sampling the majority class or oversampling the minority class in the training data are two ways that are used to change the composition of samples. Here, Listed below, are examples of specialized techniques for under sampling and oversampling.

Random Under sampling.
SMOTE Oversampling.

[1]