Regression vs Classification in Machine Learning

What is Machine Learning?

Machine Learning is a subset of Artificial Intelligence which lets the machine to discover pattern from the data, draw insights and helps in decision making.

The mathematical model is built and it is trained on the sample data. Different types of learning are Supervised Learning, Unsupervised Learning, semi-supervised learning, Reinforcement Learning, Self-Learning, etc.

In Supervised Learning, the prediction model is built from the set of training data which is labeled (i.e. data contains both the input and output). Regression algorithms and Classification algorithms are the types of supervised learning.

What is Regression in Machine Learning

Francis Galton coined the term “Regression” in the context of biological phenomenon. The work was later extended to general statistical context by Karl Pearson and Udny Yule. Regression analysis is the statistical method that derives the relationship and determines the strength among dependent variables and one or more independent variables. Regression analysis also indicates the impact of independent variables on the dependent variable.

The meaning of Regression is any procedure that tries to find the relationship between variables.

When Regression is used?

Regression is used when we want to predict the output variable which is continuous or a real value.

For example:

Predicting the Price of House
Predicting the Height of the person
Predicting the salary of the person
Predicting Temperature

There are different Regression modeling techniques. Among all Linear Regression is the most popular algorithm. It is the simplest form of Regression.

The Linear Regression is represented by y=ax+b+e where a is the slope of the line and b is the intercept of the line and e is the error.

In the diagram below blue dots are the observed data points and the red line is the line of best fit.

There are many different types of Regression algorithm like Linear Regression, Polynomial Regression, Lasso Regression, Ordinal Regression, Quantile Regression, ElasticNet Regression, Stepwise Regression, Poisson Regression, Cox Regression etc.

In multiple regression, there is more than 1 independent variable.

What is Classification in Machine Learning?

The Classification Algorithm is used when we want to predict the output variable which is discrete. The dependent variable is predicted by analyzing the dependent variables.

The main goal of classification is to identify the category of the dependent variables based on training data.

For Example:

Classification of fruits (by analyzing the properties – color, size, texture, etc.)

Classification of Animals (input images)

Face Recognition

Email spam identification

Sentiment Analysis

The Different Classification Algorithms are:

Logistic Regression (Linear Classifier)
Naïve Bayes
Nearest Neighbour
Support Vector Machine
Decision Tree
Boosted Trees
Random Forest
Neural Networks etc.

5 MACHINE LEARNING ALGORITHMS EVERY DATA SCIENTIST MUST KNOW BY HEART

The Logistic Regression finds the probability of a certain class or event. Using Logit Function, it simply predicts the probability of the occurrence of an event. Suppose if we want to find whether the person is diabetic or not based on his age, Blood pressure (bp), and sex.

More formally it can be written as

P(Disease=Age|(Blood Pressure)BP|sex)

In this example Diabetes is dependent variable and age, BP, sex is the independent variable.

A Problem where the outcome is of two classes is known as a binary classification problem.

A Problem where the outcome is more than two classes is known as Multi-class classification.

A problem where a data point is assigned multiple labels is known as Multi-Label Classification Problem

The Types of Logistic regression is

Binary Logistic Regression: The Outcome is of two classes. E.g.: spam or not spam
Multinomial Logistic Regression: More than two outcome classes without any order E.g.: shape – rectangle, round, triangle
Ordinal Logistic Regression: More than two outcome classes with ordering. E.g.: Grades – Distinction, First class, Second class

To select the right algorithm for modeling it is very important to understand whether the problem is a classification problem or a Regression problem.

Performance Evaluation of Classification and Regression:

It is very important to evaluate the performance of the model. Both classification and Regression has various methods, formulas and techniques to evaluate the performance of an Algorithm.

Performance metrics for Regression:

The different metrics for Regression problems are:

Mean Absolute Error (MAE): average squared difference between the estimated values and the actual value.
Root Mean Squared Error (RMSE): Difference between the predicted values and the observed values. It measures the spread of the residuals.
R – squared: known as the coefficient of determination which tells the percentage of points falls on the regression line.
Adjusted R square: It indicates how well the data points fir the curve. It considers the significant data points only.

Performance Metrics For Classification :

To calculate the performance different metrics are used but apart of metrics specific data is required to calculate the performance of the model that is True positive, True Negative, False positive, False Negative. To get visual matrix python provides a confusion matrix which is a ski kit-learn library.

Based on this information we calculate:

Accuracy: Accuracy is a number of predictions our model got correct.

Accuracy = Correct Predictions / Total Number of Predictions

Precision: Ratio of Correct positive observations to total predicted positive observation.

Precision = TP/TP+FP

Recall: It is the ratio of positively predicted observations to actual observations. The recall is also known as sensitivity.

Recall = TP/TP+FN

Specificity: It measures the True Negative Rate.

F1 score: It is a harmonic mean of Precision and Recall.

ROC/AUC curve: It shows the performance of the model at thresholds by plotting a graph of True positive rate against False positive rate. AUC is the Area under the ROC curve.

Log loss: It measures the performance where the prediction input is a probability value between 0 and 1.

Classification Vs Regression

PARAMETER	CLASSIFICATION	REGRESSION
Prediction	The output variable is discrete in nature	The output variable is continuous in nature
Find	Decision boundary	Best Fit line
Output Data	Unordered	Ordered
Evaluation	calculate accuracy	Calculate the sum of squared errors, R- squared
Example Algorithms	logistic regression, Decision Tree, Random Forest etc	Linear Regression, Polynomial Regression etc.

To choose the best model for your specific use case it is really important to understand the difference between the Classification and Regression problem as there are various parameters on the basis of which we train and tune our model.

If you want to Learn Machine Learning then join Gyansetu’s Machine Learning with Python Training Course. or Call Us at 8130799520

iClass Gyansetu

iClass Gyansetu

Regression vs Classification in Machine Learning

What is Machine Learning?

What is Regression in Machine Learning

When Regression is used?

What is Classification in Machine Learning?

The Types of Logistic regression is

Classification Vs Regression

Recent Blogs

FOLLOW US

Useful links

Contact with Us

Newsletter