Regression Vs Classification In Machine Learning

Gyansetu Team Updated on July 3, 2024 Data Science

Machine Learning with R Programming course in gurgaon

What is Machine Learning?

Machine Learning, a subset of Artificial Intelligence, enables machines to uncover patterns, gain insights, and aid in decision-making by analyzing data.

Mathematical models are trained on sample data, using techniques like Supervised, Unsupervised, and Reinforcement Learning.

Gyansetu offers a Machine Learning with R Programming course in Gurgaon for hands-on learning.

What is Regression in Machine Learning

Francis Galton coined the term “Regression” in the context of biological phenomenon. The work was later extended to general statistical context by Karl Pearson and Udny Yule. Regression analysis is the statistical method that derives the relationship and determines the strength among dependent variables and one or more independent variables. Regression analysis also indicates the impact of independent variables on the dependent variable.

The meaning of Regression is any procedure that tries to find the relationship between variables.

When Regression is used?

Regression is used when we want to predict the output variable which is continuous or a real value.

For example:

Predicting the Price of House
Predicting the Height of the person
Predicting the salary of the person
Predicting Temperature

There are different Regression modeling techniques. Among all Linear Regression is the most popular algorithm. It is the simplest form of Regression.

The Linear Regression is represented by y=ax+b+e where a is the slope of the line and b is the intercept of the line and e is the error.

In the diagram below blue dots are the observed data points and the red line is the line of best fit.

There are many different types of Regression algorithm like Linear Regression, Polynomial Regression, Lasso Regression, Ordinal Regression, Quantile Regression, ElasticNet Regression, Stepwise Regression, Poisson Regression, Cox Regression etc.

In multiple regression, there is more than 1 independent variable.

What is Classification in Machine Learning?

The Classification Algorithm is used when we want to predict the output variable which is discrete. The dependent variable is predicted by analyzing the dependent variables.

The main goal of classification is to identify the category of the dependent variables based on training data.

For Example:

Classification of fruits (by analyzing the properties – color, size, texture, etc.)

Classification of Animals (input images)

Face Recognition

Email spam identification

Sentiment Analysis

The Different Classification Algorithms are:

Logistic Regression (Linear Classifier)
Naïve Bayes
Nearest Neighbour
Support Vector Machine
Decision Tree
Boosted Trees
Random Forest
Neural Networks etc.

5 MACHINE LEARNING ALGORITHMS EVERY DATA SCIENTIST MUST KNOW BY HEART

The Logistic Regression finds the probability of a certain class or event. Using Logit Function, it simply predicts the probability of the occurrence of an event. Suppose if we want to find whether the person is diabetic or not based on his age, Blood pressure (bp), and sex.

More formally it can be written as

P(Disease=Age|(Blood Pressure)BP|sex)

In this example Diabetes is dependent variable and age, BP, sex is the independent variable.

A Problem where the outcome is of two classes is known as a binary classification problem.

A Problem where the outcome is more than two classes is known as Multi-class classification.

A problem where a data point is assigned multiple labels is known as Multi-Label Classification Problem

The Types of Logistic regression is

Binary Logistic Regression: The Outcome is of two classes. E.g.: spam or not spam
Multinomial Logistic Regression: More than two outcome classes without any order E.g.: shape – rectangle, round, triangle
Ordinal Logistic Regression: More than two outcome classes with ordering. E.g.: Grades – Distinction, First class, Second class

To select the right algorithm for modeling it is very important to understand whether the problem is a classification problem or a Regression problem.

Performance Evaluation of Classification and Regression:

It is very important to evaluate the performance of the model. Both classification and Regression has various methods, formulas and techniques to evaluate the performance of an Algorithm.

Performance metrics for Regression:

The different metrics for Regression problems are:

Mean Absolute Error (MAE): average squared difference between the estimated values and the actual value.
Root Mean Squared Error (RMSE): Difference between the predicted values and the observed values. It measures the spread of the residuals.
R – squared: known as the coefficient of determination which tells the percentage of points falls on the regression line.
Adjusted R square: It indicates how well the data points fir the curve. It considers the significant data points only.

Performance Metrics For Classification :

To calculate the performance different metrics are used but apart of metrics specific data is required to calculate the performance of the model that is True positive, True Negative, False positive, False Negative. To get visual matrix python provides a confusion matrix which is a ski kit-learn library.

Based on this information we calculate:

Accuracy: Accuracy is a number of predictions our model got correct.

Accuracy = Correct Predictions / Total Number of Predictions

Precision: Ratio of Correct positive observations to total predicted positive observation.

Precision = TP/TP+FP

Recall: It is the ratio of positively predicted observations to actual observations. The recall is also known as sensitivity.

Recall = TP/TP+FN

Specificity: It measures the True Negative Rate.

F1 score: It is a harmonic mean of Precision and Recall.

ROC/AUC curve: It shows the performance of the model at thresholds by plotting a graph of True positive rate against False positive rate. AUC is the Area under the ROC curve.

Log loss: It measures the performance where the prediction input is a probability value between 0 and 1.

Classification Vs Regression

PARAMETER	CLASSIFICATION	REGRESSION
Prediction	The output variable is discrete in nature	The output variable is continuous in nature
Find	Decision boundary	Best Fit line
Output Data	Unordered	Ordered
Evaluation	calculate accuracy	Calculate the sum of squared errors, R- squared
Example Algorithms	logistic regression, Decision Tree, Random Forest etc	Linear Regression, Polynomial Regression etc.

To choose the best model for your specific use case it is really important to understand the difference between the Classification and Regression problem as there are various parameters on the basis of which we train and tune our model.

If you want to Learn Machine Learning then join Gyansetu’s Machine Learning with Python Training Course. or Call Us at 8130799520

Regression Vs Classification In Machine Learning

What is Machine Learning?

What is Regression in Machine Learning

When Regression is used?

What is Classification in Machine Learning?

The Types of Logistic regression is

Classification Vs Regression

Leave a Comment Cancel Reply

Related Articles

Top 5 Jobs in Trend after Lockdown

Top 5 Most in demand IT Job Skills You Need for the Future

How Can I Build A Data Science Portfolio Without Job Experience?