Regression Vs Classification

Janani Nagarajan
2 min readApr 4, 2022
image source

Regression analysis and classification are the popular tools used for prediction. Both share the same concept of utilizing the known datasets (also referred as training data sets) to make predictions on the upcoming new data points to come up with a conclusion. Both these problems fall into Supervised machine learning category where the task involves learning a function that maps an inuput to an output based on example input-output pairs that is y=f(x), mathematically known as the problem of function approximation.

In machine learning the task involves developing a model that learns from historical data which enables it to make predictions on new instances. Essentially the way we determine whether a task is classification or regression is by the output value. Values of the output are discrete labels ( Male/Female, Yes/No, True/False, etc.) in classification whereas they are integer/float in regression.

Some algorithms are either exclusively for regression style problems such as linear regression models and some algorithms are exclusively for classification taks such as logistic regression. However there are some algorithms that can overlap once small modifications are made to them such as Decision trees.

The objective of these problems is to approximate the mapping function (f) as accurately as possible such that whenever there is a new input data(x), output variable (y) for the dataset can be predicted. In Regression, the nature of the predicted data will be in some sequence (ordered) whereas the nature of the predicted data will be unordered in case of classification models.

Examples of classification models could be predicting if a person has the disease or not, predicting whether the factory is still profitble or not, predicting whether the disease is more prone among the male/female, and so on where the output falls under discrete lables that are set based on the pre-existing data values.

Examples of regression models could be prdicting the price of a land based on one or more independent variables, predicting the optimal production quantity in order to maximize the revenue of an industry, and so on where the output is plotted in a graph and generalised as an equation that can be further used for the upcoming new dats sets.

Hope this article adds some value to you. Happy learning!

--

--