How Logistic Regression Works for Classification (with Maximum Likelihood Estimation Derivation)

Logistic regression is an extension of regression method for classification. In the beginning of this machine learning series post, we already talked about regression using LSE here. To use regression approach for classification,we will feed the output regression $Y$ into so-called activation function, usually using sigmoid acivation function. See piture below.

Sigmoid function will have output with s-shape like picture above whose output range is from zero to one. For classification, logistic regression is originally intended for binary classification. Regarding picture above, our output regression $Y$ is fed sigmoid activation function. We will classify input to $class_1$ when the output is closed to 1 (formally when $output >0.5$) and classify to $class_2$ when the output is closed to 0 (formally when $output \leq 0.5$) To do that, we can achieve by maximizing out likelihood using MLE (Maximum Likelihood Estiamtion). Continue reading “How Logistic Regression Works for Classification (with Maximum Likelihood Estimation Derivation)”

How k-NN (k-Nearest Neighbors) Works for Classification

We already know that classification problem is predicting given input data into certain class. The simplest and most naive method is nearest neighbor. Given data training with class label, nearest neighbor classifier will assign given input data to the nearest data label. It can be done by using euclidean distance. Here is the illustration.

Introduction to Classification and Confusion Matrix

In this machine learning and pattern recognition series, we already talk about regression problem that the output prediction is in continuous value. In machine learning, predicting output in discrete value given input is called classification. For two possible outputs, we usually call it as binary classification. For example, predicting that certain bank transaction is fraud or not, predicting that the cancer is benign or malignant, predicting that tomorrow will be raining or not, and so on. Whereas, for more than two possible outputs, we call it multi-class classification. For example, classification in hand gesture recognition whether the hand is moving right, left, bottom or up, classifying digit number 0 to 9, and so on.

Even thought classification is similar with regression, and the difference is only that classification output is discrete, whereas regression output is continuous, we can’t use exactly same method of regression for classification. The reason are : (1) it will perform bad when we classify given input to many classes, and (2) it lacks robustness to outliers. To use regression approach for classification, we need so-called activation function. Such method is called logistic regression, and we will talk later here. It is called logistic “regression” because we use similar way with what we did in regression here, but instead of taking output $y$ as prediction output, we feed the output $y$ into logistic function. The logistic function that is often used is sigmoid function. Furthermore, even its name uses “regression”, it is for classification problem, not regression problem. Continue reading “Introduction to Classification and Confusion Matrix”