Deriving Polynomial Regression with Regularization to Avoid Overfitting

After we discuss about polynomial regression here using LSE (Least Square Error), we know that higher order of polynomial model has more capability to fit more complex data points, but more prone to be overfitting.  Picture below illustrates that red line (using high order) exactly fit those blue dot points, but will give big error, such as in axis 0.9. That is what we called overfitting (away too fit data training). In this case, green line is better, that has more general model to represent those data points.

We can avoid overfitting by using so-called regularization. How does it work? Usually, a function is prone to be overfitting when its coefficients (weighting values) has big value and not well distributed. Thus, we will force our training process to make those coefficients small by adding a term in our cost function. This process also makes those coefficients more well distributed. Here is our new cost function. Continue reading “Deriving Polynomial Regression with Regularization to Avoid Overfitting”