In machine learning, we usually define a cost function that measures our error between our prediction and the ground truth data. And we will use the cost function to build a good hypothesis function, either for classifier or regressor, by minimizing the cost function. We already discuss here, here and here how we can use LSE (Least Square Estimation) to minimize our cost function for regression problem. Those problem are opimization problems, finding the optimal parameter that gives the minimal error of cost function. And now, we will learn another powerful and important method for optimization, which is newton’s method. I will divide this discussion into three parts: (1) newton’s method, (2) taylor series, and (3) newton’s method for optimization.
(1) Newton method – finding root of an equation
Newton method is originally intended to find root of an equation. For example, we have an equation , it is easy to find roots of this equation, by decomposing , we get that the roots of this equation are and . But, what about complex equation to be solved by computer? We can use numerical method to find the root of an equation, one of them is by using Newton’s method. See picture below.
We have like picture above, we can find root location by using Newton’s method. Suppose we are now in axis , first, we will find tangent line of in axis . Tangent line in this picture is the red-dash line. This first tangent line intersects axis in . Then, we find next tangent line of in . This tangent line will intersect axis in . By iterating this process, we will reach the root location. That is the idea how Newton’s method for finding root.
Now, let’s try to derive the mathematical formula of Newton’s method. Tangent line of in is a linear line that is parallel with in axis . In high school mathematics, we already know that to get such linear function, we can find by:
, where gradient is same with gradient of at
To find the value of , we can plug a pair x-y values of to . In this case, our and . Here we go.
After getting , our tangent line becomes.
We know that our tangent line intersects axis at . Thus, by plugging this coordinate to our tangent line function, we get:
Thus, our Newton’s method is by iterating this equation .
(2) Taylor series – approximating a function
Taylor series is an approximation of a function using series expansion. See picture below.
Picture above illustrates approximation of red line using Taylor series with some order (blue line) in . Higher order, more precise it will be. From this illustration, we can approximate any function using Taylor series. Taylor series of function around axis can be written below.
(3) Newton’s method for optimization
After we know how Newton’s method works for finding root, and Taylor series for approximating a function, we will try to expand our Newton’s method for optimization, finding the minimal value of a function. For how it works, see picture below.
Blue line is our cost function, and we want to minimize it until reaching the optimal point. This will be done very similar with what we already discuss in Newton’s method for finding root before. We can say that Newton’s method for finding root, it uses first order method. In this case, tangent line can be considered as first order approximation of . Whereas, Newton’s method for optimization, it use second order method. In other words, we approximate using Taylor series with second order approximation. This approximation is shown with orange and red-dash lines in picture above.
Suppose we are now in , first, we find second order approximation of in , then, find the minimal value location of it. In picture above, it would be at . This can be done by taking first differential, and make it equal to zero. Then, we take next approximation of in , and find the minimal value location of it. Iterate this process until we reach optimal point.
Now, it’s time to derive the mathematical formula of Newton’s method for optimization. 🙂 Second approximation of around axis is as follows.
Then, in order to get the minimal value location of approximation above, we take the first differential, and make it equal to zero. Here we go.
Voila! We just derived our Newton’s method for optimization. To find minimal value of our cost function in machine learning, we can iterate using this equation .