We already learn how we can use newton method to minimize error iteratively here. In this post, we will discuss about how we can use gradient descent method to minimize error interatively. When we later talk about neural network (deep neural network will be deep learning), we can say that understanding gradient descent is half of understanding neural network. And in fact, gradient descent is really easy to understand, likewise neural network. It’s true and not exaggerated 😀
Let’s talk about how gradient descent works first. This is also used in other machine learning methods, such as in logistic regression for binary classification here. Let’s assume we have function shown picture below to be minimized.