Nowadays, even the most sophisticated tools that use AI are favored by the optimization algorithm, Gradient Descent. From ChatGPT to Midjourney, the concept is applicable due to the central theme in Machine Learning: minimizing the cost function.

This post aims to give readers a general overview of this tool, which exists beyond the field of Machine Learning but plays a crucial role, being continuously mentioned in tutorials, books, or videos when the topic is Artificial Intelligence.

Gradient

Before diving into the understanding of Gradient Descent, we need to understand what a Gradient is. Simply put, it is the slope of a curve at a given point in a specific direction.

In cases where the function is univariate, the gradient is given by the first derivative of the function. For multivariable cases, it becomes a vector of partial derivatives (since what matters is the slope in a specific direction).

Gradient Descent

Gradient Descent is an algorithm used in the optimization of neural networks, and today it is one of the most popular (Ruder, 2017). Understanding how it works will allow the reader to grasp how models function at their deepest level, rather than seeing them as a black box. This can be advantageous for several reasons, including improvement and debugging.

How Does Gradient Descent Work?

First, it is necessary to consider that there are two conditions we depend on to calculate the Gradient of a function:

  1. The function is differentiable.
  2. The function is convex.

In their work (Du, et al. 2016), they prove that it is possible to reach the global minimum even for functions that do not satisfy the second condition.

In the mathematical field of optimization, the cost function is a metric that maps an event to a real value, which in this case would be the cost. An optimization problem is characterized precisely by the reduction of this real value; therefore, the gradient descent algorithm plays a fundamental role in this process.

We can conclude that gradient descent is an optimization technique specialized in machine learning algorithms, which consist of minimizing a metric by which we can evaluate the performance of statistical models. Even the most modern tools today rest on this simple yet powerful idea.

Mentioned Blogs

Papers:

Extra:

Wikipedia:

Appendix

Fig.1 The Gradient for a Function with n Dimensions. (Image Source: https://en.wikipedia.org/wiki/Gradient)

Fig.2 When the function is not convex, there is no guarantee that the global minimum will be found. (Image Source: https://ai.plainenglish.io/navigating-the-terrain-convex-vs-non-convex-functions-in-optimization-86812e9a1989?gi=8ecc65586601)

Fig.3 Comparison of Cost Functions Used in Regression Problems. (Image Source: [https://en.wikipedia.org/wiki/Loss_function#/media/File:Comparison_of_loss_functions.png(https://en.wikipedia.org/wiki/Loss_function#/media/File:Comparison_of_loss_functions.png))

Fig.4 Operation of Gradient Descent. (Image Source: https://halfrost.me/post/how-to-understand-gradient-descent/)