A while ago, a friend of mine sent me a small handbook of mathematics for AI & machine learning, and when scanning the book I was quite shocked when the author equates gradient with derivative. Unfortunately, such misunderstanding is not so uncommon amongst the AI/ML community as I later discovered during conversations with others AI practitioners.
Numerous explanations are available on the internet to help clarify the difference and I will not reinvent the wheel here. However, I find it’s useful to highlight some important distinctions that I myself was sometimes guilty of.
The key takeaways are:
- The derivative describes the rate of change of a function. When evaluated at some specific value of the variable of interest, the returned value is a number (or if you prefer a more fancy term scalar).
- The gradient on other hand is a vector function or vector field. When evaluated at some specific values of the variables, the returned quantity is a vector. The gradient is related to the derivative though. In fact, each component of the gradient vector is the partial derivative with respect to one variable.
If a function has more than one input (or independent) variables, we can calculate its partial derivative with respect to one specific variable. Still when evaluated at some specific value of that variable, the returned value is a number.
The gradient on other hand is a vector function, thus when evaluated at some specific values of input variable the returned value is a vector. One useful property of this returned vector is that it’s the direction of steepest ascent (meaning the function increases most rapidly if you move into that direction by a tiny amount). Because of this, the additive inverse of the gradient is used to minimize the loss for a machine learning model and find the best possible combination of weights. Hence the corresponding algorithm is called gradient descent.
I hope this post can set some light on the matter. If you want to learn more about direction derivative and gradient, this online free ebook can provide you with a more in-depth explanation and formal notation.
The book that I mentioned at the beginning of this article is published on a quite popular website about AI/ML in Vietnam and you can find the related term appears near the very beginning of section 3.1 here and it was repeated several times throughout the book. For which, I have contacted them for correction and hopefully they make a change in (the) near future.