As you’ve probably heard, calculus is imperative for Machine Learning. However, there is a definite emphasis on differentiation compared to integration, so this series of posts will build from simple derivatives to Jacobians and Hessians. Ideally, at the end of this series, if you read a paper that mentions one of the topics above, you’ll have a rough idea of why the authors chose to do what they did and what their choice means for the results.

Background

If you’ve already taken Calculus or Linear Algebra, feel free to skip ahead to the next tutorial, Hessians and Jacobians

1) Derivatives 101

The equation below describes both the equation of a straight line as well as what happens if you take the derivative of that straight line with respect to some input value:

\[\begin{align*} y &= mx + c\\ \frac{d y}{dx} &= m \end{align*}\]

Typically in a calculus class, we’d talk about the rate of change of $y$ with regards to $x$. In other words, how much does $y$ change as $x$ changes? In this case, we see that $y$ changes by a factor of m for every unit that $x$ changes. For the moment, we are focused on scalar values, but this concept will generalize to vectors and matrices (which segues us into….)


2) Linear Algebra 101

Math often deals with the concept of abstraction. For example, we often deal with numbers, e.g., 5 or 100. In Linear Algebra, we are concerned with collections of numbers (vectors), e.g., a collection of (5, 10), or a collection of those collections (matrices), and further abstractions. To make this notion concrete, consider the following example:

2.1) Scalars

Edit: I have no idea if the following examples describe actual streets and avenues, so I’d like to apologize beforehand.

Say that we were somewhere in New York City, which works on a grid system. If I were on 4th and 5th, while you were on 10th and 7th, our (x, y) coordinates could be described as (4, 5) and (10, 7), respectively. Equivalently, our coordinates could be described as the following:

\[\text{My location:=} \begin{pmatrix} 4\\ 5\\ \end{pmatrix}\]

and

\[\text{Your location:=} \begin{pmatrix} 10\\ 7\\ \end{pmatrix}\]

We decide to meet for coffee, but since neither of us drives, we agree to meet in the middle as that is easiest. So, we would meet at:

\[x := \frac{4 + 10}{2} = 7\] \[y := \frac{5 + 7}{2} = 6\]

which corresponds to 7th and 6th (7, 6).

We saw in the computation above that it can be tedious to write out both equations to describe our (x, y) position. This complexity only grows as we add more locations, e.g., what shop; what if we had a compact way of representing my location, your location, and the operation of averaging to determine where we should meet? Here I want to keep two concepts in the back of your mind:

1) The concept of abstraction on scalars.

2) The concept of a coordinate system and what it means for something to be in the coordinate system.

2.2) Abstractions on Scalars: Vectors

At the start of this Linear Algebra review, I said that Linear Algebra is concerned with numbers or collections. So far, we have already discussed one such collection: a coordinate system. In that case, my location is described as the collection of (4, 5), and yours is represented by (10, 7). The top element (4 and 10) represents the street, and the bottom represents the avenue.

Congratulations! We’ve just worked through the concept of a vector, albeit in a particular setting: New York streets and avenues. Let’s take a step back and our locations for what they are: specific instances of an abstract concept. We could just as well write:

\[X_1:= \begin{pmatrix} a\\ b\\ \end{pmatrix}\] \[X_2 := \begin{pmatrix} c\\ d\\ \end{pmatrix}\]

where $X_1$ CAN represent my street-avenue, but it could just as well describe my latitude-longitude or my age-height. Whatever the case, if we are then looking for the average of these two containers, $X_1$ and $X_2$, we can represent them as the following:

\[\text{the middle := } \frac{X_1 + X_2}{2}\]

. This equation holds for the street number and the avenue (our x and y coordinates).

Note, we can add more information, e.g., a Z coordinate, which represents the shop number to meet at, or the corner I’m on, but we do not need to change anything. Our “middle” can still be represented by the same general equation above.

2.2) Abstractions on Scalars and Vectors: Matrices

We can then expand on our scalars and vectors to a collection of collections. Say we had two other friends, all our locations could be described as

\[\text{Us := } \begin{pmatrix} 4 & 6 & 10 & 12\\ 5 & 7 & 7 & 15\\ \end{pmatrix}\]

which would be a matrix. Phew, that was a mouthful.

2.4) The abstracted coordinate system

When we first introduced the idea of vectors, we discussed it in the sense of streets and avenues on New York’s grid system. In that case, our locations would be described by whole numbers (we can’t be at avenue 10.5).

\[\text{My location: } \begin{pmatrix} 4\\ 5\\ \end{pmatrix}\]

However, if we consider latitude and longitude, it makes sense that we can describe those numbers as numbers with some decimal point. For example, this random location I picked in New York has a latitude-longitude of (40.712776, -74.005974).

2.4.1) Counting Numbers

The first example, street-avenue, pertains to the Natural numbers. We say that the street and the avenue, individual elements of our collection, exist in $\mathbb{N}$, the natural numbers (also known as the counting numbers).

2.4.2) Decimal point numbers

In the case of latitude-longitude, the individual elements of our collection exist in $\mathbb{R}$, the real numbers (have a decimal space). We denote these scalar values as elements of the sets of $\in \mathbb{N}$ and $\in \mathbb{R}$ respectively.

2.4.3 Collections of Scalars: Vectors

If we talked about the collection, as opposed to elements within the collection, my street-avenue would then be:

\[\text{My location: } \begin{pmatrix} 4\\ 5\\ \end{pmatrix}\]

such that my location can be described as being in the naturals, $\in \mathbb{N}^2$, a vector of natural numbers. My latitude, longitude can be described as $\in \mathbb{R}^2$, a vector of real numbers. If we then added another number, e.g., the shop that I’m in, we would then have

\[\text{My location: } \begin{pmatrix} 4\\ 5\\ 6 \\ \end{pmatrix}\]

and my location can thus be represented as $\text{my location } \in \mathbb{N}^3$. This same concept extends to matrices. Consider our group of friends from earlier:

\[\text{Us: } \begin{pmatrix} 4 & 6 & 10 & 12\\ 5 & 7 & 7 & 15\\ \end{pmatrix}\]

Our location can then be described as $\text{my location } \in \mathbb{N}^{2 x 4}$. And that’s it for the linear algebra you’ll need for the rest of this series!

3) Further Readings / References

``` 1) Zico Kolter’s Linear Algebra Review and Reference - great professor at CMU, and I found this guide to be handy.

2) 3Blue1Brown’s channel