Preliminary Mathematics for online MSc programmes in Data AnalyticsUnit 2: Differentiation in 1D (minima and maxima)
Differentiation
Introduction to differentiation
We are often interested in the rate at which some variable is changing. For example, we may be interested in the rate at which the temperature is changing in a chemical reaction or in the rate at which the pressure in a vessel is changing. Rapid rates of change of a variable may indicate that a system is not operating normally and is approaching critical values.
Rates of change may be positive, zero, or negative. A positive rate of change means that the variable is increasing; a zero rate of change means that the variable is not changing; while a negative change of rate means that the variable is decreasing.
Consider the function for , shown below.
Between and , the function is decreasing rapidly. Across this interval the rate of change of the function is large and negative. Between and the function is still decreasing but not as rapidly as before. Across this interval the rate of change of the function is small and negative. There is a small interval, that the function seems to not change at all. Across that interval the rate of change is zero. Between and the function is increasing rapidly; the rate of change is large and positive.
It is often not sufficient to describe a rate of change as "large and positive" or "small and negative". A precise value is needed. The technique for calculating the rate of change of any function is called differentiation. Use of differentiation provides a precise value or expression for the rate of change of a function.
Average rate of change across an interval
We have already seen that a function can have different rates of change at different points on its graph. Let's first define and calculate the average rate of change of a function across an interval and later on we will also define the rate of change at a point. The figure below shows a function ; two possible argument values, and , and their two respective outputs and .
Consider that is increasing from to . The change in is . As increases from to , then the function increases from to . The change in is . Then the average rate of change of across the interval is
Another way to think of the average rate of change of a function is by visualising it as the slope of a line that passes through two points on the function. This line, called a secant line, can be drawn on a graph of a function so that we can quantify the value of the slope of the line. A secant line passing through the points and has a vertical rise of and a horizontal run of . The slope of the line, between the points and , is (which is exactly the same as the average rate of change).
Let's calculate the average rate of change of across the following intervals
(a) to (b) to
For the first interval the change in is equal to . When , ; while when , . Thus, the change of is . So, the avarage rate of change across the interval is . What does this mean though? It means that across the interval , on average the value increases by for every unit increase in .
This is a good time for you to try out the second interval. (The average rate of change turns out to be -2.)
Rate of change at a point
We often need to know the rate of change of a function at a point, and not simply an average rate of change across an interval. Let's assume that is really close to . To better reflect this is our notation, we will call what we used to call , , and what we used to call , , with being a very small number.
As mentioned earlier, the average rate of change of across the interval is
What do you think would happen if we assumed that the distance, , between the two points was made increasingly small (in Mathematics notation )?
If we assumed that, it would mean that the second point is really close to . This is exactly what we will assume in order to find the rate of change at the point . Let's say that we assumed that . If we now focus again on the graph above and assume that , the distance between the two points and would get smaller and likewise the difference between their respective outputs, and , would also get smaller. We can define those respective differences as and respectively. The term reads as "delta x" and represents a small change in the direction. In our case and .
Thus, the rate of change at a point is
Let's look at a couple of examples first and then focus on terminology and notation.
One of the simplest functions to consider is a linear function. Let's assume that we have .
What should we do if we want to find the rate of change at any point of the function? (We want to essentially answer the question "What is the change in the direction when the change in the direction is small")
Let's use the definition we saw earlier and calculate the rate of change at any point of the function (think of it as looking at the two points and with ).
Wait. The rate of change for the function at any point is 2? What does that mean?
It means that the value increases by for every small increase, , in . So it doesn't matter which value we are looking at (e.g. or ); the value will always increase by for every small increase, , in (i.e. or where ).
For non-linear functions a one unit increase in the value of leads to different increases in .
Consider a quadratic function .
Before we use the previous definition and calculate the rate of change at any point, let's try something else.
What will happen to the values:
-
if and we increase it by unit (i.e. )? The values will increase by (i.e. ).
-
if and we increase it by unit (i.e. )? The values will increase by (i.e. ).
-
if and we increase it by unit (i.e. )? The values will increase by (i.e. ).
Thus, in a quadratic function a unit increase in leads to different increases in the values.
Let's now use the definition to find out what is happening in the values when is increased by with (instead of being increased by ).
So, the rate of change for the function at a point is . This means that the value increases by for every small increase, , in . Thus, the rate of change along a quadratic function is changing constantly (according to the value of we are looking at), the rate of change has to be computed separately at each possible value of . The rate of change is thus a local phenomenon: it does not give us any information about the rate of change globally.
Note that the rate of change, , for the function is itself a function of .
Terminology and notation
The process of finding the rate of change of a given function is called differentiation. The function is said to be differentiated. If (read " is equivalent to ") is a function of we say that is differentiated with respect to . The rate of change of a function is also known as the derivative of the function.
There is a notation for writing down the derivative of a function. If the function is , we denote the derivative of by
(read "dee y (by) dee x", "dee f of x dee x" and "f prime").
This is the point where you should start asking yourselves "Wait a minute, do I have to compute every time I need to find the derivative of a function at a point ?". Thankfully, the answer is no.
Table of derivatives
Table 1 lists some of the common functions used in Mathematics and Statistics and their corresponding derivatives. The symbols and are constants while the symbol represents a variable.
Function | Derivative |
---|---|
constant | |
Find the derivative of .
We note that is of the form where . This means that .
Find the derivative of .
This function is constant, hence its derivative is zero.
Find the derivative of .
This function is of the form with and , hence its derivative is .
Find the derivative of .
We first rewrite the function as . This means that the function is of the form with and . This means that .
Find the derivative of .
We first rewrite the function as . This means that the function is of the form with and . This means that .
Find the derivative of .
This function is of the form with , hence its derivative is .
Ok, that is a good start but what do we do with functions like , and ?
The first function involves adding two functions (the first one being of the form while the second one is a constant function).
The second function, , involves multiplying two functions ( and ) while the last one, , involves dividing two functions ( and ).
We need to introduce some simple rules to enable us to extend the range of functions that we can differentiate.
Rules of differentiation
-
Differentiation is linear: For any functions and and any real numbers and , the derivative of the function with respect to is
-
Product rule: For any functions and the derivative of a function with respect to is
-
Quotient rule: For any functions and the derivative of a function , where , with respect to is
-
Chain rule: The derivative of the function of a composite function with respect to is What is a composite function you ask? It is a function that takes another function as its argument. So, instead of having a function that has as its input, we have a function which takes as its input. Thus, it becomes .
Function | Derivative |
---|---|
Find the derivative of .
This function is of the form with , , and . Hence, and , which yields
(We could have also used , , and .)
Find the derivative of .
This function is of the form with , , and . Hence, and , which yields
Find the derivative of .
This function is of the form with , , and . Hence, and , which yields
Find the derivative of .
This function is of the form with and . Hence, , and
Find the derivative of .
This function is of the form with and . Hence, , and
Find the derivative of .
This function is of the form with and . Hence, , and
Find the derivative of .
This function is of the form with and . Hence, , and
Find the derivative of .
We could first expand the power, but this would be extremely time-consuming. It is a lot easier to view with and .
The formula for the derivative of is , i.e. we have to first differentiate both and yielding and .
Next we need to find , i.e. we have to use the value of as the input to . We do this by replacing every occurence of in by :
Thus,
Find the derivative of .
We will again use the chain rule and write with and . Then and and thus
Higher-order derivatives
So far we have only looked at the first derivative. The derivative of is known as the second derivative and denoted as
While the first derivative contains information about the rate of change, which corresponds to the slope of a function, the second derivative contains information about the curvature.
Find the first and second derivative of .
The first derivative is . To find the second derivative, we differentiate the first derivative once more.
Approximating functions using derivatives
We can use derivatives to approximate functions. We have already seen that the derivative gives the slope of the function at .
The line that corresponds to the slope at is actually a function itself. The purple line in the figure above is graph of the function
This function provides a local approximation to the function around . The approximation touches the function at so that both functions take the same value and have the same slope at .
Can we do a better job at approximating the function ? The answer is yes: we can include a term involving the second derivative, so that we also match the curvature in .
We can keep adding higher-order derivatives in order to improve the approximation, which is known as Taylor series (or Maclaurin series) approximation.
Tasks
Find the derivative of ?
We can differentiate the two terms independently of each other with the second term having a derivative of zero. Thus,
Find the derivative of ?
Let and be functions that are differentiable everywhere.
Suppose that , , , and .
Use this information to determine the value of , where .
The slope of the curve is zero at which value(s) of ?
We first need to find . Setting and solving for yields .
What is the derivative of ?
VideoVideo model answersDuration2:00
is of the form . Using the quotient rule gives the derivative .
What is the derivative of ?
Using the product rule,
Self help
Basic differentiation: a refresher
Maximum and minimum values
Local and global extrema
The maximum and minimum values of a function are often very important. For example, we may want to know what value a parameter needs to take so that an algorithm performs best, as measured by an objective function.
It is important to distinguish between two types of maxima and minima.
- Local maxima (minima) are points at which the function takes larger (smaller) values than in its vicinity.
- Global maxima (minima) are points at which the function takes its largest (smallest) value.
If it is clear from the context, local maxima (minima) are often just referred to as maxima (minima), without prefixing them by the word "local".
Differentiation and stationary points
Differentiation can be used to find the maximum and minimum values of a function. Since the derivative provides information about the slope (or gradient) of the graph of a function we can use it to locate points on a graph where the slope is zero. We will see that such points are often associated with the largest or smallest values of the function, at least in their immediate locality.
Consider the function . We may be interested in finding its maximum and minimum values.
We can see in the graph that the slope of the function is in and . Such points at which the slope to the graph is horizontal, thus zero, are called stationary points. You can also say that the rate of change of a function at stationary points is zero.
At the local maximum at the function takes a larger value than in its vicinity. Note that this is not a global maximum, as the function takes larger values for large (as as ). At the local minimum at the function takes a smaller value than in its vicinity. Again, this is not a global minimum as the function takes smaller values for small (as as ).
You can probably notice by looking at the graph that the curve actually turns at the stationary points. As an example, let's focus on the local maximum at . We can see that the curve goes up right before it reaches the local maximum and then it goes down. The exact opposite happens at the local minimum. Thus, these stationary points are also referred to as turning points.
Drawing a graph of a function as above will reveal its behaviour, but if we want to know the precise location of such points we need to turn to algebra and differential calculus.
We have seen that the local maximum and local minimum are stationary points, i.e. we can find their exact location by solving the equation .
Solving
yields the roots and , which correspond to the local maximum and minimum, respectively.
All turning points are stationary points; but not all stationary points are turning points. Can you draw a graph of a function that has a stationary point that is not a turning point? (Hint: Try to create a graph that around a specific point it has a slope of zero but the behaviour of the curve is the same around that point.)
Distinguishing between stationary points
Think about what happens to the slope of the graph from Example 20 as we travel through the minimum turning point, from left to right, that is as increases. To the left of the local minimum, right before , the slope is negative; then the slope becomes zero, and right after the minimum point the slope becomes positive. In other words, the slope is increasing as increases. In other words, the second derivative is positive.
To summarise, if we want to find maximum or minimum values we can:
-
locate the position of stationary points, let's say , by looking for points where , and
-
calculate the second derivative at those values (i.e. ).
If the second derivative is positive, then the stationary point is a minimum. If the second derivative is negative, then the stationary point is a maximum.
It is possible for second derivative (at a stationary point) to be equal to zero; in that case we do not have sufficient information about what kind of stationary point it is.
Tasks
Can you find the stationary points of the following functions and distinguish between them?
(a)
(b)
(c)
(d)
VideoVideo model answersDuration2:34
(a) . Setting yields . , thus there is a local (and also global) minimum at .
(b) . As we have factorised the already, we know that it is zero for and .
Taking the second derivative gives . As , there is a local minimum at . As , there is a local maximum at .
(c) . is thus zero for and .
, thus and . Hence there is a local minimum at . We don't know yet about . Taking the third derivative gives and thus , thus there is a saddle point at .
We can confirm this by plotting the function.
(d) , hence the derivative is zero for , and .
The second derivative is yielding (local minimum at ), (local maximum at ) and (local minimum at ).