Lagrange multipliers Part One: A much simpler setting

I am currently in northern Hungary for our annual winter school for some of the strongest young school-aged mathematicians in the UK and Hungary. We’ve had a mixture of lectures, problem-solving sessions and the chance to enjoy a more authentic version of winter than is currently on offer in balmy Oxford.

One of my favourite aspects of this event is the chance it affords for the students and the staff to see a slightly different mathematical culture. It goes without saying that Hungary has a deep tradition in mathematics, and the roots start at school. The British students observe fairly rapidly that their counterparts have a much richer diet of geometry, and methods in combinatorics at school, which is certainly an excellent grounding for use in maths competitions. By contrast, our familiarity with calculus is substantially more developed – by the time students who study further maths leave school, they can differentiate almost anything.

But the prevailing attitude in olympiad circles is that calculus is unrigorous and hence illegal method. The more developed summary is that calculus methods are hard, or at least technical. This is true, and no-one wants to spoil a measured development of analysis from first principles, but since some of the British students asked, it seemed worth giving a short exposition of why calculus can be made rigorous. They are mainly interested in the multivariate case, and the underlying problem is that the approach suggested by the curriculum doesn’t generalise well at all to the multivariate setting. Because it’s much easier to imagine functions of one variable, we’ll develop the machinery of the ideas in this setting in this post first.

Finding minima – the A-level approach

Whether in an applied or an abstract setting, the main use of calculus at school is to find where functions attain their maximum or minimum. The method can be summarised quickly: differentiate, find where the derivative is zero, and check the second-derivative at that value to determine that the stationary point has the form we want.

Finding maxima and finding minima are a symmetric problem, so throughout, we talk about finding minima. It’s instructive to think of some functions where the approach outlined above fails.

In the top left, there clearly is a minimum, but the function is not differentiable at the relevant point. We can probably assert this without defining differentiability formally: there isn’t a well-defined local tangent at the minimum, so we can’t specify the gradient of the tangent. In the top right, there’s a jump, so depending on the value the function takes at the jump point, maybe there is a minimum. But in either case, the derivative doesn’t exist at the jump point, so our calculus approach will fail.

In the middle left, calculus will tell us that the stationary point in the middle is a ‘minimum’, but it isn’t the minimal value taken by the function. Indeed the function doesn’t have a minimum, because it seems to go off to $-\infty$ in both directions. In the middle right, the asymptote provides a lower bound on the values taken by the function, but this bound is never actually achieved. Indeed, we wouldn’t make any progress by calculus, since there are no stationary points.

At the bottom, the functions are only defined on some interval. In both cases, the minimal value is attained at one of the endpoints of the interval, even though the second function has a point which calculus would identify as a minimum.

The underlying problem in any calculus argument is that the derivative, if it exists, only tells us about the local behaviour of the function. At best, it tells us that a point is a local minimum. This is at least a necessary condition to be a global minimum, which is what we actually care about. But this is a change of emphasis from the A-level approach, for which having zero derivative and appropriately-signed second-derivative is treated as a sufficient condition to be a global minimum.

Fortunately, the A-level approach is actually valid. It can be shown that if a function is differentiable everywhere, and it only has one stationary point, where the second-derivative exists and is positive, then this is in fact the global minimum. The first problem is that this is really quite challenging to show – since in general the derivative might not be continuous, although it might have many of the useful properties of a continuous function. Showing all of this really does require setting everything up carefully with proper definitions. The second problem is that this approach does not generalise well to multivariate settings.

Finding minima – an alternative recipe

What we do is narrow down the properties which the global minimum must satisfy. Here are some options:

0) There is no global minimum. For example, the functions pictured in the middle row satisfy this.

Otherwise, say the global minimum is attained at x. It doesn’t matter if it is attained at several points. At least one of the following options must apply to each such x.

1) $f'(x)=0$ ,

2) $f'(x)$ is not defined,

3) x lies on the boundary of the domain where f is defined.

We’ll come back to why this is true. But with this decomposition, the key to identifying a global minimum via calculus is to eliminate options 0), 2) and 3). Hopefully we can eliminate 2) immediately. If we know we can differentiate our function everywhere, then 2) couldn’t possibly hold for any value of x. Sometimes we will be thinking about functions defined everywhere, in which case 3) won’t matter. Even if our function is defined on some interval, this only means we have to check two extra values, and this isn’t such hard work.

It’s worth emphasising why if x is a local minimum not on the boundary and f'(x) exists, then f'(x)=0. We show that if $f'(x)\ne 0$ , then x can’t be a local minimum. Suppose f'(x)>0. Then both the formal definition of derivative, and the geometric interpretation in terms of the gradient of a tangent which locally approximates the function, give that, when h is small,

$f(x-h) = f(x)-h f'(x) +o(h),$

where this ‘little o’ notation indicates that for small enough h, the final term is much smaller than the second term. So for small enough h, $f(x-h)<f(x)$ , and so we don’t have a local minimum.

The key is eliminating option 0). Once we know that there definitely is a global minimum, we are in a good position to identify it using calculus and a bit of quick checking. But how would we eliminate option 0)?

Existence of global minima

This is the point where I’m in greatest danger of spoiling first-year undergraduate course content, so I’ll be careful.

As we saw in the middle row, when functions are defined on the whole real line, there’s the danger that they can diverge to $\pm \infty$ , or approach some bounding value while never actually attaining it. So life gets much easier if you work with functions defined on a closed interval. We also saw what can go wrong if there are jumps, so we will assume the function is continuous, meaning that it has no jumps, or that as y gets close to x, f(y) gets close to f(x). If you think a function can be differentiated everywhere, then it is continuous, because we’ve seen that once a function has a jump (see caveat 2) then it certainly isn’t possible to define the derivative at the jump point.

It’s a true result that a continuous function defined on a closed interval is bounded and attains its bounds. Suppose such a function takes arbitrarily large values. The main idea is that if the function takes arbitrarily large values throughout the interval, then because the interval is finite it also takes arbitrarily large values near some point, which will make it hard to be continuous at that point. You can apply a similar argument to show that the function can’t approach a threshold without attaining it somewhere. So how do you prove that this point exists? Well, you probably need to set up some formal definitions of all the properties under discussion, and manipulate them carefully. Which is fine. If you’re still at school, then you can either enjoy thinking about this yourself, or wait until analysis courses at university.

My personal opinion is that this is almost as intuitive as the assertion that if a continuous function takes both positive and negative values, then it has a zero somewhere in between. I feel if you’re happy citing the latter, then you can also cite the behaviour of continuous functions on closed intervals.

Caveat 2) It’s not true to say that if a function doesn’t have jumps then it is continuous. There are other kinds of discontinuity, but in most contexts these are worse than having a jump, so it’s not disastrous in most circumstances to have this as your prime model of non-continuity.

Worked example

Question 1 of this year’s BMO2 was a geometric inequality. I’ve chosen to look at this partly because it’s the first question I’ve set to make it onto BMO, but mainly because it’s quite hard to find olympiad problems which come down to inequalities in a single variable.

Anyway, there are many ways to parameterise and reparameterise the problem, but one method reduces, after some sensible application of Pythagoras, to showing

$f(x)=x+ \frac{1}{4x} + \frac{1}{4x+\frac{1}{x}+4}\ge \frac{9}{8},$ (*)

for all positive x.

There are simpler ways to address this than calculus, especially if you establish or guess that the equality case is x=1/2. Adding one to both sides is probably a useful start.

But if you did want to use calculus, you should argue as follows. (*) is certainly true when $x\ge \frac{9}{8}$ and also when $x\le \frac{2}{9}$. The function f(x) is continuous, and so on the interval $[\frac{2}{9},\frac{9}{8}]$ it has a minimum somewhere. We can differentiate, and fortunately the derivative factorises (this might be a clue that there’s probably a better method…) as

$(1-\frac{1}{4x^2}) \left[ 1 - \frac{4}{(4x+\frac{1}{x}+4)^2} \right].$

If x is positive, the second bracket can’t be zero, so the only stationary point is found at x=1/2. We can easily check that $f(\frac12)=\frac98$ , and we have already seen that $f(\frac29),f(\frac98)>\frac98$ . We know f attains its minimum on $[\frac29,\frac98]$ , and so this minimal value must be $\frac98$, as we want.

Overall, the moral of this approach is that even if we know how to turn the handle both for the calculation, and for the justification, it probably would be easier to use a softer approach if possible.

Next stage

For the next stage, we assess how much of this carries across to the multivariate setting, including Lagrange multipliers to find minima of a function subject to a constraint.

Eventually Almost Everywhere

A blog about probability and olympiads by Dominic Yeo

Lagrange multipliers Part One: A much simpler setting

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply