# Lagrange multipliers Part Two

My own question on last week’s BMO2 notwithstanding, inequalities seem out of fashion at the moment among mainstream international olympiads. Such problems often involve minimising some function subject to a constraint, and word has, over the years, filtered down to students interested in such things, that there’s a general method for achieving this via Lagrange multipliers. The motivation for my talk in Hungary, summarised by the previous post and this one, is to dispute the claim made by some of the UK students that these are hard to justify rigorously. I dispute this because I don’t think it’s qualitatively much harder to justify Lagrange multipliers rigorously than an unconstrained optimisation problem, whereas I would claim instead that Lagrange multipliers are merely unlikely to work at a computational level on the majority of olympiad problems.

Unconstrained optimisation in two variables

Before we can possibly discuss constrained optimisation, we should discuss unconstrained optimisation. That is, finding minima of a function of several variables. We don’t lose too much by assuming that our function f(x,y) depends on two variables.

Recall that our method in the previous post for justifying the A-level approach to minima was to find a necessary condition to be a local minimum, and also a general reason why there should be a global minimum. That way, if there are finitely many points satisfying the condition, we just check all of them, and the one with the smallest value of f is the global minimum. We’ll discuss the existence of the global minimum later.

If we hold one coordinate fixed, the local variation of a function is equivalent to the one-dimensional case.

$f(x+h,y)-f(x,y)= h\frac{\partial f}{\partial x}(x,y)+O(h^2).$

In general we want to vary both variables, which is fine since

$f(x+h,y+\ell)-f(x+h,y)=\ell \frac{\partial f}{\partial y}(x+h,y) + O(h^2).$

But since we really want everything to be determined by the function at (x,y), we really want

$\frac{\partial f}{\partial y}(x+h,y) \approx \frac{\partial f}{\partial y}(x,y),$

and so we be mindful that we may have to assume that both partial derivatives are continuous everywhere they exist. Once we have this though, we can rewrite as

$f(x+h,y+\ell) - f(x,y)= h\frac{\partial f}{\partial x}(x,y) + \ell \frac{\partial f}{\partial y}(x,y) + O(h^2)$

$= (\frac{\partial f}{\partial x}(x,y) ,\frac{\partial f}{\partial y}(x,y) )\cdot (h,\ell) + O(h\vee \ell^2).$

In particular, if we define grad of f to be $\nabla f(x,y)=(\frac{\partial f}{\partial x}(x,y), \frac{\partial f}{\partial y}(x,y) )$ and apply a similar argument to that which we used in the original setting. If $\nabla f(x,y)\ne 0$, then we can choose some small $(h,\ell)$, such that $f(x+h,y+\ell). Thus a necessary condition for (x,y) to be a local minimum for f is that $\nabla f=0$.

Lagrange multipliers

This is natural time to discuss where Lagrange multipliers emerge. The setting now is that we still want to minimise some function $f(\mathbf{x})$, but only across those values of $\mathbf{x}$ which satisfy the constraint $g(\mathbf{x})=0$.

But our approach is exactly the same, namely we find a necessary condition to be a local minimum subject to the condition. As before, we have

$f(\mathbf{x}+\mathbf{h}) - f(\mathbf{x}) = \mathbf{h}\cdot \nabla f + O(|\mathbf{h}|^2),$

but we are only interested in those small vectors $\mathbf{h}$ for which $\mathbf{x}+\mathbf{h}$ actually satisfies the constraint, namely $g(\mathbf{x}+\mathbf{h})=0$. But then

$0 = g(\mathbf{x}+\mathbf{h})- g(\mathbf{x})=\mathbf{h}\cdot \nabla g + O(|\mathbf{h}|^2).$

From this, we conclude that the set of small relevant $\mathbf{h}$ is described by $\mathbf{h}\nabla g=O(|\mathbf{h}|^2)$. And now we really can revert to the original argument. If there’s a small $\mathbf{h}$ such that $\mathbf{h}\cdot \nabla g=0$ but $\mathbf{h}\cdot \nabla f \ne 0$, then we can find some $\mathbf{h'_+},\mathbf{h'_-}=\pm \mathbf{h}+O(|\mathbf{h}|^2)$ such that at least one of $f(\mathbf{h'_+}),f(\mathbf{h'_-}).

So a necessary condition to be a constrained local minimum is that every vector which is perpendicular to $\nabla g$ must also be perpendicular to $\nabla f$. From which it follows that these two vectors must be parallel, that is $\nabla f(\mathbf{x})=\lambda \nabla g(\mathbf{x})$, where $\lambda$ is the so-called Lagrange multiplier. Of course we must also have that $g(\mathbf{x})=0$, and so we have a complete characterisation for a necessary condition that the constrained optimisation has a local minimum at $\mathbf{x}$, assuming that all the derivatives of both f and g exist with suitable regularity near $\mathbf{x}$.

Technicalities

The point of setting up the one-variable case in the unusual way in the previous post was to allow me to say at this stage: “it’s exactly the same”. Well, we’ve already seen an extra differentiability condition we might require, but apart from that, the same approach holds. Multi-variate continuous functions also attain their bounds when the domain is finite and includes its boundary.

Checking the boundary might be more complicated in this setting. If the underlying domain is $\{x,y,z\ge 0\}$, then one will have to produce a separate argument for why the behaviour when at least one of the variables is zero fits what you are looking for. Especially in the constrained case, it’s possible that the surface corresponding to the constraint doesn’t actually have a boundary, for example if it is the surface of a sphere. Similarly, checking that the objective function gets large as the variables diverge to infinity can be annoying, as there are many ‘directions’ down which to diverge to infinity.

Motivating Cauchy-Schwarz

In practise, you probably want to have such methods in hand as a last resort on olympiad problems. It’s always possible that something will slip through the net, but typically problem-setters are going to trying to ensure that their problems are not amenable to mindless application of non-elementary methods. And even then, one runs the risk of accusations of non-rigour if you don’t state exact, precise results which justify everything which I’ve presented above.

One thing that can be useful, on the other hand, is to observe that the Lagrange multiplier condition looks a lot like the equality condition for Cauchy-Schwarz. So, even if you can’t solve the family of Lagrange multiplier ‘equations’, this does suggest that applying Cauchy-Schwarz to the vectors involved might give you some insight into the problem.

The following inequality from the IMO 2007 shortlist is a good example.

Suppose $a_1,\ldots,a_{100}\ge 0$ satisfy $a_1^2+\ldots+a_{100}^2=1$. Prove that

$a_1^2a_2+a_2^2a_3+\ldots + a_{100}^2 a_1<\frac{12}{25}.$

We shouldn’t be perturbed by the strictness. Maybe we’ll end up showing a true bound in terms of surds that is less neat to write down…

Anyway, applying Lagrange multipliers would require us to solve

$a_{k-1}^2 + 2a_ka_{k+1}=2\lambda a_k,\quad k=1,\ldots,100,$

with indices taken modulo 100. As so often with these cyclic but non-symmetric expressions, this looks quite hard to solve. However, it turns out that by applying Cauchy-Schwarz to the vectors $(a_k),(a_{k-1}^2+2a_ka_{k+1})$ gets us a long way into the problem by classical means. Working all the way through is probably best left as an exercise.

1. Regarding the problem at the end, I don’t quite follow. Let $a_{100} = 1$ and $a_k = 0$ for $k = 1, \dots, 99$. This clearly satisfies the constraint. Moreover, the sum is now 1, which is bigger than 12/25. Did I do something wrong, or is there a typo somewhere?