Chains and antichains

I’ve recently been at the UK-Hungary winter olympiad camp in Tata, for what is now my sixth time. As well as doing some of my own work, have enjoyed the rare diversion of some deterministic combinatorics. It seems to be a local variant of the pigeonhole principle that given six days at a mathematical event in Hungary, at least one element from {Ramsay theory, Erdos-Szekeres, antichains in the hypercube} will be discussed, with probability one. On this occasion, all were discussed, so I thought I’d write something about at least one of them.

Posets and directed acyclic graphs

This came up on the problem set constructed by the Hungarian leaders. The original formulation asked students to show that among any 17 positive integers, there are either five such that no one divides any other, or five such that among any pair, one divides the other.

It is fairly clear why number theory plays little role. We assign the given integers to the vertices of a graph, and whenever a divides b, we add a directed edge from the vertex corresponding to a to the vertex corresponding to b. Having translated the given situation into a purely combinatorial statement, fortunately we can translate the goal into the same language. If we can find a chain of four directed edges (hence five vertices – beware confusing use of the word ‘length’ here) then we have found the second possible option. Similarly, if we can find an antichain, a set of five vertices with no directed edges between them, then we have found the first possible option.

It’s worth noting that the directed graph we are working with with is transitive. That is, whenever there is an edge a->b and b->c, there will also be an edge a->c. This follows immediately from the divisibility condition. There are also no directed cycles in the graph, since otherwise there would be a cycle of integers where each divided its successor. But of course, when a divides b and these are distinct positive integers, this means that b is strictly larger than a, and so this relation cannot cycle.

In fact, among a set of positive integers, divisibility defines a partial order, which we might choose to define as any ordering whether the associated directed graph is transitive and acyclic, although obviously we could use language more naturally associated with orderings. Either way, from now on we consider posets and the associated DAGs (directed acyclic graphs) interchangeably.

Dilworth’s theorem

In the original problem, we are looking for either a large chain, or a large antichain. We are trying to prove that it’s not possible to have largest chain size at most four, and largest antichain size at most four when there are 17 vertices, so we suspect there may some underlying structure: in some sense perhaps the vertex set is the ‘product’ of a chain and an antichain, or at least a method of producing antichains from a single vertex.

Anyway, one statement of Dilworth’s theorem is as follows:

Statement 1: in a poset with nm+1 elements, there is either a chain of size n+1, or an antichain of size m+1.

Taking n=m=4 immediately finishes the original problem about families of divisors. While this is the most useful statement here, it’s probably not the original, which says the following:

Statement 2: in a poset, there exists \mathcal{C} a decomposition into chains, and an antichain A such that |\mathcal{C}|=|A|.

Remark 1: Note that for any decomposition into chains and any antichain, we have |\mathcal{C}|\ge |A|, since you can’t have more than one representative from any chain in the antichain. So Statement 2 is saying that equality does actually hold.

Remark 2: Statement 1 follows immediately from Statement 2. If all antichains had size at most m, then there’s a decomposition into at most m chains. But each chain has size n, so the total size of the graph is at most mn. Contradiction.

Unsuccessful proof strategies for Dilworth

Since various smart young people who didn’t know the statement or proof of Dilworth’s theorem attempted to find it (in the form of Statement 1, and in a special case) in finite time conditions, it’s easy to talk about what doesn’t work, and try to gain intellectual value by qualifying why.

  • Forgetting directions: in general one might well attack a problem by asking whether we have more information than we need. But ignoring the directions of the edges is throwing away too much information. After doing this, antichains are fine, but maybe you need to exhibit some undirected ‘chains’. Unless these undirected chains are much longer than you are aiming for, you will struggle to reconstruct directed chains out of them.
  • Where can the final vertex go?: in a classic trope, one might exhibit a directed graph on nm vertices with neither a chain of size n+1 nor an antichain of size m+1. We attempt to argue that this construction is essentially unique, and that it goes wrong when we add an extra vertex. As a general point, it seems unlikely to be easier to prove that exactly one class of configurations has a given property in the nm case, than to prove no configurations has the same property in the nm+1 case. A standalone proof of uniqueness is likely to be hard, or a disguised rehash of an actual proof of the original statement.
  • Removing a chain: If you remove a chain of maximal length, then, for contradiction, what you have left is m(n-1)+1 vertices. If you have a long chain left, then you’re done, although maximality has gone wrong somewhere. So you have an antichain size n in what remains. But it’s totally unclear why it should be possible to extend the antichain with one of the vertices you’ve just removed.

An actual proof of Dilworth (Statement 1), and two consequences

This isn’t really a proof, instead a way of classifying the vertices in the directed graph so that this version of Dilworth. As we said earlier, we imagine there may be some product structure. In particular, we expect to be able to find a maximal chain, and a nice antichain associated to each element of the maximal chain.

dilworth-thmWe start by letting V_0 consist of all the vertices which are sources, that is, have zero indegree. These are minima in the partial ordering setting. Now let V_1 consist of all vertices whose in-neighbourhood is entirely contained in V_0, that is they are descendents only of V_0. Then let V_2 consist of all remaining vertices whose in-neighourhood is entirely contained in V_0\cup V_1 (but not entirely in V_0, otherwise it would have already been treated), and so on. We end up with what one might call an onion decomposition of the vertices based on how far they are from the sources. We end up with V_0,V_1,\ldots,V_k, and then we can find a chain of size k+1 by starting with any vertex in V_k and constructing backwards towards the source. However, this is also the largest possible size of a chain, because every time we move up a level in the chain, we must move from V_i to V_j where j>i.

It’s easy to check that each V_i is an antichain, and thus we can read off Statement 1. A little more care, and probably an inductive argument is required to settle Statement 2.

We have however proved what is often called the dual of Dilworth’s theorem, namely that in a poset there exists a chain C, and a decomposition into a collection \mathcal{A} of antichains, for which |C|=|\mathcal{A}|.

Finally, as promised returning to Erdos-Szekeres, if not to positive integers. We apply Dilworth Statement 1 to a sequence of m^2+1 real numbers a_0,a_1,\ldots,a_{m^2}, with the ordering a_i\rightarrow a_j if i\le j and a_i\le a_j. Chains correspond to increasing subsequences, and antichains to decreasing subsequences, so we have shown that there is either a monotone subsequence of length m+1.


Lagrange multipliers Part One: A much simpler setting

I am currently in northern Hungary for our annual winter school for some of the strongest young school-aged mathematicians in the UK and Hungary. We’ve had a mixture of lectures, problem-solving sessions and the chance to enjoy a more authentic version of winter than is currently on offer in balmy Oxford.

One of my favourite aspects of this event is the chance it affords for the students and the staff to see a slightly different mathematical culture. It goes without saying that Hungary has a deep tradition in mathematics, and the roots start at school. The British students observe fairly rapidly that their counterparts have a much richer diet of geometry, and methods in combinatorics at school, which is certainly an excellent grounding for use in maths competitions. By contrast, our familiarity with calculus is substantially more developed – by the time students who study further maths leave school, they can differentiate almost anything.

But the prevailing attitude in olympiad circles is that calculus is unrigorous and hence illegal method. The more developed summary is that calculus methods are hard, or at least technical. This is true, and no-one wants to spoil a measured development of analysis from first principles, but since some of the British students asked, it seemed worth giving a short exposition of why calculus can be made rigorous. They are mainly interested in the multivariate case, and the underlying problem is that the approach suggested by the curriculum doesn’t generalise well at all to the multivariate setting. Because it’s much easier to imagine functions of one variable, we’ll develop the machinery of the ideas in this setting in this post first.

Finding minima – the A-level approach

Whether in an applied or an abstract setting, the main use of calculus at school is to find where functions attain their maximum or minimum. The method can be summarised quickly: differentiate, find where the derivative is zero, and check the second-derivative at that value to determine that the stationary point has the form we want.

Finding maxima and finding minima are a symmetric problem, so throughout, we talk about finding minima. It’s instructive to think of some functions where the approach outlined above fails.

20160131_124059In the top left, there clearly is a minimum, but the function is not differentiable at the relevant point. We can probably assert this without defining differentiability formally: there isn’t a well-defined local tangent at the minimum, so we can’t specify the gradient of the tangent. In the top right, there’s a jump, so depending on the value the function takes at the jump point, maybe there is a minimum. But in either case, the derivative doesn’t exist at the jump point, so our calculus approach will fail.

In the middle left, calculus will tell us that the stationary point in the middle is a ‘minimum’, but it isn’t the minimal value taken by the function. Indeed the function doesn’t have a minimum, because it seems to go off to -\infty in both directions. In the middle right, the asymptote provides a lower bound on the values taken by the function, but this bound is never actually achieved. Indeed, we wouldn’t make any progress by calculus, since there are no stationary points.

At the bottom, the functions are only defined on some interval. In both cases, the minimal value is attained at one of the endpoints of the interval, even though the second function has a point which calculus would identify as a minimum.

The underlying problem in any calculus argument is that the derivative, if it exists, only tells us about the local behaviour of the function. At best, it tells us that a point is a local minimum. This is at least a necessary condition to be a global minimum, which is what we actually care about. But this is a change of emphasis from the A-level approach, for which having zero derivative and appropriately-signed second-derivative is treated as a sufficient condition to be a global minimum.

Fortunately, the A-level approach is actually valid. It can be shown that if a function is differentiable everywhere, and it only has one stationary point, where the second-derivative exists and is positive, then this is in fact the global minimum. The first problem is that this is really quite challenging to show – since in general the derivative might not be continuous, although it might have many of the useful properties of a continuous function. Showing all of this really does require setting everything up carefully with proper definitions. The second problem is that this approach does not generalise well to multivariate settings.

Finding minima – an alternative recipe

What we do is narrow down the properties which the global minimum must satisfy. Here are some options:

0) There is no global minimum. For example, the functions pictured in the middle row satisfy this.

Otherwise, say the global minimum is attained at x. It doesn’t matter if it is attained at several points. At least one of the following options must apply to each such x.

1) f'(x)=0,

2) f'(x) is not defined,

3) x lies on the boundary of the domain where f is defined.

We’ll come back to why this is true. But with this decomposition, the key to identifying a global minimum via calculus is to eliminate options 0), 2) and 3). Hopefully we can eliminate 2) immediately. If we know we can differentiate our function everywhere, then 2) couldn’t possibly hold for any value of x. Sometimes we will be thinking about functions defined everywhere, in which case 3) won’t matter. Even if our function is defined on some interval, this only means we have to check two extra values, and this isn’t such hard work.

20160131_124716It’s worth emphasising why if x is a local minimum not on the boundary and f'(x) exists, then f'(x)=0. We show that if f'(x)\ne 0, then x can’t be a local minimum. Suppose f'(x)>0. Then both the formal definition of derivative, and the geometric interpretation in terms of the gradient of a tangent which locally approximates the function, give that, when h is small,

f(x-h) = f(x)-h f'(x) +o(h),

where this ‘little o’ notation indicates that for small enough h, the final term is much smaller than the second term. So for small enough h, f(x-h)<f(x), and so we don’t have a local minimum.

The key is eliminating option 0). Once we know that there definitely is a global minimum, we are in a good position to identify it using calculus and a bit of quick checking. But how would we eliminate option 0)?

Existence of global minima

This is the point where I’m in greatest danger of spoiling first-year undergraduate course content, so I’ll be careful.

As we saw in the middle row, when functions are defined on the whole real line, there’s the danger that they can diverge to \pm \infty, or approach some bounding value while never actually attaining it. So life gets much easier if you work with functions defined on a closed interval. We also saw what can go wrong if there are jumps, so we will assume the function is continuous, meaning that it has no jumps, or that as y gets close to x, f(y) gets close to f(x). If you think a function can be differentiated everywhere, then it is continuous, because we’ve seen that once a function has a jump (see caveat 2) then it certainly isn’t possible to define the derivative at the jump point.

It’s a true result that a continuous function defined on a closed interval is bounded and attains its bounds. Suppose such a function takes arbitrarily large values. The main idea is that if the function takes arbitrarily large values throughout the interval, then because the interval is finite it also takes arbitrarily large values near some point, which will make it hard to be continuous at that point. You can apply a similar argument to show that the function can’t approach a threshold without attaining it somewhere. So how do you prove that this point exists? Well, you probably need to set up some formal definitions of all the properties under discussion, and manipulate them carefully. Which is fine. If you’re still at school, then you can either enjoy thinking about this yourself, or wait until analysis courses at university.

My personal opinion is that this is almost as intuitive as the assertion that if a continuous function takes both positive and negative values, then it has a zero somewhere in between. I feel if you’re happy citing the latter, then you can also cite the behaviour of continuous functions on closed intervals.

Caveat 2) It’s not true to say that if a function doesn’t have jumps then it is continuous. There are other kinds of discontinuity, but in most contexts these are worse than having a jump, so it’s not disastrous in most circumstances to have this as your prime model of non-continuity.

Worked example

Question 1 of this year’s BMO2 was a geometric inequality. I’ve chosen to look at this partly because it’s the first question I’ve set to make it onto BMO, but mainly because it’s quite hard to find olympiad problems which come down to inequalities in a single variable.

Anyway, there are many ways to parameterise and reparameterise the problem, but one method reduces, after some sensible application of Pythagoras, to showing

f(x)=x+ \frac{1}{4x} + \frac{1}{4x+\frac{1}{x}+4}\ge \frac{9}{8}, (*)

for all positive x.

There are simpler ways to address this than calculus, especially if you establish or guess that the equality case is x=1/2. Adding one to both sides is probably a useful start.

But if you did want to use calculus, you should argue as follows. (*) is certainly true when x\ge \frac{9}{8} and also when $x\le \frac{2}{9}$. The function f(x) is continuous, and so on the interval [\frac{2}{9},\frac{9}{8}] it has a minimum somewhere. We can differentiate, and fortunately the derivative factorises (this might be a clue that there’s probably a better method…) as

(1-\frac{1}{4x^2}) \left[ 1 - \frac{4}{(4x+\frac{1}{x}+4)^2} \right].

If x is positive, the second bracket can’t be zero, so the only stationary point is found at x=1/2. We can easily check that f(\frac12)=\frac98, and we have already seen that f(\frac29),f(\frac98)>\frac98. We know f attains its minimum on [\frac29,\frac98], and so this minimal value must be $\frac98$, as we want.

Overall, the moral of this approach is that even if we know how to turn the handle both for the calculation, and for the justification, it probably would be easier to use a softer approach if possible.

Next stage

For the next stage, we assess how much of this carries across to the multivariate setting, including Lagrange multipliers to find minima of a function subject to a constraint.