RMM 2017 – Problems 1, 4 and 5

I’ve recently taken a UK team to the 2017 edition of the Romanian Master of Mathematics competition in Bucharest. The British students did extremely well and we all enjoyed ourselves mathematically and generally. The customary diary may appear shortly, but this time I want to focus mainly on the questions, since that is after all the main point of these competitions! I hope that what follows is interesting, and at slightly education to potential future students.

I’ve split this into two posts based on my opinion on difficulty, which is subjective but probably correlates fairly positively with most people’s. The account of Q1 is guest-written by two British students, based on their solutions during the competition.

Problem 1

a) Prove that every positive integer n can be written uniquely in the form

n = \sum_{j=1}^{2k+1} (-1)^{j-1} 2^{m_j},

where k\geq 0 and 0 \leq m_1 < m_2 < \cdots < m_{2k+1} are integers. This number k is called the weight of n.

b) Find (in closed form) the difference between the number of positive integers at most 2^{2017} with even weight and the number of positive integers at most 2^{2017} with odd weight.

Rosie Cates and Neel Nanda:

a) We are trying to express n in terms of powers of 2, so it seems sensible to write in binary. As 2^{m_1} is the smallest power of 2, this term is responsible for the last 1 in the binary representation of n. Let $letx x = n – 2^{m_1}$ (ie n with the last 1 removed from its binary expansion). Now if we pair up terms in the sum to get

x = (2^{m_{2k}+1} - 2^{m_{2k}}) + \ldots + (2^{m_3} - 2^{m_2}),

we can see that each bracket looks like 11…100…0 when written in binary. Also, the condition that m_i < m_{i+1} is equivalent to ensuring that we do not break any strings of consecutive 1s that were in the binary expansion of x (so for example 111110 = 110000 +1110 is not allowed). So writing x in the desired form is the same as writing it as the sum of numbers of the form 11…100\ldots 0 without breaking any strings of 1s. For example

1110100110 = 1110000000 + 100000 + 110.

Clearly there is exactly one way of doing this for every x, so (as each n has exactly one x) there is exactly one way to do it for each n as well.

This approach allows k to be understood differently. Write n in binary and remove the last 1; now count the number of groups of consecutive 1s. This is equal to k.

b) The second half of the problem becomes a lot simpler with the observation that n\leq 2^{m_{2k+1}}, as

n=2^{m_{2k+1}}-(2^{m_{2k}}-2^{m_{2k-1}})-\ldots-(2^{m_2}-2^{m_1}),

and the sequence m_n is increasing, so each bracket is positive. As each sequence of (m_n)s corresponds uniquely to an integer, this means we just want to count sequences of (m_n)s with greatest term at most 2017. The sequence is increasing, so each sequence corresponds to a subset of {0, 1, …, 2017} of size (2k+1). There are \binom{2018}{2k+1} subsets of size (2k+1), so the question reduces to finding a closed form for \sum_{k=0}^{1008} (-1)^k {{2018}\choose{2k+1}}.

This is reminiscent of a classic problem in combinatorics: using the binomial theorem to evaluate sums of binomial coefficients weighted by powers. The best example is

\sum_{k=0}^n (-1)^k \binom{n}{k} =(1-1)^n=0,

but here rather than (-1) we want something whose square is $(-1)$, so we consider the complex number i. Using the same ideas, we get that

\sum_{r=0}^{2018} i^r \binom{2018}{r}=(1+i)^{2018},

which contains what we want, but also binomial coefficients with even r. But if r is even, i^r is real, and if r is odd, i^r is imaginary. So the sum we want appears as the imaginary part, that is

\mathrm{Im}\left((1+i)^{2018}\right)=\mathrm{Im}\left((\sqrt{2} \cdot e^{\frac{i\pi}{4}})^{2018}\right)=2^{1009}.

Dominic: note that in both parts, the respective authors find slightly more than what they were required to. That is, respectively, the interpretation of k, and a bound on m_{2k+1}. The latter is an excellent example of the general notion that sometimes it is better to use a stronger statement than what you actually require in an induction argument (here for existence). The stronger statement (which you guess from playing with examples) makes the inductive step easier, as it’s then clear that the new term you get is distinct from the terms you already have.

Problem 4

In the Cartesian plane, let \mathcal G_1, \mathcal G_2 be the graphs of the quadratic functions f_1(x) = p_1x^2 + q_1x + r_1, f_2(x) = p_2x^2 + q_2x + r_2, where p_1 > 0 > p_2. The graphs \mathcal G_1, \mathcal G_2 cross at distinct points A and B. The four tangents to \mathcal G_1, \mathcal G_2 at~A and B form a convex quadrilateral which has an inscribed circle. Prove that the graphs \mathcal{G}_1 and \mathcal{G}_2 have the same axis of symmetry.

This question is quite unusual for an olympiad of this kind, and I was initially skeptical, but then it grew on me. Ultimately, I was unsurprised that many contestants attacked entirely with coordinate calculations. If you use this strategy, you will definitely get there in the end, but you have to accept that you aren’t allowed to make any mistakes. And because of the amount of symmetry in the configuration, even if you make a mistake, you might still get the required answer, and so not notice that you’ve made a mistake. But I decided I liked it because various levels of geometric insight either reduced or removed the nastier calculations.

Typically, one could gain geometric insight by carefully observing an accurate diagram, but an accurate parabola is hard to draw. However, even from a vague diagram, we might guess the key intermediate property of the configuration, which is that the line joining the other two points in the quadrilateral is parallel to the y-axis. This means that they have the same x-coordinate, and indeed this x-coordinate must in fact be the same for any parabola through A and B, so it is reasonable to guess that it is \frac{x_A+x_B}{2}, the mean of the x-coordinates of A and B.

Since you know this is the goal, it’s not too bad to calculate the equations of the tangent lines directly, and demonstrate this algebraically. But I was determined to use the focus-directrix definition of a parabola. Either recall, or digest the interesting new fact that a parabola may be defined as the locus of points which are the same distance from a fixed point P (the focus), and a fixed line \ell (the directrix). Naturally, the distance to the line is perpendicular distance.

To ensure the form given in the statement where y is a quadratic function of x, in this setting the directrix should be parallel to the x-axis. To define the tangent to the parabola at A, let A’ be the foot of the perpendicular from A onto \ell, so AA’=PA. I claim that the tangent at A is given by the perpendicular bisector of A’P. Certainly this passes through A, and it is easy to convince yourself that it can’t pass through any other point B on the parabola, since BA’> PB, as A’ is on \ell but is not the foot of the perpendicular form B to \ell. This final observation is truly a lot more obvious if you’re looking at a diagram.

We now want to finish geometrically too. In our quadrilateral, one diagonal is parallel to the y-axis, and it will suffice to show that the existence of an incircle implies that A and B must have the same y-coordinate. We have just shown A and B are the same (horizontal) distance from the other diagonal. So certainly if they have the same y-coordinate, then the quadrilateral is a kite, and the sums of opposite sides are equal, which is equivalent to the existence of an incircle. One could then finish by arguing that this ceases to be true if you move one of A and B in either direction, or by some short explicit calculation if such a perturbation argument leaves you ill at ease.

Question 5

Fix an integer n \geq 2. An n x n  sieve is an n x n array with n cells removed so that exactly one cell is removed from every row and every column. A stick is a 1 x k or k x 1 array for any positive integer k. For any sieve A, let m(A) be the minimal number of sticks required to partition A. Find all possible values of m(A), as A varies over all possible n x n sieves.

This is a fairly classic competition problem, and while in my opinion the statement isn’t particularly fascinating, it’s interesting that it admits such a wide range of approaches.

As ever, you need to start by playing around with the setup, and guessing that the answer is 2n-2, and not thinking `it can’t possibly be the same answer as Q3??’ Then think about reasons why you couldn’t do better than 2n-2. My very vague reason was that if you only use horizontal sticks, the answer is clearly 2n-2, and the same if you only use vertical sticks. But it feels like you can only make life harder for yourself if you try to use both directions of sticks in lots of places. Note that some sort of argument involving averaging over stick lengths is definitely doomed to fail unless it takes into account the Latin square nature of the location of holes! For example, if you were allowed to put all the holes in the first row, m(A) would be n-1.

Induction is tempting. That is, you remove some number of sticks, probably those corresponding to a given hole, to reduce the board to an (n-1)x(n-1) configuration. If you do this, you need to be clear about why you can remove what you want to remove (in particular, the number of sticks you want to remove), and whether it’s qualitatively different if the hole in question lies on the border of the board. In all of these settings, you want to be careful about 1×1 sticks, which it’s easy inadvertently to count as both horizontal and vertical. This is unlikely to affect the validity of any argument (just picking either an arbitrary or a canonical direction if it’s 1×1 should be fine) but does make it much harder to check the validity.

Joe exhibited directly a construction of 2n-2 cells which must be covered by different sticks. This approach lives or dies by the quality of the written argument. It must look general, even though any diagram you draw must, almost by definition, correspond to some particular case. Alternatively, since the problem is set on a grid, the cells correspond naturally to edges of a bipartite graph, where classes correspond to rows and columns. The holes form a perfect matching on this bipartite graph. But, as Harvey observed, if you split the rows and columns in two, on either side of the relevant hole (or not in the 2+2 cases where the hole is at the border), you have a (2n-2)+(2n-2) bipartite graph, and a perfect matching here corresponds to a set of cells which must be covered by different sticks. This is an ingenious idea, and if you’ve recently met Hall’s Marriage Theorem, which gives a verifiable criterion for the existence of such a perfect matching, there are few better uses of your next ten minutes than to check whether Hall’s condition a) should hold; b) can be proven to hold in this setting.

Advertisements

DGFF 3 – Gibbs-Markov property for entropic repulsion

In the previous post, we saw that it isn’t much extra effort to define the DGFF with non-zero boundary conditions, by adding onto the zero-BC DGFF the unique (deterministic) harmonic function which extends the boundary values into the domain. We also saw how a Gibbs-Markov property applies, whereby the values taken by the field on some sub-region A\subset D depend on the values taken on D\backslash A only through values taken on \partial A.

In this post, we look at how this property and some other methods are applied by Deuschel [1] to study the probability that the DGFF on a large box in \mathbb{Z}^d is positive ‘everywhere’. This event can be interpreted in a couple of ways, all of which are referred to there as entropic repulsion. Everything which follows is either taken directly or paraphrased directly from [1]. I have tried to phrase this in a way which avoids repeating most of the calculations, instead focusing on the methods and the motivation for using them.

Fix dimension d\ge 2 throughout. We let P^0_N be the law of the DGFF on V_N:=[-N,N]^d\subset \mathbb{Z}^d with zero boundary conditions. Then for any subset A\subset \mathbb{Z}^d, in an intuitively-clear abuse of notation, we let

\Omega^+(A):= \{ h_x\ge 0, x\in A\},

be the event that some random field h takes only non-negative values on A. The goal is to determine P^0_N ( \Omega^+(V_N)). But for the purposes of this post, we will focus on showing bounds on the probability that the field is non-negative on a thin annulus near the boundary of V_N, since this is a self-contained step in the argument which contains a blog-friendly number of ideas.

We set (L_N) to be a sequence of integers greater than one (to avoid dividing by zero in the statement), for which \frac{L_N}{N}\rightarrow 0. We now define for each N, the annulus

W_N = \{v\in V_N: L_N\le d_{\mathbb{Z}^d}(v, V_N^c)\le 2L_N \}

with radius L_N set a distance L_N inside the box V_N. We aim to control P^N_0 (\Omega^+(W_N)). This forms middle steps of Deuschel’s Propositions 2.5 and 2.9, which discuss P^N_0(\Omega^+(V_{N-L_N})). Clearly there is the upper bound

P^N_0(\Omega^+(V_{N-L_N})) \le P^N_0(\Omega^+(W_N)) (1)

and a lower bound on P^N_0(\Omega^+(V_{N-L_N})) is obtained in the second proposition by considering the box as a union of annuli then combining the bounds on each annulus using the FKG inequality.

Upper bound via odds and evens

After removing step (1), this is Proposition 2.5:

\limsup_{N\rightarrow \infty} \frac{L_N}{N^{d-1} \log L_N} \log P^N_0(\Omega^+(W_N)) < 0. (2)

This is giving a limiting upper bound on the probability of the form L_N^{-CN^{d-1}/L_N}, though as with all LDP estimates, the form given at (2) is more instructive.

Morally, the reason why it is unlikely that the field should be non-negative everywhere within the annulus is that the distribution at each location is centred, and even though any pair of values are positively correlated, this correlation is not strong enough to avoid this event being unlikely. But this is hard to corral into an upper bound argument directly. In many circumstances, we want to prove upper bounds for complicated multivariate systems by projecting to get an unlikely event for a one-dimensional random variable, or a family of independent variables, even if we have to throw away some probability. We have plenty of tools for tail probabilities in both of these settings. Since the DGFF is normal, a one-dimensional RV that is a linear combination (eg the sum) of all the field heights is a natural candidate. But in this case we would have thrown away too much probability, since the only way we could dominate is to demand that the sum \sum_{x\in W_N}h^N_x\ge 0, which obviously has probability 1/2 by symmetry. (3)

So Deuschel splits W_N into W_N^o,W_N^e, where the former includes all vertices with odd total parity in W_N and the latter includes all the vertices with even total parity in the interior of W_N. (Recall that \mathbb{Z}^d is bipartite in exactly this fashion). The idea is to condition on h^N\big|_{W^o_N}. But obviously each even vertex is exactly surrounded by odd vertices. So by the Gibbs-Markov property, conditional on the odd vertices, the values of the field at the even vertices are independent. Indeed, if for each v\in W_N^e we define \bar h_v to be the average of its neighbours (which is measurable w.r.t to the sigma-algebra generated by the odd vertices), then

\{h_v: v\in W_N^e \,\big|\, \sigma(h_w: w\in W_N^o)\},

is a collection of independent normals with variance one, and where the mean of h_v is \bar h_v.

To start finding bounds, we fix some threshold m=m_N\gg 1 to be determined later, and consider the odd-measurable event A_N that at most half of the even vertices v have \bar h_v\ge m. So A_N^c\cap \Omega^+(W_N) says that all the odd vertices are non-negative and many are quite large. This certainly feels like a low-probability event, and unlike at (3), we might be able to obtain good tail bounds by projection into one dimension.

In the other case, conditional on A_N, there are a large number of even vertices with conditional mean at most m, and so we can control the probability that at least one is negative as a product

(1-\varphi(m))^{\frac12 |W_N^e|}. (4)

Note that for this upper bound, we can completely ignore the other even vertices (those with conditional mean greater than m).

So we’ll go back to A_N^c \cap \Omega^+(W_N). For computations, the easiest one-dimensional variable to work with is probably the mean of the \bar h_vs across v\in W_N^e, since on A_N^c\cap \Omega^+(W_N) this is at least \frac{m}{2}. Rather than focus on the calculations themselves involving

\bar S^e_N:= \frac{1}{|W_N^e|} \sum\limits_{v\in W_N^e} \bar h_v,

let us remark that it is certainly normal and centered, and so there are many methods to bound its tail, for example

P^0_N \left( \bar S^e_N \ge \frac{m}{2} \right) \le \exp\left( \frac{-m^2}{8\mathrm{Var}(\bar S^e_N)} \right), (5)

as used by Deuschel just follows from an easy comparison argument within the integral of the pdf. We can tackle the variance using the Green’s function for the random walk (recall the first post in this set). But before that, it’s worth making an observation which is general and useful, namely that \bar S^e_N is the expectation of

S^e_N:= \sum{1}{|W_N^e|}\sum\limits_{v\in W_N^e} h_v

conditional on the odds. Directly from the law of total variance, the variance of any random variable X is always larger than the variance of \mathbb{E}[X|Y].

So in this case, we can replace \mathrm{Var}(\bar S^e_N) in (5) with \mathrm{Var}(S^e_N), which can be controlled via the Green’s function calculation.

Finally, we choose m_N so that the probability at (4) matches the probability at (5) in scale, and this choice leads directly to (2).

In summary, we decomposed the event that everything is non-negative into two parts: either there are lots of unlikely local events in the field between an even vertex and its odd neighbours, or the field has to be atypically large at the odd sites. Tuning the parameter m_N allows us to control both of these probabilities in the sense required.

Lower bound via a sparse sub-lattice

To get a lower bound on the probability that the field is non-negative on the annulus, we need to exploit the positive correlations in the field. We use a similar idea to the upper bound. If we know the field is positive and fairly large in many places, then it is increasingly likely that it is positive everywhere. The question is how many places to choose?

We are going to consider a sub-lattice that lives in a slightly larger region than W_N itself, and condition the field to be larger than m=m_N everywhere on this lattice. We want the lattice to be sparse enough that even if we ignore positive correlations, the chance of this happening is not too small. But we also want the lattice to be dense enough that, conditional on this event, the chance that the field is actually non-negative everywhere in W_N is not too small either.

To achieve this, Deuschel chooses a sub-lattice of width \lfloor\epsilon L_N^{2/d}\rfloor, and sets \Lambda_N(\epsilon) to be the intersection of this with the annulus with radii [N-\frac{5}{2}L_N, N-\frac{1}{2}L_N], to ensure it lives in a slightly larger region than W_N itself. The scaling of this sub-lattice density is such that when a random walk is started at any v\in W_N, the probability that the RW hits \Lambda_N(\epsilon) before \partial V_N is asymptotically in (0,1). (Ie, not asymptotically zero or one – this requires some definitely non-trivial calculations.) In particular, for appropriate (ie large enough) choice of \epsilon, this probability is at least 1/2 for all v\in W_N. This means that after conditioning on event B_N:=\{h_v\ge m : v\in \Lambda_N(\epsilon)\}, the conditional expectation of h_w is at least \frac{m}{2} for all w\in W_N\backslash \Lambda_N(\epsilon). Again this uses the Gibbs-Markov property and the Gaussian nature of the field. In particular, this conditioning means we are left with the DGFF on V_N\backslash \Lambda_N(\epsilon), ie with boundary \partial V_N\cup \Lambda_N(\epsilon), and then by linearity, the mean at non-boundary points is given by the harmonic extension, which is linear (and so increasing) in the boundary values.

At this point, the route through the calculations is fairly clear. Since we are aiming for a lower bound on the probability of the event \Omega^+(W_N), it’s enough to find a lower bound on P^0_N(\Omega^+(W_N)\cap B).

Now, by positive correlation (or, formally, the FKG inequality) we can control P^0_N(B) just as a product of the probabilities that the field exceeds the threshold at each individual site in \Lambda_N(\epsilon). Since the value of the field at each site is normal with variance at least 1 (by definition), this is straightforward.

Finally, we treat P^0_N(\Omega^+(W_N) \,\big|\, B). We’ve established that, conditional on B, the mean at each point of W_N\backslash \Lambda_N(\epsilon) is at least \frac{m}{2}, and we can bound the variance above too. Again, this is a conditional variance, and so is at most the corresponding original variance, which is bounded above by \sigma_N^2:=\mathrm{Var}(h^N_0). (This fact that the variance is maximised at the centre is intuitively clear when phrased in terms of occupation times, but the proof is non-obvious, or at least non-obvious to me.)

Since each of the event h_v^N\ge 0 for v\in W_N\backslash \Lambda_N(\epsilon) is positively correlated with B, we can bound the probability it holds for all v by the product of the probabilities that it holds for each v. But having established that the conditional mean is at least \frac{m_N}{2} for each v, and the variance is uniformly bounded above (including in N), this gives an easy tail bound of the form we require.

Again it just remains to choose the sequence of thresholds m_N to maximise the lower bound on the probability that we’ve found in this way. In both cases, it turns out that taking m_N= \sqrt{C\log N} is sensible, and this turns out to be linked to the scaling of the maximum of the DGFF, which we will explore in the future.

References

[1] – J-D Deuschel, Entropic Repulsion of the Lattice Free Field, II. The 0-Boundary Case. Available at ProjectEuclid.

EGMO 2016 Paper I

We’ve just our annual selection and training camp for the UK IMO team in Cambridge, and I hope it was enjoyed by all. I allotted myself the ‘graveyard slot’ at 5pm on the final afternoon (incidentally, right in the middle of this, but what England fan could have seen that coming in advance?) and talked about random walks on graphs and the (discrete) heat equation. More on that soon perhaps.

The UK has a team competing in the 5th European Girls Mathematical Olympiad (hereafter EGMO 2016) right now in Busteni, Romania. The first paper was sat yesterday, and the second paper is being sat as I write this. Although we’ve already sent a team to the Romania this year (where they did rather well indeed! I blame the fact that I wasn’t there.), this feels like the start of the olympiad ‘season’. It also coincides well with Oxford holidays, when, though thesis deadlines loom, I have a bit more free time for thinking about these problems. Anyway, last year I wrote a summary of my thoughts and motivations when trying the EGMO problems, and this seemed to go down well, so I’m doing the same this year. My aim is not to offer official solutions, or even outlines of good solutions, but rather to talk about ideas, and how and why I decided whether they did or didn’t work. I hope some of it is interesting.

You can find the paper in many languages on the EGMO 2016 website. I have several things to say about the geometry Q2, but I have neither enough time nor geometric diagram software this morning, so will only talk about questions 1 and 3. If you are reading this with the intention of trying the problems yourself at some point, you probably shouldn’t keep reading, in the nicest possible way.

Question 1

[Slightly paraphrased] Let n be an odd positive integer and x_1,\ldots,x_n\ge 0. Show that

\min_{i\in[n]} \left( x_i^2+x_{i+1}^2\right) \le \max_{j\in[n]} 2x_jx_{j+1},

where we define x_{n+1}=x_1 cyclically in the natural way.

Thought 1: this is a very nice statement. Obviously when i and j are equal, the inequality holds the other way round, and so it’s interesting and surprising that constructing a set of pairs of inequalities in the way suggested gives a situation where the ‘maximum minimum’ is at least the ‘minimum maximum’.

Thought 2: what happens if n is actually even? Well, you can kill the right-hand-side by taking at least every other term to be zero. And if n is even, you really can take every other term to be even, while leaving the remaining terms positive. So then the RHS is zero and the LHS is positive.

The extension to this thought is that the statement is in danger of not holding if there’s a lot of alternating behaviour. Maybe we’ll use that later.

Idea 1: We can write

2(x_i^2+x_{i+1}^2)=(x_i+x_{i+1})^2 + |x_i-x_{i+1}|^2, \quad 4x_ix_{i+1}=(x_i+x_{i+1})^2 - |x_i-x_{i+1}|^2,

which gives insight into ‘the problem multiplied by 2’. This was an ‘olympiad experience’ idea. These transformations between various expressions involving sums of squares turn out to be useful all the time. Cf BMO2 2016 question 4, and probably about a million other examples. As soon as you see these expressions, your antennae start twitching. Like when you notice a non-trivial parallelogram in a geometry problem, but I digress. I’m not sure why I stuck in the absolute value signs.

This was definitely a good idea, but I couldn’t find a way to make useful deductions from it especially easily. I tried converting the RHS expression for i (where LHS attains minimum) into the RHS expression for any j by adding on terms, but I couldn’t think of a good way to get any control over these terms, so I moved on.

Idea 2: An equality case is when they are all equal. I didn’t investigate very carefully at this point whether this might be the only equality case. I started thinking about what happens if you start with an ‘equal-ish’ sequence where the inequality holds, then fiddle with one of the values. If you adjust exactly one value, then both sides might stay constant. It seemed quite unlikely that both would vary, but I didn’t really follow this up. In any case, I didn’t feel like I had very good control over the behaviour of the two sides if I started from equality and built up to the general case by adjusting individual values. Or at least, I didn’t have a good idea for a natural ordering to do this adjustment so that I would have good control.

Idea 3: Now I thought about focusing on where the LHS attains this minimum. Somewhere, there are values (x,y) next to each other such that x^2+y^2 is minimal. Let’s say x\le y. Therefore we know that the element before x is at least y, and vice versa, ie we have

\ldots, \ge y, x, y, \ge x,\ldots.

and this wasn’t helpful, because I couldn’t take this deduction one step further on the right. However, once you have declared the minimum of the LHS, you are free to make all the other values of x_i smaller, so long as they don’t break this minimum. Why? Because the LHS stays the same, and the RHS gets smaller. So if you can prove the statement after doing this, then the statement was also true before doing this. So after thinking briefly, this means that you can say that for every i, either x_{i-1}^2+x_i^3 or x_i^2+x_{i+1}^2 attains this minimum.

Suddenly this feels great, because once we know at least one of the pairs corresponding to i attains the minimum, this is related to parity of n, which is in the statement. At this point, I was pretty confident I was done. Because you can’t partition odd [n] into pairs, there must be some i which achieves a minimum on both sides. So focus on that.

Let’s say the values are (x,y,x) with x\le y. Now when we try to extend in both directions, we actually can do this, because the values alternate with bounds in the right way. This key is to use the fact that the minimum x^2+y^2 must be attained at least every other pair. (*) So we get

\ldots, \le x,\ge y,x,y,x,\ge y,\le x,\ldots.

But it’s cyclic, so the ‘ends’ of this sequence join up. If n\equiv 1 modulo 4, we get \ge y,\ge y next to each other, which means the RHS of the statement is indeed at least the LHS. If n\equiv 3 modulo 4, then we get \le x,\le x next to each other, which contradicts minimality of x^2+y^2 unless x=y. Then we chase equality cases through the argument (*) and find that they must all be equal. So (after checking that the case x\ge y really is the same), we are done.

Thought 3: This really is the alternating thought 2 in action. I should have probably stayed with the idea a bit longer, but this plan of reducing values so that equality was achieved often came naturally out of the other ideas.

Thought 4: If I had to do this as an official solution, I imagine one can convert this into a proof by contradiction and it might be slightly easier, or at least easier to follow. If you go for contradiction, you are forcing local alternating behaviour, and should be able to derive a contradiction when your terms match up without having to start by adjusting them to achieve equality everywhere.

Question 3

Let m be a positive integer. Consider a 4m x 4m grid, where two cells are related to each other if they are different but share a row or a column. Some cells are coloured blue, such that every cell is related to at least two blue cells. Determine the minimum number of blue cells.

Thought 1: I spent the majority of my time on this problem working with the idea that the answer was 8m. Achieved by taking two in each row or column in pretty much any fashion, eg both diagonals. This made me uneasy because the construction didn’t take advantage of the fact that the grid size was divisible by 4. I also couldn’t prove it.

Thought 2: bipartite graphs are sometimes useful to describe grid problems. Edges correspond to cells and each vertex set to row labels or column labels.

Idea 1: As part of an attempt to find a proof, I was thinking about convexity, and why having exactly two in every row was best, so I wrote down the following:

Claim A: No point having three in a row.

Claim B: Suppose a row has only one in it + previous claim => contradiction.

In Cambridge, as usual I organised a fairly comprehensive discussion of how to write up solutions to olympiad problems. The leading-order piece of advice is to separate your argument into small pieces, which you might choose to describe as lemmas or claims, or just separate implicitly by spacing. This is useful if you have to do an uninteresting calculation in the middle of a proof and don’t want anyone to get distracted, but mostly it’s useful for the reader because it gives an outline of your argument.

My attempt at this problem illustrates an example of the benefit of doing this even in rough. If your claim is a precise statement, then that’s a prompt to go back and separately decide whether it is actually true or not. I couldn’t prove it, so started thinking about whether it was true.

Idea 2: Claim A is probably false. This was based on my previous intuition, and the fact that I couldn’t prove it or get any handle on why it might be true. I’d already tried the case m=1, but I decided I must have done it wrong so tried it again. I had got it wrong, because 6 is possible, and it wasn’t hard from here (now being quite familiar with the problem) to turn this into a construction for 6m in the general case.

Idea 3: This will be proved by some sort of double-counting argument. Sometimes these arguments turn on a convexity approach, but when the idea is that a few rows have three blue cells, and the rest have one, this now seemed unlikely.

Subthought: Does it make sense for a row to have more than three blue cells? No. Why not? Note that as soon as we have three in a row, all the cells in that row are fine, irrespective of the rest of the grid. If we do the problem the other way round, and have some blues, and want to fill out legally the largest possible board, why would we put six in one row, when we could add an extra row, have three in each (maintaining column structure) and be better off than we were before. A meta-subthought is that this will be impossible to turn into an argument, but we should try to use it to inform our setup.

Ages and ages ago, I’d noticed that you could permute the rows and columns without really affecting anything, so now seemed a good time to put all the rows with exactly one blue cell at the top (having previously established that rows with no blue cell were a disaster for achieving 6m), and all the columns with one blue cell at the left. I said there were r_1,c_1 such rows and columns. Then, I put all the columns which had a blue cell in common with the r_1 rows next to the c_1 columns I already had. Any such column has at least three blues in it, so I said there were c_3 of these, and similarly r_3 rows. The remaining columns and rows might as well be r_0,c_0 and hopefully won’t matter too much.

From here, I felt I had all the ingredients, and in fact I did, though some of the book-keeping got a bit fiddly. Knowing what you are aiming for and what you have means there’s only one way to proceed: first expressions in terms of these which are upper bounds for the number of columns (or twice the number of columns = rows if you want to keep symmetry), and lower bounds in terms of these for the number of blue cells. I found a noticeable case-distinction depending on whether r_1\le 3c_3 and c_1\le 3r_3. If both held or neither held, it was quite straightforward, and if exactly one held, it got messy, probably because I hadn’t set things up optimally. Overall, fiddling about with these expressions occupied slightly more time than actually working out the answer was 6m, so I don’t necessarily have a huge number of lessons to learn, except be more organised.

Afterthought 2: Thought 2 said to consider bipartite graphs. I thought about this later while cycling home, because one can’t (or at least, I can’t) manipulate linear inequalities in my head while negotiating Oxford traffic and potholes. I should have thought about it earlier. The equality case is key. If you add in the edges corresponding to blue cells, you get a series of copies of K_{1,3}, that is, one vertex with three neighbours. Thus you have three edges for every four vertices, and everything’s a tree. This is a massively useful observation for coming up with a very short proof. You just need to show that there can’t be components of size smaller than 4. Also, I bet this is how the problem-setter came up with it…

Hall’s Marriage Theorem

Hall’s Marriage Theorem gives conditions on when the vertices of a bipartite graph can be split into pairs of vertices corresponding to disjoint edges such that every vertex in the smaller class is accounted for. Such a set of edges is called a matching. If the sizes of the vertex classes are equal, then the matching naturally induces a bijection between the classes, and such a matching is called a perfect matching.

The number of possible perfect matchings of K_{n,n} is n!, which is a lot to check. Since it’s useful to have bijections, it’s useful to have matchings, so we would like a simple way to check whether a bipartite graph has a matching. Hall’s Marriage Theorem gives a way to reduce the number of things to check to 2^n, which is still large. However, much more importantly, the condition for the existence of a matching has a form which is much easier to check in many applications. The statement is as follows:

Given bipartite graph G with vertex classes X and Y, there is a matching of X into Y precisely when for every subset A\subset X, |\Gamma(A)|\ge |A|, where \Gamma(A) is the set of vertices joined to some vertex in A, called the neighbourhood of A.

Taking A=X, it is clear that |Y|\ge |X| is a necessary condition for the result to hold, unsurprisingly. Perhaps the most elementary standard proof proceeds by induction on the size of X, taking the smallest A to give a contradiction, then using the induction hypothesis to lift smaller matchings up to the original graph. This lifting is based on the idea that a subset relation between sets induces a subset relation between their neighbourhoods.

In this post, I want to consider this theorem as a special case of the Max-Flow-Min-Cut Theorem, as this will support useful generalisations much more easily. The latter theorem is a bit complicated notationally to set up, and I don’t want it to turn into the main point of this post, so I will summarise. The Wikipedia article, and lots of sets of lecture notes are excellent sources of more detailed definitions.

The setting is a weakly-connected directed graph, with two identified vertices, the source, with zero indegree, and the sink, with zero outdegree. Every other vertex lies on a path (not necessarily unique) between the source and the sink. Each edge has a positive capacity, which should be thought of as the maximum volume allowed to flow down the edge. A flow is a way of assigning values to each edge so that they do not exceed the capacity, and there is volume conservation at each interior vertex. That is, the flow into the vertex is equal to flow out of the vertex. The value of the flow is the sum of the flows out of the source, which is necessarily equal to the sum of flows into the sink.

A cut is a partition of the vertices into two classes, with the source in one and the sink in the other. The value of a cut is then the sum of the capacities of any edges going from the class containing the source to the class containing the sink. In most examples, the classes will be increasing, in the sense that any path from source to sink changes class exactly once.

The Max-Flow-Min-Cut Theorem asserts that the maximum value of a flow through the system is equal to the minimum value of a cut. The proof is elementary, though it relies on defining a sensible algorithm to construct a minimal cut from a maximal flow that is not going to be interesting to explain without more precise notation available.

First we explain why Hall’s Marriage Theorem is a special case of this result. Suppose we are given the setup of HMT, with edges directed from X to Y with infinite capacity. We add edges of capacity 1 from some new vertex x_0 to each vertex of X, and from each vertex of Y to a new vertex y_0. The aim is to give necessary and sufficient conditions for the existence of a flow of value |X|. Note that one direction of HMT is genuinely trivial: if there is a matching, then the neighbourhood size condition must hold. We focus on the other direction. If the maximum flow is less than |X|, then there should be a cut of this size as well. We can parameterise a cut by the class of vertices containing the source, say S. Let A=SnX. If \Gamma(A)\not\subset S, then there will be an infinite capacity edge in the cut. So if we are looking for a minimal cut, this should not happen, hence \Gamma(A)\subset S if S is minimal. Similarly, there cannot be any edges from X\A to \Gamma(A). The value of the cut can then be given by

|\Gamma(A)|+|X|-|A|, which is at least |X| if the neighbourhood size assumption is given. Note we can use the same method with the original edges having capacity one, but we have to track slightly more quantities.

This topic came up because I’ve been thinking about fragmentation chains over this holiday. I have a specific example concerning forests of unrooted trees in mind, but won’t go into details right now. The idea is that we often have distributions governing random partitions of some kind, let’s say of [n]. Conditioning on having a given number of classes might give a family of distributions P_{n,k} for the partitions of [n] into k parts. We would be interested to know how easy it is to couple these distributions in a nice way. One way would be via a coalescence or fragmentation process. In the latter, we start with [n] itself, then at each step, split one of the parts into two according to some (random, Markovian) rule. We are interested in finding out whether such a fragmentation process exists for a given distribution.

It suffices to split the problem into single steps. Can we get from P_{n,k} to P_{n,k+1}?

The point I want to make is that this is just a version of Hall’s Marriage Theorem again, at least in terms of proof method. We can take X to be the set of partitions of [n] into k parts, and Y the set of partitions into (k+1) parts. Then we add a directed edge with infinite capacity between x\in X and y\in Y if y can be constructed from x by breaking a part into two. Finally, we connect a fresh vertex x_0 to each edge in X, only now we insist that the capacity is equal to P_{n,k}(x), and similarly an edge from y to y_0 with capacity equal to P_{n,k+1}(y). The existence of a fragmentation chain over this step is then equivalent to the existence of a flow of value 1 in the directed graph network.

Although in many cases this remains challenging to work with, which I will explore in a future post perhaps, this is nonetheless a useful idea to have in mind when it comes to deciding on whether such a construction is possible for specific examples.