# EGMO 2019

Last week, we held our annual IMO training and selection camp in the lovely surroundings of Trinity College, Cambridge. Four of our students have subsequently spent this week in Kiev, for the ninth edition of the prestigious European Girls’ Mathematical Olympiad.

The UK team, none of whom had attended the competition before, and all of whom remain eligible to return at least once, did extremely well, placing fourth out of the official European countries, and earning three silver medals, and one gold medal, with complete or almost-complete solutions to all six problems between them. More details are available on the competition website.

In this post, I’m going to discuss some of the non-geometry problems. As a special treat, the discussion of Question 6 is led by Joe Benton, which is only fitting, since he wrote it. You can find the first day’s three problems here, and the second day’s problems here. The geometry problems are treated in a separate post.

Problem One

Triple equalities are hard to handle directly, so generally one starts with a single equality, for example the left hand one $a^2b+c = b^2c+a$, after noting that the setup is cyclic (but not symmetric) in the variables, under cycling $a\to b \to c\to a$.

Some quick notes:

• The given equations are inhomogeneous, meaning that not every term has the same degree in terms of {a,b,c}. In complicated equations, normally we want to cancel terms, and we certainly can’t cancel terms with different degrees! So a good first idea is to find a way to make the equations homogeneous, for example using the condition, which is itself inhomogeneous. For example, one could replace $c$ with $c(ab+bc+ca)$, and it’s clear this might lead to some cancellation. In this case, the algebra is more straightforward if one rearranges and factorises first to obtain $a(1-ab)=c(1-bc)$, from which the condition gives $a(bc+ca)=c(ab+ac)$.
• Shortly after this, we find $a^2c=ac^2$, which means $a=c$ unless at least one of these terms is equal to zero.
• This case distinction should not be brushed over lightly as a tiny detail. This is the sort of thing that can lose you significant marks, especially if the omitted case turns out to be more involved than the standard case.
• This is a good example of planning your final write-up. The case distinction $\{a,c\ne 0\}, \{a\text{ or }c=0\}$ is really clumsy. What about b? We have already said that the setup is cyclic in (a,b,c), and it’s annoying to throw this aspect away. It really would be much much to have the overall case distinction at the start of your argument, for example into $\{a,b,c \ne 0\}$ and $\{a\text{ or }b\text{ or }c=0\}$, because then in the latter case you could assume that $a=0$ without loss of generality, then see what that implied for the other variables.
• In this setting, one really should check that the claimed solutions satisfy both the condition and the triple-equality. This is a complicated family of simultaneous equations, and even if you’re lucky enough to have settled on an argument that is reversible, it’s much easier to demonstrate that everything is satisfied as desired than actually to argue that the argument is reversible.

Problem Two

Maybe I speak for a number of people when I say I’m a little tired of domino tiling problems, so will leave this one out.

Problem Five

I liked this problem. A lot. It did feel like there were potential time-wasting bear traps one might slip into, and perhaps these comments are relevant:

• This feels like quite a natural problem to ask. In particular, the task specified by (A)+(B) is much more elegant than the RHS of (C). So unless you have immediate insight for the RHS of (C), it makes sense to start with the task, and aim for any bound similar to (C).
• There are a lot of transformations one could make, for example, repeatedly removing n from one of the $a_n$s, or re-ordering them conveniently somehow, which shouldn’t affect any solution. You can also attack the RHS of (C) in a number of ways (perhaps with a case split for n odd or even?). As with involved but easy angle chases discussed in the companion post, it’s probably better to hold this in mind than instantly to do all of these, since it will just obscure the argument if you end up not using the result of the simplification!
• One should always try small examples. Here, it’s worth trying examples large enough to get a sense of what’s happening, because I think the crucial observation is that (unless you were very very unlucky) there are lots of suitable sequences $B=(b_i)$. (*)
• Certainly the case where the $(a_i)$s themselves form a complete n-residue system is ‘easiest’ and certainly straightforward. One might say that the case where the $(a_i)$ are equal (or at least congruent) is ‘hardest’ (in that $(b_i-a_i)$ might have to take the largest values in some sense), but is also straightforward. There’s certainly the option of trying to rigorise this, but I don’t know whether this can be made to work.

# BMO2 2019

The second round of the British Mathematical Olympiad was taken on Thursday by the 100 or so top scoring eligible participants from the first round, as well as some open entries. Qualifying for BMO2 is worth celebrating in its own right. The goal of the setters is to find the sweet spot of difficult but stimulating for the eligible participants, which ultimately means it’s likely to be the most challenging exam many of the candidates sit while in high school, at least in mathematics.

I know that lots of students view BMO2 as something actively worth preparing for. As with everything, this is a good attitude in moderation. Part of the goal in writing about the questions at such length is because I think at this level it’s particularly easy to devote more time than needed to preparation, and use it poorly. This year time is tight at the end of semester, and so what follows is closer to a set of complete solutions than usual, for which apologies, although I hope it is still possible to get a sense of how one might have come across the solutions yourself. Of course, this means that what follows will certainly spoil the problems for anyone who hasn’t tried them by themselves already.

The copyright for the problems is held by BMOS, and reproduced here with permission.

Question 1

As if often the case in geometry questions, what you’ve been asked to prove here isn’t the most natural property of the configuration. A good first step would be to see if there are stronger statements which are true.You are asked to show that triangle BPE is isosceles, but you aren’t told which of the three vertices is the apex. In fact, the task is to show that $BP=EP$ or, alternatively, $\angle BEP=\angle PBE$. It’s not in general true that BE is equal to BP=EP. Unless you’re very unlucky, you can establish this from one diagram.

Now, you don’t immediately know whether it’s going to be easier to show that two lengths are equal, or that two angles are equal. However, you know that P lies on the perpendicular bisector of BC, hence BP=CP, which is a big clue. In particular, this means that P would be the centre of the circle through BCE. This clearly implies the given result, so deciding to prove this instead is a good strategy.

There are now a number of ways to prove this. Note that D lies on the altitude from A, and the feet of the perpendiculars from D to sides AB and AC are both present in the configuration so (just as for the orthocentre diagram) we can calculate most of the angles involving {A,B,C,D,E}.

For example, ABDE is cyclic, so $\angle BED=\angle BAD = 90-\hat{B}$, hence $\angle AEB=\hat{B},\,\angle EBA=\hat{C}$. This shows that AB is tangent to the circumcircle of BCE. But then the line L is a radius of this circle, and so its centre must be P, the unique point on L which is equidistant from B and C.

Alternatively, we could directly calculate $\angle BEC=180-\hat{B}$ and $\angle CBP=90-\hat{B}$. But BPC is isosceles so $\angle BPC=2\hat{B}$. In general, the converse of ‘angle at centre is twice angle at circumference’ does not hold, but when we know P is equidistant from B and C this does hold, and so the angle relations precisely confirm that P is the centre of the circle through BPE.

My intention had been that the triangle would be acute-angled, to reduce the number of diagram options based on the magnitude of $\hat{B}$. If pursuing this second approach, one would need to be careful to account for whether P is on the same side or the opposite side of BC to E. That said, unless you do something very exotic, it should be exactly the same argument or calculation, and such a case distinction probably isn’t very important.

Question 2

First, a short remark. As stated, if n=5, a piece could move 3 squares to the left then 4 squares up, by Pythagoras. Handling all such options is likely to be quite annoying, since some values of n can be written in this Pythagorean form, and others cannot. This brings us to some good general principles for olympiad problems which look like this one:

• A construction, when one exists, will probably be possible using simple versions of the allowed moves / structures.
• An argument why a construction is impossible should probably be based on ideas which treat the simple moves similarly to the more complicated moves.

The setup of the problem encourages you to think about dividing the board into $n^2$ sub-boards, each with dimensions $n\times n$. Continue reading

# Lecture 10 – the configuration model

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

As we enter the final stages of the semester, I want to discuss some extensions to the standard Erdos-Renyi random graph which has been the focus of most of the course so far. Although we will not get far into the details during this course, the overall goal is to develop models which are close to Erdos-Renyi in terms of ease of analysis, while also allowing more of the features characteristic of networks observed in the real world.

One of the more obvious deficiencies of the sparse regime of Erdos-Renyi random graphs for modelling ‘real-world phenomena’ concerns the degree sequence. Indeed, the empirical degree distribution of G(n,c/n) converges to Poisson(c). By contrast, in real-world networks, a much wider range of degrees is typically observed, and in many cases it is felt that these should follow a power law, with a small number of a very highly connected agents.

One way around this problem to construct random graphs where we insist that the graph has a given sequence of degrees. The configuration model, which is the subject of this lecture and this post (and about which I’ve written before), offers one way to achieve this.

Definition and notes

Let $n\ge 1$ and let $d=(d_1,d_2,\ldots,d_n)$ be a sequence of non-negative integers such that $\sum_{i=1}^n d_i$ is even. Then the configuration model with degree sequence d is a random multigraph with vertex set [n], constructed as follows:

• To each vertex $i\in[n]$, assign $d_i$ half-edges;
• Then, take a uniform matching of these half-edges;
• Finally, for each pair of half-edges in the matching, replace the two half-edges with a genuine edge, to obtain the multigraph $CM_n(d)$, in which, by construction, vertex i has degree $d_i$.

One should note immediately that although the matching is uniform, the multigraph is not uniform amongst multigraphs with that degree sequence. Note also that the condition on the sums of the degrees is necessary for any graph, and in this context means that the number of half-edges is even, without which it would not be possible to construct a matching.

This effect is manifest in the simplest possible example, when n=2 and d=(3,3). There are two possible graphs, up to isomorphism, which are shown below:

For obvious reasons, we might refer to these as the handcuffs and the theta , respectively. It’s helpful if we, temporarily, assume the half-edges are distinguishable at the moment we join them up in the configuration model construction. Because then there are 3×3=9 ways to join them up to form the handcuffs (think of which half-edge ends up forming the edge between the two vertices) while there are 3!=6 ways to pair up the half-edges in the theta.

In general, for multigraphs H with the correct degree sequence, we have

$\mathbb{P}( CM_n(d)\simeq H) \propto \left( 2^{\# \text{loops}(H)} \prod_{e\in E(H)} \text{mult}(e)! \right),$

where $\text{mult}(e)$ is the multiplicity with which a given edge e appears in H.

Note: it might seem counterintuitive that this procedure is biased against multiple edges and self-loops, but it is really just saying that there are more ways to form two distinct edges than to form two equal edges (ie a multiedge pair) when we view the half-edges as distinguishable. (See this post for further discussion of this aspect in the 3-regular setting.)

However, a consequence of this result is that if we condition on the event that $CM_n(d)$ is simple, then the resulting random graph is uniform on the set of simple graphs satisfying the degree property. Note that the same example as above shows that there’s no guarantee that there exists a simple graph whose degrees are some given sequence.

d-regular configuration model

In general, from a modelling point of view, we are particularly interested in simple, connected graphs, and so it is valuable to study whether the large examples of the configuration model are likely to have these properties. In this lecture, I will mainly focus on the case where the multigraphs are d-regular, meaning that all the vertices have degree equal to d. For the purposes of this lecture, we denote by $G^d(n)$, the d-regular configuration model $CM_n(d,\ldots,d)$.

• d=1: to satisfy the parity condition on the sums of degrees, we must have n even. But then $G^1(n)$ will consist of n/2 disjoint edges.
• d=2: $G^2(n)$ will consist of some number of disjoint cycles, and it is a straightforward calculation to check that when n is large, with high probability the graph will be disconnected.

In particular, I will focus on the case when d=3, which is the first interesting case. Most of the results we prove here can be generalised (under various conditions) to more general examples of the configuration model. The main goal of the lecture is revision of some techniques of the course, plus one new one, in a fresh setting, and the strongest possible versions of many of these results can be found amongst the references listed at the end.

Connectedness

In the lecture, we showed that $G^3(2n)$ is connected with high probability. This is, in fact, a very weak result, since in fact $G^d(n)$ is d-connected with high probability for $d\ge 3$ [Bol81, Wor81]. Here, d-connected means that one must remove at least d vertices in order to disconnect the graph, or, equivalently, that there are d disjoint paths between any pair of vertices. Furthermore, Bollobas shows that for $d\ge 3$, $G^d(n)$ is a (random) expander family [Bol88].

Anyway, for the purposes of this course, the main tool is direct enumeration. The matching number $M_{2k}$ satisfies

$M_{2k}=(2k-1)\times (2k-3)\times\ldots\times 3\times 1 = \frac{(2k)!}{2^k \cdot k!},$

and so Stirling’s approximation gives the asymptotics

$M_{2k} = (\sqrt{2}+o(1)) \left(\frac{2}{e}\right)^k k^k,$

although it will be useful to use the true bounds

$c \left(\frac{2}{e}\right)^k k^k \le M_{2k}\le C\left(\frac{2}{e}\right)^k k^k,\quad \forall k,$

instead in some places. Anyway, in $G^3(2n)$, there are 6n half-edges in total, and so the probability that the graph may be split into two parts consisting of $2\ell,2m$ vertices, with $2\ell+2m=2n$, and with no edges between the classes is $\frac{\binom{2n}{2\ell} M_{6\ell}M_{6m}}{M_{6n}}.$ Continue reading

# Lecture 2 – Connectivity threshold

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

The goal of the second lecture was to establish the sharp phase transition for the connectivity of the random graph G(n,p(n)) around the critical regime $p(n)\sim \frac{\log n}{n}$. In the end, we showed that when $\omega(n)$ is any diverging sequence, and $p(n)=\frac{\log n-\omega(n)}{n}$, then we have that G(n,p(n)) is with high probability not connected.

In the next lecture, we will finish the classification by studying $p(n)=\frac{\log n+\omega(n)}{n}$, and show that for this range of p, the graph G(n,p(n)) is with high probability connected.

The details of the lecture, especially the calculation, are not presented fully here. There, I followed van der Hofstad’s recent book fairly closely, sometimes taking different approximations and routes through the algebra, though all versions remain fairly close to the original enumerations by Renyi.

Immediate remarks

• One is allowed to be surprised that for almost all scalings of p(n), G(n,p) is either whp connected or whp not connected. The speed of the transition is definitely interesting.
• As defined in lectures, the property that a graph is connected is an increasing property, meaning that it is preserved when you add additional edges to the graph.
• Because of the natural coupling between G(n,p) and G(n,q), the fact that connectedness is an increasing property makes life easier. For example, we can insist temporarily that $\omega(n)\ll \log n$, or whatever scaling turns out to be convenient for the proof, but conclude the result for all diverging $\omega(n)$. This avoids the necessity for an annoying case distinction.

Heuristics – Isolated vertices

It turns out that the ‘easiest’ way for such a graph to be disconnected is for it to have an isolated vertex. In determining that the graph has a cut into classes of sizes a and b with no edges between them, there is a trade-off between the number of ways to choose the partition (which increases with min(a,b) ) and the probabilistic penalty from banning the ab edges between the classes (which decreases with min(a,b) ). It turns out that the latter effect is slightly stronger, and so (1,n-1) dominates.

Method 1: second-moment method

In the case $p(n)=\frac{\log n - \omega(n)}{n}$, we use a second-moment method argument to establish that G(n,p) contains an isolated vertex with high probability. Note that a given vertex v is isolated precisely if n-1 edges are not present. Furthermore, two given vertices v,w are both isolated, precisely if 2n-3 edges are not present. So in fact, both the first moment and the second moment of the number of isolated vertices are straightforward to evaluate.

It turns out that the number of isolated vertices, $Y_n$, satisfies

$\mathbb{E}[Y_n]= \exp(\omega(n)+o(1))\rightarrow\infty.$ (*)

As always, we have to eliminate the possibility that this divergent expectation is achieved by the graph typically having no isolated vertices, but occasionally having very many. So we turn to the second moment, and can show

$\mathrm{Var}(Y_n)= (1+o(1))\mathbb{E}[Y_n],$

and so by Chebyshev’s inequality, we have $\mathbb{P}(Y_n=0)\rightarrow 0$.

Method 2: first-moment method

Counter-intuitively, although the case $p(n)=\frac{\log n + \omega(n)}{n}$ requires only a first-moment method, it is more technical because it involves the non-clear direction of the informal equivalence:

$\text{Connected}\; \iff ''\; \text{no isolated vertices}.$

At the time we showed (*), we also showed that for this regime of p(n), G(n,p) whp has no isolated vertices. It remains to show that it has no splits into (unions of) connected components of sizes k and n-k. Continue reading

# EGMO 2018

Last week the UK held its annual training and selection camp in Cambridge. This week, four of the students have been attending the European Girls’ Mathematical Olympiad. 2018 is the seventh edition of this prestigious competition, and is being held in Florence.

You can find very many details about the competition, and observe the UK’s excellent performance (with particular congratulations to Emily, who obtained a perfect score) at the competition website. A short article about the team in the TES can be found here.

In this post, I’m going to muse on some of the problems. You can find the two papers here and here.

Problem Two

Part a) would have been more immediate if the set A had been defined as

$A:= \left\{\frac{k+1}{k} \,:\, k=1,2,3,\ldots\right\},$

as this is instantly suggestive of a telescoping product such as

$7 = \frac{7}{6}\cdot \frac{6}{5}\cdot\ldots \cdot \frac{2}{1}.$

I found part b) to be one of the most difficult sections of the paper. It builds on the idea that given an expression for x as a product of elements of A, and an expression for y as a product of elements of A, we can combine these (ie take the product of the products!) to get an expression for xy as a product of elements of A. This yields $f(xy)\le f(x)+f(y)$, and so the task is to show that sometimes this isn’t the most efficient way to proceed.

I tried a couple of things, like trying to bound f(p) below when p is a prime. This wasn’t ludicrous, as one would certainly need to use a term $\frac{kp}{kp-1}$ somewhere in the product so that it is divisible by p. However, this didn’t go anywhere, and nor did investigating f(n!). Perhaps others had more success down these avenues.

But as a general rule, if an abstractly-defined function is typically hard to calculate, then classes where you can calculate it are likely to be extra valuable. Here, powers of two make life particularly easy. We have $2\in A$, and so $2^n=2\times 2\times\ldots\times 2$ is a valid product. And we can’t possibly achieve $2^n$ as a product of fewer terms than this, because 2 is the largest element of A. So $f(2^n)=n$. Note that this is already way better than the bound we would have achieved from our argument in part a), which yields $f(m)\le m-1$.

My next observation was that a similar argument and a natural construction gives $f(2^n+1)=n+1$. But this can be extended so that when $2^n+1\le m\le 2^{n+1}$, we have $f(m)\ge n+1$ and in fact there is equality exactly for

$m= 2^n+1, 2^n+2, 2^n+4,\ldots, 2^n+2^{n-1},2^{n+1}.$ (*)

In particular, note that all of these are even except $2^n+1$. It may well be the case that we don’t need this extension, but before you’ve solved the problem you don’t know how much you’ll have to extend each idea!

I had a feeling that this meant $f(2^n+1)$ was a strong candidate to satisfy the requirements, and that almost any factorisation would work. I suspect many students at this point did some refinement of choice of n, but I want to stay abstract, and use the extended observation (*). Since $2^n+1$ is certainly not always prime, let’s focus on the infinitely many values n where it has a factorisation as $2^n+1 = ab$, and consider whether a or b can achieve equality at (*). We’d better introduce the notation

$2^\alpha

So $ab> 2^{ab}+2^a+2^b+1$, and so $\alpha+\beta>n$. But similarly, $ab< 2^{\alpha+1}2^{\beta+1}$, so $\alpha+\beta. We obtain

$\alpha+\beta+1=n,$

which is kind of what I’d been hoping for when I started this analysis. Now, we have

$f(a)\ge \alpha+1,\quad f(b)\ge \beta+1,$

$\Rightarrow\quad f(a)+f(b)\ge \alpha+\beta+2 = n+1,$ (**)

with equality precisely if $a,b$ both satisfy the equality conditions at (*). But $a,b$ are odd, and so we have equality at (**) precisely if $a=2^\alpha+1,b=2^\beta+1$. So we have a resolution to the problem whenever $2^n+1$ can be non-trivially factorised in any way other than $2^n+1=(2^\alpha+1)(2^\beta+1)$, so we have a rich (and certainly infinite) class of suitable (x,y).

Problem Three

An obvious remark. The jury will never choose contestant i if she has fewer than contestants in front of her unless they are forced to. They are only forced to if everyone has this property. So we ignore the second dashed bullet point, as it just tells us when the process ends. And with a little further thought, the process ends exactly when the contestants are in correct order.

I suspect part a) of this may end up featuring on future examples of our interactive write-up clinic, where students are challenged to produce technically-correct arguments for obvious but awkward mini-problems. The location of contestant $C_n$ is probably a good starting point.

For part b), you have to find an optimal construction, and prove that it’s optimal. At national and junior olympiad level, students often forget that they have to supply both of these components. At international level, the challenge is to work out which one gives the entry into the problem. I would say that about two-thirds of the time, either the optimal construction is very obvious, or is best attacked after you’ve had some insight into the bound. For this question (and again it’s just my opinion), I felt it was all about the construction. I made absolutely no progress by aiming for bounds. Whereas the construction offers plenty of insight into how to prove the bounds, and so once I had it, I found it quick.

The usual rules apply. An optimal construction is going to have to be neat to describe. It’s very unlikely to have millions of cases. Intuitively, it seems reasonable that starting the contestants in reverse order gives the jury the greatest possible ‘elbow room’ to squeeze moves into the procedure. Perhaps you tried to prove this directly, by coupling a procedure starting from arbitrary order with a corresponding procedure starting from reverse order? Well, I found that very hard, and perhaps you did too.

However, that doesn’t mean it’s the wrong construction! The key question is, what to do about contestant $C_n$? Well, essentially nothing. This contestant can never be moved. So when do we start allowing other contestants to pass her? It seemed natural to let the other contestants $C_1,\ldots,C_{n-1}$ do as much as possible among themselves first. That is

$\mathbf{C_n},C_{n-1},\ldots,C_2,C_1 \quad \Rightarrow\Rightarrow\Rightarrow \quad \mathbf{C_n}, C_1,C_2,\ldots,C_{n-1},$

where $\Rightarrow\Rightarrow\Rightarrow$ denotes lots of moves. At this point, what to do next stood out for me, namely that one could use $\mathbf{C_n}$ at the start to put all the others back into reverse order, while moving $\mathbf{C_n}$ to the back. That is

$\mathbf{C_n},C_1,C_2,\ldots,C_{n-1}\quad\Rightarrow \quad C_1,\mathbf{C_n},C_2,\ldots,C_{n-1} \quad\Rightarrow\quad C_2,C_1,\mathbf{C_n},C_3,\ldots,C_{n-1}$

$\Rightarrow\Rightarrow \quad C_{n-1},C_{n-2},\ldots,C_2,C_1,\mathbf{C_n}.$

You might have tried other things first, but once you notice this, you just know it has to be right. It’s just too elegant a construction, and it looks like the sort of thing one prove will be optimal, because the overall process

$\mathbf{C_n},C_{n-1},\ldots,C_n\quad \Rightarrow\Rightarrow\Rightarrow \quad \mathbf{C_n},C_1,C_2,\ldots,C_{n-1}$

$\Rightarrow\Rightarrow\quad C_{n-1},\ldots,C_2,C_1,\mathbf{C_n}\quad\Rightarrow\Rightarrow\Rightarrow\quad C_1,C_2,\ldots,C_{n-1},\mathbf{C_n},$

incorporates the corresponding process for n-1 (twice, in fact) and so induction is very accessible both for calculating the total number of moves. We conjecture that this is indeed the optimal process, and under this assumption, with $f(n)$ the number of moves, we would have $f(n) = f(n-1) + (n-1) + f(n-1)$, from the three stages of the process, from which, after looking at small values,

$f(n)=2^n - (n+1).$

I started by saying that the construction was the hard part of this problem. Well, that doesn’t mean the bound is easy. But at least with a construction in hand, you can make observations that might inform a bounding argument:

• observation 1: contestant $C_n$ never jumps;
• observation 2: in the optimal construction, by induction $C_{n-1}$ doesn’t jump in the outer phases, so in fact jumps only once, ie during the middle phase;
• observation 3: contestant $C_{n-2}$ doesn’t jump very often, and in fact can only jump after at least one of $C_{n-1}$ and $C_n$ have ended up in front of her. Since we’ve established that $C_{n-1},C_n$ don’t jump very often themselves, this gives a bound on the number of times $C_{n-2}$.

There is still work to do here, and errors with $\pm 1$ could easily creep in. But I still hold fast to my original claim that the construction was the hard part here. Or at least, the rough form of the construction. I guess it’s possible that one would have started making observations like the three above without a construction in mind, but I think it’s unlikely. Anyway, I leave for you the final details of the bounding argument, which involves formally transcribing observation 3, proving it, then generalising it to jumps of $C_{n-3}$ and so on.

Problem Four

One of the exercises I have been setting to UK students recently is to produce short solution digests, which manifest any key ideas of the solution abstractly and briefly enough to resonate in the future. I’m a little tired of domino tiling problems, so I’ll do one of these here. This will be slightly longer than if I were not writing for a (small) audience.

Double-counting the total value by rows/columns and by dominos shows there are $\frac{2kn}{3}$ dominos in a balanced configuration. When n=3, we can achieve k=1, and by tiling copies of this down the main diagonal, can extend to $3\,|\,n$. For $3\not|\,n$, we must have $3\,|\,k$ ie $k\ge 3$, but in fact k=3 is achievable, by tiling the main diagonal with copies of small boards for which k=3 can be constructed with a bit of trial-and-error.

The double-counting idea at the start is the nice part of the problem. The construction is a bit annoying, but saving ourselves work by building up large examples from copies of small examples is a useful motif to have in mind.

Problem Six

This question has lots of clues in the statement. It would, for example, be pretty surprising if the answer were ‘no’ to part b) given the setup in part a).

My confession is that I wasted lots of time on part a) not using the option m=0, which was foolish given that it’s clued from part b) that one needs to use the option m=0. My thought had been to consider some integer y, and ask which integers x were banned (if we were aiming for contradiction in part a)). For part a), it gets harder if t is smaller, so I found it helpful to think of t as $\epsilon\ll 1$. Anyway, if you struggled on part a), maybe worth reviewing whether you were definitely trying to solve part a), and not accidentally using the setup that really addressed part b)!

Some people have shown me solutions to part a) that carry an air of magic, by placing all the key steps (such as (*) to follow) in the language of the original setup. Let’s try to be cleaner. The key is to consider m=0. Since m=0 is included, we know that whenever x<y, we must have

$\epsilon y \le x \le (1-\epsilon)y.$ (*)

Maybe you just have a gut feeling that this can’t be possible if you have enough xs and ys? But either way, choosing to focus on (*) is the key step, because once you know you have to prove the result based on this, it’s not too hard. I prefer addition to multiplication, so one might as well take logs (since does it really look like we’re going to use heavily the integer property now?) to obtain

$\alpha\le |z_i - z_j|\le \beta,$

for all $z_i,z_j$ in some large finite collection, where $0<\alpha<\beta$. You should now have a strong gut feeling that this is impossible. You have an arbitrarily large collection of real numbers which have to be close to each other pairwise, but never too close pairwise. How to finish the argument is a matter of taste.

For part b), assuming we’re aiming for the answer ‘yes’, we probably want to construct it one step at a time, and you want to think of $t\approx \frac12$ to make life as hard as possible.

Now, suppose you have $x_1,x_2,\ldots,x_n$ so far. What next? Well if we could have

$x_{n+1} \equiv \frac{x_i}{2}\,\mod x_i,$

for all $i=1,\ldots,n$, that would be perfect. We can often solve sets of coupled residue equations like this using the Chinese Remainder Theorem. (Recall of course that the solutions themselves are rarely important – the fact that they exist is enough!) A couple of things go wrong with trying this crudely:

• If $x_i$ is odd, then $\frac{x_i}{2}$ is not an integer…
• If we correct this by asking for $x_{n+1}\equiv \lfloor\frac{x_i}{2}\rfloor\,\mod x_i$, then there’s a chance we might still be within the t-window around a multiple of $x_i$.
• Unless we are going to make complicated demands on the residues, to use CRT it would be easier if all the $x_i$s were coprime.

One option is to give up. But actually all these objections can be handled with fairly small alterations. Can you see how the second objection can be overcome by an appropriate choice of $x_1$? Remember that t is fixed right at the start, and cannot be equal to 1/2. Is the third objection actually an objection any more? If it is, can we fix it?

Anyway, I guess P6 was my favourite non-geometry question on the paper, though, that’s far from relevant. P5 was pretty neat too, but who knows whether a follow-up geometry post will materialise soon.

# BMO2 2018

The second round of the British Mathematical Olympiad was taken yesterday by the 100 or so top scoring eligible participants from the first round, as well as some open entries. Qualifying for BMO2 is worth celebrating in its own right. The goal of the setters is to find the sweet spot of difficult but stimulating for the eligible participants, which ultimately means it’s likely to be the most challenging exam many of the candidates sit while in high school, at least in mathematics.

I know that lots of students view BMO2 as something actively worth preparing for. As with everything, this is a good attitude in moderation. Part of the goal in writing about the questions at such length (and in particular not just presenting direct solutions) is because I think at this level it’s particularly easy to devote more time than needed to preparation, and use it poorly.

All these questions could be solved by able children. In fact, each could be solved by able children in less than an hour. You definitely count as an able child if you qualified or if your teacher allowed you to make an open entry! Others count too naturally. But most candidates won’t in fact solve all the questions, and many won’t solve any. And I think candidates often come up with the wrong reasons why they didn’t solve problems. “I didn’t know the right theorems” is very very rarely the reason. Olympiad problems have standard themes and recurring tropes, but the task is not to look at the problem and decide that it is an example of Olympiad technique #371. The task is actually to have as many ideas as possible, and eliminate the ones that don’t work as quickly as possible.

The best way to realise that an idea works is to solve the problem immediately. For the majority of occasions when we’re not lucky enough for that to happen, the second-best way to realise that an idea works is to see that it makes the problem look a bit more like something familiar. Conversely, the best way to realise that an idea doesn’t work is to observe that if it worked it would solve a stronger but false problem too. (Eg Fermat’s Last Theorem *does* have solutions over the reals…) The second-best way to realise that an idea doesn’t work is to have the confidence that you’ve tried it enough and you’ve only made the problem harder, or less familiar.

Both of these second-best ideas do require a bit of experience, but I will try to explain why none of the ideas I needed for various solutions this year required any knowledge beyond the school syllabus, some similarities to recent BMOs, and a small bit of creativity.

As usual, the caveat that these are not really solutions, and certainly not official solutions, but they are close enough to spoil the problems for anyone who hasn’t tried them by themselves already. Of course, the copyright for the problems is held by BMOS, and reproduced here with permission.

Question One

I wrote this question. Perhaps as a focal point of the renaissance of my interest in geometry, or at least my interest in teaching geometry, I have quite a lot to say about the problem, its solutions, its origin story, the use of directed angles, the non-use of coordinate methods and so on. In an ideal world I would write a book about this sort of thing, but for now, a long and separate post is the answer.

This will be available once I’ve successfully de-flooded my apartment.

Question Two

I also wrote this problem, though I feel it’s only fair to show the version I submitted to the BMO committee. All the credit for the magical statement that appears above lies with them. There is a less magical origin story as well, but hopefully with some interesting combinatorial probability, which is postponed until the end of this post.One quick observation is that in my version Joe / Hatter gets to keep going forever. As we shall see, all the business happens in the first N steps, but a priori one doesn’t know that, and in my version it forces you to strategise slightly differently for Neel / Alice. In the competition version, we know Alice is done as soon as she visits a place for a second time, but not in the original. So in the original we only have to consider ‘avoid one place’ rather than the multiple possibilities now of ‘avoid one place’ or ‘visit a place again’.

But I think the best idea is to get Alice to avoid one particular place $c\not\equiv 0$ whenever possible. At all times she has two possible options for where to go next, lets say $b_k+a_k, b_k-a_k$ in the language of the original statement. We lose nothing by assuming $-N/2 < a_k\le N/2$, and certainly it would be ridiculous for Joe / Hatter ever to choose $a_k=0$. The only time Alice’s strategy doesn’t work is when both of these are congruent to $c$, which implies $N\,|\, 2a_k$, and thus we must have $N= 2a_k$. In other words, Alice’s strategy will always work if N is odd.

I think it’s really worth noticing that the previous argument is weak. We certainly did not show that N must be odd for Alice to win. We showed that Alice can avoid a congruence class modulo an odd integer. We didn’t really need that odd integer to be N for this to work. In particular, if N has an odd factor p (say a prime), then the same argument works to show that we can avoid visiting any site with label congruent to 1 modulo p.

It’s actually very slightly more complicated. In the original argument, we didn’t need to use any property of $b_k$. But obviously here, if $b_k\equiv 1$ modulo p and $p\,|\,a_k$, then certainly $b_{k+1}\equiv 1$ modulo p. So we have to prove instead that Alice can ensure she never ‘visits 1 modulo p for the first time’. Which is fine, by the same argument.

So, we’ve shown that Neel / Alice wins if N is odd, or has an odd factor. The only values that remain are powers of 2. I should confess that I was genuinely a little surprised that Joe / Hatter wins in the power of 2 case. You can find a construction fairly easily for N=2 and N=4, but I suspected that might be a facet of small numbers. Why? Because it still felt we could avoid a particular site. In order for Alice’s strategy to fail, we have to end up exactly opposite the particular site at exactly the time when the next $a_k=N/2$, and so maybe we could try to avoid that second site as well, and so on backwards?

But that turned out to be a good example of something that got very complicated quite quickly with little insight. And, as discussed at the beginning, that’s often a sign in a competition problem that your idea isn’t so good. (Obviously, when composing a problem, that’s no guarantee at all. Sometimes things are true but no good ideas work.) So we want other ideas. Note that for N=4, the sequence (2,1,2) works for Joe / Hatter, because that forces Alice / Neel to visit either (0,2,1,3) or (0,2,3,1). In particular, this strategy gave Alice no control on the first step nor the last step, and the consequence is that we force her to visit the evens first, then transfer to an odd, and then force her to visit the other odd.

We might play around with N=8, or we might proceed directly to a general extension. If we have a Joe / Hatter strategy for N, then by doubling all the $a_k$s, we have a strategy for 2N which visits all the even sites in the first N steps. But then we can move to an odd site eg by taking $a_N=1$. Just as in the N=4 case, it doesn’t matter which odd site we start from, since if we again double all the $a_k$s, we will visit all the other odd sites. This gives us an inductive construction of a strategy for powers of two. To check it’s understood, the sequence for N=8 is (4,2,4,1,4,2,4).

Although we don’t use it, note that this strategy takes Alice on a tour of sites described by decreasing order of largest power of two dividing the label of the site.

# Kernels of critical graph components

This post is motivated by G(N,p), the classical Erdos-Renyi random graph, specifically its critical window, when $p=p(N)=\frac{1}{N}(1+\lambda N^{-1/3})$.

We start with the following observation, which makes no restriction on p. Suppose a component of G(N,p) is a tree. Then, the graph geometry of this component is that of a uniform random tree on the appropriate number of vertices. This is deliberately informal. To be formal, we’d have to say “condition on a particular subset of vertices forming a tree-component” and so on. But the formality is broadly irrelevant, because at the level of metric scaling limits, if we want to describe the structure of a tree component, it doesn’t matter whether it has $\log N$ or $\frac{1}{7}N$ vertices, because in both cases the tree structure is uniform. The only thing that changes is the scaling factor.

In general, when V vertices form a connected component of a graph with E edges, we define the excess to be E-V+1. So the excess is non-negative, and is zero precisely when the component is a tree. I’m reluctant to say that the excess counts the number of cycles in the component, but certainly it quantifies the amount of cyclic structure present. We will sometimes, in a mild abuse of notation, talk about excess edges. But note that for a connected component with positive excess, there is a priori no way to select which edges would be the excess edges. In a graph process, or when there is some underlying exploration of the component, there sometimes might be a canonical way to classify the excess edges, though it’s worth remarking that the risk of size-biasing errors is always extremely high in this sort of situation.

Returning to the random graph process, as so often there are big changes around criticality. In the subcritical regime, the components are small, and most of them, even the largest with high probability, are trees. In the supercritical regime, the giant component has excess $\Theta(N)$, which is qualitatively very different.

It feels like every talk I’ve ever given has begun with an exposition of Aldous’s seminal paper [Al97] giving a distributional scaling limit of the sizes of critical components in the critical window, and a relation between the process on this time-scale and the multiplicative coalescent. And it remains relevant here, because the breadth-first exploration process can also be used to track the number of excess edges.

In a breadth-first exploration, we have a stack of vertices we are waiting to explore. We pick one and look its neighbours restricted to the rest of the graph, that is without the vertices we have already fully explored, and also without the other vertices in the stack. That’s the easiest way to handle the total component size. But we can simultaneously track how many times we would have joined to a neighbour within the stack, which leads to an excess edge, and Aldous derives a joint distributional scaling limit for the sizes of the critical components and their excesses. (Note that in this case, there is a canonical notion of excess edge, but it depends not just on the graph structure, but also on the extra randomness of the ordering within the breadth-first search.)

Roughly speaking, we consider the reflected exploration process, and its scaling limit, which is a reflected parabolically-drifting Brownian motion (though the details of this are not important at this level of exposition, except that it’s a well-behaved non-negative process that hits zero often). The component sizes are given by the widths of the excursions above zero, scaled up in a factor $N^{1/3}$. Then conditional on the shape of the excursion, the excess is Poisson with parameter the area under the excursion, with no rescaling. That is, a critical component has $\Theta(1)$ excess.

So, with Aldous’s result in the background, when we ask about the metric structure of these critical components, we are really asking: “what does a uniformly-chosen connected component with fixed excess look like when the number of vertices grows?”

I’ll try to keep notation light, but let’s say T(n,k) is a uniform choice from connected graphs on n vertices with excess k.

[Note, the separation of N and n is deliberate, because in the critical window, the connected components have size $n = \Theta(N^{2/3})$, so I want to distinguish the two problems.]

In this post, we will mainly address the question: “what does the cycle structure of T(n,k) look like for large n?” When k=0, we have a uniform tree, and the convergence of this to the Brownian CRT is now well-known [CRT2, LeGall]. We hope for results with a similar flavour for positive excess k.

2-cores and kernels

First, we have to give a precise statement of what it means to study just the cycle structure of a connected component. From now on I will assume we are always working with a connected graph.

There are several equivalent definitions of the 2-core C(G) of a graph G:

• When the excess is positive, there are some cycles. The 2-core is the union of all edges which form part of some cycle, and any edges which lie on a path between two edges which both form part of some cycle.
• C(G) is the maximal induced subgraph where all degrees are at least two.
• If you remove all the leaves from the graph, then all the leaves from the remaining graph, and continue, the 2-core is the state you arrive at where there are no leaves.

It’s very helpful to think of the overall structure of the graph as consisting of the 2-core, with pendant trees ‘hanging off’ the 2-core. That is, we can view every vertex of the 2-core as the root of a (possibly size 1) tree. This is particular clear if we remove all the edges of the 2-core from the graph. What remains is a forest, with one tree for each vertex of the 2-core.

In general, the k-core is the maximal induced subgraph where all degrees are at least k. The core is generally taken to be something rather different. For this post (and any immediate sequels) I will never refer to the k-core for k>2, and certainly not to the traditional core. So I write ‘core’ for ‘2-core’.

As you can see in the diagram, the core consists of lots of paths, and topologically, the lengths of these paths are redundant. So we will often consider instead the kernel, K(G), which is constructed by taking the core and contracting all the paths between vertices of degree greater than 2. The resulting graph has minimal degree at least three. So far we’ve made no comment about the simplicity of the original graphs, but certainly the kernel need not be simple. It will regularly have loops and multiple edges. The kernel of the graph and core in the previous diagram is therefore this:

Kernels of critical components

To recap, we can deconstruct a connected graph as follows. It has a kernel, and each edge of the kernel is a path length of some length in the core. The rest of the graph consists of trees hanging off from the core vertices.

For now, we ask about the distribution of the kernel of a T(n,K). You might notice that the case k=1 is slightly awkward, as when the core consists of a single cycle, it’s somewhat ambiguous how to define the kernel. Everything we do is easily fixable for k=1, but rather than carry separate cases, we handle the case $k\ge 2$.

We first observe that fixing k doesn’t confirm the number of vertices or edges in the kernel. For example, both of the following pictures could correspond to k=3:

However, with high probability the kernel is 3-regular, which suddenly makes the previous post relevant. As I said earlier, it can introduce size-biasing errors to add the excess edges one-at-a-time, but these should be constant factor errors, not scaling errors. So imagine the core of a large graph with excess k=2. For the sake of argument, assume the kernel has the dumbbell / handcuffs shape. Now add an extra edge somewhere. It’s asymptotically very unlikely that this is incident to one of the two vertices with degree three in the core. Note it would need to be incident to both to generate the right-hand picture above. Instead, the core will gain two new vertices of degree three.

Roughly equivalently, once the size of the core is fixed (and large) we have to make a uniform choice from connected graphs of this size where almost every vertex has degree 2, and $\Theta(1)$ of the rest have degree 3 or higher. But the sum of the degrees is fixed, because the excess is fixed. If there are n vertices in the core, then there are $\Theta(n)$ more graphs where all the vertices have degree 2 or 3, than graphs where a vertex has degree at least 4. Let’s state this formally.

Proposition: The kernel of a uniform graph with n vertices and excess $k\ge 2$ is, with high probability as $n\rightarrow\infty$, 3-regular.

This proved rather more formally as part of Theorem 7 of [JKLP], essentially as a corollary after some very comprehensive generating function setup; and in [LPW] with a more direct computation.

In the previous post, we introduced the configuration model as a method for constructing regular graphs (or any graphs with fixed degree sequence). We observe that, conditional on the event that the resulting graph is simple, it is in fact uniformly-distributed among simple graphs. When the graph is allowed to be a multigraph, this is no longer true. However, in many circumstances, as remarked in (1.1) of [JKLP], for most applications the configuration model measure on multigraphs is the most natural.

Given a 3-regular labelled multigraph H with 2(k-1) vertices and 3(k-1) edges, and K a uniform choice from the configuration model with these parameters, we have

$\mathbb{P}\left( K \equiv H \right) \propto \left(2^{t(H)} \prod_{e\in E(H)} \mathrm{mult}(e)! \right)^{-1},$

where t(H) is the number of loops in H, and mult(e) the multiplicity of an edge e. This might seem initially counter-intuitive, because it looks we are biasing against graphs with multiple edges, when perhaps our intuition is that because there are more ways to form a set of multiple edges we should bias in favour of it.

I think it’s most helpful to look at a diagram of a multigraph as shown, and ask how to assign stubs to edges. At a vertex with degree three, all stub assignments are different, that is 3!=6 possibilities. At the multiple edge, however, we care which stubs match with which stubs, but we don’t care about the order within the multi-edge. Alternatively, there are three choices of how to divide each vertex’s stubs into (2 for the multi-edge, 1 for the rest), and then two choices for how to match up the multi-edge stubs, ie 18 in total = 36/2, and a discount factor of 2.

We mention this because in fact K(T(n,k)) converges in distribution to this uniform configuration model. Once you know that K(T(n,k)) is with high probability 3-regular, then again it’s probably easiest to think about the core, indeed you might as well condition on its total size and number of degree 3 vertices. It’s then not hard to convince yourself that a uniform choice induces a uniform choice of kernel. Again, let’s state that as a proposition.

Proposition: For any H a 3-regular labelled multigraph H with 2(k-1) vertices and 3(k-1) edges as before,

$\lim_{n\rightarrow\infty}\mathbb{P}\left( K(T(n,k)) \equiv H \right) \propto \left(2^{t(H)} \prod_{e\in E(H)} \mathrm{mult}(e)! \right)^{-1}.$

As we said before, the kernel describes the topology of the core. To reconstruct the graph, we need to know the lengths in the core, and then how to glue pendant trees onto the core. But this final stage depends on k only through the total length of paths in the core. Given that information, it’s a combinatorial problem, and while I’m not claiming it’s easy, it’s essentially the same as for the case with k=1, and is worth treating separately.

It is worth clarifying a couple of things first though. Even the outline of methods above relies on the fact that the size of the core diverges as n grows. Again, the heuristic is that up to size-biasing errors, T(n,k) looks like a uniform tree with some uniformly-chosen extra edges. But distances in T(n,k) scale like $n^{1/2}$ (and thus in critical components of G(N,p) scale like $N^{1/3}$). And the core will be roughly the set of edges on paths between the uniformly-chosen pairs of vertices, and so will also have length $\Theta(n^{1/2})$.

Once you have conditioned on the kernel structure, and the (large) number of internal vertices on paths in the core (ie the length of the core), it is natural that the assignment of the degree-2 vertices to core paths / kernel edges is uniform. A consequence of this is that if you record $(Y_1,\ldots,Y_m)$ the lengths of paths in the core, where m=3(k-1), then

$\frac{(Y_1,\ldots,Y_m)}{\sum Y_i} \stackrel{d}\rightarrow \mathrm{Dirichlet}(1,1,\ldots,1).$

This is stated formally as Corollary 7 b) of [ABG09]. It’s worth noting that this confirms that the lengths of core paths are bounded in probability away from zero after the appropriate rescaling. In seeking a metric scaling limit, this is convenient as it means there’s so danger that two of the degree-3 vertices end up in ‘the same place’ in the scaling limit object.

To recap, the only missing ingredients now to give a complete limiting metric description of T(n,k) are 1) a distributional limit of the total core length; 2) some appropriate description of set of pendant trees conditional on the size of the pendant forest. [ABG09] show the first of these. As remarked before, all the content of the second of these is encoded in the unicyclic k=1 case, which I have written about before, albeit slightly sketchily, here. (Note that in that post we get around size-biasing by counting a slightly different object, namely unicyclic graphs with an identified cyclic edge.)

However, [ABG09] also propose an alternative construction, which you can think of as glueing CRTs directly onto the stubs of the kernel (with the same distribution as before). The proof that this construction works isn’t as painful as one might fear, and allows a lot of the other metric distributional results to be read off as corollaries.

References

[ABG09] – Addario-Berry, Broutin, Goldschmidt – Critical random graphs: limiting constructions and distributional properties

[CRT2] – Aldous – The continuum random tree: II

[Al97] – Aldous – Brownian excursions, critical random graphs and the multiplicative coalescent

[JKLP] – Janson, Knuth, Luczak, Pittel – The birth of the giant component

[LeGall] – Le Gall – Random trees and applications

[LPW] – Luczak, Pittel, Wierman – The structure of a random graph at the point of the phase transition

# Random 3-regular graphs

A graph is d-regular if every vertex has degree d. Probably the easiest examples of d-regular graphs are the complete graph on (d+1) vertices, and the infinite d-ary tree. A less trivial example is the Petersen graph, which is 3-regular. 3-regular graphs will be the main focus for some of this post, but initially we lose nothing by considering general d.

Throughout, a necessary condition for the existence of a d-regular graph with N vertices is that at least one of d and N is even, as the sum of the degrees of a graph must be even. We will always assume that this holds, so that when d=3, we are always taking N to be even.

A natural pair of questions for a probabilist is ‘can we sample a d-regular graph with N vertices uniformly at random?’ and ‘what does a typical large d-regular graph look like?’

In a rather old post, I addressed some aspects of the first question, but revisit it briefly here. A good idea, due to Bollobas [B80] is to assign to all the vertices d stubs (or half-edges), and choose a matching of the Nd stubs uniformly at random. This works as a method to generate a random graph with any fixed degree sequence.

If you want your graphs to be simple, this can go wrong, because there’s a chance you get loops (that is, an edge from a vertex v to itself) and multiple edges between the same pair of vertices. It would be nice the graph formed in this fashion was simple with high probability when $N\rightarrow\infty$. Unfortunately that’s not the case, however the probability that the graph is simple remains asymptotically bounded away from 0 and 1. Indeed, because the presence of a loop / multiple edge is asymptotically independent of the presence of a loop / multiple edge elsewhere, it’s unsurprising we have a Poisson limit for the number of such occurences. So from a sampling point of view, it’s reasonable to sample a graph in this way until you find a simple one. This takes O(1) steps, and it’s O(N) steps to check whether a given multigraph is simple.

It’s clear that conditional on the graph generated in this fashion being simple, its distribution is uniform on the set of simple graphs with the correct degree distribution. If you are happy for your graphs to have loops, then it’s a little bit more complicated, because if an edge has multiplicity k, these can appear in k! ways in the configuration construction.

Other asymptotic properties

Loops and multiple edges can be thought of as cycles of length 1 and 2 respectively if you want. We might ask about other small cycles. A calculation in expectation is relatively straightforward. Given three vertices, the probability they form a triangle (in at least one way) is $\Theta(N^{-3})$, and there are $\Theta(N^3)$ ways to choose three vertices. Thus the expected number of triangles is $\Theta(1)$. Finally, the edge structure induced on disjoint triples is asymptotically independent, and hence a Poisson limit. (See [J06] for details, including more detail on the general configuration construction.) The same result holds for the same reasons for cycles of any fixed finite length.

We might also ask about connectivity. At a heuristic level, there are two ways for the graph to be disconnected: it could have some small components; or it could have two components of size $\Theta(N)$. The smallest possible component is $K_4$, and an argument like for the cycles above shows that the number of copies of $K_4$ vanishes in expectation. Now, consider having two components of size roughly N/2. There are $\binom{N}{N/2} \sim 2^{2N}$ ways to make this choice. However, given such a choice, we can handle the probability that all the stubs from one class match within that class by going through the class one stub at a time:

$\frac{\frac{3N}{2}-1}{3N-1} \times \frac{\frac{3N}{2}-3}{3N-3} \times \cdots \times \frac{1}{\frac{3N}{2}+1}.$

We approximate this as

$\frac{\sqrt{(3N/2)!}}{\sqrt{ (3N)!}} \sim e^{3N/2} 2^{-3N/2} \left(3N\right)^{-3N/2},$

and this dominates the number of choices powerfully enough that we might believe it remains valid for a broader range of class sizes. In fact we have a much stronger statement, namely that G(N,3) is 3-connected with high probability. This means that the graph cannot be disconnected by removing two vertices, or equivalently that there are three vertex-disjoint paths between any pair of vertices in the graph, essentially one emerging from each stub. See this note by David Ellis for a quick proof. We might return to this later.

You might ask about planarity. It’s clear from degree consideration that there are no induced copies of $K_5$ in any random 3-regular graph, and since $K_{3,3}$ contains a cycle of length 4, and with high probability G(N,3) doesn’t, that takes care of that possibility too. However, there might be minors of this form. This seemed a good example of the Kuratowski criterion not actually being that useful, since I certainly don’t find the minors of the 3-regular graph an obvious structure to handle.

However, we can use Euler’s formula V – E + F = 2 for planar graphs. Here V = N, E = 3N/2. Faces are described by (a subset of the) cycles, and we there are asymptotically O(1) small cycles, so most faces include a large number of edges. But each edge corresponds to at most two faces. So we have $F \ll E$, and so with high probability Euler’s formula can’t hold in G(N,3) for large N.

We can also ask about the local limit of G(N,3). Since the vertices are exchangeable, we don’t need to worry about whether we choose the root uniformly at random (often referred to as the Benjamini-Schramm sense) or by some other method.

The root has up to three neighbours, and with high probability it has exactly three neighbours. These neighbours have at most two other neighbours themselves. However, we’ve already seen that there are asymptotically O(1) cycles, and so with high probability there are no small cycles near a fixed root vertex. So the six neighbours-of-neighbours are with high probability different to the root and the root’s neighbours and to each other. We can make this argument at arbitrary finite radius from the root, to conclude that the local limit of G(N,3) is the infinite 3-ary tree.

Spectral expansion

[Caveat – this is something I read about and wanted to mention, but I really don’t know much at all about any of this theory, and it’s definitely not certain that what follows wouldn’t be better replaced by a set of links.]

This straightforward local limit offers good heuristics on some of the more global properties. Almost by definition, the d-ary tree expands as rapidly as is possible away from the root among infinite d-regular graphs. There are a number of ways to measure the expansion of a graph, and some methods transfer better to the infinite setting than others. The adjacency matrix of an infinite graph can be defined similarly to that of a finite graph, and it remains possible to talk about eigenfunctions and spectrum. As for the finite setting, d is an eigenvalue because the tree is d-regular, and -d is an eigenvalue because it is also bipartite.

The next largest eigenvalue $\lambda_2$ governs the spectral gap $d-\lambda_2$ which is a measure of the expansion of a graph. A graph is a good (spectral) expander if all the non-trivial eigenvalues are close to zero. A priori, all we know is that $|\lambda_2|\le d$. For the infinite d-ary tree, we have $\lambda_2 = 2\sqrt{d-1}$. This blog post by Luca Trevisan gives a very readable proof.

A key result is that finite graphs can have $\lambda_2 \le 2\sqrt{d-1}$, but not asymptotically. That is, taking N to be the number of vertices:

$\lambda_2 \ge 2\sqrt{d-1} - o_N(1).$

This is the content of the Alon-Boppana theorem [Al86]. In fact the error can be quantified as $O(\frac{1}{\log N})$ – the diamater of the graph is relevant here. A finite d-regular graph for which $\lambda_2\le 2\sqrt{d-1}$ is called a Ramanujan graph. The existence of Ramanujan graphs has been much studied, and various constructions often rely on number theoretic properties of N, and lie at the interface of disparate branches of mathematics where my understanding is zero rather than epsilon.

Now return to our view of the d-ary tree as the local limit of a d-regular graph on N vertices for large N. We might expect from everything above that the uniform d-regular graph is a good expander. Bollobas shows that in the sense of edge-expansion, asymptotically almost all d-regular graphs have edge-expansion bounded away from zero. (See Section 2 of [Ell], including history of the d=3 case.) Friedman [Fri08] proves the conjecture of Alon that for every $\epsilon>0$, a.a.s. $\lambda_2$ for G(N,d) is at most $2\sqrt{d-1}+\epsilon$. In this sense, G(N,d) is asymptotically ‘almost Ramanujan’. (See also [Bor17] for another proof and an introduction including history, context and references.)

Some other links: The Wikipedia page on expanders, which includes a discussion of the different descriptions of expansion, and the Cheeger inequalities and other relations between them; slides for a talk by Spielman on spectra and Ramanujan graphs; a survey by Murty on Ramanujan graphs;.

What next?

This post took a slightly different direction from what I had intended, and rather than make a halting U-turn back to my planned finale, I’ll postpone this. However, a short overture is that I’m interested in the structure of critical components of random graphs during the critical window. This is the window during which the largest components first have cycles with probability $\Theta(1)$. Indeed, the critical components have size $\Theta(N^{2/3})$ and $\Theta(1)$ surplus edges. Conditional on their size, and number of surplus edges, the choice of the graph structure on the component is uniform among such (connected) graphs.

Addario-Berry, Broutin and Goldschmidt [ABG09] study scaling limits of such components. Central to this analysis is the 2-core of such components, which can be described in terms of 3-regular (multi)graphs. Various processes we are now interested in running on the critical components of critical RGs can then be studied in terms of related processes on random 3-regular graphs.

References

[ABG09] – Addario-Berry, Broutin, Goldschmidt – Critical random graphs: limiting constructions and distributional properties

[Al86] – Alon – Eigenvalues and expanders

[B80] – Bollobas – A probabilistic proof of an asymptotic formula for the number of labelled regular graphs

[B88] – Bollobas – The isoperimetric number of random regular graphs

[Bor17] – Bordenave – A new proof of Friedman’s second eigenvalue theorem and its extension to random lifts. Arxiv.

[Ell] – Ellis – The expansion of random regular graphs

[Fri08] – Friedman – A proof of Alon’s second eigenvalue conjecture and related problems

[J06] – Janson – The probability that a random multigraph is simple by

# Generating uniform trees

A long time ago, I wrote quite a few a things about uniform trees. That is, a uniform choice from the $n^{n-2}$ unrooted trees with vertex set [n]. This enumeration, normally called Cayley’s formula, has several elegant arguments, including the classical Prufer bijection. But making a uniform choice from a large set is awkward, and so we seek more probabilistic methods to sample such a tree, which might also give insight into the structure of a ‘typical’ uniform tree.

In another historic post, I talked about the Aldous-Broder algorithm. Here’s a quick summary. We run a random walk on the complete graph $K_n$ started from a uniformly-chosen vertex. Every time we arrive at a vertex we haven’t visited before, we record the edge just traversed. Eventually we have visited all n vertices, so have recorded n-1 edges. It’s easy enough to convince yourself that these n-1 edges form a tree (how could there be a cycle?) and a bit more complicated to decide that the distribution of this tree is uniform.

It’s worth noting that this algorithm works to construct a uniform spanning tree on any connected base graph.

This post is about a few alternative constructions and interpretations of the uniform random tree. The first construction uses a Galton-Watson process. We take a Galton-Watson process where the offspring distribution is Poisson(1), and condition that the total population size is n. The resulting random tree has a root but no labels, however if we assign labels in [n] uniformly at random, the resulting rooted tree has the uniform distribution among rooted trees on [n].

Proof

This is all about moving from ordered trees to non-ordered trees. That is, when setting up a Galton-Watson tree, we distinguish between the following two trees, drawn extremely roughly in Paint:

That is, it matters which of the first-generation vertices have three children. Anyway, for such a (rooted) ordered tree T with n vertices, the probability that the Galton-Watson process ends up equal to T is

$\mathbb{P}(GW = T) = \prod_{v\in T} \frac{e^{-1}}{C(v)!} = e^{-n} \prod_{v\in T}\frac{1}{C(v)!},$

where $C(v)$ is the number of children of a vertex $v\in T$. Then, since $\mathbb{P}( |GW|=n )$ is a function of n, we find

$\mathbb{P}(GW=T \,\big|\, |GW|=n) = f(n)\prod_{v\in T} \frac{1}{C(v)!},$

where f(n) is a function of n alone (ie depends on T only through its size n).

But given an unordered rooted tree t, labelled by [n], there are $\prod_{v \in t} C(v)!$ ordered trees associated to t in the natural way. Furthermore, if we take the Poisson Galton-Watson tree conditioned to have total population size n, and label uniformly at random with [n], we obtain any one of these ordered trees with probability $\frac{f(n)}{n!} \prod_{v\in t} \frac{1}{C(v)!}$. So the probability that we have t after we forget about the ordering is $\frac{f(n)}{n!}$, which is a function of n alone, and so the distribution is uniform among the set of rooted unordered trees labelled by [n], exactly as required.

Heuristic for Poisson offspring distribution

In this proof, the fact that $\mathbb{P}(C(v)=k)\propto \frac{1}{k!}$ exactly balances the number of orderings of the k children explains why Poisson(1) works out. Indeed, you can see in the proof that Poisson(c) works equally well, though when $c\ne 1$, the event we are conditioning on (namely that the total population size is n) has probability decaying exponentially in n, whereas for c=1, the branching process is critical, and the probability decays polynomially.

We can provide independent motivation though, from the Aldous-Broder construction. Both the conditioned Galton-Watson construction and the A-B algorithm supply the tree with a root, so we’ll keep that, and look at the distribution of the degree of the root as constructed by A-B. Let $\rho=v_1,v_2,v_3,\ldots$ be the vertices [n], ordered by their discovery during the construction. Then $\rho$ is definitely connected by an edge to $v_2$, but thereafter it follows by an elementary check that the probability $\rho$ is connected to $v_m$ is $\frac{1}{n-1}$, independently across all m. In other words, the distribution of the degree of $\rho$ in the tree as constructed by A-B is

$1+ \mathrm{Bin}\left(n-2,\frac{1}{n-1}\right) \approx 1+\mathrm{Poisson}(1).$

Now, in the Galton-Watson process, conditioning the tree to have fixed, large size changes the offspring distribution of the root. Conveniently though, in a limiting sense it’s the same change as conditioning the tree to have size at least n. Since these events are monotone in n, it’s possible to take a limit of the conditioning events, and interpret the result as the Galton-Watson tree conditioned to survive. It’s a beautiful result that this interpretation can be formalised as a local limit. The limiting spine decomposition consists of an infinite spine, where the offspring distribution is a size-biased version of the original offspring distribution (and so in particular, always has at least one child) and where non-spine vertices have the original distribution.

In particular, the number of the offspring of the root is size-biased, and it is well-known and not hard to check that size-biasing Poisson(c) gives 1+Poisson(c) ! So in fact we have, in an appropriate limiting sense in both objects, a match between the degree distribution of the root in the uniform tree, and in the conditioned Galton-Watson tree.

This isn’t supposed to justify why a conditioned Galton-Watson tree is relevant a priori (especially the unconditional independence of degrees), but it does explain why Poisson offspring distributions are relevant.

Construction via G(N,p) and the random cluster model

The main reason uniform trees were important to my thesis was their appearance in the Erdos-Renyi random graph G(N,p). The probability that vertices {1, …, n} form a tree component in G(N,p) with some particular structure is

$p^{n-1} (1-p)^{\binom{n}{2}-(n-1)} \times (1-p)^{n(N-m)}.$

Here, the first two terms give the probability that the graph structure on {1, …, n} is correct, and the the final term gives the probability of the (independent) event that these vertices are not connected to anything else in the graph. In particular, this has no dependence on the tree structure chosen on [n] (for example, whether it should be a path or a star – both examples of trees). So the conditional distribution is uniform among all trees.

If we work in some limiting regime, where $pn\rightarrow 0$ (for example if n is fixed and $p=\frac{1}{N}\rightarrow 0$), then we can get away asymptotically with less strong conditioning. Suppose we condition instead just that [n] form a component. Now, there are more ways to form a connected graph with one cycle on [n] than there are trees on [n], but the former all require an extra edge, and so the probability that a given one such tree-with-extra-edge appears as the restriction to [n] in G(N,p) is asymptotically negligible compared to the probability that the restriction to [n] of G(N,p) is a tree. Naturally, the local limit of components in G(N,c/N) is a Poisson(c) Galton-Watson branching process, and so this is all consistent with the original construction.

One slightly unsatisfying aspect to this construction is that we have to embed the tree of size [n] within a much larger graph on [N] to see uniform trees. We can’t choose a scaling p=p(n) such that G(n,p) itself concentrates on trees. To guarantee connectivity with high probability, we need to take $p> \frac{\log n}{n}$, but by this threshold, the graph has (many) cycles with high probability.

At this PIMS summer school in Vancouver, one of the courses is focusing on lattice spin models, including the random cluster model, which we now briefly define. We start with some underlying graph G. From a physical motivation, we might take G to be $\mathbb{Z}^d$ or some finite subset of it, or a d-ary tree, or the complete graph $K_N$. As in classical bond percolation (note G(N,p) is bond percolation on $K_N$), a random subset of the edges of G are included, or declared open. The probability of a given configuration w, with e open edges is proportional to

$p^e (1-p)^{|E(G)| - e} q^{k(w)},$ (*)

where the edge-weight $p\in(0,1)$ as usual, and cluster weight $q\in (0,\infty)$, and $k(w)$ counts the number of connected components in configuration w. When q=1, we recover classical bond percolation (including G(N,p) ), while for q>1, this cluster-reweighting favours having more components, and q<1 favours fewer components. Note that in the case $q\ne 1$, the normalising constant (or partition function) of (*) is generally intractable to calculate explicitly.

As in the Erdos-Renyi graph, consider fixing the underlying graph G, and taking $p\rightarrow 0$, but also taking $\frac{q}{p}\rightarrow 0$. So the resulting graph asymptotically ‘wants to have as few edges as possible, but really wants to have as few components as possible’. In particular, 1) all spanning trees of G are equally likely; 2) any configuration with more than one component has asymptotically negligible probability relative to any tree; 3) any graph with a cycle has #components + #edges greater than that of a tree, and so is asymptotically negligible probability relative to any tree.

In other words, the limit of the distribution is the uniform spanning tree of G, and so this (like Aldous-Broder) is a substantial generalisation, which constructs the uniform random tree in the special case where $G=K_n$.

# Balkan MO 2017 – Qs 1, 3 and 4

The UK is normally invited to participate as a guest team at the Balkan Mathematical Olympiad, an annual competition between eleven countries from South-Eastern Europe. I got to take part in Rhodes almost exactly ten years ago, and this year the competition was held in Ohrid, in Macedonia. There’s one paper, comprising four questions, normally one from each of the agreed olympiad topic areas, with 4.5 hours for students to address them. The contest was sat this morning, and I’m going to say quite a bit about the geometric Q2, and a little bit about Qs 1 and 3 also. In all cases, this discussion will include most of a solution, with some commentary, so don’t read these if you are planning to try the problems yourself.

I’m not saying anything about Q4, because I haven’t solved it. (Edit: I have solved it now, so will postpone Q2 until later today.)

Question One

Find all ordered pairs of positive integers (x,y) such that

$x^3+y^3=x^2+42xy+y^2.$

The first thought is that if either of x or y is ‘large’, then the LHS is bigger than the RHS, and so equality can’t hold. That is, there are only finitely many solutions. The smallest possible value of y is, naturally, 1, and substituting y=1 is convenient as then $y^2=y^3$, and it’s straightforward to derive $x=7$ as a solution.

Regarding the non-existence of large solutions, you can make this precise by factorising the LHS as

$(x+y)(x^2-xy+y^2) = x^2+42xy+y^2.$

There are 44 terms of degree two on the RHS, and one term of degree in the second bracket on the LHS. With a bit of AM-GM, you can see then that if $x+y>44$, you get a contradiction, as the LHS will be greater than the RHS. But that’s still a lot of possibilities to check.

It struck me that I could find ways to reduce the burden by reducing modulo various primes. 2, 3 and 7 all divide 42, and furthermore cubes are nice modulo 7 and squares are nice modulo 3, so maybe that would bring the number of possibilities down. But my instinct was that this wasn’t the right way to use the fact that we were solving over positive integers.

The second bracket in the factorisation looks enough like the RHS, that it’s worth exploring. If we move $x^2-xy+y^2$ from the right to the left, we get

$(x+y-1)(x^2-xy+y^2) = 43xy.$ (1.1)

Now it suddenly does look useful that we are solving over positive integers, because 43 is a prime, so has to appear as a factor somewhere on the LHS. But it’s generally quite restrictive that $x^2-xy+y^2 | 43xy$. This definitely looks like something that won’t hold often. If x and y are coprime, then certainly $x^2-xy+y^2$ and $y$ are coprime also. But actually if x and y have a non-trivial common factor d, we can divide both sides by $d^2$, and it still holds. Let’s write

$x=dm,\quad y=dn,\quad\text{where }d=\mathrm{gcd}(x,y).$

Then $m^2 -mn+n^2$ really does divide 43, since it is coprime to both m and n. This is now very restrictive indeed, since it requires that $m^2-mn+n^2$ be equal to 1 or 43. A square-sandwiching argument gives $m^2-mn+n^2=1$ iff $m=n=1$. 43 requires a little bit more work, with (at least as I did it) a few cases to check by hand, but again only has one solution, namely $m=7, n=1$ and vice versa.

We now need to add the common divisor d back into the mix. In the first case, (1.1) reduces to $(2d-1)=43$, which gives $(x,y)=(22,22)$. In the second case, after cancelling a couple of factors, (1.1) reduces to $(8d-1)=7$, from which $(x,y)=(7,1),(1,7)$ emerges, and these must be all the solutions.

The moral here seemed to be that divisibility was a stronger tool than case-reduction. But that was just this question. There are other examples where case-reduction is probably more useful than chasing divisibility.

Question Three

Find all functions $f:\mathbb{N}\rightarrow\mathbb{N}$ such that

$n+f(m) \,\big|\, f(n)+nf(m)$

for all $m,n\in\mathbb{N}$.

What would be useful here? There are two variables, and a function. It would be useful if we could reduce the number of variables, or the number of occurences of f. We can reduce the number of variables by taking m=n, to get

$n+f(n) \,\big|\, f(n) [1+n].$ (3.1)

From this, we might observe that $f(n)\equiv 1$ is a solution. Of course we could analyse this much more, but this doesn’t look like a 10/10 insight, so I tried other things first.

In general, the statement that $a\,|\,b$ also tells us that $a\,|\, b-ka$. That is, we can subtract arbitrary multiples of the divisor, and the result is still true. A recurring trope is that the original b is elegant, but an adjusted b-ka is useful. I don’t think we can do the latter, but by subtracting $n^2 +nf(m)$ from the problem statement, we get

$n+f(m) \,\big|\, n^2-f(n).$ (3.2)

There’s now no m on the RHS, but this relation has to hold for all m. One option is that $f(n)=n^2$ everywhere, then what we’ve deduced always holds since the RHS is zero. But if there’s a value of n for which $f(n)\ne n^2$, then (3.2) is a very useful statement. From now on, we assume this. Because then as we fix n and vary m, we need $n+f(m)$ to remain a divisor of the RHS, which is fixed, and so has finitely many divisors. So $f(m)$ takes only finitely many values, and in particular is bounded.

This ties to the observation that $f\equiv 1$ is a solution, which we made around (3.1), so let’s revisit that: (Note, there might be more elegant ways to finish from here, but this is what I did. Also note, n is no longer fixed as in previous paragraph.)

$n+f(n) \,\big|\, f(n) [1+n].$ (3.1)

Just to avoid confusion between the function itself, and one of the finite collection of values it might take, let’s say b is a value taken by f. So there are values of n for which

$n+b \,\big|\, b(1+n).$

By thinking about linear equations, you might be able to convince yourself that there are only finitely many solutions (in n) to this relation. There are certainly only finitely many solutions where LHS=RHS (well, at most one solution), and only finitely many where 2xLHS=RHS etc etc. But why do something complicated, when we can actually repeat the trick from the beginning, and subtract $b(n+b)$, to obtain

$n+b \,\big|\, b^2-b.$

For similar reasons to before, this is a great deduction, because it means if $b\ne 1$, then the RHS is positive, which means only finitely many n can satisfy this relation. Remember we’re trying to show that no n can satisfy this relation if $b\ne 1$, so this is definitely massive progress!

If any of what’s already happened looked like magic, I hope we can buy into the idea that subtracting multiples of the divisor from the RHS is the only tool we used, and that making the RHS fixed gives a lot of information about the LHS as the free variable varies. The final step is not magic either. We know that f is eventually 1. If you prefer “for large enough n, $f(n)=1$,” since all other values appear only finitely often. I could write this with quantifiers, but I don’t want to, because that makes it seem more complicated than it is. We genuinely don’t care when the last non-1 value appears.

Anyway, since we’ve deduced this, we absolutely have to substitute this into something we already have. Why not the original problem statement? Fix m, then for all large enough n

$n+f(m) \,\big|\, 1+nf(m).$ (3.3)

To emphasise, (3.3) has to hold for all large enough n. Is it possible that f(m)=2? Again, it’s easy to convince yourself not. But, yet again, why not use the approach we’ve used so profitably before to clear the RHS? In fact, we already did this, and called it (3.2), and we can make that work [3.4], but in this setting, because f(m) is fixed and we’re working with variable large n, it’s better to eliminate n, to get

$n+f(m)\,\big|\, f(m)^2-1,$

again for all large enough n. By the same size argument as before, this is totally impossible unless f(m)=1. Which means that in fact $f(m)=1$ for all m. Remember ages ago we assumed that f(n) was not $n^2$ everywhere, so this gives our two solutions: $f(n)=1,\, f(n)=n^2$.

Moral: choosing carefully which expression to work with can make life much more interesting later. Eliminating as many variables or difficult things from one side is a good choice. Playing with small values can help you understand the problem, but here you need to think about soft properties of the expression, in particular what happens when you take one variable large while holding another fixed.

[3.4] – if you do use the original approach, you get $n^2-1$ on the RHS. There’s then the temptation to kill the divisibility by taking n to be the integer in the middle of a large twin prime pair. Unfortunately, the existence of such an n is still just a conjecture

Question Four

(Statement copied from Art of Problem Solving. I’m unsure whether this is the exact wording given to the students in the contest.)

On a circular table sit n>2 students. First, each student has just one candy. At each step, each student chooses one of the following actions:

(A) Gives a candy to the student sitting on his left or to the student sitting on his right.

(B) Separates all its candies in two, possibly empty, sets and gives one set to the student sitting on his left and the other to the student sitting on his right.

At each step, students perform the actions they have chosen at the same time. A distribution of candy is called legitimate if it can occur after a finite number of steps.
Find the number of legitimate distributions.

My moral for this question is this: I’m glad I thought about this on the bus first. What I found hardest here was getting the right answer. My initial thoughts:

• Do I know how to calculate the total number of possibilities, irrespective of the algorithm? Fortunately yes I do. Marbles-in-urns = barriers between marbles on a line (maybe add one extra marble per urn first). [4.1]
• What happens if you just use technique a)? Well first you can get into trouble because what happens if you have zero sweets? But fine, let’s temporarily say you can have a negative number of sweets. If n is even, then there’s a clear parity situation developing, as if you colour the children red and blue alternately, at every stage you have n/2 sweets moving from red children to blue and vice versa, so actually the total number of sweets among the red children is constant through the process.
• What happens if you just use technique b)? This felt much more promising.
• Can you get all the sweets to one child? I considered looking at the child directly opposite (or almost-directly opposite) and ‘sweeping’ all the sweets away from them. It felt like this would work, except if for some parity reason we couldn’t prevent the final child having one (or more, but probably exactly one) sweets at the crucial moment when all the other sweets got passed to him.

Then I got home, and with some paper, I felt I could do all possibilities with n=5, and all but a few when n=6. My conjecture was that all are possible with n odd, and all are possible with n even, except those when none of the red kids or none of the kids get a sweet. I tried n=8, and there were a few more that I couldn’t construct, but this felt like my failure to be a computer rather than a big problem. Again there’s a trade-off between confirming your answer, and trying to prove it.

Claim: If n is even, you can’t achieve the configurations where either the red children or the blue children have no sweets.

Proof: Suppose you can. That means there’s a first time that all the sweets were on one colour. Call this time T. Without loss of generality, all the sweets are on red at T. Where could the sweets have been at time T-1? I claim they must all have been on blue, which contradicts minimality. Why? Because if at least one red child had at least one sweet, they must have passed at least one sweet to a blue neighbour.

Now it remains to give a construction for all other cases. In the end, my proof has two stages:

Step One: Given a configuration, in two steps, you can move a candy two places to the right, leaving everything else unchanged.

This is enough to settle the n odd case. For the even case, we need an extra step, which really corresponds to an initial phase of the construction.

Step Two: We can make some version of the ‘sweeping’ move precise, to end up in some configuration where the red number of children have any number of sweets except 0 or n.

Step one is not so hard. Realising that step one would be a useful tool to have was probably the one moment where I shifted from feeling like I hadn’t got into the problem to feeling that I’d mostly finished it. As ever in constructions, working out how to do a small local adjustment, which you plan to do lots of times to get a global effect, is great. (Think of how you solve a Rubik’s cube for example.)

Step two is notationally fiddly, and I would think very carefully before writing it up. In the end I didn’t use the sweeping move. Instead, with the observation that you can take an adjacent pair and continually swap their sweets it’s possible to set up an induction.

Actual morals: Observing the possibility to make a small change in a couple of moves (Step one above) was crucial. My original moral does still hold slightly. Writing lots of things down didn’t make life easier, and in the end the ideas on the bus were pretty much everything I needed.

[4.1] – one session to a group of 15 year olds is enough to teach you that the canon is always ‘marbles in urns’ never ‘balls’ nor ‘bags’, let alone both.