EGMO 2018

Last week the UK held its annual training and selection camp in Cambridge. This week, four of the students have been attending the European Girls’ Mathematical Olympiad. 2018 is the seventh edition of this prestigious competition, and is being held in Florence.

You can find very many details about the competition, and observe the UK’s excellent performance (with particular congratulations to Emily, who obtained a perfect score) at the competition website. A short article about the team in the TES can be found here.

In this post, I’m going to muse on some of the problems. You can find the two papers here and here.

Problem Two

Part a) would have been more immediate if the set A had been defined as

A:= \left\{\frac{k+1}{k} \,:\, k=1,2,3,\ldots\right\},

as this is instantly suggestive of a telescoping product such as

7 = \frac{7}{6}\cdot \frac{6}{5}\cdot\ldots \cdot \frac{2}{1}.

I found part b) to be one of the most difficult sections of the paper. It builds on the idea that given an expression for x as a product of elements of A, and an expression for y as a product of elements of A, we can combine these (ie take the product of the products!) to get an expression for xy as a product of elements of A. This yields f(xy)\le f(x)+f(y), and so the task is to show that sometimes this isn’t the most efficient way to proceed.

I tried a couple of things, like trying to bound f(p) below when p is a prime. This wasn’t ludicrous, as one would certainly need to use a term \frac{kp}{kp-1} somewhere in the product so that it is divisible by p. However, this didn’t go anywhere, and nor did investigating f(n!). Perhaps others had more success down these avenues.

But as a general rule, if an abstractly-defined function is typically hard to calculate, then classes where you can calculate it are likely to be extra valuable. Here, powers of two make life particularly easy. We have 2\in A, and so 2^n=2\times 2\times\ldots\times 2 is a valid product. And we can’t possibly achieve 2^n as a product of fewer terms than this, because 2 is the largest element of A. So f(2^n)=n. Note that this is already way better than the bound we would have achieved from our argument in part a), which yields f(m)\le m-1.

My next observation was that a similar argument and a natural construction gives f(2^n+1)=n+1. But this can be extended so that when 2^n+1\le m\le 2^{n+1}, we have f(m)\ge n+1 and in fact there is equality exactly for

m= 2^n+1, 2^n+2, 2^n+4,\ldots, 2^n+2^{n-1},2^{n+1}. (*)

In particular, note that all of these are even except 2^n+1. It may well be the case that we don’t need this extension, but before you’ve solved the problem you don’t know how much you’ll have to extend each idea!

I had a feeling that this meant f(2^n+1) was a strong candidate to satisfy the requirements, and that almost any factorisation would work. I suspect many students at this point did some refinement of choice of n, but I want to stay abstract, and use the extended observation (*). Since 2^n+1 is certainly not always prime, let’s focus on the infinitely many values n where it has a factorisation as 2^n+1 = ab, and consider whether a or b can achieve equality at (*). We’d better introduce the notation

2^\alpha<a<2^{\alpha+1},\quad 2^\beta<b<2^{\beta+1}.

So ab> 2^{ab}+2^a+2^b+1, and so \alpha+\beta>n. But similarly, ab< 2^{\alpha+1}2^{\beta+1}, so \alpha+\beta<n+2. We obtain

\alpha+\beta+1=n,

which is kind of what I’d been hoping for when I started this analysis. Now, we have

f(a)\ge \alpha+1,\quad f(b)\ge \beta+1,

\Rightarrow\quad f(a)+f(b)\ge \alpha+\beta+2 = n+1, (**)

with equality precisely if a,b both satisfy the equality conditions at (*). But a,b are odd, and so we have equality at (**) precisely if a=2^\alpha+1,b=2^\beta+1. So we have a resolution to the problem whenever 2^n+1 can be non-trivially factorised in any way other than 2^n+1=(2^\alpha+1)(2^\beta+1), so we have a rich (and certainly infinite) class of suitable (x,y).

Problem Three

An obvious remark. The jury will never choose contestant i if she has fewer than contestants in front of her unless they are forced to. They are only forced to if everyone has this property. So we ignore the second dashed bullet point, as it just tells us when the process ends. And with a little further thought, the process ends exactly when the contestants are in correct order.

I suspect part a) of this may end up featuring on future examples of our interactive write-up clinic, where students are challenged to produce technically-correct arguments for obvious but awkward mini-problems. The location of contestant C_n is probably a good starting point.

For part b), you have to find an optimal construction, and prove that it’s optimal. At national and junior olympiad level, students often forget that they have to supply both of these components. At international level, the challenge is to work out which one gives the entry into the problem. I would say that about two-thirds of the time, either the optimal construction is very obvious, or is best attacked after you’ve had some insight into the bound. For this question (and again it’s just my opinion), I felt it was all about the construction. I made absolutely no progress by aiming for bounds. Whereas the construction offers plenty of insight into how to prove the bounds, and so once I had it, I found it quick.

The usual rules apply. An optimal construction is going to have to be neat to describe. It’s very unlikely to have millions of cases. Intuitively, it seems reasonable that starting the contestants in reverse order gives the jury the greatest possible ‘elbow room’ to squeeze moves into the procedure. Perhaps you tried to prove this directly, by coupling a procedure starting from arbitrary order with a corresponding procedure starting from reverse order? Well, I found that very hard, and perhaps you did too.

However, that doesn’t mean it’s the wrong construction! The key question is, what to do about contestant C_n? Well, essentially nothing. This contestant can never be moved. So when do we start allowing other contestants to pass her? It seemed natural to let the other contestants C_1,\ldots,C_{n-1} do as much as possible among themselves first. That is

\mathbf{C_n},C_{n-1},\ldots,C_2,C_1 \quad \Rightarrow\Rightarrow\Rightarrow \quad \mathbf{C_n}, C_1,C_2,\ldots,C_{n-1},

where \Rightarrow\Rightarrow\Rightarrow denotes lots of moves. At this point, what to do next stood out for me, namely that one could use \mathbf{C_n} at the start to put all the others back into reverse order, while moving \mathbf{C_n} to the back. That is

\mathbf{C_n},C_1,C_2,\ldots,C_{n-1}\quad\Rightarrow \quad C_1,\mathbf{C_n},C_2,\ldots,C_{n-1} \quad\Rightarrow\quad C_2,C_1,\mathbf{C_n},C_3,\ldots,C_{n-1}

\Rightarrow\Rightarrow \quad C_{n-1},C_{n-2},\ldots,C_2,C_1,\mathbf{C_n}.

You might have tried other things first, but once you notice this, you just know it has to be right. It’s just too elegant a construction, and it looks like the sort of thing one prove will be optimal, because the overall process

\mathbf{C_n},C_{n-1},\ldots,C_n\quad \Rightarrow\Rightarrow\Rightarrow \quad \mathbf{C_n},C_1,C_2,\ldots,C_{n-1}

\Rightarrow\Rightarrow\quad C_{n-1},\ldots,C_2,C_1,\mathbf{C_n}\quad\Rightarrow\Rightarrow\Rightarrow\quad C_1,C_2,\ldots,C_{n-1},\mathbf{C_n},

incorporates the corresponding process for n-1 (twice, in fact) and so induction is very accessible both for calculating the total number of moves. We conjecture that this is indeed the optimal process, and under this assumption, with f(n) the number of moves, we would have f(n) = f(n-1) + (n-1) + f(n-1), from the three stages of the process, from which, after looking at small values,

f(n)=2^n - (n+1).

I started by saying that the construction was the hard part of this problem. Well, that doesn’t mean the bound is easy. But at least with a construction in hand, you can make observations that might inform a bounding argument:

  • observation 1: contestant C_n never jumps;
  • observation 2: in the optimal construction, by induction C_{n-1} doesn’t jump in the outer phases, so in fact jumps only once, ie during the middle phase;
  • observation 3: contestant C_{n-2} doesn’t jump very often, and in fact can only jump after at least one of C_{n-1} and C_n have ended up in front of her. Since we’ve established that C_{n-1},C_n don’t jump very often themselves, this gives a bound on the number of times C_{n-2}.

There is still work to do here, and errors with \pm 1 could easily creep in. But I still hold fast to my original claim that the construction was the hard part here. Or at least, the rough form of the construction. I guess it’s possible that one would have started making observations like the three above without a construction in mind, but I think it’s unlikely. Anyway, I leave for you the final details of the bounding argument, which involves formally transcribing observation 3, proving it, then generalising it to jumps of C_{n-3} and so on.

Problem Four

One of the exercises I have been setting to UK students recently is to produce short solution digests, which manifest any key ideas of the solution abstractly and briefly enough to resonate in the future. I’m a little tired of domino tiling problems, so I’ll do one of these here. This will be slightly longer than if I were not writing for a (small) audience.

Double-counting the total value by rows/columns and by dominos shows there are \frac{2kn}{3} dominos in a balanced configuration. When n=3, we can achieve k=1, and by tiling copies of this down the main diagonal, can extend to 3\,|\,n. For 3\not|\,n, we must have 3\,|\,k ie k\ge 3, but in fact k=3 is achievable, by tiling the main diagonal with copies of small boards for which k=3 can be constructed with a bit of trial-and-error.

The double-counting idea at the start is the nice part of the problem. The construction is a bit annoying, but saving ourselves work by building up large examples from copies of small examples is a useful motif to have in mind.

Problem Six

This question has lots of clues in the statement. It would, for example, be pretty surprising if the answer were ‘no’ to part b) given the setup in part a).

My confession is that I wasted lots of time on part a) not using the option m=0, which was foolish given that it’s clued from part b) that one needs to use the option m=0. My thought had been to consider some integer y, and ask which integers x were banned (if we were aiming for contradiction in part a)). For part a), it gets harder if t is smaller, so I found it helpful to think of t as \epsilon\ll 1. Anyway, if you struggled on part a), maybe worth reviewing whether you were definitely trying to solve part a), and not accidentally using the setup that really addressed part b)!

Some people have shown me solutions to part a) that carry an air of magic, by placing all the key steps (such as (*) to follow) in the language of the original setup. Let’s try to be cleaner. The key is to consider m=0. Since m=0 is included, we know that whenever x<y, we must have

\epsilon y \le x \le (1-\epsilon)y. (*)

Maybe you just have a gut feeling that this can’t be possible if you have enough xs and ys? But either way, choosing to focus on (*) is the key step, because once you know you have to prove the result based on this, it’s not too hard. I prefer addition to multiplication, so one might as well take logs (since does it really look like we’re going to use heavily the integer property now?) to obtain

\alpha\le |z_i - z_j|\le \beta,

for all z_i,z_j in some large finite collection, where 0<\alpha<\beta. You should now have a strong gut feeling that this is impossible. You have an arbitrarily large collection of real numbers which have to be close to each other pairwise, but never too close pairwise. How to finish the argument is a matter of taste.

For part b), assuming we’re aiming for the answer ‘yes’, we probably want to construct it one step at a time, and you want to think of t\approx \frac12 to make life as hard as possible.

Now, suppose you have x_1,x_2,\ldots,x_n so far. What next? Well if we could have

x_{n+1} \equiv \frac{x_i}{2}\,\mod x_i,

for all i=1,\ldots,n, that would be perfect. We can often solve sets of coupled residue equations like this using the Chinese Remainder Theorem. (Recall of course that the solutions themselves are rarely important – the fact that they exist is enough!) A couple of things go wrong with trying this crudely:

  • If x_i is odd, then \frac{x_i}{2} is not an integer…
  • If we correct this by asking for x_{n+1}\equiv \lfloor\frac{x_i}{2}\rfloor\,\mod x_i, then there’s a chance we might still be within the t-window around a multiple of x_i.
  • Unless we are going to make complicated demands on the residues, to use CRT it would be easier if all the x_is were coprime.

One option is to give up. But actually all these objections can be handled with fairly small alterations. Can you see how the second objection can be overcome by an appropriate choice of x_1? Remember that t is fixed right at the start, and cannot be equal to 1/2. Is the third objection actually an objection any more? If it is, can we fix it?

Anyway, I guess P6 was my favourite non-geometry question on the paper, though, that’s far from relevant. P5 was pretty neat too, but who knows whether a follow-up geometry post will materialise soon.

Advertisements

BMO2 2018

The second round of the British Mathematical Olympiad was taken yesterday by the 100 or so top scoring eligible participants from the first round, as well as some open entries. Qualifying for BMO2 is worth celebrating in its own right. The goal of the setters is to find the sweet spot of difficult but stimulating for the eligible participants, which ultimately means it’s likely to be the most challenging exam many of the candidates sit while in high school, at least in mathematics.

I know that lots of students view BMO2 as something actively worth preparing for. As with everything, this is a good attitude in moderation. Part of the goal in writing about the questions at such length (and in particular not just presenting direct solutions) is because I think at this level it’s particularly easy to devote more time than needed to preparation, and use it poorly.

All these questions could be solved by able children. In fact, each could be solved by able children in less than an hour. You definitely count as an able child if you qualified or if your teacher allowed you to make an open entry! Others count too naturally. But most candidates won’t in fact solve all the questions, and many won’t solve any. And I think candidates often come up with the wrong reasons why they didn’t solve problems. “I didn’t know the right theorems” is very very rarely the reason. Olympiad problems have standard themes and recurring tropes, but the task is not to look at the problem and decide that it is an example of Olympiad technique #371. The task is actually to have as many ideas as possible, and eliminate the ones that don’t work as quickly as possible.

The best way to realise that an idea works is to solve the problem immediately. For the majority of occasions when we’re not lucky enough for that to happen, the second-best way to realise that an idea works is to see that it makes the problem look a bit more like something familiar. Conversely, the best way to realise that an idea doesn’t work is to observe that if it worked it would solve a stronger but false problem too. (Eg Fermat’s Last Theorem *does* have solutions over the reals…) The second-best way to realise that an idea doesn’t work is to have the confidence that you’ve tried it enough and you’ve only made the problem harder, or less familiar.

Both of these second-best ideas do require a bit of experience, but I will try to explain why none of the ideas I needed for various solutions this year required any knowledge beyond the school syllabus, some similarities to recent BMOs, and a small bit of creativity.

As usual, the caveat that these are not really solutions, and certainly not official solutions, but they are close enough to spoil the problems for anyone who hasn’t tried them by themselves already. Of course, the copyright for the problems is held by BMOS, and reproduced here with permission.

Question One

I wrote this question. Perhaps as a focal point of the renaissance of my interest in geometry, or at least my interest in teaching geometry, I have quite a lot to say about the problem, its solutions, its origin story, the use of directed angles, the non-use of coordinate methods and so on. In an ideal world I would write a book about this sort of thing, but for now, a long and separate post is the answer.

This will be available once I’ve successfully de-flooded my apartment.

Question Two

I also wrote this problem, though I feel it’s only fair to show the version I submitted to the BMO committee. All the credit for the magical statement that appears above lies with them. There is a less magical origin story as well, but hopefully with some interesting combinatorial probability, which is postponed until the end of this post.One quick observation is that in my version Joe / Hatter gets to keep going forever. As we shall see, all the business happens in the first N steps, but a priori one doesn’t know that, and in my version it forces you to strategise slightly differently for Neel / Alice. In the competition version, we know Alice is done as soon as she visits a place for a second time, but not in the original. So in the original we only have to consider ‘avoid one place’ rather than the multiple possibilities now of ‘avoid one place’ or ‘visit a place again’.

But I think the best idea is to get Alice to avoid one particular place c\not\equiv 0 whenever possible. At all times she has two possible options for where to go next, lets say b_k+a_k, b_k-a_k in the language of the original statement. We lose nothing by assuming -N/2 < a_k\le N/2, and certainly it would be ridiculous for Joe / Hatter ever to choose a_k=0. The only time Alice’s strategy doesn’t work is when both of these are congruent to c, which implies N\,|\, 2a_k, and thus we must have N= 2a_k. In other words, Alice’s strategy will always work if N is odd.

I think it’s really worth noticing that the previous argument is weak. We certainly did not show that N must be odd for Alice to win. We showed that Alice can avoid a congruence class modulo an odd integer. We didn’t really need that odd integer to be N for this to work. In particular, if N has an odd factor p (say a prime), then the same argument works to show that we can avoid visiting any site with label congruent to 1 modulo p.

It’s actually very slightly more complicated. In the original argument, we didn’t need to use any property of b_k. But obviously here, if b_k\equiv 1 modulo p and p\,|\,a_k, then certainly b_{k+1}\equiv 1 modulo p. So we have to prove instead that Alice can ensure she never ‘visits 1 modulo p for the first time’. Which is fine, by the same argument.

So, we’ve shown that Neel / Alice wins if N is odd, or has an odd factor. The only values that remain are powers of 2. I should confess that I was genuinely a little surprised that Joe / Hatter wins in the power of 2 case. You can find a construction fairly easily for N=2 and N=4, but I suspected that might be a facet of small numbers. Why? Because it still felt we could avoid a particular site. In order for Alice’s strategy to fail, we have to end up exactly opposite the particular site at exactly the time when the next a_k=N/2, and so maybe we could try to avoid that second site as well, and so on backwards?

But that turned out to be a good example of something that got very complicated quite quickly with little insight. And, as discussed at the beginning, that’s often a sign in a competition problem that your idea isn’t so good. (Obviously, when composing a problem, that’s no guarantee at all. Sometimes things are true but no good ideas work.) So we want other ideas. Note that for N=4, the sequence (2,1,2) works for Joe / Hatter, because that forces Alice / Neel to visit either (0,2,1,3) or (0,2,3,1). In particular, this strategy gave Alice no control on the first step nor the last step, and the consequence is that we force her to visit the evens first, then transfer to an odd, and then force her to visit the other odd.

We might play around with N=8, or we might proceed directly to a general extension. If we have a Joe / Hatter strategy for N, then by doubling all the a_ks, we have a strategy for 2N which visits all the even sites in the first N steps. But then we can move to an odd site eg by taking a_N=1. Just as in the N=4 case, it doesn’t matter which odd site we start from, since if we again double all the a_ks, we will visit all the other odd sites. This gives us an inductive construction of a strategy for powers of two. To check it’s understood, the sequence for N=8 is (4,2,4,1,4,2,4).

Although we don’t use it, note that this strategy takes Alice on a tour of sites described by decreasing order of largest power of two dividing the label of the site.

Question Three

I have a theory that the average marks on Q1, Q2 and Q3 on this year’s paper will be in ascending order rather than, as one might expect, descending order. I think my theory will fail because it’s an unavoidable fact of life that in any exam, candidates normally start at the beginning, and don’t move to the middle until making earlier progress. But I think that’s the only reason my theory will fail.

Like kitchen cleanliness or children’s character flaws, it’s hard to compare one’s own problem proposals with others’ rationally. But I felt that, allowing for general levels of geometry non-preference, Q3 was more approachable than Q2, especially to any candidate who’d prepared by looking at some past papers.

I’m in no way a number theorist, but I know three or four common themes when one is asked to prove that a certain sequence contains no squares, or almost no squares. [3a]

  • Number theoretic properties of the sequence of squares. Squares cannot be 3 modulo 4 for example. They also cannot be 2 modulo 4, and thus they also cannot be 2^{k-1} modulo 2^k for any even k. This first observation was essentially the body of most solutions to Q4 of BMO1 2016, among many others.
  • Soft properties of the sequence of squares. The sequence of squares grows quadratically. Sometimes we can show a quadratic sequence will have no overlap with some other sequence for basic reasons. This is especially common if the second sequence is also quadratic or similar. For example, the expression n^2+3n-4 is typically not a square because

(n+1)^2 = n^2+2n+1 < n^2 + 3n - 4 < n^2+4n+4 = (n+2)^2,

  • when n is large. In fact the right hand inequality is always true, and the left hand inequality is true for n\ge 6, which doesn’t leave too many cases to check (and n=5 does actually give a square). This type of argument has been quite common on BMO recently, directly on Q1 of BMO1 2011 and also Q3 of BMO1 2016. An example in a more abstract setting is Q3 of Balkan MO 2007, which I greatly enjoyed at the time…
  • Number theoretic properties of the definition of a square. A square is the product of an integer with itself, and so if we want the product of two or more integers to be a square, then this imposes conditions on the shared factors of the two integers. I’ll cite some examples shortly.
  • Huge theorems. Some old paper which I encountered as a child asked us to find all solutions to x^2-1=2^y. Or similar – I can’t find it now – but Q2 of BMO2 2006 is close enough to the sensible approach to the problem. I think it’s more helpful to think about this as proving that a particular sequence rarely includes powers of two than that a particular sequence rarely includes squares. But either way, one could in principle use the Catalan conjecture, which controls all non-trivial solutions to a^p - b^q=1. Fortunately, the Catalan conjecture was proved, by Mihailescu (readable blog about it), between the paper being set, and me attempting it a few years later. I’m being flippant. This is not a standard trope in solving these questions. For very obvious reasons. If it can be killed by direct reference to a known theorem, it won’t be set.

Anyway, those references (and more to follow) are to illuminate why I thought this question was not too hard. Indeed, I feel one can make substantial meta-progress in your head. The given information is interesting, but for the purpose of this question is just a black box. By subtracting the expression for m from the expression for 2m, we can derive an expression for the required sum. It’ll be a quartic in m, because the leading terms won’t cancel.

This leaves all three of the methods above very accessible. Unfortunately m=0 would be a square were it not excluded specifically, so a modular arithmetic approach is unlikely to work directly. Bounding between two quadratics is entirely plausible, as is factorising and comparing number theoretic properties of the factors. I thought the second one seemed more promising, but either way, having two potentially good ideas based only on recent BMO problems before even writing anything down is a good opening.

We do have to calculate the sum, and I make it \frac{1}{4}m^2(5m+3)(3m+1). Now I’m not so sure how to bound this between two quadratics, because the leading coefficient is 15/4, which is not the square of a rational. But the factor analysis approach is definitely on.

Let’s review this generally. Throughout, suppose m,n are positive integers.

Claim 1: if mn is a square, then m and n are squares too.

Claim 2: if mn is a square, then m=n.

Both of these claims are false. However, a version of Claim 1 is true.

Claim 1′: if mn is a square, and m,n are coprime, then each is a square.

Even though this isn’t a named theorem, it is true, and well-known and can be used without proof. One way to prove it is to write m,n as products of primes, and show that since the primes are disjoint, the exponents must all be even. Most other methods will be equivalent to this, maybe with less notation.

What is good about Claim 1′ is that more complicated versions are true for for essentially similar reasons. For example

Claim 3: if mn is 6k^2, and m,n are coprime, then either one is a square and the other is six times a square; or one is two times a square, and the other is three times a square.

Claim 4: if mn is a square, and the greatest common divisor (m,n) is either 5 or 1, then either each is a square, or each is five times a square.

I cited some examples of the other methods I proposed. Here are some examples of this sort of thing in recent BMOs:

  • Q4 of BMO2 2016. Even the statement is suggestive. There are more complicated routes, but showing that (2p-u-v)(2p+u+v) is a square is one way to proceed, and then Claim 4 directly applies after checking a gcd.
  • Q2 of BMO1 2014 is similar, but it is much more explicit that this is the correct approach. Expose p^2 then use a (correct) version of Claim 2.
  • Q1 of BMO2 2009. Show that a and b must each be a square times 41 for rationality reasons.
  • Q6 of BMO1 2006. After sensible focused substitutions, obtain 3n^2=q(q-1). Rather than try to ‘solve’ this, extract the key properties along the lines of Claim 3, eliminate one of the cases by modular arithmetic, and return to the required statement.
  • Q3 of BMO2 2010 requires the student to reproduce the essentials of the arguments above in the case of a particular degree six polynomial with a tractable factorisation, along with some mild square-sandwiching or bounding arguments as discussed earlier.

In conclusion, I’m trying to say that if I claim I am confident I can find all integers m such that \frac14 m^2(5m+3)(3m+1) is a square, this is not based on complicated adult experience, but rather on recent problems at a similar sensible level. And I still don’t think it counts as Olympiad technique #371 – thinking about divisibility of factors is a good thing to do when talking about integers, and so it’s just a natural entry point into problems about squares. Plenty of problems might have this sort of thing as a starting point or an ending point.

For this problem we need a different ending point. To be brief, the factors (5m+3) and (3m+1) cannot both be squares because 5m+3 is never a square. So since the gcd of these factors is 1, 2 or 4, the only other option is that they are both squares times 2. And because -1 is not a square modulo 3, so 1 is not a (square times 2) modulo 3, and we are done. Note that this was a literal example of the first technique for proving something is not a square, proposed all the way back at the start of this section.

Footnotes

[3a] – some common themes for proving that sequences do include squares might be comparison with Pell’s Equations, or comparison with the explicit construction of solutions to Pythagoras’s equation.

Question Four

An example of an absorbing function is f(x)=\lfloor x\rfloor. One challenge is thinking of many other examples. This one is fine, but it’s true under replacing 2018 by 1 in the statement, and so it doesn’t really capture the richness of the situation.

Notation: the pre-image of a function is the language used to describe the inverse of a function which doesn’t have a uniquely-defined inverse. That is, if f is not injective, and multiple arguments have the output. We write f^{-1}(y)=\{x: f(x)=y\}. In particular, this is a set of values, not necessarily a single value. We also use \mathbb{Z} to denote the integers. We can apply pre-images to sets as well. So for example f^{-1}(\mathbb{Z})=\{x : f(x)\in \mathbb{Z}\}.

This question is tricky, and I will be surprised to see many full solutions from the eligible candidates. It rewards the sort of organisation and clear-thinking that is easier said than done in a time-pressured contest environment. There are also many many possible things to consider, and so is particularly challenging in the short timeframe of BMO2 as opposed to, for example, appearing as the middle question on a 4.5 hour international-level paper.

At a meta-level we are being asked to confirm or deny the existence of absorbing functions where f^{-1}(\mathbb{Z}) is small in some sense, firstly when actually having finite size, secondly when, although infinite, being a small sort of infinite, namely spread out in a sparse, well-ordered way (you might say countable if familiar with that language). The general idea is presumably that it’s hard to be absorbing if the pre-image of the integers is small, and so it’s reasonable to assume that it’s too hard if this is finite; but perhaps not quite too hard if it’s merely countable. So (no, yes) is a sensible guess at the answer to the question, though (no, no) might also fit, maybe with a harder argument for the second no.

Ok, instead of trying a) or b), just play with the configuration. Let A=f^{-1}(\mathbb{Z}). We will use this frequently. In the picture below, f maps the real line on top to the real line below. If two reals get mapped to the same image, then whether or not the image is an integer, the whole (closed) interval bounded by the two reals also gets mapped to the same image. This is because f is weakly increasing.

This means that A consists of various intervals (which include single points). But in both a) and b) we know that A is ‘small’, and so it cannot contain any intervals of positive length. So in fact A is a set of separated real values. In the case of a) it’s a finite set.

Do we want to try and iterate this, and look at f^{-1}(A)? Well maybe, but we don’t know much about about pre-images of A, only about pre-images of \mathbb{Z}.

But note that the pre-image of the pre-image of the … of the pre-image [2017 times] of A must be the whole real line, so at some point, some value has a pre-image that is an interval. So if we’re guessing that the answer to b) is yes, then we need to give a construction.

\mathbb{R} \stackrel{f}\longrightarrow ?? \stackrel{f}\longrightarrow\quad\ldots\quad \stackrel{f}\longrightarrow ??\stackrel{f}\longrightarrow A \stackrel{f}\longrightarrow f(A)\subset \mathbb{Z}.

If you play around for a bit, it seems very unlikely to be absorbing if the integers don’t get mapped to the integers. You can try to prove this, but at the moment we’re just aiming for a construction, so let’s assume f(\mathbb{Z})\subset \mathbb{Z}. It would be convenient if f(n)=n for all n\in \mathbb{Z}, but we already know that this won’t work because then the pre-image of the pre-image of the… of \mathbb{Z} is always \mathbb{Z}, but we need it to be \mathbb{R}.

The ideal situation would be if A= \mathbb{Z}\cup \{\ldots, a'_{-1},a'_0,a'_1,\ldots\}, where the pre-image of \{\ldots, a'_{-1},a_0,a'_1,\ldots\} is pretty much everything.

Informally, we are specifically banned from mapping intervals directly onto an integer. So have an intermediate set, and try to map almost everything (except the integers and the set itself) onto that set, so and map that set into the integers.

At this point, you really just have to have the right idea and finish it. Many things will work, but this seems the easiest to me. Let the set A consist of the integers and the (integers plus 1/2). And for x\in A, f(x)=2x. This is what f looks like so far.

Here the black crosses are integers, and the purple crosses are (integers plus 1/2). But now we need to make as many reals as possible in the top row map to a purple cross (which is allowed, because purple crosses aren’t integers), but we need also to preserve the weakly increasing property. Fortunately, we can exactly do that. Each cross of either colour in the top row maps to a black cross in the middle row (ie an integer), so we can map the open interval between crosses in the top row to a purple cross in the middle row. As shown in red:

Note that this is consistent. The fact that I haven’t drawn in the red cones into the bottom row is only because I didn’t use the bottom row to motivate doing this. I’ve shown a consistent definition of f that maps all the reals onto the integers in two steps. If it’s an integer to begin with, that was great; if it was an (integer plus 1/2) to begin with then it becomes an integer in one step and stays an integer; and otherwise it first maps to an (integer plus 1/2), and then to an integer in the second step.

To check you’ve understood, try to write down a standalone definition of this function.

I’ve therefore solved part b) with the alternative condition \ldots a_{-1}<a_0<a_1<a_2<\ldots which isn’t exactly as required. It requires one small and simple idea to convert to a solution to the actual statement. See if you can find it yourself!

I think part a) is harder, not because the solution will look more complicated, but because there are so many potential partial results you could try to prove, because there are so many sets you could consider. To name a few: the image of f, the image of f intersected with \mathbb{Z}, the image of \mathbb{Z}, the 2018-composition image f^{2018}(\mathbb{R}), the 2018-composition image f^{2018}(\mathbb{Z}) and so on and so forth. You might have good insight into the wrong things.

For me, the crucial observation (which you can see from the figure in the b) construction) is that when composing an increasing function with itself, the ‘trajectories’ are either increasing or decreasing. That is, if x\le f(x) (respectively, x\ge f(x)), then x\le f(x)\le f^2(x)\le f^3(x)\le\ldots (respectively x\ge f(x)\ge f^2(x)\ge \ldots). Again, you can think of this as Olympiad technique #371 if you insist, but I don’t think that’s helpful. There are lots of things one could try to say here, and this turns out to be natural, true and useful, but you can’t know it’s useful until you play with it.

Anyway, we’re playing with part a), and we know that f^k(x) is an integer for all large enough k, and that f^{k+1}(x) is also an integer, so f^k(x) is one of a finite set of integers because of the condition on A. But we’ve seen the sequence x,f(x),f^2(x),\ldots is weakly increasing or weakly decreasing, and so if we also know it’s eventually bounded (because eventually it’s in this finite set) then it must eventually be constant. And this constant is one of the integers, say n. But unless we started from n, this means that f(n)=n, but also f(x)=n for some other real value x. And so exactly as at the very very beginning, that’s bad, because then the whole interval [x,n] gets mapped to n, which is a contradiction.

Question Two – Origin story

The origin story for Q2 started in a talk I heard by Renan Gross at Weizmann, who referenced some of the history of Scenery Reconstruction. Roughly speaking, we colour the integers (say with two colours), and then let loose a random walker, who tells us the sequence of colours she observes during her walk, but no other information about the walk itself.

How much information can we recover about the colouring? Obviously, the best we can hope for is to recover the colouring, up to translations and reflection, since for every possible random walk trajectory, the exact reflection is equally probable, and we are given no information about the starting point.

Since lots of the transitions between recoverable and unrecoverable depend on the periodicity of the colouring, a reasonable toy model is to do it on a cycle. Note that the Strong Law of Large Numbers tells us that we almost surely recover the number of black sites and white sites from an the infinite trajectory of the random walk. Of course it’s possible that there are only two black vertices, and they are adjacent, and the walker oscillates between them, thus seeing BBBBBB… But this is extremely unlikely. You could think of this in Bayesian terms as strongly increasing the prior on the whole cycle being black, but I think initially it’s best to do this as an infinite-time, SLLN problem not as finite time WLLN/CLT reweightings of anything.

But what more? It’s clear that the lengths of all black substrings should follow some mixed geometric-ish distribution, and this distribution will almost surely wash out as the empirical distribution in an SLLN sense. But it’s tricky to justify why such a mixed geometric-ish distribution should be uniquely determined by the lengths of black arcs in the cycle. But it does definitely feel like we should have enough information to reconstruct the colouring up to reflection/rotation with probability one. For example, analogously to the number of black vertices and the number of white vertices, we should be able to recover the number of adjacent black vertices, the number of adjacent white vertices, and the number of black-white adjacent vertices, and so on.

Anyway, this can be done, and it follows as a consequence of various authors’ work answering some more general conjectures of Benjamini and, separately, of den Hollander and Keane. Douglas Howard [DH] shows a handful of generalisations of this, as do Benjamini and Kesten [BK]. Most of this work is focused on sceneries on \mathbb{Z}, but periodic sceneries are often used as a basis, and of course, the only difference between periodic sceneries on \mathbb{Z} and sceneries on the N-cycle are whether you know the period in advance. [BK] show that ‘almost all’ sceneries are distinguishable in a particular sense, in response to which Lindenstrauss [L99] exhibits a large family of sceneries which are not distinguishable. A readable but technical review is [ML].

So Renan’s talk was about the similar problem (and generalisations) on the hypercube [GG]. Rather than paraphrase the main differences badly, you can read his own excellent blog post about the work.

On the train back to Haifa from Rehovot, I was thinking a bit about the cycle case, and what happens if you generalise the random walk with varying jump lengths, or indeed introduce a demon walker, whose goal is to make it as hard as possible for the reviewer to deduce the colouring. One way this can certainly happen is if the walker can avoid visiting some particular site, as then how could one possibly deduce the colour of the never-visited site? And so we get to the statement posed.

References

[BK] – Benjamini, Kesten, 1996 – Distinguishing sceneries by observing the scenery along a random walk path

[dH] – den Hollander, 1988 – Mixing properties for random walk in random scenery

[DH] – Douglas Howard, 1996 – Detecting defects in periodic scenery by random walks on Z

[GG] – Grupel, Gross, 2017 – Indistinguishable sceneries on the Boolean hypercube

[L99] – Lindenstrauss, 1999 – Indistinguishable sceneries

[ML] – Matzinger, Lember, 2003 – Scenery reconstruction: an overview [link]

 

BMO1 2017 – Questions 5 and 6

The first round of the British Mathematical Olympiad was sat yesterday. The questions can be found here and video solutions here. My comments on the first four questions are in the previous post.

Overall, I didn’t think any of the questions on this paper were unusually difficult by the standard of BMO1, but I found everything slightly more time-consuming than typical. I thought Question 5 was a great problem, and I tried lots of things unsuccessfully, first, and so wanted to discuss it in slightly more technical language. For Question 6 I made a decisive mistake, which I’ll explain, and which cost a lot of time. But in general, my point is that the back end of the paper was a little fiddlier than normal, and required longer written solutions, and perhaps many students might have had less time than expected to attack them anyway after details earlier in the paper.

Question Five

As I said before, I thought this question was quite challenging. Not because the solution is particularly exotic or complicated, but because there were so many possible things that might have worked. In my opinion it would not have been out of place at the start of an IMO paper, because it’s perfectly possible to have enough good ideas that eliminating the ones that don’t work takes an hour, or hours. Even though it slightly spoils the flow of the solution, I’m particularly trying to emphasise the tangents that didn’t work, mostly for reassurance to anyone who spent a long time struggling.

I was thinking about this question in terms of a 2Nx2N board, where N is even, and for the given question equal to 100. I spent a while thinking that the bound was 8N-4, corresponding to taking the middle two rows and the middle two columns, but not the 2×2 square which is their intersection. If you think of a comb as a ‘handle’ of 1xN cells, with an extra N/2 alternating cells (say, ‘teeth’) bolted on, then it’s clear this construction works because there’s never space to fit in a handle, let alone the teeth.

I couldn’t prove that this was optimal though. A standard way to prove a given bound K was optimal would be to produce a tiling on the board with K combs, where every cell is included in exactly one comb. But this is clearly not possible in this situation, since the number of cells in a comb (which is 150) does not divide the total number of cells on the board.

Indeed, the general observation that if you take a comb and a copy of the comb rotated by 180, the teeth of the second comb can mesh perfectly with the teeth of the first comb to generate a 3xN unit. I wasted a moderate amount of time pursuing this route.

[Note, it will be obvious in a minute why I’m writing ‘shaded’ instead of ‘coloured’.]

But in motivating the construction, I was merely trying to shade cells so that they intersected every possible 1xN handle, and maybe I could prove that it was optimal for this. In fact, I can’t prove it’s optimal because it isn’t optimal – indeed it’s clear that a handle through one of the middle rows intersects plenty of shaded cells, not just one. However, with this smaller problem in mind, it didn’t take long to come up with an alternative proposal, namely splitting the board into equal quarters, and shading the diagonals of each quarter, as shown.

It seems clear that you can’t fit in a 1xN handle, and any sensible tiling with 1xN handles contains exactly one shaded cell, so this shading (with 4N shaded cells) is optimal. But is it optimal for a comb itself?

Consider a shading which works, so that all combs include a shaded cell. It’s clear that a comb is contained within a 2xN block, and in such a 2xN block, there are four possible combs, as shown.

You need to cover all these combs with some shading somewhere. But if you put the shaded cell on a tooth of comb A, then you haven’t covered comb B. And if you put the shaded cell on the handle of comb A, then you haven’t covered one of comb C and comb D. You can phrase this via a colouring argument too. If you use four colours with period 2×2, as shown

then any comb involves exactly three colours, and so one of them misses out the colour of the shaded cell. (I hope it’s clear what I mean, even with the confusing distinction between ‘shaded’ and ‘coloured’ cells.)

Certainly we have shown that any 2xN block must include at least two shaded cells. And that’s pretty much it. We have a tiling with 2N copies of a 2xN block, and with at least two shaded cells in each, that adds to at least 4N shaded cells overall.

Looking back on the method, we can identify another way to waste time. Tiling a board, eg a chessboard with dominos is a classic motif, which often relies on clever colouring. So it’s perhaps lucky that I didn’t spot this colouring observation earlier. Because the argument described really does use the local properties of how the combs denoted A-D overlap. An attempt at a global argument might start as follows: we can identify 2N combs which don’t use colour 1, and tile this subset of the grid with them, so we need to shade at least 2N cells from colours {2,3,4}. Similarly for sets of colours {1,3,4}, {1,2,4}, and {1,2,3}. But if we reduce the problem to this, then using roughly 2N/3 of each colour fits this global requirement, leading to a bound of 8N/3, which isn’t strong enough. [1]

Question Six

A word of warning. Sometimes it’s useful to generalise in problems. In Q5, I was thinking in terms of N, and the only property of N I used was that it’s even. In Q4, we ignored 2017 and came back to it at the end, using only the fact that it’s odd. By contrast, in Q2, the values did turn out to be important for matching the proof bounds with a construction.

You have to guess whether 300 is important or not here. Let’s see.

I have a natural first question to ask myself about the setup, but some notation is useful. Let a_1,a_2,\ldots,a_{300} be the ordering of the cards. We require that \frac{a_1+\ldots+a_n}{n} is an integer for every 1\le n\le 300. Maybe the values of these integers will be important, so hold that thought, but for now, replace with the divisibility statement that n | a_1+\ldots+a_n.

I don’t think it’s worth playing with small examples until I have a better idea whether the answer is 5 or 295. So the natural first question is: “what does it mean to have (a_1,\ldots,a_{n-1}) such that you can’t pick a suitable a_n?”

It means that there is no integer k in \{1,\ldots,300\}\backslash\{a_1,\ldots,a_{n-1}\} such that n\,\big|\,(a_1+\ldots+a_{n-1})+k, which for now we write as

k\equiv -(a_1+\ldots+a_{n-1})\,\mod n.

Consider the congruence class of -(a_1+\ldots+a_{n-1}) modulo n. There are either \lfloor \frac{300}{n}\rfloor or \lceil \frac{300}{n}\rceil integers under consideration in this congruence class. If no such k exists, then all of the relevant integers in this congruence class must appear amongst \{a_1,\ldots,a_{n-1}\}. At this stage, we’re trying to get a feel for when this could happen, so lower bounds on n are most relevant. Therefore, if we get stuck when trying to find a_n, we have

\lfloor \frac{300}{n} \rfloor\text{ or }\lceil \frac{300}{n}\rceil \le n-1, (*)

which is summarised more succinctly as

\lfloor \frac{300}{n} \rfloor \le n-1. (**)

[Note, with this sort of bounding argument, I find it helpful to add intermediate steps like (*) in rough. The chance of getting the wrong direction, or the wrong choice of \pm 1 is quite high here. Of course, you don’t need to include the middle step in a final write-up.]

We can check that (**) is false when n\le 17 and true when n\ge 18. Indeed, both versions of (*) are true when n\ge 18.

So we know the minimum failure length is at least 17. But is there a failing sequence of length 17? At a meta-level, it feels like there should be. That was a very natural bounding argument for 17 (which recall corresponds to n=18), and it’s easy to believe that might be part of an official solution. If we achieve equality throughout the argument, that’s most of the way to a construction as well. It won’t be so easy to turn this argument into a construction for n\ge 19 because there won’t be equality anywhere.

We have to hope there is a construction for n=18. What follows is a description of a process to derive (or fail to derive) such a construction. In a solution, one would not need to give this backstory.

Anyway, in such a construction, let \alpha\in\{1,2,\ldots,18\} describe the congruence class modulo 18 which is exhausted by \{a_1,\ldots,a_{17}\}. I’m going to hope that \alpha=18 because then the calculations will be easier since everything’s a multiple of 18. We haven’t yet used the fact that for a problem, we need \alpha\equiv-(a_1+\ldots+a_{n-1}). We definitely have to use that. There are 16 multiples of 18 (ie relevant integers in the congruence class), so exactly one of the terms so far, say a_j, is not a multiple of 18. But then

0 \equiv 0+\ldots+0+a_j+0+\ldots+0,

which can’t happen. With a bit of experimentation, we find a similar problem making a construction using the other congruence classes with 16 elements, namely \alpha\in \{13,14,\ldots,18\}.

So we have to tackle a different class. If \alpha\le 12 then our sequence must be

\alpha,18+\alpha,2\times 18 +\alpha, \ldots, 16\times 18 + \alpha,

in some order. In fact, let’s add extra notation, so our sequence is

(a_1,\ldots,a_{17}) = (18\lambda_1+ \alpha,\ldots,18\lambda_{17}+\alpha),

where (\lambda_1,\ldots,\lambda_{17}) is a permutation of {0,…,16}. And so we require

k \,\big|\, 18(\lambda_1+\ldots+\lambda_k) + k\alpha, (%)

for 1\le k\le 17. But clearly we can lop off that k\alpha, and could ignore the 18. Can we find a permutation \lambda such that

k \,\big|\, \lambda_1+\ldots+\lambda_k.

This was where I wasted a long time. I played around with lots of examples and kept getting stuck. Building it up one term at a time, I would typically get stuck around k=9,10. And I had some observations that in all the attempted constructions, the values of \frac{\lambda_1+\ldots+\lambda_k}{k} were around 8 and 9 too when I got stuck.

I became convinced this subproblem wasn’t possible, and decided that would be enough to show that n=18 wasn’t a possible failure length. I was trying to show the subproblem via a parity argument (how must the a_is alternate odd/even to ensure all the even partial sums are even) but this wasn’t a problem. Then I came up with a valid argument. We must have

\lambda_1+\ldots+\lambda_{17}=136= 16\times 8 + 8\quad\text{and}\quad 16\,\big|\,\lambda_1+\ldots+\lambda_{16},

which means \lambda_1+\ldots+\lambda_{16} must be 128 = 15×8 + 8, ie \lambda_{17}=8. But then we also have 15\,\big|\, \lambda_1+\ldots+\lambda_{15}, which forces $latex\lambda_{16}=8$ also. Which isn’t possible.

If this then hadn’t wasted enough time, I then tried to come up with a construction for n=19, for which there are lots more variables, and took a lot more time, and seemed to be suffering from similar problems, just in a more complicated way. So I became convinced I must have made a mistake, because I was forced down routes that were way too complicated for a 3.5 hour exam. Then I found it…

What did I do wrong? I’ll just say directly. I threw away the 18 after (%). This made the statement stronger. (And in fact false.) Suppose instead I’d thrown away a factor of 9 (or no factors at all, but it’s the residual 2 that’s important). Then I would be trying to solve

k\,\big|\,2(\lambda_1+\ldots+\lambda_k).

And now if you experiment, you will notice that taking \lambda_1=0,\lambda_2=1,\lambda_3=2,\ldots seems to work fine. And of course, we can confirm this, using the triangle number formula for the second time in the paper!

This had wasted a lot of time, but once that thought is present, we’re done, because we can go straight back and exhibit the sequence

(a_1,\ldots,a_{17}) = (1, 18+1,2\times 18 +1,\ldots, 16\times 18 +1).

Then the sum so far is congruent to -1 modulo 18, but we have exhausted all the available integers which would allow the sum of the first 18 terms to be a multiple of 18. This confirms that the answer to the question as stated is 17.

At the start, I said that we should be cautious about generalising. In the end, this was wise advice. We definitely used the fact that 18 was even in the stage I over-reduced the first time. We also used the fact that there was at least one value of \alpha with an ‘extra’ member of the congruence class. So I’m pretty sure this proof wouldn’t have worked with 288 = 16×18 cards.

Footnotes

[1] – If shading were a weighted (or continuous or whatever you prefer) property, ie that each cell has a quantity of shading given by a non-negative real number, and we merely demand that the total shading per comb is at least one, then the bound 8N/3 is in fact correct for the total shading. We could look at a 2xN block, and give 1/3 shading to one cell of each colour in the block. Alternatively, we could be very straightforward and apply 2/3N shading to every cell in the grid. The fact that shading has to be (in this language) zero or one, imposes meaningful extra constraints which involve the shape of the comb.

BMO1 2017 – Questions 1-4

The first round of the British Mathematical Olympiad was sat yesterday. The questions can be found here. I recorded some thoughts on the questions while I was in Cyprus, hence the nice Mediterranean sunset above. I hope this might be useful to current or future contestants, as a supplement to the concise official solutions available. It goes without saying that while these commentaries may be interesting at a general level, they will be much more educational to students who have at least digested and played around with the questions, so consider trying the paper first. Video solutions are available here. These have more in common with this blog post than the official solutions, though inevitably some of the methods are slightly different, and the written word has some merits and demerits over the spoken word for clarity and brevity.

The copyright for these questions lies with BMOS, and are reproduced here with permission. Any errors or omissions are obviously my own.

I found the paper overall quite a bit harder than in recent years, or at least harder to finish quickly. I’ve therefore postponed discussion of the final two problems to a second post, to follow shortly.

Question One

A recurring theme of Q1 from BMO1 in recent years has been: “it’s possible to do this problem by a long, and extremely careful direct calculation, but additional insight into the setup makes life substantially easier.”

This is the best example yet. It really is possible to evaluate Helen’s sum and Phil’s sum, and compare them directly. But it’s easy to make a mistake in recording all the remainders when the divisor is small, and it’s easy to make a mistake in summation when the divisor is large, and so it really is better to have a think for alternative approaches. Making a mistake in a very calculation-heavy approach is generally penalised heavily. And this makes sense intellectually, since the only way for someone to fix an erroneous calculation is to repeat it themselves, whereas small conceptual or calculation errors in a less onerous solution are more easily isolated and fixed by a reader. Of course, it also makes sense to discourage such attempts, which aren’t really related to enriching mathematics, which is the whole point of the exercise!

Considering small divisors (or even smaller versions of 365 and 366) is sometimes helpful, but here I think a ‘typical’ divisor is more useful. But first, some notation will make any informal observation much easier to turn into a formal statement. Corresponding to Helen and Phil, let h(n) be the remainder when n is divided by 365, and p(n) the remainder when n is divided by 366. I would urge students to avoid the use of ‘mod’ in this question, partly because working modulo many different bases is annoying notationally, partly because the sum is not taken modulo anything, and partly because the temptation to use mod incorrectly as an operator is huge here [1].

Anyway, a typical value might be n=68, and we observe that 68 x 5 + 25 = 365, and so h(68)=25 and p(68)=26. Indeed, for most values of n, we will have p(n)=h(n)+1. This is useful because

p(1)+p(2)+\ldots+p(366) - \left(h(1)+h(2)+\ldots+h(365)\right)

= \left(p(1)-h(1)\right) + \ldots+\left(p(365)-h(365)\right) + p(366),

and now we know that most of the bracketed terms are equal to one. We just need to handle the rest. The only time it doesn’t hold that p(n)=h(n)+1 is when 366 is actually a multiple of n. In this case, p(n)=0 and h(n)=n-1. We know that 366 = 2 x 3 x 61, and so its divisors are 1, 2, 3, 6, 61, 122, 183.

Then, in the big expression above, seven of the 365 bracketed terms are not equal to 1. So 358 of them are equal to one. The remaining ones are equal to 0, -1, -2, -5, -60, -121, -182 respectively. There are shortcuts to calculate the sum of these, but it’s probably safer to do it by hand, obtaining -371. Overall, since p(366)=0, we have

p(1)+p(2)+\ldots+p(366) - \left(h(1)+h(2)+\ldots+h(365)\right)

= -371 + 358 + 0 = -13.

So, possibly counter-intuitively, Helen has the larger sum, with difference 13, and we didn’t have to do a giant calculation…

Question Two

Suppose each person chooses which days to go swimming ‘at random’, without worrying about how to define this. Is this likely to generate a maximum or minimum value of n? I hope it’s intuitively clear that this probably won’t generate an extreme value. By picking at random we are throwing away lots of opportunity to force valuable overlaps or non-overlaps. In other words, we should start thinking about ways to set up the swimming itinerary with lots of symmetry and structure, and probably we’ll eventually get a maximum or a minimum. At a more general level, with a problem like this, one can start playing around with proof methods immediately, or one can start by constructing lots of symmetric and extreme-looking examples, and see what happens. I favour the latter approach, at least initially. You have to trust that at least one of the extreme examples will be guess-able.

The most obvious extreme example is that everyone swims on the first 75 days, and no-one swims on the final 25 days. This leads to n=75. But we’re clearly ‘wasting’ opportunities in both directions, because there are never exactly five people swimming. I tried a few more things, and found myself simultaneously attacking maximum and minimum, which is clearly bad, so focused on minimum. Just as a starting point, let’s aim for something small, say n=4. The obstacle is that if you demand at most four swimmers on 96 days, then even with six swimmers on the remaining four days, you don’t end up with enough swimming having taken place!

Maybe you move straight from this observation to a proof, or maybe you move straight to a construction. Either way, I think it’s worth saying that the proof and the construction come together. My construction is that everyone swims on the first 25 days, then on days 26-50 everyone except A and B swim, on days 51-75 everyone except C and D swim, and on days 76-100 everyone except E and F swim. This exactly adds up. And if you went for the proof first, you might have argued that the total number of swim days is 6×75 = 450, but is at most 4n + 6(100-n). This leads immediately to n\ge 25, and I just gave the construction. Note that if you came from this proof first, you can find the construction because your proof shows that to be exact you need 25 days with six swimmers, and 75 days with four swimmers, and it’s natural to try to make this split evenly. Anyway, this clears up the minimum.

[Less experienced contestants might wonder why I was worried about generating a construction despite having a proof. Remember we are trying to find the minimum. I could equally have a proof for n\ge 10 which would be totally totally valid. But this wouldn’t show that the minimum was n=10, because that isn’t in fact possible (as we’ve seen), hence it’s the construction that confirms that n=25 is the true minimum.]

It’s tempting to go back to the drawing board for the maximum, but it’s always worth checking whether you can directly adjust the proof you’ve already given. And here you can! We argued that

450\le 4n + 6(100-n)

to prove the minimum. But equally, we know that on the n days we have at least five swimmers, and on the remaining days, we have between zero and four swimmers, so

450 \ge 5n + 0\times (100-n), (*)

which gives n\le 90. If we have a construction that attains this bound then we are done. Why have I phrased (*) with the slightly childish multiple of zero? Because it’s a reminder that for a construction to attain this bound, we really do need the 90 days to have exactly five swimmers, and the remaining ten days to have no swimmers. So it’s clear what to do. Split the first 90 days into five groups of 15 days. One swimmer skips each group. No-one swims in the final ten days, perhaps because of a jellyfish infestation. So we’re done, and 25\le n\le 90.

At a general level, it’s worth noting that in the story presented, we found an example for the minimum which we turned into a proof, and then a proof for the maximum, which we then analysed to produce a construction.

Note that similar bounding arguments would apply if we fiddled with the numbers 5, 75 and 100. But constructions matching the bounds might not then be possible because the splits wouldn’t work so nicely. This would make everything more complicated, but probably not more interesting.

Question Three

It’s understandable that lots of students attempting this paper might feel ill-at-ease with conventional Euclidean geometry problems. A good first rule of thumb here, as in many settings, is “don’t panic!”, and a more specific second rule of thumb is “even if you think you can calculate, try to find geometric insight first.”

Here, it really does look like you can calculate. A configuration based on a given isosceles triangle and a length condition and a perpendicular line is open to several coordinate approaches, and certainly some sensible trigonometry. It’s also very open to organised labelling of the diagram. You have three equal lengths, and a right-angle, as shown.

The key step is this. Drop the perpendicular from A to BC, and call its foot D. That alone really is the key step, as it reduces both parts of the question to an easy comparison. It’s clear that the line AD splits the triangle into two congruent parts, and thus equal areas and perimeters. So it is enough to show that triangle BMN has the same area as triangle ABD, and that their outer-perimeters (ie the part of its perimeter which is also the perimeter of ABC) are the same.

But they’re congruent, so both of these statements are true, and the problem is solved.

My solution could be as short as two or three lines, so for the purposes of this post all that remains is to justify why you might think of the key step. Here are a few possible entry routes:

  • You might notice that line AD induces the required property for triangle ABD.
  • You might try to find a triangle congruent to AMN, and come up with D that way.
  • There’s already a perpendicular in the question so experimenting with another one is natural, especially since the perpendicular from A has straightforward properties.
  • AMN is a right angle, and so constructing D gives a cyclic quadrilateral. We didn’t use that directly in the proof above, but constructing cyclic quadrilaterals is usually a good idea.
  • If you were trying a calculation approach, you probably introduced the length AD, or at least the midpoint D as an intermediate step.

On the video, Mary Teresa proposes a number of elegant synthetic solutions with a few more steps. You might find it a useful exercise to try to come up with some motivating reasons like the bullet points above to justify her suggestion to reflect A in M as a first step.

Question Four

BMO1 2017Q4

I wasn’t paying enough attention initially, and I calculated a_2=0\text{ or }2. This made life much much more complicated. As with IMO 2017 Q1, if trying to deduce general behaviour from small examples, it’s essential to calculate the small examples correctly!

Once you engage your brain properly, you find that a_2=0 \text{ or }3, and of course a_2=0 is not allowed, since it must be positive. So a_2=3, and a similar calculation suggests a_3=1\text{ or }6. It’s clear that the set of values for a_{k+1} depends only on a_k, so if you take a_3=1, then you’re back to the situation you started with at the beginning. If you choose to continue the exploration with a_3=6, you will find a_4=2\text{ or }10, at which point you must be triggered by the possibility that triangle numbers play a role here.

As so often with a play-around with small values, you need to turn a useful observation into a concrete statement, which could then be applied to the problem statement. It looks like in any legal sequence, every term will be a triangle number, so we only need to clarify which triangle number. An example of a suitable statement might be:

Claim: If a_n=T_k=\frac{k(k+1)}{2}, the k-th triangle number, then a_{n+1}=T_{k-1}\text{ or }T_{k+1}.

There are three stages. 1) Checking the claim is true; 2) checking the claim is maximally relevant; 3) proving it. In this case, proving it is the easiest bit. It’s a quick exercise, and I’m omitting it. Of course, we can’t prove any statement which isn’t true, and here we need to make some quick adjustment to account for the case k=1, for which we are forced to take a_{n+1}=T_{k+1}.

The second stage really concerns the question “but what if a_n\ne T_k?” While there are deductions one could make, the key is that if a_1 is a triangle number, the claim we’ve just made shows that a_n is always a triangle number, so this question is irrelevant. Indeed the claim further shows that a_{2017}\le T_{2017}, and also that a_{2017}=T_k for some odd value of k. To be fully rigorous you should probably describe a sequence which attains each odd value of k, but this is really an exercise in notation [2], and it’s very obvious they are all attainable.

In any case, the set of possible values is \{T_1,T_3,\ldots,T_{2017}\}, which has size 1009.

Final two questions

These are discussed in a subsequent post.

Footnotes

[1] – mod n is not an operator, meaning you shouldn’t think of it as ‘sending integers to other integers’, or ‘taking any integer, to an integer in {0,1,…,n-1}’. Statements like 19 mod 5 = 4 are useful at the very start of an introduction to modular arithmetic, but why choose 4? Sometimes it’s more useful to consider -1 instead, and we want statements like a^p\equiv a modulo p to make sense even when a\ge p. 19 = 4 modulo 5 doesn’t place any greater emphasis on the 4 than the 19. This makes it more like a conventional equals sign, which is of course appropriate.

[2] – Taking a_n=T_n for $1\le n\le k$, and thereafter a_n=T_k if k is odd, and $a_n=T_{k+1}$ if k is even will certainly work, as will many other examples, some perhaps easier to describe than this one, though make sure you don’t accidentally try to use T_0!

Kernels of critical graph components

This post is motivated by G(N,p), the classical Erdos-Renyi random graph, specifically its critical window, when p=p(N)=\frac{1}{N}(1+\lambda N^{-1/3}).

We start with the following observation, which makes no restriction on p. Suppose a component of G(N,p) is a tree. Then, the graph geometry of this component is that of a uniform random tree on the appropriate number of vertices. This is deliberately informal. To be formal, we’d have to say “condition on a particular subset of vertices forming a tree-component” and so on. But the formality is broadly irrelevant, because at the level of metric scaling limits, if we want to describe the structure of a tree component, it doesn’t matter whether it has \log N or \frac{1}{7}N vertices, because in both cases the tree structure is uniform. The only thing that changes is the scaling factor.

In general, when V vertices form a connected component of a graph with E edges, we define the excess to be E-V+1. So the excess is non-negative, and is zero precisely when the component is a tree. I’m reluctant to say that the excess counts the number of cycles in the component, but certainly it quantifies the amount of cyclic structure present. We will sometimes, in a mild abuse of notation, talk about excess edges. But note that for a connected component with positive excess, there is a priori no way to select which edges would be the excess edges. In a graph process, or when there is some underlying exploration of the component, there sometimes might be a canonical way to classify the excess edges, though it’s worth remarking that the risk of size-biasing errors is always extremely high in this sort of situation.

Returning to the random graph process, as so often there are big changes around criticality. In the subcritical regime, the components are small, and most of them, even the largest with high probability, are trees. In the supercritical regime, the giant component has excess \Theta(N), which is qualitatively very different.

It feels like every talk I’ve ever given has begun with an exposition of Aldous’s seminal paper [Al97] giving a distributional scaling limit of the sizes of critical components in the critical window, and a relation between the process on this time-scale and the multiplicative coalescent. And it remains relevant here, because the breadth-first exploration process can also be used to track the number of excess edges.

In a breadth-first exploration, we have a stack of vertices we are waiting to explore. We pick one and look its neighbours restricted to the rest of the graph, that is without the vertices we have already fully explored, and also without the other vertices in the stack. That’s the easiest way to handle the total component size. But we can simultaneously track how many times we would have joined to a neighbour within the stack, which leads to an excess edge, and Aldous derives a joint distributional scaling limit for the sizes of the critical components and their excesses. (Note that in this case, there is a canonical notion of excess edge, but it depends not just on the graph structure, but also on the extra randomness of the ordering within the breadth-first search.)

Roughly speaking, we consider the reflected exploration process, and its scaling limit, which is a reflected parabolically-drifting Brownian motion (though the details of this are not important at this level of exposition, except that it’s a well-behaved non-negative process that hits zero often). The component sizes are given by the widths of the excursions above zero, scaled up in a factor N^{1/3}. Then conditional on the shape of the excursion, the excess is Poisson with parameter the area under the excursion, with no rescaling. That is, a critical component has \Theta(1) excess.

So, with Aldous’s result in the background, when we ask about the metric structure of these critical components, we are really asking: “what does a uniformly-chosen connected component with fixed excess look like when the number of vertices grows?”

I’ll try to keep notation light, but let’s say T(n,k) is a uniform choice from connected graphs on n vertices with excess k.

[Note, the separation of N and n is deliberate, because in the critical window, the connected components have size n = \Theta(N^{2/3}), so I want to distinguish the two problems.]

In this post, we will mainly address the question: “what does the cycle structure of T(n,k) look like for large n?” When k=0, we have a uniform tree, and the convergence of this to the Brownian CRT is now well-known [CRT2, LeGall]. We hope for results with a similar flavour for positive excess k.

2-cores and kernels

First, we have to give a precise statement of what it means to study just the cycle structure of a connected component. From now on I will assume we are always working with a connected graph.

There are several equivalent definitions of the 2-core C(G) of a graph G:

  • When the excess is positive, there are some cycles. The 2-core is the union of all edges which form part of some cycle, and any edges which lie on a path between two edges which both form part of some cycle.
  • C(G) is the maximal induced subgraph where all degrees are at least two.
  • If you remove all the leaves from the graph, then all the leaves from the remaining graph, and continue, the 2-core is the state you arrive at where there are no leaves.

It’s very helpful to think of the overall structure of the graph as consisting of the 2-core, with pendant trees ‘hanging off’ the 2-core. That is, we can view every vertex of the 2-core as the root of a (possibly size 1) tree. This is particular clear if we remove all the edges of the 2-core from the graph. What remains is a forest, with one tree for each vertex of the 2-core.

In general, the k-core is the maximal induced subgraph where all degrees are at least k. The core is generally taken to be something rather different. For this post (and any immediate sequels) I will never refer to the k-core for k>2, and certainly not to the traditional core. So I write ‘core’ for ‘2-core’.

As you can see in the diagram, the core consists of lots of paths, and topologically, the lengths of these paths are redundant. So we will often consider instead the kernel, K(G), which is constructed by taking the core and contracting all the paths between vertices of degree greater than 2. The resulting graph has minimal degree at least three. So far we’ve made no comment about the simplicity of the original graphs, but certainly the kernel need not be simple. It will regularly have loops and multiple edges. The kernel of the graph and core in the previous diagram is therefore this:

Kernels of critical components

To recap, we can deconstruct a connected graph as follows. It has a kernel, and each edge of the kernel is a path length of some length in the core. The rest of the graph consists of trees hanging off from the core vertices.

For now, we ask about the distribution of the kernel of a T(n,K). You might notice that the case k=1 is slightly awkward, as when the core consists of a single cycle, it’s somewhat ambiguous how to define the kernel. Everything we do is easily fixable for k=1, but rather than carry separate cases, we handle the case k\ge 2.

We first observe that fixing k doesn’t confirm the number of vertices or edges in the kernel. For example, both of the following pictures could correspond to k=3:

However, with high probability the kernel is 3-regular, which suddenly makes the previous post relevant. As I said earlier, it can introduce size-biasing errors to add the excess edges one-at-a-time, but these should be constant factor errors, not scaling errors. So imagine the core of a large graph with excess k=2. For the sake of argument, assume the kernel has the dumbbell / handcuffs shape. Now add an extra edge somewhere. It’s asymptotically very unlikely that this is incident to one of the two vertices with degree three in the core. Note it would need to be incident to both to generate the right-hand picture above. Instead, the core will gain two new vertices of degree three.

Roughly equivalently, once the size of the core is fixed (and large) we have to make a uniform choice from connected graphs of this size where almost every vertex has degree 2, and \Theta(1) of the rest have degree 3 or higher. But the sum of the degrees is fixed, because the excess is fixed. If there are n vertices in the core, then there are \Theta(n) more graphs where all the vertices have degree 2 or 3, than graphs where a vertex has degree at least 4. Let’s state this formally.

Proposition: The kernel of a uniform graph with n vertices and excess k\ge 2 is, with high probability as n\rightarrow\infty, 3-regular.

This proved rather more formally as part of Theorem 7 of [JKLP], essentially as a corollary after some very comprehensive generating function setup; and in [LPW] with a more direct computation.

In the previous post, we introduced the configuration model as a method for constructing regular graphs (or any graphs with fixed degree sequence). We observe that, conditional on the event that the resulting graph is simple, it is in fact uniformly-distributed among simple graphs. When the graph is allowed to be a multigraph, this is no longer true. However, in many circumstances, as remarked in (1.1) of [JKLP], for most applications the configuration model measure on multigraphs is the most natural.

Given a 3-regular labelled multigraph H with 2(k-1) vertices and 3(k-1) edges, and K a uniform choice from the configuration model with these parameters, we have

\mathbb{P}\left( K \equiv H \right) \propto \left(2^{t(H)} \prod_{e\in E(H)} \mathrm{mult}(e)! \right)^{-1},

where t(H) is the number of loops in H, and mult(e) the multiplicity of an edge e. This might seem initially counter-intuitive, because it looks we are biasing against graphs with multiple edges, when perhaps our intuition is that because there are more ways to form a set of multiple edges we should bias in favour of it.

I think it’s most helpful to look at a diagram of a multigraph as shown, and ask how to assign stubs to edges. At a vertex with degree three, all stub assignments are different, that is 3!=6 possibilities. At the multiple edge, however, we care which stubs match with which stubs, but we don’t care about the order within the multi-edge. Alternatively, there are three choices of how to divide each vertex’s stubs into (2 for the multi-edge, 1 for the rest), and then two choices for how to match up the multi-edge stubs, ie 18 in total = 36/2, and a discount factor of 2.

We mention this because in fact K(T(n,k)) converges in distribution to this uniform configuration model. Once you know that K(T(n,k)) is with high probability 3-regular, then again it’s probably easiest to think about the core, indeed you might as well condition on its total size and number of degree 3 vertices. It’s then not hard to convince yourself that a uniform choice induces a uniform choice of kernel. Again, let’s state that as a proposition.

Proposition: For any H a 3-regular labelled multigraph H with 2(k-1) vertices and 3(k-1) edges as before,

\lim_{n\rightarrow\infty}\mathbb{P}\left( K(T(n,k)) \equiv H \right) \propto \left(2^{t(H)} \prod_{e\in E(H)} \mathrm{mult}(e)! \right)^{-1}.

As we said before, the kernel describes the topology of the core. To reconstruct the graph, we need to know the lengths in the core, and then how to glue pendant trees onto the core. But this final stage depends on k only through the total length of paths in the core. Given that information, it’s a combinatorial problem, and while I’m not claiming it’s easy, it’s essentially the same as for the case with k=1, and is worth treating separately.

It is worth clarifying a couple of things first though. Even the outline of methods above relies on the fact that the size of the core diverges as n grows. Again, the heuristic is that up to size-biasing errors, T(n,k) looks like a uniform tree with some uniformly-chosen extra edges. But distances in T(n,k) scale like n^{1/2} (and thus in critical components of G(N,p) scale like N^{1/3}). And the core will be roughly the set of edges on paths between the uniformly-chosen pairs of vertices, and so will also have length \Theta(n^{1/2}).

Once you have conditioned on the kernel structure, and the (large) number of internal vertices on paths in the core (ie the length of the core), it is natural that the assignment of the degree-2 vertices to core paths / kernel edges is uniform. A consequence of this is that if you record (Y_1,\ldots,Y_m) the lengths of paths in the core, where m=3(k-1), then

\frac{(Y_1,\ldots,Y_m)}{\sum Y_i} \stackrel{d}\rightarrow \mathrm{Dirichlet}(1,1,\ldots,1).

This is stated formally as Corollary 7 b) of [ABG09]. It’s worth noting that this confirms that the lengths of core paths are bounded in probability away from zero after the appropriate rescaling. In seeking a metric scaling limit, this is convenient as it means there’s so danger that two of the degree-3 vertices end up in ‘the same place’ in the scaling limit object.

To recap, the only missing ingredients now to give a complete limiting metric description of T(n,k) are 1) a distributional limit of the total core length; 2) some appropriate description of set of pendant trees conditional on the size of the pendant forest. [ABG09] show the first of these. As remarked before, all the content of the second of these is encoded in the unicyclic k=1 case, which I have written about before, albeit slightly sketchily, here. (Note that in that post we get around size-biasing by counting a slightly different object, namely unicyclic graphs with an identified cyclic edge.)

However, [ABG09] also propose an alternative construction, which you can think of as glueing CRTs directly onto the stubs of the kernel (with the same distribution as before). The proof that this construction works isn’t as painful as one might fear, and allows a lot of the other metric distributional results to be read off as corollaries.

References

[ABG09] – Addario-Berry, Broutin, Goldschmidt – Critical random graphs: limiting constructions and distributional properties

[CRT2] – Aldous – The continuum random tree: II

[Al97] – Aldous – Brownian excursions, critical random graphs and the multiplicative coalescent

[JKLP] – Janson, Knuth, Luczak, Pittel – The birth of the giant component

[LeGall] – Le Gall – Random trees and applications

[LPW] – Luczak, Pittel, Wierman – The structure of a random graph at the point of the phase transition

 

Random 3-regular graphs

A graph is d-regular if every vertex has degree d. Probably the easiest examples of d-regular graphs are the complete graph on (d+1) vertices, and the infinite d-ary tree. A less trivial example is the Petersen graph, which is 3-regular. 3-regular graphs will be the main focus for some of this post, but initially we lose nothing by considering general d.

Throughout, a necessary condition for the existence of a d-regular graph with N vertices is that at least one of d and N is even, as the sum of the degrees of a graph must be even. We will always assume that this holds, so that when d=3, we are always taking N to be even.

A natural pair of questions for a probabilist is ‘can we sample a d-regular graph with N vertices uniformly at random?’ and ‘what does a typical large d-regular graph look like?’

In a rather old post, I addressed some aspects of the first question, but revisit it briefly here. A good idea, due to Bollobas [B80] is to assign to all the vertices d stubs (or half-edges), and choose a matching of the Nd stubs uniformly at random. This works as a method to generate a random graph with any fixed degree sequence.

If you want your graphs to be simple, this can go wrong, because there’s a chance you get loops (that is, an edge from a vertex v to itself) and multiple edges between the same pair of vertices. It would be nice the graph formed in this fashion was simple with high probability when N\rightarrow\infty. Unfortunately that’s not the case, however the probability that the graph is simple remains asymptotically bounded away from 0 and 1. Indeed, because the presence of a loop / multiple edge is asymptotically independent of the presence of a loop / multiple edge elsewhere, it’s unsurprising we have a Poisson limit for the number of such occurences. So from a sampling point of view, it’s reasonable to sample a graph in this way until you find a simple one. This takes O(1) steps, and it’s O(N) steps to check whether a given multigraph is simple.

It’s clear that conditional on the graph generated in this fashion being simple, its distribution is uniform on the set of simple graphs with the correct degree distribution. If you are happy for your graphs to have loops, then it’s a little bit more complicated, because if an edge has multiplicity k, these can appear in k! ways in the configuration construction.

Other asymptotic properties

Loops and multiple edges can be thought of as cycles of length 1 and 2 respectively if you want. We might ask about other small cycles. A calculation in expectation is relatively straightforward. Given three vertices, the probability they form a triangle (in at least one way) is \Theta(N^{-3}), and there are \Theta(N^3) ways to choose three vertices. Thus the expected number of triangles is \Theta(1). Finally, the edge structure induced on disjoint triples is asymptotically independent, and hence a Poisson limit. (See [J06] for details, including more detail on the general configuration construction.) The same result holds for the same reasons for cycles of any fixed finite length.

We might also ask about connectivity. At a heuristic level, there are two ways for the graph to be disconnected: it could have some small components; or it could have two components of size \Theta(N). The smallest possible component is K_4, and an argument like for the cycles above shows that the number of copies of K_4 vanishes in expectation. Now, consider having two components of size roughly N/2. There are \binom{N}{N/2} \sim 2^{2N} ways to make this choice. However, given such a choice, we can handle the probability that all the stubs from one class match within that class by going through the class one stub at a time:

\frac{\frac{3N}{2}-1}{3N-1} \times \frac{\frac{3N}{2}-3}{3N-3} \times \cdots \times \frac{1}{\frac{3N}{2}+1}.

We approximate this as

\frac{\sqrt{(3N/2)!}}{\sqrt{ (3N)!}} \sim  e^{3N/2} 2^{-3N/2} \left(3N\right)^{-3N/2},

and this dominates the number of choices powerfully enough that we might believe it remains valid for a broader range of class sizes. In fact we have a much stronger statement, namely that G(N,3) is 3-connected with high probability. This means that the graph cannot be disconnected by removing two vertices, or equivalently that there are three vertex-disjoint paths between any pair of vertices in the graph, essentially one emerging from each stub. See this note by David Ellis for a quick proof. We might return to this later.

You might ask about planarity. It’s clear from degree consideration that there are no induced copies of K_5 in any random 3-regular graph, and since K_{3,3} contains a cycle of length 4, and with high probability G(N,3) doesn’t, that takes care of that possibility too. However, there might be minors of this form. This seemed a good example of the Kuratowski criterion not actually being that useful, since I certainly don’t find the minors of the 3-regular graph an obvious structure to handle.

However, we can use Euler’s formula V – E + F = 2 for planar graphs. Here V = N, E = 3N/2. Faces are described by (a subset of the) cycles, and we there are asymptotically O(1) small cycles, so most faces include a large number of edges. But each edge corresponds to at most two faces. So we have F \ll E, and so with high probability Euler’s formula can’t hold in G(N,3) for large N.

We can also ask about the local limit of G(N,3). Since the vertices are exchangeable, we don’t need to worry about whether we choose the root uniformly at random (often referred to as the Benjamini-Schramm sense) or by some other method.

The root has up to three neighbours, and with high probability it has exactly three neighbours. These neighbours have at most two other neighbours themselves. However, we’ve already seen that there are asymptotically O(1) cycles, and so with high probability there are no small cycles near a fixed root vertex. So the six neighbours-of-neighbours are with high probability different to the root and the root’s neighbours and to each other. We can make this argument at arbitrary finite radius from the root, to conclude that the local limit of G(N,3) is the infinite 3-ary tree.

Spectral expansion

[Caveat – this is something I read about and wanted to mention, but I really don’t know much at all about any of this theory, and it’s definitely not certain that what follows wouldn’t be better replaced by a set of links.]

This straightforward local limit offers good heuristics on some of the more global properties. Almost by definition, the d-ary tree expands as rapidly as is possible away from the root among infinite d-regular graphs. There are a number of ways to measure the expansion of a graph, and some methods transfer better to the infinite setting than others. The adjacency matrix of an infinite graph can be defined similarly to that of a finite graph, and it remains possible to talk about eigenfunctions and spectrum. As for the finite setting, d is an eigenvalue because the tree is d-regular, and -d is an eigenvalue because it is also bipartite.

The next largest eigenvalue \lambda_2 governs the spectral gap d-\lambda_2 which is a measure of the expansion of a graph. A graph is a good (spectral) expander if all the non-trivial eigenvalues are close to zero. A priori, all we know is that |\lambda_2|\le d. For the infinite d-ary tree, we have \lambda_2 = 2\sqrt{d-1}. This blog post by Luca Trevisan gives a very readable proof.

A key result is that finite graphs can have \lambda_2 \le 2\sqrt{d-1}, but not asymptotically. That is, taking N to be the number of vertices:

\lambda_2 \ge 2\sqrt{d-1} - o_N(1).

This is the content of the Alon-Boppana theorem [Al86]. In fact the error can be quantified as O(\frac{1}{\log N}) – the diamater of the graph is relevant here. A finite d-regular graph for which \lambda_2\le 2\sqrt{d-1} is called a Ramanujan graph. The existence of Ramanujan graphs has been much studied, and various constructions often rely on number theoretic properties of N, and lie at the interface of disparate branches of mathematics where my understanding is zero rather than epsilon.

Now return to our view of the d-ary tree as the local limit of a d-regular graph on N vertices for large N. We might expect from everything above that the uniform d-regular graph is a good expander. Bollobas shows that in the sense of edge-expansion, asymptotically almost all d-regular graphs have edge-expansion bounded away from zero. (See Section 2 of [Ell], including history of the d=3 case.) Friedman [Fri08] proves the conjecture of Alon that for every \epsilon>0, a.a.s. \lambda_2 for G(N,d) is at most 2\sqrt{d-1}+\epsilon. In this sense, G(N,d) is asymptotically ‘almost Ramanujan’. (See also [Bor17] for another proof and an introduction including history, context and references.)

Some other links: The Wikipedia page on expanders, which includes a discussion of the different descriptions of expansion, and the Cheeger inequalities and other relations between them; slides for a talk by Spielman on spectra and Ramanujan graphs; a survey by Murty on Ramanujan graphs;.

What next?

This post took a slightly different direction from what I had intended, and rather than make a halting U-turn back to my planned finale, I’ll postpone this. However, a short overture is that I’m interested in the structure of critical components of random graphs during the critical window. This is the window during which the largest components first have cycles with probability \Theta(1). Indeed, the critical components have size \Theta(N^{2/3}) and \Theta(1) surplus edges. Conditional on their size, and number of surplus edges, the choice of the graph structure on the component is uniform among such (connected) graphs.

Addario-Berry, Broutin and Goldschmidt [ABG09] study scaling limits of such components. Central to this analysis is the 2-core of such components, which can be described in terms of 3-regular (multi)graphs. Various processes we are now interested in running on the critical components of critical RGs can then be studied in terms of related processes on random 3-regular graphs.

References

[ABG09] – Addario-Berry, Broutin, Goldschmidt – Critical random graphs: limiting constructions and distributional properties

[Al86] – Alon – Eigenvalues and expanders

[B80] – Bollobas – A probabilistic proof of an asymptotic formula for the number of labelled regular graphs

[B88] – Bollobas – The isoperimetric number of random regular graphs

[Bor17] – Bordenave – A new proof of Friedman’s second eigenvalue theorem and its extension to random lifts. Arxiv.

[Ell] – Ellis – The expansion of random regular graphs

[Fri08] – Friedman – A proof of Alon’s second eigenvalue conjecture and related problems

[J06] – Janson – The probability that a random multigraph is simple by

Random transpositions

We study a procedure for generating a random sequence of permutations of [N]. We start with the identity permutation, and then in each step, we choose two elements uniformly at random, and swap them. We obtain a sequence of permutations, where each term is obtained from the previous one by multiplying by a uniformly-chosen transposition.

Some more formality and some technical remarks:

  • This is a Markov chain, and as often with Markov chains, it would be better it was aperiodic. As described, the cycle will alternate between odd and even permutations. So we allow the two elements chosen to be the same. This laziness slows down the chain by a factor N-1/N, but removes periodicity. We will work over timescales where this adjustment makes no practical difference.
  • Let \tau_1,\tau_2,\ldots be the sequence of transpositions. We could define the sequence of permutations by \pi_m= \tau_m\cdot\tau_{m-1}\cdot \ldots\cdot \tau_1. I find it slightly more helpful to think of swapping the elements in places i and j, rather the elements i and j themselves, and so I’ll use this language, for which \pi_m = \tau_1\cdot \tau_2\cdot\ldots \cdot \tau_m is the appropriate description. Of course, transpositions and the identity are self-inverse permutations, so it makes no difference to anything we might discuss.
  • You can view this as lazy random walk on the Cayley graph of S_N generated by the set of transpositions. That is, the vertices of the graph are elements of S_N, and two are connected by an edge if one can be obtained from the other by multiplying by a transposition. Note this relation is symmetric. Hence random transposition random walk.
  • Almost everything under discussion would work in continuous time too.

At a very general level, this sort of model is interesting because sometimes the only practical way to introduce ‘global randomness’ is repeatedly to apply ‘local randomness’. This is not the case for permutations – it is not hard to sample uniformly from S_N. But it is a tractable model in which to study relevant questions about the generating randomness on a complicated set through iterated local operations.

Since it is a Markov chain with a straightforward invariant distribution, we can ask about the mixing time. That is, the correct scaling for the number of moves before the random permutation is close in distribution (say in the sense of total variation distance) to the equilibrium distribution. See this series of posts for an odd collection of background material on the topic. Diaconis and Shahshahani [DS81] give an analytic argument for mixing around \frac{N\log N}{2} transpositions. Indeed they include a constant because it is a sharp cutoff, where the total variation distance drops from approximately 1 to approximately 0 in O(N) steps.

Comparison with Erdos-Renyi random graph process

In the previous result, one might observe that m=\frac{N\log N}{2} is also the threshold number of edges to guarantee connectivity of the Erdos-Renyi random graph G(N,m) with high probability. [ER59] Indeed, there is also a sharp transition around this threshold in this setting too.

We explore this link further. We can construct a sequence of random graphs simultaneously with the random transposition random walk. When we multiply by transposition (i j), we add edge ij in the graph. Laziness of RTRW and the possibility of multiple edges mean this definition isn’t literally the same as the conventional definition of a discrete-time Erdos-Renyi random graph process, but again this is not a problem for any of the effects we seek to study.

The similarity between the constructions is clear. But what about the differences? For the RTRW, we need to track more information than the random graph. That is, we need to know what order the transpositions were added, rather than merely which edges were added. However, the trade-off is that a permutation is a simpler object than a graph in the following sense. A permutation can be a described as a union of disjoint cycles. In an exchangeable setting, all the information about a random permutation is encoded in the lengths of the these cycles. Whereas in a graph, geometry is important. It’s an elegant property of the Erdos-Renyi process that we can forget about the geometry and treat it as a process on component sizes (indeed, a multiplicative coalescent process), but there are other questions we might need to ask for which we do have to study the graph structure itself.

Within this analogy, unfortunately the word cycle means different things in the two different settings. In a permutation, a cycle is a directed orbit, while in a graph it has the usual definition. I’m going to write graph-cycle whenever relevant to avoid confusion.

A first observation is that, under this equivalence, the cycles of the permutation form a finer partition than the components of the graph. This is obvious. If we split the vertices into sets A and B, and there are no edges between them, then nothing in set A will ever get moved out of set A by a transposition. (Note that the slickness of this analogy is the advantage of viewing a transposition as swapping the elements in places i and j.)

However, we might then ask under what circumstances is a cycle of the permutation the same as a component of the graph (rather than a strict subset of it). A first answer is the following:

Lemma: [Den59] The permutation formed by a product of transpositions corresponding in any order to a tree in the graph has a single cycle.

We can treat this as a standalone problem and argue in the following predictable fashion. (Indeed, I was tempted to set this as a problem during selection for the UK team for IMO 2017 – it’s perfectly suitable in this context I think.) The first transposition corresponds to some edge say ab, and removing this edge divides the vertices into components A \ni a, B\ni b. Since no further transposition swaps between places in A and places in B, the final permutation maps a into B and b into A, and otherwise preserves A and B.

This argument extends to later transpositions too. Now, suppose there are multiple cycles. Colour one of them. So during the process, the coloured labels move around. At some point, we must swap a coloured label with an uncoloured label. Consider this edge, between places a and b as before, and indeed the same conclusion holds. WLOG we move the coloured label from a to b. But then at the end of the process (ie in the permutation) there are more coloured labels in B than initially. But the number of coloured labels should be the same, because they just cycle around in the final permutation.

We can learn a bit more by trying thinking about the action on cycles (in the permutation) of adding a transposition. In the following pair of diagrams, the black arrows represent the original permutation (note it’s not helpful to think of the directed edges as having anything to do with transpositions now), the dashed line represents a new transposition, and the new arrows describe the new permutation which results from this product.

It’s clear from this that adding a transposition between places corresponding to different cycles causes the cycles to merge, while adding a transposition between places already in the same cycle causes the cycle to split into two cycles. Furthermore the sizes of the two cycles formed is related to the distance in the cycle between the places defining the transposition.

This allows us to prove the lemma by adding the edges of the tree one-at-a-time and using induction. The inductive claim is that cycles of the permutation exactly correspond to components of the partially-built tree. Assuming this claim guarantees that the next step is definitely a merge, not a split (otherwise the edge corresponding to the next step would have to form a cycle). If all N-1 steps are merges, then the number of cycles is reduced by one on each step, and so the final permutation must be a single cycle.

Uniform split-merge

This gives another framework for thinking about the RTRW itself, entirely in terms of cycle lengths as a partition of [N]. That is, given a partition, we choose a pair of parts in a size-biased way. If they are different, we merge them; and if it is the same part, with size k, we split them into two parts, with sizes chosen uniformly from { (1,k-1), (2,k-2), …  (k-1,1) }.

What’s nice about this is that it’s easy to generalise to real-valued partitions, eg of [0,1]. Given a partition of [0,1], we sample two IID U[0,1] random variables U_1,U_2. If these correspond to different parts, we replace these parts by a single part with size given by the sum. If these correspond to the same part, with size \alpha, we split this part into two parts with sizes |U_1-U_2| and \alpha - |U_1-U_2|. This is equivalent in a distributional sense to sampling another U[0,1] variable U and replacing \alpha with (\alpha U, \alpha(1-U)). We probably want our partition to live in \ell^1_\searrow, so we might have to reorder the parts afterwards too.

These uniform split-merge dynamics have a (unique) stationary distribution, the canonical Poisson-Dirichlet random partition, hereafter PD(0,1). This was first shown in [DMZZ04], and then in a framework more relevant to this post by Schramm [Sch08].

Conveniently, PD(0,1) is also the scaling limit of the cycle lengths in a uniform random permutation (scaled by N). The best way to see this is to start with the observation that the length of the cycle containing 1 in a permutation chosen uniformly from S_N has the uniform distribution on {1,…,N}. This matches up well with the uniform stick-breaking construction of PD(0,1), though other arguments are available too. Excellent background on Poisson-Dirichlet distributions and this construction and equivalence can be found in Chapter 3 of Pitman’s comprehensive St. Flour notes [CSP]. Also see this post, and the links within, with the caveat that my understanding of the topic was somewhat shaky then (as presently, for now).

However, Schramm says slightly more than this. As the Erdos-Renyi graph passes criticality, there is a well-defined (and whp unique) giant component including \Theta(N) vertices. It’s not clear that the corresponding permutation should have giant cycles. Indeed, whp the giant component has \Theta(N) surplus edges, so the process of cycle lengths will have undergone O(N) splits. Schramm shows that most of the labels within the giant component are contained in giant cycles in the permutation. Furthermore, the distribution of cycle lengths within the giant component, rescaled by the size of the giant component, converges in distribution to PD(0,1) at any supercritical time \frac{(1+\epsilon)N}{2}

This is definitely surprising, since we already know that the whole permutation doesn’t look close to uniform until time \frac{N\log N}{2}. Essentially, even though the size of the giant component is non-constant (ie it’s gaining vertices), the uniform split-merge process is happening to the cycles within it at rate N. So heuristically, at the level of the largest cycles, at any supercritical time we have a non-trivial partition, so at any slightly later time (eg \frac{(1+\epsilon/2)N}{2} and \frac{(1+\epsilon)N}{2} ), mixing will have comfortably occurred, and so the distribution is close to PD(0,1).

This is explained very clearly in the introduction of [Ber10], in which the approach is extended to a random walk on S_N driven by a uniform choice from any conjugacy class.

So this really does tell us how the global uniform randomness emerges. As the random graph process passes criticality, we have a positive mass of labels in a collection of giant cycles which are effectively a continuous-space uniform split-merge model near equilibrium (and thus with PD(0,1) marginals). The remaining cycles are small, corresponding to small trees which make up the remaining (subcritical by duality) components of the ER graph. These cycles slowly get absorbed into the giant cycles, but on a sufficiently slow timescale relevant to the split-merge dynamics that we do not need to think of a separate split-merge-with-immigration model. Total variation distance on permutations does feel the final few fixed points (corresponding to isolated vertices in the graph), hence the sharp cutoff corresponding to sharp transition in the number of isolated vertices.

References

[Ber10] – N. Berestycki – Emergence of giant cycles and slowdown transition in random transpositions and k-cycles. [arXiv version]

[CSP] – Pitman – Combinatorial stochastic processes. [pdf available]

[Den59] – Denes – the representation of a permutation as a product of a minimal number of transpositions, and its connection with the theory of graphs

[DS81] – Diaconis, Shahshahani – Generating a random permutation with random transpositions

[DMZZ04] – Diaconis, Mayer-Wolf, Zeitouni, Zerner – The Poisson-Dirichlet distribution is the unique invariant distribution for uniform split-merge transformations [link]

[ER59] – Erdos, Renyi – On random graphs I.

[Sch08] – Schramm – Compositions of random transpositions [book link]

DGFF 4 – Properties of the Green’s function

I’m at UBC this month for the PIMS probability summer school. One of the long courses is being given by Marek Biskup about the Discrete Gaussian Free Field (notes and outline here) so this seems like a good moment to revive the sequence of posts about the DGFF. Here’s DGFF1, DGFF2, DGFF3 from November.

The first draft of this post was about the maximum of the DGFF in a large box V_N, and also about the Green’s function G^{V_N}(x,y), which specifies the covariance structure of the DGFF. This first draft also became too long, so I’m splitting it into two somewhat shorter ones. As we’ll see, some understanding and standard estimates of the Green’s function is enough to say quite a bit about the maximum. In this first post, we’ll explore some ‘low-hanging fruit’ concerning the Green’s function, as defined through a simple random walk, which are useful, but rarely explained in the DGFF literature.

Symmetry of Green’s function

We start with one of these low-hanging fruit. If G^{V_N} is to be a covariance matrix, it has to be symmetric. In the first post, showing that the definition of the DGFF as a random field with given Hamiltonian is equivalent to \mathcal{N}(0,G^{V_N}) certainly can be viewed as a proof of symmetry. However, it would be satisfying if there was a direct argument in the language of the definition of the Green’s function.

To make this self-contained, recall the random walk definition of G^{V_N}(x,y). Let (S_m)_{m\ge 0} be simple random walk on V_N, and \mathbb{P}_x,\,\mathbb{E}_x denote starting the random walk at x\in V_N. As usual, let \tau_y,\,\tau_A denote the hitting time of a vertex y or a set A respectively. Then

G^{V_N}(x,y):= \mathbb{E}_x \left[ \sum_{m=0}^{\tau_{\partial V_N}}1_{(S_m=y) }\right].

That is, G^{V_N}(x,y) is the expected number of visits to y by a random walk from x, before it exits V_N.

Let’s drop the superscript for now, as everything should hold for a more general subset of the lattice. I don’t think it’s immediately obvious at the level of Markov chains why G(x,y)=G(y,x). In particular, it’s not the case that

\mathbb{P}_x(\tau_y < \tau_{D^c}) = \mathbb{P}_y(\tau_x <\tau_{D^c}),

and it feels that we can’t map between paths x \to \partial D and y\to \partial D in a way that preserves the number of visits to y and x, respectively. However, we can argue that for any m

\mathbb{P}_x(S_m=y, \tau_{D^c}>m) = \mathbb{P}_y(S_m=x, \tau_{D^c}>m),

by looking at the suitable paths of (S_m). That is, if we have a path x=S_0,S_1,\ldots,S_m=y that stays within D, then the probability of seeing this path starting from x and its reverse direction starting from y are equal. Why? Because

\mathbb{P}_x(S_0=x,S_1=v_1,\ldots,S_{m-1}=v_{m-1},S_m=y) = \prod_{\ell=0}^{m-1} \frac{1}{\mathrm{deg}(v_\ell)},

and

\mathbb{P}_y(S_0=y,S_1=v_{m-1},\ldots,S_{m-1}=v_1, S_m=x) = \prod_{\ell=0}^{m-1} \frac{1}{\mathrm{deg}(v_{m-\ell})} = \prod_{\ell=1}^m \frac{1}{\mathrm{deg}(v_\ell)}.

Since D\subset \mathbb{Z}^d and x,y are in the interior of D, we must have \mathrm{deg}(x)=\mathrm{deg}(y), and so these two expressions are equal. Summing over all such two-way paths, and then all m gives the result.

Fixing one argument

We now focus on G^D(\cdot,y), where the second argument is fixed. This is the solution to the Poisson equation

\Delta G^D(\cdot,y) = -\delta_y(\cdot),\quad G^D(x,y)=0,\; \forall x\in \partial D.

To see this, can use a standard hitting probability argument (as here) with the Markov property. This is harmonic in D\backslash \{y\}, and since we know

G^D(y,y)= \frac{1}{\mathbb{P}_y(\text{RW hits }\partial D\text{ before returning to }y)},

this uniquely specifies G^D(\cdot,y). Anyway, since harmonic functions achieve their maxima at the boundary, we have G(y,y)\ge G(x,y) for all x\in D. We can also see this from the SRW definition as

G(x,y)=G(y,x) = \mathbb{P}_y (\tau_x < \tau_{\partial D} ) G(x,x) \le G(x,x).

Changing the domain

Now we want to consider nested domains D\subset E, and compare G^D(\cdot,\cdot) and G^E(\cdot,\cdot) on DxD. The idea is that for SRW started from x\in D, we have \tau_{\partial D}\le \tau_{\partial E}, since one boundary is contained within the other. From this, we get

G^D(x,y)\le G^E(x,y),\quad \forall x,y\in D,

and we will use the particular case y=x.

For example, if x\in V_N, the box with width N, then the box with width 2N centred on x contains the whole of V_N. So, if we set \bar {V}_{2N}:= [-N,N]^d, then with reference to the diagram, we have

G^{V_N}(x,x)\le G^{\bar{V}_{2N}}(0,0),\quad x\in V_N.

As we’ll see when we study the maximum of the DGFF on V_N, uniform control over the pointwise variance will be a useful tool.

Maximising the Green’s function

The idea of bounding G^{V_N}(x,x) by G^{\bar V_{2N}}(0,0) for any x\in V_N is clever and useful. But a more direct approach would be to find the value of x that maximises G^{V_N}(x,x). We would conjecture that when V_N has a central vertex, then this is the maximiser.

We can prove this directly from the definition of the Green’s function in terms of random walk occupation times. Let’s assume we are working with \bar{V}_N for even N, so that 0 is the central vertex. Again, since

G^D(x,x)=\frac{1}{\mathbb{P}_x(\text{RW hits }\partial D\text{ before returning to }x)}, (*)

it would suffice to show that this probability is minimised when x=0. This feels right, since 0 is furthest from the boundary. Other points are closer to the boundary in some directions but further in others, so we can’t condition on the maximum distance from its start point achieved by an excursion of SRW (we’re vertex-transitive, so these look the same from all starting points), as even allowing for the four possible rotations, for an excursion of diameter slightly larger than N, starting at the centre is maximally bad.

However, intuitively it does feel as if being closer to the boundary makes you more likely to escape earlier. In fact, with a bit more care, we can couple the SRW started from 0 and the SRW started from r=(r^x,r^y)\ne 0 such that the latter always exits first. For convenience we’ll assume also that r^x,r^y are both even.

I couldn’t find any reference to this, so I don’t know whether it’s well-known or not. The following argument involves projecting into each axis, and doing separate couplings for transitions in the x-direction and transitions in the y-direction. We assume WLOG that x is in the upper-right quadrant as shown. Then, let 0=S_0,S_1,S_2,\ldots be SRW started from 0, and we will construct r=R_0,R_1,R_2,\ldots on the same probability space as (S_m)_{m\ge 0} as follows. For every m, we set the increment R_{m+1}-R_m to be \pm(S_{m+1}-S_m). It remains to specify the sign, which will be determined by the direction of the S-increment, and a pair of stopping times. The marginal is therefore again an SRW, started from r. Temporarily, we use the unusual notation S_m= (S^x_m,S^y_m) for the coordinates of S_m.

So, if S_{m+1}-S_m=(1,0), (-1,0), ie S moves left or right, then we set

R_{m+1}-R_m = \begin{cases} -(S_{m+1}-S_m) &\quad \text{if }m<T^x\\ S_{m+1}-S_m&\quad \text{if }m>T^x.\end{cases} (*)

where T^x:= \min\{m\,:\, R^x_m=S^x_m\}. That is, R^x moves in the opposing direction to S^x until the first time when they are equal (hence the parity requirement), and then they move together. WLOG assume that r^x>0. Then suppose S^x_m=\pm N and such m is minimal. Then by construction, if m\ge T^x, then R^x_m=\pm N also. If m<T^x, then we must have S^x_m=-N, and so since R^x‘s trajectory is a mirror image of S^x‘s, in fact R^x_m = N+r^x>N, so R^x hit +N first. In both cases, we see that R^x hits \pm N at the same time or before S^x.

In other words, when S^x_m has non-negative x coordinate, the lazy random walk R^x follows the same trajectory as S^x, and when it has negative x coordinate, the R^x mirrors S^x. At some time, it may happen that S^x_m= R^x_m=0 (recall the parity condition on r). Call this time T^x. We then adjust the description of the coupling so that (*) is the mechanism for m<T^x, and then for m\ge T^x, we take S^x_m=R^x_m.

Similarly, if S_{m+1}-S_m =(0,1), (0,-1), ie S moves up or down, then we set

R_{m+1}-R_m = \begin{cases} -(S_{m+1}-S_m)&\quad \text{ if }m<T^y\\  S_{m+1}-S_m&\quad \text{if }m\le T^y,\end{cases}

with corresponding definition of the stopping time T^y.

This completes the coupling, and by considering T^x\wedge T^y, we have shown what that the exit time for the walk started from zero dominates the exit time for walk started from r. Recall that so far we are in the case where the box has even width and r=(r^x,r^y) has even coordinates.

This exit time comparison isn’t exactly what we need to compare G^N(0,0) and G^N(x,x). It’s worth remarking at this stage that if all we cared about was the Green’s function on the integer line [-N,N], we would have an easier argument, as by the harmonic property of G(\cdot,y)

G^{[-N,N]}(0,r)=\frac{N-r}{N}G^{[-N,N]}(0,0),

G^{[-N,N]}(r,0) = \frac{N}{N+r}G^{[-N,N]}(r,r),

and so G(0,0)>G(r,r) follows by symmetry. To lift from 1D to 2D directly, we need a bit more than this. It’s possible that S returns in both x- and y- coordinates more often than R, but never at the same time. Fortunately, the coupling we defined slightly earlier does give us a bit more control.

Let \tau^x(S), \tau^x(R) be the first times that S^x, R^x hit \pm N. Under this coupling, for any m\ge 0

\mathbb{P}(S^x_m=0, m<T^x) = \mathbb{P}(R^x_m=r^x, m<T^x)

since these events are literally equal. Since we showed that \tau^x(R)\le \tau^x(S) almost surely, we can further deduce

\mathbb{P}(S^x_m=0,m<T^x\wedge \tau^x(S)) \ge \mathbb{P}(S^x_m=0,m<T^x\wedge \tau^x(R))

=\mathbb{P}(R^x_m=r^x, m <T^x \wedge \tau^x(R)).

To address the corresponding events for which m\ge T^x, we apply the strong Markov property at T^x, to obtain SRW Z_m started from r/2, and let \tau_{-N},\tau_{+N} be the hitting times of -N,+N respectively and \tau_{\pm N}=\tau_{-N}\wedge \tau_{+N}. It will now suffice to prove that

\mathbb{P}(Z_m=0, m< \tau_{\pm N}) \ge \mathbb{P}(Z_m=r,m<\tau_{\pm N}), (**)

as then we can apply the law of total probability and sum over values of T^x and m\ge 0.

To prove this result, we consider the following bijection between trajectories of length m from r/2 to {0,r}. We decompose the trajectories into excursions away from r/2, and then a final meander from r/2 to {0,r} that stays on the same side of r/2. We construct the new trajectory by preserving all the initial excursions, but reversing all the steps of the final meander. So if the original trajectory ended up at 0, the image ends up at r. Trivially, the initial excursions in the image only hit \pm N if the excursions in the original trajectory did this too. But it’s also easy to see, by a similar argument to the coupling at the start of this section, that if the original trajectory ends at r and does not hit \pm N, then so does the image. However, the converse is not true. So we conclude (**), and thus

\mathbb{P}(S_m^x=0) \ge \mathbb{P}(R_m^x=0)

for all m by combining everything we have seen so far. And so we can now lift to a statement about S_m itself, that is considering both coordinates separately.

 

The remaining cases for r require a little more care over the definition of T^x, though the same projection argument works, for fundamentally the same reason. (Note that in the above argument, if S^x_m=-N and m<T^x, then in fact R^x_m\ge N+2, and so it’s not hard to convince yourself that a sensible adjustment to the stopping time will allow a corresponding result with R^x_m\ge N+1 in the odd r^x case.) The case for N odd is harder, since in one dimension there are two median sites, and it’s clear by symmetry that we can’t couple them such that RW from one always exits at least as early as RW from the other. However, the distributions of exit times started from these two sites are the same (by symmetry), and so although we can’t find a coupling, we can use similar stopping times to obtain a result in probability.

In the next post, we’ll see how to apply this uniform bound on G^{V_N}(x,x) to control the maximum of the DGFF on V_N. In particular, we address how the positive correlations of DGFF influence the behaviour of the maximum by comparison with independent Gaussians at each site.

Generating uniform trees

A long time ago, I wrote quite a few a things about uniform trees. That is, a uniform choice from the n^{n-2} unrooted trees with vertex set [n]. This enumeration, normally called Cayley’s formula, has several elegant arguments, including the classical Prufer bijection. But making a uniform choice from a large set is awkward, and so we seek more probabilistic methods to sample such a tree, which might also give insight into the structure of a ‘typical’ uniform tree.

In another historic post, I talked about the Aldous-Broder algorithm. Here’s a quick summary. We run a random walk on the complete graph K_n started from a uniformly-chosen vertex. Every time we arrive at a vertex we haven’t visited before, we record the edge just traversed. Eventually we have visited all n vertices, so have recorded n-1 edges. It’s easy enough to convince yourself that these n-1 edges form a tree (how could there be a cycle?) and a bit more complicated to decide that the distribution of this tree is uniform.

It’s worth noting that this algorithm works to construct a uniform spanning tree on any connected base graph.

This post is about a few alternative constructions and interpretations of the uniform random tree. The first construction uses a Galton-Watson process. We take a Galton-Watson process where the offspring distribution is Poisson(1), and condition that the total population size is n. The resulting random tree has a root but no labels, however if we assign labels in [n] uniformly at random, the resulting rooted tree has the uniform distribution among rooted trees on [n].

Proof

This is all about moving from ordered trees to non-ordered trees. That is, when setting up a Galton-Watson tree, we distinguish between the following two trees, drawn extremely roughly in Paint:

That is, it matters which of the first-generation vertices have three children. Anyway, for such a (rooted) ordered tree T with n vertices, the probability that the Galton-Watson process ends up equal to T is

\mathbb{P}(GW = T) = \prod_{v\in T} \frac{e^{-1}}{C(v)!} = e^{-n} \prod_{v\in T}\frac{1}{C(v)!},

where C(v) is the number of children of a vertex v\in T. Then, since \mathbb{P}( |GW|=n ) is a function of n, we find

\mathbb{P}(GW=T \,\big|\, |GW|=n) = f(n)\prod_{v\in T} \frac{1}{C(v)!},

where f(n) is a function of n alone (ie depends on T only through its size n).

But given an unordered rooted tree t, labelled by [n], there are \prod_{v \in t} C(v)! ordered trees associated to t in the natural way. Furthermore, if we take the Poisson Galton-Watson tree conditioned to have total population size n, and label uniformly at random with [n], we obtain any one of these ordered trees with probability \frac{f(n)}{n!} \prod_{v\in t} \frac{1}{C(v)!}. So the probability that we have t after we forget about the ordering is \frac{f(n)}{n!}, which is a function of n alone, and so the distribution is uniform among the set of rooted unordered trees labelled by [n], exactly as required.

Heuristic for Poisson offspring distribution

In this proof, the fact that \mathbb{P}(C(v)=k)\propto \frac{1}{k!} exactly balances the number of orderings of the k children explains why Poisson(1) works out. Indeed, you can see in the proof that Poisson(c) works equally well, though when c\ne 1, the event we are conditioning on (namely that the total population size is n) has probability decaying exponentially in n, whereas for c=1, the branching process is critical, and the probability decays polynomially.

We can provide independent motivation though, from the Aldous-Broder construction. Both the conditioned Galton-Watson construction and the A-B algorithm supply the tree with a root, so we’ll keep that, and look at the distribution of the degree of the root as constructed by A-B. Let \rho=v_1,v_2,v_3,\ldots be the vertices [n], ordered by their discovery during the construction. Then \rho is definitely connected by an edge to v_2, but thereafter it follows by an elementary check that the probability \rho is connected to v_m is \frac{1}{n-1}, independently across all m. In other words, the distribution of the degree of \rho in the tree as constructed by A-B is

1+ \mathrm{Bin}\left(n-2,\frac{1}{n-1}\right) \approx 1+\mathrm{Poisson}(1).

Now, in the Galton-Watson process, conditioning the tree to have fixed, large size changes the offspring distribution of the root. Conveniently though, in a limiting sense it’s the same change as conditioning the tree to have size at least n. Since these events are monotone in n, it’s possible to take a limit of the conditioning events, and interpret the result as the Galton-Watson tree conditioned to survive. It’s a beautiful result that this interpretation can be formalised as a local limit. The limiting spine decomposition consists of an infinite spine, where the offspring distribution is a size-biased version of the original offspring distribution (and so in particular, always has at least one child) and where non-spine vertices have the original distribution.

In particular, the number of the offspring of the root is size-biased, and it is well-known and not hard to check that size-biasing Poisson(c) gives 1+Poisson(c) ! So in fact we have, in an appropriate limiting sense in both objects, a match between the degree distribution of the root in the uniform tree, and in the conditioned Galton-Watson tree.

This isn’t supposed to justify why a conditioned Galton-Watson tree is relevant a priori (especially the unconditional independence of degrees), but it does explain why Poisson offspring distributions are relevant.

Construction via G(N,p) and the random cluster model

The main reason uniform trees were important to my thesis was their appearance in the Erdos-Renyi random graph G(N,p). The probability that vertices {1, …, n} form a tree component in G(N,p) with some particular structure is

p^{n-1} (1-p)^{\binom{n}{2}-(n-1)} \times (1-p)^{n(N-m)}.

Here, the first two terms give the probability that the graph structure on {1, …, n} is correct, and the the final term gives the probability of the (independent) event that these vertices are not connected to anything else in the graph. In particular, this has no dependence on the tree structure chosen on [n] (for example, whether it should be a path or a star – both examples of trees). So the conditional distribution is uniform among all trees.

If we work in some limiting regime, where pn\rightarrow 0 (for example if n is fixed and p=\frac{1}{N}\rightarrow 0), then we can get away asymptotically with less strong conditioning. Suppose we condition instead just that [n] form a component. Now, there are more ways to form a connected graph with one cycle on [n] than there are trees on [n], but the former all require an extra edge, and so the probability that a given one such tree-with-extra-edge appears as the restriction to [n] in G(N,p) is asymptotically negligible compared to the probability that the restriction to [n] of G(N,p) is a tree. Naturally, the local limit of components in G(N,c/N) is a Poisson(c) Galton-Watson branching process, and so this is all consistent with the original construction.

One slightly unsatisfying aspect to this construction is that we have to embed the tree of size [n] within a much larger graph on [N] to see uniform trees. We can’t choose a scaling p=p(n) such that G(n,p) itself concentrates on trees. To guarantee connectivity with high probability, we need to take p> \frac{\log n}{n}, but by this threshold, the graph has (many) cycles with high probability.

At this PIMS summer school in Vancouver, one of the courses is focusing on lattice spin models, including the random cluster model, which we now briefly define. We start with some underlying graph G. From a physical motivation, we might take G to be \mathbb{Z}^d or some finite subset of it, or a d-ary tree, or the complete graph K_N. As in classical bond percolation (note G(N,p) is bond percolation on K_N), a random subset of the edges of G are included, or declared open. The probability of a given configuration w, with e open edges is proportional to

p^e (1-p)^{|E(G)| - e} q^{k(w)}, (*)

where the edge-weight p\in(0,1) as usual, and cluster weight q\in (0,\infty), and k(w) counts the number of connected components in configuration w. When q=1, we recover classical bond percolation (including G(N,p) ), while for q>1, this cluster-reweighting favours having more components, and q<1 favours fewer components. Note that in the case q\ne 1, the normalising constant (or partition function) of (*) is generally intractable to calculate explicitly.

As in the Erdos-Renyi graph, consider fixing the underlying graph G, and taking p\rightarrow 0, but also taking \frac{q}{p}\rightarrow 0. So the resulting graph asymptotically ‘wants to have as few edges as possible, but really wants to have as few components as possible’. In particular, 1) all spanning trees of G are equally likely; 2) any configuration with more than one component has asymptotically negligible probability relative to any tree; 3) any graph with a cycle has #components + #edges greater than that of a tree, and so is asymptotically negligible probability relative to any tree.

In other words, the limit of the distribution is the uniform spanning tree of G, and so this (like Aldous-Broder) is a substantial generalisation, which constructs the uniform random tree in the special case where G=K_n.