EGMO 2019

Last week, we held our annual IMO training and selection camp in the lovely surroundings of Trinity College, Cambridge. Four of our students have subsequently spent this week in Kiev, for the ninth edition of the prestigious European Girls’ Mathematical Olympiad.

The UK team, none of whom had attended the competition before, and all of whom remain eligible to return at least once, did extremely well, placing fourth out of the official European countries, and earning three silver medals, and one gold medal, with complete or almost-complete solutions to all six problems between them. More details are available on the competition website.

In this post, I’m going to discuss some of the non-geometry problems. As a special treat, the discussion of Question 6 is led by Joe Benton, which is only fitting, since he wrote it. You can find the first day’s three problems here, and the second day’s problems here. The geometry problems are treated in a separate post.

Problem One

Triple equalities are hard to handle directly, so generally one starts with a single equality, for example the left hand one $a^2b+c = b^2c+a$, after noting that the setup is cyclic (but not symmetric) in the variables, under cycling $a\to b \to c\to a$.

Some quick notes:

• The given equations are inhomogeneous, meaning that not every term has the same degree in terms of {a,b,c}. In complicated equations, normally we want to cancel terms, and we certainly can’t cancel terms with different degrees! So a good first idea is to find a way to make the equations homogeneous, for example using the condition, which is itself inhomogeneous. For example, one could replace $c$ with $c(ab+bc+ca)$, and it’s clear this might lead to some cancellation. In this case, the algebra is more straightforward if one rearranges and factorises first to obtain $a(1-ab)=c(1-bc)$, from which the condition gives $a(bc+ca)=c(ab+ac)$.
• Shortly after this, we find $a^2c=ac^2$, which means $a=c$ unless at least one of these terms is equal to zero.
• This case distinction should not be brushed over lightly as a tiny detail. This is the sort of thing that can lose you significant marks, especially if the omitted case turns out to be more involved than the standard case.
• This is a good example of planning your final write-up. The case distinction $\{a,c\ne 0\}, \{a\text{ or }c=0\}$ is really clumsy. What about b? We have already said that the setup is cyclic in (a,b,c), and it’s annoying to throw this aspect away. It really would be much much to have the overall case distinction at the start of your argument, for example into $\{a,b,c \ne 0\}$ and $\{a\text{ or }b\text{ or }c=0\}$, because then in the latter case you could assume that $a=0$ without loss of generality, then see what that implied for the other variables.
• In this setting, one really should check that the claimed solutions satisfy both the condition and the triple-equality. This is a complicated family of simultaneous equations, and even if you’re lucky enough to have settled on an argument that is reversible, it’s much easier to demonstrate that everything is satisfied as desired than actually to argue that the argument is reversible.

Problem Two

Maybe I speak for a number of people when I say I’m a little tired of domino tiling problems, so will leave this one out.

Problem Five

I liked this problem. A lot. It did feel like there were potential time-wasting bear traps one might slip into, and perhaps these comments are relevant:

• This feels like quite a natural problem to ask. In particular, the task specified by (A)+(B) is much more elegant than the RHS of (C). So unless you have immediate insight for the RHS of (C), it makes sense to start with the task, and aim for any bound similar to (C).
• There are a lot of transformations one could make, for example, repeatedly removing n from one of the $a_n$s, or re-ordering them conveniently somehow, which shouldn’t affect any solution. You can also attack the RHS of (C) in a number of ways (perhaps with a case split for n odd or even?). As with involved but easy angle chases discussed in the companion post, it’s probably better to hold this in mind than instantly to do all of these, since it will just obscure the argument if you end up not using the result of the simplification!
• One should always try small examples. Here, it’s worth trying examples large enough to get a sense of what’s happening, because I think the crucial observation is that (unless you were very very unlucky) there are lots of suitable sequences $B=(b_i)$. (*)
• Certainly the case where the $(a_i)$s themselves form a complete n-residue system is ‘easiest’ and certainly straightforward. One might say that the case where the $(a_i)$ are equal (or at least congruent) is ‘hardest’ (in that $(b_i-a_i)$ might have to take the largest values in some sense), but is also straightforward. There’s certainly the option of trying to rigorise this, but I don’t know whether this can be made to work.

Lecture 8 – Bounds in the critical window

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

Preliminary – positive correlation, Harris inequality

I wrote about independence, association, and the FKG property a long time ago, while I was still an undergraduate taking a first course on Percolation in Cambridge. That post is here. In the lecture I discussed the special case of the FKG inequality applied in the setting of product measure setting, of which the Erdos-Renyi random graph is an example, and which is sometimes referred to as the Harris inequality.

Given two increasing events A and B, say for graphs on [n], then if $\mathbb{P}$ is product measure on the edge set, we have

$\mathbb{P}(A\cap B)\ge \mathbb{P}(A)\mathbb{P}(B).$

Intuitively, since both A and B are ‘positively-correlated’ with the not-rigorous notion of ‘having more edges’, then are genuinely positively-correlated with each other. We will use this later in the post, in the form $\mathbb{E}[X|A]\ge \mathbb{E}[X]$, whenever X is an increasing RV and A is an increasing event.

The critical window

During the course, we’ve discussed separately the key qualitative features of the random graph $G(n,\frac{\lambda}{n})$ in the

• subcritical regime when $\lambda<1$, for which we showed that all the components are small, in the sense that $\frac{1}{n}|L_1| \stackrel{\mathbb{P}}\rightarrow 0$, although the same argument would also give $|L_1|\le K\log n$ with high probability if we used stronger Chernoff bounds;
• supercritical regime when $\lambda>1$, for which there is a unique giant component , ie that $\frac{1}{n}|L_1|\stackrel{\mathbb{P}}\rightarrow \zeta_\lambda>0$, the survival probability of a Galton-Watson branching process with Poisson($\lambda$) offspring distribution. Arguing for example by a duality argument shows that with high probability all other components are small in the same sense as in the subcritical regime.

In between, of course we should study $G(n,\frac{1}{n})$, for which it was known that $L_1\stackrel{d}\sim n^{2/3},\, L_2\stackrel{d}\sim n^{2/3},\ldots$. (*) That is, the largest components are on the scale $n^{2/3}$, and there are lots of such critical components.

In the early work on random graphs, the story ended roughly there. But in the 80s, these questions were revived, and considerable work by Bollobas and Luczak, among many others, started investigating the critical setting in more detail. In particular, between the subcritical and the supercritical regimes, the ratio $\frac{|L_2|}{|L_1|}$ between the sizes of the largest and second-largest components goes from ‘concentrated on 1’ to ‘concentrated on 0’. So it is reasonable to ask what finer scaling of the edge probability $p(n)$ around $\frac{1}{n}$ should be chosen to see this transition happen.

Critical window

In this lecture, we studied the critical window, describing sequences of probabilities of the form

$p(n)=\frac{1+\lambda n^{-1/3}}{n},$

where $\lambda\in(-\infty,+\infty)$. (Obviously, this is a different use of $\lambda$ to previous lectures.)

It turns out that as we move $\lambda$ from $-\infty$ to $+\infty$, this window gives exactly the right scaling to see the transition of $\frac{|L_2|}{|L_1|}$ described above. Work by Bollobas and Luczak and many co-authors and others in the 80s establish a large number of results in this window, but for the purposes of this course, this can be summarised as saying that the critical window has the same scaling behaviour as $p(n)=1/n$, with a large number of components on the scale $\sim n^{2/3}$ (see (*) earlier), but different scaling limits.

Note: Earlier in the course, we have discussed local limits, in particular for $G(n,\lambda/n)$, where the local limit is a Galton-Watson branching process tree with offspring distribution $\mathrm{Poisson}(\lambda)$. Such local properties are not sufficient to distinguish between different probabilities within the critical window. Although there are lots of critical components, it remains the case that asymptotically almost all vertices are in ‘small components’.

The precise form of the scaling limit for

$\frac{1}{n^{2/3}} \left( |L_1|, |L_2|, |L_3|,\ldots \right)$

as $n\rightarrow\infty$ was shown by Aldous in 1997, by lifting a scaling limit result for the exploration process, which was discussed in this previous lecture and this one too. Since Brownian motion lies outside the assumed background for this course, we can’t discuss that, so this lecture establishes upper bounds on the correct scale of $|L_1|$ in the critical window. Continue reading

BMO2 2019

The second round of the British Mathematical Olympiad was taken on Thursday by the 100 or so top scoring eligible participants from the first round, as well as some open entries. Qualifying for BMO2 is worth celebrating in its own right. The goal of the setters is to find the sweet spot of difficult but stimulating for the eligible participants, which ultimately means it’s likely to be the most challenging exam many of the candidates sit while in high school, at least in mathematics.

I know that lots of students view BMO2 as something actively worth preparing for. As with everything, this is a good attitude in moderation. Part of the goal in writing about the questions at such length is because I think at this level it’s particularly easy to devote more time than needed to preparation, and use it poorly. This year time is tight at the end of semester, and so what follows is closer to a set of complete solutions than usual, for which apologies, although I hope it is still possible to get a sense of how one might have come across the solutions yourself. Of course, this means that what follows will certainly spoil the problems for anyone who hasn’t tried them by themselves already.

The copyright for the problems is held by BMOS, and reproduced here with permission.

Question 1

As if often the case in geometry questions, what you’ve been asked to prove here isn’t the most natural property of the configuration. A good first step would be to see if there are stronger statements which are true.You are asked to show that triangle BPE is isosceles, but you aren’t told which of the three vertices is the apex. In fact, the task is to show that $BP=EP$ or, alternatively, $\angle BEP=\angle PBE$. It’s not in general true that BE is equal to BP=EP. Unless you’re very unlucky, you can establish this from one diagram.

Now, you don’t immediately know whether it’s going to be easier to show that two lengths are equal, or that two angles are equal. However, you know that P lies on the perpendicular bisector of BC, hence BP=CP, which is a big clue. In particular, this means that P would be the centre of the circle through BCE. This clearly implies the given result, so deciding to prove this instead is a good strategy.

There are now a number of ways to prove this. Note that D lies on the altitude from A, and the feet of the perpendiculars from D to sides AB and AC are both present in the configuration so (just as for the orthocentre diagram) we can calculate most of the angles involving {A,B,C,D,E}.

For example, ABDE is cyclic, so $\angle BED=\angle BAD = 90-\hat{B}$, hence $\angle AEB=\hat{B},\,\angle EBA=\hat{C}$. This shows that AB is tangent to the circumcircle of BCE. But then the line L is a radius of this circle, and so its centre must be P, the unique point on L which is equidistant from B and C.

Alternatively, we could directly calculate $\angle BEC=180-\hat{B}$ and $\angle CBP=90-\hat{B}$. But BPC is isosceles so $\angle BPC=2\hat{B}$. In general, the converse of ‘angle at centre is twice angle at circumference’ does not hold, but when we know P is equidistant from B and C this does hold, and so the angle relations precisely confirm that P is the centre of the circle through BPE.

My intention had been that the triangle would be acute-angled, to reduce the number of diagram options based on the magnitude of $\hat{B}$. If pursuing this second approach, one would need to be careful to account for whether P is on the same side or the opposite side of BC to E. That said, unless you do something very exotic, it should be exactly the same argument or calculation, and such a case distinction probably isn’t very important.

Question 2

First, a short remark. As stated, if n=5, a piece could move 3 squares to the left then 4 squares up, by Pythagoras. Handling all such options is likely to be quite annoying, since some values of n can be written in this Pythagorean form, and others cannot. This brings us to some good general principles for olympiad problems which look like this one:

• A construction, when one exists, will probably be possible using simple versions of the allowed moves / structures.
• An argument why a construction is impossible should probably be based on ideas which treat the simple moves similarly to the more complicated moves.

The setup of the problem encourages you to think about dividing the board into $n^2$ sub-boards, each with dimensions $n\times n$. Continue reading

Lecture 10 – the configuration model

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

As we enter the final stages of the semester, I want to discuss some extensions to the standard Erdos-Renyi random graph which has been the focus of most of the course so far. Although we will not get far into the details during this course, the overall goal is to develop models which are close to Erdos-Renyi in terms of ease of analysis, while also allowing more of the features characteristic of networks observed in the real world.

One of the more obvious deficiencies of the sparse regime of Erdos-Renyi random graphs for modelling ‘real-world phenomena’ concerns the degree sequence. Indeed, the empirical degree distribution of G(n,c/n) converges to Poisson(c). By contrast, in real-world networks, a much wider range of degrees is typically observed, and in many cases it is felt that these should follow a power law, with a small number of a very highly connected agents.

One way around this problem to construct random graphs where we insist that the graph has a given sequence of degrees. The configuration model, which is the subject of this lecture and this post (and about which I’ve written before), offers one way to achieve this.

Definition and notes

Let $n\ge 1$ and let $d=(d_1,d_2,\ldots,d_n)$ be a sequence of non-negative integers such that $\sum_{i=1}^n d_i$ is even. Then the configuration model with degree sequence d is a random multigraph with vertex set [n], constructed as follows:

• To each vertex $i\in[n]$, assign $d_i$ half-edges;
• Then, take a uniform matching of these half-edges;
• Finally, for each pair of half-edges in the matching, replace the two half-edges with a genuine edge, to obtain the multigraph $CM_n(d)$, in which, by construction, vertex i has degree $d_i$.

One should note immediately that although the matching is uniform, the multigraph is not uniform amongst multigraphs with that degree sequence. Note also that the condition on the sums of the degrees is necessary for any graph, and in this context means that the number of half-edges is even, without which it would not be possible to construct a matching.

This effect is manifest in the simplest possible example, when n=2 and d=(3,3). There are two possible graphs, up to isomorphism, which are shown below:

For obvious reasons, we might refer to these as the handcuffs and the theta , respectively. It’s helpful if we, temporarily, assume the half-edges are distinguishable at the moment we join them up in the configuration model construction. Because then there are 3×3=9 ways to join them up to form the handcuffs (think of which half-edge ends up forming the edge between the two vertices) while there are 3!=6 ways to pair up the half-edges in the theta.

In general, for multigraphs H with the correct degree sequence, we have

$\mathbb{P}( CM_n(d)\simeq H) \propto \left( 2^{\# \text{loops}(H)} \prod_{e\in E(H)} \text{mult}(e)! \right),$

where $\text{mult}(e)$ is the multiplicity with which a given edge e appears in H.

Note: it might seem counterintuitive that this procedure is biased against multiple edges and self-loops, but it is really just saying that there are more ways to form two distinct edges than to form two equal edges (ie a multiedge pair) when we view the half-edges as distinguishable. (See this post for further discussion of this aspect in the 3-regular setting.)

However, a consequence of this result is that if we condition on the event that $CM_n(d)$ is simple, then the resulting random graph is uniform on the set of simple graphs satisfying the degree property. Note that the same example as above shows that there’s no guarantee that there exists a simple graph whose degrees are some given sequence.

d-regular configuration model

In general, from a modelling point of view, we are particularly interested in simple, connected graphs, and so it is valuable to study whether the large examples of the configuration model are likely to have these properties. In this lecture, I will mainly focus on the case where the multigraphs are d-regular, meaning that all the vertices have degree equal to d. For the purposes of this lecture, we denote by $G^d(n)$, the d-regular configuration model $CM_n(d,\ldots,d)$.

• d=1: to satisfy the parity condition on the sums of degrees, we must have n even. But then $G^1(n)$ will consist of n/2 disjoint edges.
• d=2: $G^2(n)$ will consist of some number of disjoint cycles, and it is a straightforward calculation to check that when n is large, with high probability the graph will be disconnected.

In particular, I will focus on the case when d=3, which is the first interesting case. Most of the results we prove here can be generalised (under various conditions) to more general examples of the configuration model. The main goal of the lecture is revision of some techniques of the course, plus one new one, in a fresh setting, and the strongest possible versions of many of these results can be found amongst the references listed at the end.

Connectedness

In the lecture, we showed that $G^3(2n)$ is connected with high probability. This is, in fact, a very weak result, since in fact $G^d(n)$ is d-connected with high probability for $d\ge 3$ [Bol81, Wor81]. Here, d-connected means that one must remove at least d vertices in order to disconnect the graph, or, equivalently, that there are d disjoint paths between any pair of vertices. Furthermore, Bollobas shows that for $d\ge 3$, $G^d(n)$ is a (random) expander family [Bol88].

Anyway, for the purposes of this course, the main tool is direct enumeration. The matching number $M_{2k}$ satisfies

$M_{2k}=(2k-1)\times (2k-3)\times\ldots\times 3\times 1 = \frac{(2k)!}{2^k \cdot k!},$

and so Stirling’s approximation gives the asymptotics

$M_{2k} = (\sqrt{2}+o(1)) \left(\frac{2}{e}\right)^k k^k,$

although it will be useful to use the true bounds

$c \left(\frac{2}{e}\right)^k k^k \le M_{2k}\le C\left(\frac{2}{e}\right)^k k^k,\quad \forall k,$

instead in some places. Anyway, in $G^3(2n)$, there are 6n half-edges in total, and so the probability that the graph may be split into two parts consisting of $2\ell,2m$ vertices, with $2\ell+2m=2n$, and with no edges between the classes is $\frac{\binom{2n}{2\ell} M_{6\ell}M_{6m}}{M_{6n}}.$ Continue reading

Lecture 9 – Inhomogeneous random graphs

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

As we enter the final stages of the semester, I want to discuss some extensions to the standard Erdos-Renyi random graph which has been the focus of most of the course so far. In doing so, we can revisit material that we have already covered, and discover how easily one can extend this directly to more exotic settings.

The focus of this lecture was the model of inhomogeneous random graphs (IRGs) introduced by Soderberg [Sod02] and first studied rigorously by Bollobas, Janson and Riordan [BJR07]. Soderberg and this blog post address the case where vertices have a type drawn from a finite set. BJR address the setting with more general typespaces, in particular a continuum of types. This generalisation is essential if one wants to use IRGs to model effects more sophisticated than those of the classical Erdos-Renyi model G(n,c/n), but most of the methodology is present in the finite-type setting, and avoids the operator theory language which is perhaps intimidating for a first-time reader.

Inhomogeneous random graphs

Throughout, $k\ge 2$ is fixed. A graph with k types is a graph G=(V,E) together with a type function $V\to \{1,\ldots,k\}$. We will refer to a $k\times k$ symmetric matrix with non-negative entries as a kernel.

Given $n\in\mathbb{N}$ and a vector $p=(p_1,\ldots,p_k)\in\mathbb{N}_0^k$ satisfying $\sum p_i=n$, and $\kappa$ a kernel, we define the inhomogeneous random graph $G^n(p,\kappa)$ with k types as:

• the vertex set is [n],
• types are assigned uniformly at random to the vertices such that exactly $p_i$ vertices have type i.
• Conditional on these types, each edge $v\leftrightarrow w$ (for $v\ne w\in [n]$) is present, independently, with probability

$1 - \exp\left(-\frac{\kappa_{\mathrm{type}(v),\mathrm{type}(w)} }{n} \right).$

Notes on the definition:

• Alternatively, we could assign the types so that vertices $\{1,\ldots,p_1\}$ have type 1, $\{p_1+1,\ldots,p_1+p_2\}$ have type 2, etc etc. This makes no difference except in terms of the notation we have to use if we want to use exchangeability arguments later.
• An alternative model considers some distribution $\pi$ on [k], and assigns the types of the vertices of [n] in an IID fashion according to $\pi$. Essentially all the same results hold for these two models. (For example, this model with ‘random types’ can be studied by quenching the number of each type!) Often one works with whichever model seems easier for a given proof.
• Note that the edge probability given is $\approx \frac{\kappa_{\mathrm{type}(v),\mathrm{type}(w)}}{n}$. The exponential form has a more natural interpretation if we ever need to turn the IRGs into a process. Additionally, it avoids the requirement to treat small values of n (for which, a priori, $k/n$ might be greater than 1) separately.

In the above example, one can see that, roughly speaking, red vertices are more likely to be connected to each other than blue vertices. However, for both colours, they are more likely to be connected to a given vertex of the same colour than a vertex of the opposite colour. This might, for example, correspond to the kernel $\begin{pmatrix}3&1\\1&2\end{pmatrix}$.

The definition given above corresponds to a sparse setting, where the typical vertex degrees are $\Theta(1)$. Obviously, one can set up an inhomogeneous random graph in a dense regime by an identical argument.

From an applications point of view, it’s not hard to imagine that an IRG of some flavour might be a good model for many phenomena observed in reality, especially when a mean-field assumption is somewhat appropriate. The friendships of boys and girls in primary school seems a particularly resonant example, though doubtless there are many others.

One particular application is to recover the types of the vertices from the topology of the graph. That is, if you see the above picture without the colours, can you work out which vertices are red, and which are blue? (Assuming you know the kernel.) This is clearly impossible to do with anything like certainty in the sparse setting – how does one decide about isolated vertices, for example? The probabilities that a red vertex is isolated and that a blue vertex is isolated differ by a constant factor in the $n\rightarrow\infty$ limit. But in the dense setting, one can achieve this with high confidence. When studying such statistical questions, these IRGs are often referred to as stochastic block models, and the recent survey of Abbe [Abbe] gives a very rich history of this type of problem in this setting.

Poisson multitype branching processes

As in the case of the classical random graph G(n,c/n), we learn a lot about the IRG by studying its local structure. Let’s assume from now on that we are given a sequence of IRGs $G^n(p^n,\kappa)$ for which $\frac{p^n}{n}\rightarrow \pi$, where $\pi=(\pi_1,\ldots,\pi_k)\in[0,1]^k$ satisfies $||\pi||_1=1$.

Now, let $v^n$ be a uniformly-chosen vertex in [n]. Clearly $\mathrm{type}(v^n)\stackrel{d}\rightarrow \pi$, with the immediate mild notation abuse of viewing $\pi$ as a probability distribution on [k].

Then, conditional on $\mathrm{type}(v^n)=i$:

• when $j\ne i$, the number of type j neighbours of $v^n$ is distributed as $\mathrm{Bin}\left(p_j,1-\exp\left(-\frac{\kappa_{i,j}}{n}\right)\right)$.
• the number of type i neighbours of $v^n$ is distributed as $\mathrm{Bin}\left( p_i-1,1-\exp\left(-\frac{\kappa_{i,i}}{n}\right)\right)$.

Note that $p_j\left[1-\exp\left(-\frac{\kappa_{i,j}}{n}\right)\right]\approx \frac{p_j\cdot \kappa_{i,j}}{n} \approx \kappa_{i,j}\pi_j$, and similarly in the case j=i, so in both cases, the number of neighbours of type j is distributed approximately as $\mathrm{Poisson}(\kappa_{i,j}\pi_j)$.

This motivates the following definition of a branching process tree, whose vertices have k types. Continue reading

Lecture 7 – The giant component

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

As we edge into the second half of the course, we are now in a position to return to the question of the phase transition between the subcritical regime $\lambda<1$ and the supercritical regime $\lambda>1$ concerning the size of the largest component $L_1(G(n,\lambda/n))$.

In Lecture 3, we used the exploration process to give upper bounds on the size of this largest component in the subcritical regime. In particular, we showed that

$\frac{1}{n}\big| L_1(G(n,\lambda/n)) \big| \stackrel{\mathbb{P}}\rightarrow 0.$

If we used slightly stronger random walk concentration estimates (Chernoff bounds rather than 2nd-moment bounds from Chebyshev’s inequality), we could in fact have shown that with high probability the size of this largest component was at most some logarithmic function of n.

In this lecture, we turn to the supercritical regime. In the previous lecture, we defined various forms of weak local limit, and asserted (without attempting the notationally-involved combinatorial calculation) that the random graph $G(n,\lambda/n)$ converges locally weakly in probability to the Galton-Watson tree with $\text{Poisson}(\lambda)$ offspring distribution, as we’ve used informally earlier in the course.

Of course, when $\lambda>1$, this branching process has strictly positive survival probability $\zeta_\lambda>0$. At a heuristic level, we imagine that all vertices whose local neighbourhood is ‘infinite’ are in fact part of the same giant component, which should occupy $(\zeta_\lambda+o_{\mathbb{P}}(1))n$ vertices. In its most basic form, the result is

$\frac{1}{n}\big|L_1(G(n,\lambda/n))\big|\;\stackrel{\mathbb{P}}\longrightarrow\; \zeta_\lambda,\quad \frac{1}{n}\big|L_2(G(n,\lambda/n))\big| \;\stackrel{\mathbb{P}}\longrightarrow\; 0,$ (*)

where the second part is a uniqueness result for the giant component.

The usual heuristic for proving this result is that all ‘large’ components must in fact be joined. For example, if there are two giant components, with sizes $\approx \alpha n,\approx \beta n$, then each time we add a new edge (such an argument is often called ‘sprinkling‘), the probability that these two components are joined is $\approx 2ab$, and so if we add lots of edges (which happens as we move from edge probability $\lambda-\epsilon\mapsto \lambda$ ) then with high probability these two components get joined.

It is hard to make this argument rigorous, and the normal approach is to show that with high probability there are no components with sizes within a certain intermediate range (say between $\Theta(\log n)$ and $n^\alpha$) and then show that all larger components are the same by a joint exploration process or a technical sprinkling argument. Cf the books of Bollobas and of Janson, Luczak, Rucinski. See also this blog post (and the next page) for a readable online version of this argument.

I can’t find any version of the following argument, which takes the weak local convergence as an assumption, in the literature, but seems appropriate to this course. It is worth noting that, as we shall see, the method is not hugely robust to adjustments in case one is, for example, seeking stronger estimates on the giant component (eg a CLT).

Anyway, we proceed in three steps:

Step 1: First we show, using the local limit, that for any $\epsilon>0$,

$\frac{1}{n}\big|L_1(G(n,\lambda/n))\big| \le \zeta_\lambda+\epsilon,$ with high probability as $n\rightarrow\infty$.

Step 2: Using a lower bound on the exploration process, for $\epsilon>0$ small enough

$\frac{1}{n}\big|L_1(G(n,\lambda/n))\big| \ge \epsilon,$ with high probability.

Step 3: Motivated by duality, we count isolated vertices to show

$\mathbb{P}(\epsilon n\le |L_1| \le (\zeta_\lambda-\epsilon)n) \rightarrow 0.$

Step 1

This step is unsurprising. The local limit gives control on how many vertices are in small components of various sizes, and so gives control on how many vertices are in small components of all finite sizes (taking limits in the right order). This gives a bound on how many vertices can be in the giant component. Continue reading

Lecture 6 – Local limits

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

By this point of the course, we’ve studied several aspects of the Erdos-Renyi random graph, especially in the sparse setting $G(n,\frac{\lambda}{n})$. We’ve also taken a lengthy detour to revise Galton-Watson trees, with a particular focus on the case of Poisson offspring distribution.

This is deliberate. Note that a given vertex v of $G(n,\frac{\lambda}{n})$ has some number of neighbours distributed as $\mathrm{Bin}(n-1,\frac{\lambda}{n})\stackrel{d}\approx\mathrm{Po}(\lambda)$, and the same approximation remains valid as we explore the graph (for example in a breadth-first fashion) either until we have seen a large number of vertices, or unless some ultra-pathological event happens, such as a vertex having degree n/3.

In any case, we are motivated by the notion that the local structure of $G(n,\frac{\lambda}{n})$ is well-approximated by the Galton-Watson tree with $\mathrm{Po}(\lambda)$ offspring, and in this lecture and the next we try to make this notion precise, and discuss some consequences when we can show that this form of convergence occurs.

Deterministic graphs

Throughout, we will be interested in rooted graphs, since by definition we have to choose a root vertex whose local neighbourhood is to be studied. Usually, we will study a sequence of rooted graphs $(G_n,\rho_n)$, where the vertex set of $G_n$ is [n], or certainly increasing in n (as in the first example).

For some rooted graph $(G,\rho)$, we say such a sequence $(G_n,\rho_n)$ converges to $(G,\rho)$ locally if for all radii $r\ge 1$, we have $B_r^{G_n}(\rho_n)\simeq B_r^G(\rho)$. In words, the neighbourhood around $\rho_n$ in $G_n$ is the same up to radius r as the neighbourhood around $\rho$ in $G$, so long as n is large enough (for given r).

This is best illustrated by an example, such as $T_n$, the binary tree to depth n.

If we take $\rho_n$ to be the usual root, then the trees are nested, and converge locally to the infinite binary tree $T_\infty$. Slightly less obviously, if we take $\rho_n$ to be one of the leaves, then the trees are still nested (up to labelling – ie in the sense of isomorphisms of rooted trees), and converge locally to the canopy tree, defined by a copy of $\mathbb{Z}_{\ge 0}$ with nearest-neighbour edges, and where each vertex $n\ge 1$ is connected to the root of a disjoint copy of $T_{n-1}$, as shown below:

Things get more interesting when the root is chosen randomly, for example, uniformly at random, as this encodes more global information about the graphs $G_n$. In the case where the $G_n$ are vertex-transitive, then if we only care about rooted graphs up to isomorphism, then it doesn’t matter how we choose the root.

Otherwise, we say that $G_n$ converges in the local weak sense to $(G,\rho)$ if, for all $r\ge 1$ and for all rooted graphs $(H,\rho_H)$,

$\mathbb{P}\left( B^{G_n}_r(\rho_n)\simeq (H,\rho_H) \right) \longrightarrow \mathbb{P}\left( B_r^G(\rho)\simeq H\right),$

as $n\rightarrow\infty$.

Alternatively, one can phrase this as a result about convergence of rooted-graph-valued distributions.

A simple non-transitive example is $G_n\simeq P_n$, the path of length n. Then, the r-neighbourhood of a vertex is isomorphic to $P_{2r}$unless that vertex is within graph-distance (r-1) of one of the leaves of $G_n$. As $n\rightarrow\infty$, the proportion of such vertices vanishes, and so, $\mathbb{P}\left( B^{P_n}_r(\rho_n)\simeq P_{2r}\right)\rightarrow 1$, from which we conclude the unsurprising result that $P_{n}$ converges in the local weak sense to $\mathbb{Z}$. (Which is vertex-transitive, so it doesn’t matter where we select the root.)

The binary trees offer a slightly richer perspective. Let $\mathcal{L}_n$ be the set of leaves of $T_n$, and we claim that when $\rho_n$ is chosen uniformly from the vertices of $T_n$, then $d_{T_n}(\rho_n,\mathcal{L}_n)$ converges in distribution. Indeed, $\mathbb{P}\left( d_{T_n}(\rho_n,\mathcal{L}_n)=k\right) = \frac{2^{n-k}}{2^{n+1}-1}$, whenever $n\ge k$, and so the given distance converges in distribution to the Geometric distribution with parameter 1/2 supported on {0,1,2,…}.

This induces a random local weak limit, namely the canopy tree, rooted at one of the vertices we denoted by $\mathbb{Z}_{\ge 0}$, with the choice of this vertex given by Geometric(1/2). Continue reading

Lecture 4 – Hitting time theorem

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

This lecture consisted of revision of the most relevant theory of Galton-Watson trees, with a focus on the case where the offspring distribution is Poisson, since, as we have seen in previous lectures, this is a strong candidate to approximate the structure of $G(n,\lambda/n)$. It makes sense to cover the theory of the trees before attempting to make rigorous the sense of approximation.

Given a Galton-Watson tree T, it is natural to label the vertices in a breadth-first order as $\varnothing=v_1,v_2,\ldots,v_{|T|}$. This is easiest if we have constructed the Galton-Watson tree as a subset of the infinite Ulam-Harris tree, where vertices have labels like (3,5,17,4), whose parent is (3,5,17). If this child vertex is part of the tree, then so are (3,5,17,1), (3,5,17,2), and (3,5,17,3). This means our breadth-first order is canonically well-defined, as we have a natural ordering of the children of each parent vertex.

Note: one advantage of using breadth-first order rather than depth-first order (which corresponds to the usual dictionary, or lexicographic ordering of the labels) is that if the tree is infinite, we don’t explore all of it during a depth-first search. (In the sense that there exist vertices which are never given a finite label.) For breadth-first search, a similar problem arises precisely when some vertex has infinitely many children. For a conventional Galton-Watson tree, the latter situation is much less of a problem than the infinite total population problem, which happens with positive probability whenever $\mu=\mathbb{E}[X]>1$.

Anyway, given the depth-first order, one can consider an exploration process $S_0,S_1,S_2,\ldots,S_{|T|}$ given by

$S_0=1,\quad S_i=S_{i-1}+(X_i-1),$

where $X_i$ is the number of children of $v_i$. In this way, we see that

$S_i=\big| \Gamma(v_1)\cup\ldots\cup\Gamma(v_i)\backslash \{v_1,\ldots,v_i\}\big|,\quad i\ge 1,$

records the number of vertices in some stack containing those which we have ‘seen but not explored’. Some authors prefer to start from 0, in which case one ends up with a similar but slightly different interpretation of the ‘stack’, but that’s fine since we aren’t going to define formally what ‘seen’ and ‘explored’ means in this post.

Essentially, we exhaust the vertices of the tree whenever $S_t=0$, and so the condition that $|T|=n$ requires

$S_n=0,\quad S_m\ge 1,\; m=0,1,\ldots,n-1.$

Conveniently, so long as we have avoiding ordering ambiguity, for example by insisting that trees live within the Ulam-Harris tree, we can reconstruct T uniquely from $(S_0,S_1,\ldots,S_{|T|})$.

Furthermore, if T is a Galton-Watson process, then the numbers of children $X_i$ are IID, and so in fact this exploration process is a random walk, and the size of the tree can be recovered as the hitting time of zero.

Note: making fully rigorous the argument that children in the GW tree are independent of the breadth-first walk fully rigorous is somewhat technical, and not to be dismissed lightly, though not of principle interest at the level of this topics course. See Proposition 1.5 in Section 1.2 of Le Gall’s notes or Section 1.2.2 of my doctoral thesis for further discussion and argument.

The hitting time theorem allows us to study the distribution of the hitting time of a random walk whose increments are bounded below by -1, in terms of the distribution of the value of the random walk.

Theorem: Let $(S_n,\, n\ge 0)$ be a random walk with $S_0=0$ and IID increments $(X_n,n\ge 1)$ satisfying $\mathbb{P}(X_n\ge -1)=1$. Let $H_{-k}=\inf \left\{n\,:\, S_n=-k\right\}$ be the hitting time of -k.

Then $\mathbb{P}\big( H_{-k}=n\big) = \frac{k}{n}\mathbb{P}\big(S_n=-k)$.

Commentary: there are local central limit theorem estimates and large deviation estimates that allow good control of the probability on the RHS for a rich class of contexts. So at a meta-level, the hitting time theorem allows us to reduce a complicated (though still classical) problem, to a real classical problem, which is particularly helpful when the LHS is a device for capturing relevant information about our random tree model.

BMO1 2018

The first round of the British Mathematical Olympiad was sat yesterday. The paper can be found here, and video solutions here. Copyright for the questions is held by BMOS. They are reproduced here with permission.

I hope any students who sat the paper enjoyed at least some of the questions, and found it challenging! The following commentaries on the problems are not official solutions, and are not always full solutions at all, but contain significant steps of solutions, so would be best saved until after you have attempted the problems, if you are planning to do so. I’ve written quite a lot about Q5 because I found it hard (or at least time-consuming) and somewhat atypical, and I’ve written a lot about Q6 because there was a lot to say. I hope at least some of this is interesting to some readers of all levels of olympiad experience.

Question 1

A list of five two-digit positive integers is written in increasing order on a blackboard. Each of the five integers is a multiple of 3, and each digit {0,1,…,9} appears exactly once on the blackboard. In how many ways can this be done? (Note that a two-digit number cannot begin with zero.)

It’s a trope of BMO1 that the first question must be doable by some sort of exhaustive calculation or listing exercise. Of course, that is rarely the most efficient solution.

However, there is normally a trade-off between eliminating all listing, and reducing to a manageable task.

The key observation here is that writing the integers in increasing order is really just a way to indicate that order of the choices doesn’t matter. Even if that seems counter-intuitive. The question wants to know how many ways to choose these five numbers. The order of choice doesn’t matter since we’re going to put them in ascending order on the blackboard anyway.

You want to make your choices with as much independence as possible. So it would, for example, be a bad idea to choose the smallest number first. How many possibilities are there where the smallest number is 24? What about 42? What about 69? These are all different, and some are zero, so will make the computation very taxing.

However, you might notice that the digits {0,3,6,9} have to go together to form two numbers, and the rest have to pair up with one digit from {1,4,7} and one from {2,5,8}. You might know that an integer is divisible by 3 precisely if its digit sum is divisible by 3, but in this context you wouldn’t lose too much time by simply listing everything! These tasks are now completely separate, so you can take the number of ways to pair up {0,3,6,9} and multiply by the number of ways to pair up {1,4,7} and {2,5,8}. You need to take care over the ordering. It does (obviously) matter which is the first digit and which is the second digit in a number!

Lecture 3 – Couplings, comparing distributions

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

In this third lecture, we made our first foray into the scaling regime for G(n,p) which will be the main focus of the course, namely the sparse regime when $p=\frac{\lambda}{n}$. The goal for today was to give a self-contained proof of the result that in the subcritical setting $\lambda<1$, there is no giant component, that is, a component supported on a positive proportion of the vertices, with high probability as $n\rightarrow\infty$.

More formally, we showed that the proportion of vertices contained within the largest component of $G(n,\frac{\lambda}{n})$ vanishes in probability:

$\frac{1}{n} \left| L_1\left(G\left(n,\frac{\lambda}{n}\right)\right) \right| \stackrel{\mathbb{P}}\longrightarrow 0.$

The argument for this result involves an exploration process of a component of the graph. This notion will be developed more formally in future lectures, aiming for good approximation rather than bounding arguments.

But for now, the key observation is that when we ‘explore’ the component of a uniformly chosen vertex $v\in[n]$ outwards from v, at all times the number of ‘children’ of v which haven’t already been considered is ‘at most’ $\mathrm{Bin}(n-1,\frac{\lambda}{n})$. Since, for example, if we already know that eleven vertices, including the current one w are in C(v), then the distribution of the number of new vertices to be added to consideration because they are directly connected to w has conditional distribution $\mathrm{Bin}(n-11,\frac{\lambda}{n})$.

Firstly, we want to formalise the notion that this is ‘less than’ $\mathrm{Bin}(n,\frac{\lambda}{n})$, and also that, so long as we don’t replace 11 by a linear function of n, that $\mathrm{Bin}(n-11,\frac{\lambda}{n})\stackrel{d}\approx \mathrm{Poisson}(\lambda)$.

Couplings to compare distributions

coupling of two random variables (or distributions) X and Y is a realisation $(\hat X,\hat Y)$ on the same probability space with correct marginals, that is

$\hat X\stackrel{d}=X,\quad \hat Y\stackrel{d}=Y.$

We saw earlier in the course that we could couple G(n,p) and G(n,q) by simulating both from the same family of uniform random variables, and it’s helpful to think of this in general: ‘constructing the distributions from the same source of randomness’.

Couplings are a useful notion to digest at this point, as they embody a general trend in discrete probability theory. Wherever possible, we try to do as we can with the random objects, before starting any calculations. Think about the connectivity property of G(n,p) as discussed in the previous lecture. This can be expressed directly as a function of p in terms of a large sum, but showing it is an increasing function of p is essentially impossible by computation, whereas this is very straightforward using the coupling.

We will now review how to use couplings to compare distributions.

For a real-valued random variable X, with distribution function $F_X$, we always have the option to couple with a uniform U(0,1) random variable. That is, when $U\sim U[0,1]$, we have $(F_X^{-1}(U)\stackrel{d}= X$, where the inverse of the distribution function is defined (in the non-obvious case of atoms) as

$F_X^{-1}(u)=\inf\left\{ x\in\mathbb{R}\,:\, F(x)\ge u\right\}.$

Note that when the value taken by U increases, so does the value taken by $F_X^{-1}(U)$. This coupling can be used simultaneously on two random variables X and Y, as $(F_X^{-1}(U),F_Y^{-1}(U))$, to generate a coupling of X and Y.

The total variation distance between two probability measures is

$d_{\mathrm{TV}}(\mu,\nu):= \sup_{A}|\mu(A)-\nu(A)|$,

with supremum taken over all events in the joint support S of $\mu,\nu$. This is particularly clear in the case of discrete measures, as then

$d_{\mathrm{TV}}(\mu,\nu)=\frac12 \sum_{x\in S} \left| \mu\left(\{x\}\right) - \nu\left(\{x\}\right) \right|.$

(Think of the difference in heights between the bars, when you plot $\mu,\nu$ simultaneously as a bar graph…)

The total variation distances records how well we can couple two distributions, if we want them to be equal as often as possible. It is therefore a bad measure of distributions with different support. For example, the distributions $\delta_0$ and $\delta_{1/n}$ are distance 1 apart (the maximum) for all values of n. Similarly, the uniform distribution on [0,1] and the uniform distribution on $\{0,1/n,2/n,\ldots, n-1/n, 1\}$ are also distance 1 apart.

When there is more overlap, the following result is useful.

Proposition: Any coupling $(\hat X,\hat Y)$ of $X\sim \mu,\,Y\sim \nu$ satisfies $\mathbb{P}(X=Y)\le 1-d_{\mathrm{TV}}(\mu,\nu)$, and there exists a coupling such that equality is achieved. Continue reading