Lecture 10 – the configuration model

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

As we enter the final stages of the semester, I want to discuss some extensions to the standard Erdos-Renyi random graph which has been the focus of most of the course so far. Although we will not get far into the details during this course, the overall goal is to develop models which are close to Erdos-Renyi in terms of ease of analysis, while also allowing more of the features characteristic of networks observed in the real world.

One of the more obvious deficiencies of the sparse regime of Erdos-Renyi random graphs for modelling ‘real-world phenomena’ concerns the degree sequence. Indeed, the empirical degree distribution of G(n,c/n) converges to Poisson(c). By contrast, in real-world networks, a much wider range of degrees is typically observed, and in many cases it is felt that these should follow a power law, with a small number of a very highly connected agents.

One way around this problem to construct random graphs where we insist that the graph has a given sequence of degrees. The configuration model, which is the subject of this lecture and this post (and about which I’ve written before), offers one way to achieve this.

Definition and notes

Let n\ge 1 and let d=(d_1,d_2,\ldots,d_n) be a sequence of non-negative integers such that \sum_{i=1}^n d_i is even. Then the configuration model with degree sequence d is a random multigraph with vertex set [n], constructed as follows:

  • To each vertex i\in[n], assign d_i half-edges;
  • Then, take a uniform matching of these half-edges;
  • Finally, for each pair of half-edges in the matching, replace the two half-edges with a genuine edge, to obtain the multigraph CM_n(d), in which, by construction, vertex i has degree d_i.

One should note immediately that although the matching is uniform, the multigraph is not uniform amongst multigraphs with that degree sequence. Note also that the condition on the sums of the degrees is necessary for any graph, and in this context means that the number of half-edges is even, without which it would not be possible to construct a matching.

This effect is manifest in the simplest possible example, when n=2 and d=(3,3). There are two possible graphs, up to isomorphism, which are shown below:

For obvious reasons, we might refer to these as the handcuffs and the theta , respectively. It’s helpful if we, temporarily, assume the half-edges are distinguishable at the moment we join them up in the configuration model construction. Because then there are 3×3=9 ways to join them up to form the handcuffs (think of which half-edge ends up forming the edge between the two vertices) while there are 3!=6 ways to pair up the half-edges in the theta.

In general, for multigraphs H with the correct degree sequence, we have

\mathbb{P}( CM_n(d)\simeq H) \propto \left( 2^{\# \text{loops}(H)} \prod_{e\in E(H)} \text{mult}(e)! \right),

where \text{mult}(e) is the multiplicity with which a given edge e appears in H.

Note: it might seem counterintuitive that this procedure is biased against multiple edges and self-loops, but it is really just saying that there are more ways to form two distinct edges than to form two equal edges (ie a multiedge pair) when we view the half-edges as distinguishable. (See this post for further discussion of this aspect in the 3-regular setting.)

However, a consequence of this result is that if we condition on the event that CM_n(d) is simple, then the resulting random graph is uniform on the set of simple graphs satisfying the degree property. Note that the same example as above shows that there’s no guarantee that there exists a simple graph whose degrees are some given sequence.

d-regular configuration model

In general, from a modelling point of view, we are particularly interested in simple, connected graphs, and so it is valuable to study whether the large examples of the configuration model are likely to have these properties. In this lecture, I will mainly focus on the case where the multigraphs are d-regular, meaning that all the vertices have degree equal to d. For the purposes of this lecture, we denote by G^d(n), the d-regular configuration model CM_n(d,\ldots,d).

  • d=1: to satisfy the parity condition on the sums of degrees, we must have n even. But then G^1(n) will consist of n/2 disjoint edges.
  • d=2: G^2(n) will consist of some number of disjoint cycles, and it is a straightforward calculation to check that when n is large, with high probability the graph will be disconnected.

In particular, I will focus on the case when d=3, which is the first interesting case. Most of the results we prove here can be generalised (under various conditions) to more general examples of the configuration model. The main goal of the lecture is revision of some techniques of the course, plus one new one, in a fresh setting, and the strongest possible versions of many of these results can be found amongst the references listed at the end.

Connectedness

In the lecture, we showed that G^3(2n) is connected with high probability. This is, in fact, a very weak result, since in fact G^d(n) is d-connected with high probability for d\ge 3 [Bol81, Wor81]. Here, d-connected means that one must remove at least d vertices in order to disconnect the graph, or, equivalently, that there are d disjoint paths between any pair of vertices. Furthermore, Bollobas shows that for d\ge 3, G^d(n) is a (random) expander family [Bol88].

Anyway, for the purposes of this course, the main tool is direct enumeration. The matching number M_{2k} satisfies

M_{2k}=(2k-1)\times (2k-3)\times\ldots\times 3\times 1 = \frac{(2k)!}{2^k \cdot k!},

and so Stirling’s approximation gives the asymptotics

M_{2k} = (\sqrt{2}+o(1)) \left(\frac{2}{e}\right)^k k^k,

although it will be useful to use the true bounds

c \left(\frac{2}{e}\right)^k k^k \le M_{2k}\le C\left(\frac{2}{e}\right)^k k^k,\quad \forall k,

instead in some places. Anyway, in G^3(2n), there are 6n half-edges in total, and so the probability that the graph may be split into two parts consisting of 2\ell,2m vertices, with 2\ell+2m=2n, and with no edges between the classes is \frac{\binom{2n}{2\ell} M_{6\ell}M_{6m}}{M_{6n}}. Continue reading

Advertisements

Lecture 9 – Inhomogeneous random graphs

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

As we enter the final stages of the semester, I want to discuss some extensions to the standard Erdos-Renyi random graph which has been the focus of most of the course so far. In doing so, we can revisit material that we have already covered, and discover how easily one can extend this directly to more exotic settings.

The focus of this lecture was the model of inhomogeneous random graphs (IRGs) introduced by Soderberg [Sod02] and first studied rigorously by Bollobas, Janson and Riordan [BJR07]. Soderberg and this blog post address the case where vertices have a type drawn from a finite set. BJR address the setting with more general typespaces, in particular a continuum of types. This generalisation is essential if one wants to use IRGs to model effects more sophisticated than those of the classical Erdos-Renyi model G(n,c/n), but most of the methodology is present in the finite-type setting, and avoids the operator theory language which is perhaps intimidating for a first-time reader.

Inhomogeneous random graphs

Throughout, k\ge 2 is fixed. A graph with k types is a graph G=(V,E) together with a type function V\to \{1,\ldots,k\}. We will refer to a k\times k symmetric matrix with non-negative entries as a kernel.

Given n\in\mathbb{N} and a vector p=(p_1,\ldots,p_k)\in\mathbb{N}_0^k satisfying \sum p_i=n, and \kappa a kernel, we define the inhomogeneous random graph G^n(p,\kappa) with k types as:

  • the vertex set is [n],
  • types are assigned uniformly at random to the vertices such that exactly p_i vertices have type i.
  • Conditional on these types, each edge v\leftrightarrow w (for v\ne w\in [n]) is present, independently, with probability

1 - \exp\left(-\frac{\kappa_{\mathrm{type}(v),\mathrm{type}(w)} }{n} \right).

Notes on the definition:

  • Alternatively, we could assign the types so that vertices \{1,\ldots,p_1\} have type 1, \{p_1+1,\ldots,p_1+p_2\} have type 2, etc etc. This makes no difference except in terms of the notation we have to use if we want to use exchangeability arguments later.
  • An alternative model considers some distribution \pi on [k], and assigns the types of the vertices of [n] in an IID fashion according to \pi. Essentially all the same results hold for these two models. (For example, this model with ‘random types’ can be studied by quenching the number of each type!) Often one works with whichever model seems easier for a given proof.
  • Note that the edge probability given is \approx \frac{\kappa_{\mathrm{type}(v),\mathrm{type}(w)}}{n}. The exponential form has a more natural interpretation if we ever need to turn the IRGs into a process. Additionally, it avoids the requirement to treat small values of n (for which, a priori, k/n might be greater than 1) separately.

In the above example, one can see that, roughly speaking, red vertices are more likely to be connected to each other than blue vertices. However, for both colours, they are more likely to be connected to a given vertex of the same colour than a vertex of the opposite colour. This might, for example, correspond to the kernel \begin{pmatrix}3&1\\1&2\end{pmatrix}.

The definition given above corresponds to a sparse setting, where the typical vertex degrees are \Theta(1). Obviously, one can set up an inhomogeneous random graph in a dense regime by an identical argument.

From an applications point of view, it’s not hard to imagine that an IRG of some flavour might be a good model for many phenomena observed in reality, especially when a mean-field assumption is somewhat appropriate. The friendships of boys and girls in primary school seems a particularly resonant example, though doubtless there are many others.

One particular application is to recover the types of the vertices from the topology of the graph. That is, if you see the above picture without the colours, can you work out which vertices are red, and which are blue? (Assuming you know the kernel.) This is clearly impossible to do with anything like certainty in the sparse setting – how does one decide about isolated vertices, for example? The probabilities that a red vertex is isolated and that a blue vertex is isolated differ by a constant factor in the n\rightarrow\infty limit. But in the dense setting, one can achieve this with high confidence. When studying such statistical questions, these IRGs are often referred to as stochastic block models, and the recent survey of Abbe [Abbe] gives a very rich history of this type of problem in this setting.

Poisson multitype branching processes

As in the case of the classical random graph G(n,c/n), we learn a lot about the IRG by studying its local structure. Let’s assume from now on that we are given a sequence of IRGs G^n(p^n,\kappa) for which \frac{p^n}{n}\rightarrow \pi, where \pi=(\pi_1,\ldots,\pi_k)\in[0,1]^k satisfies ||\pi||_1=1.

Now, let v^n be a uniformly-chosen vertex in [n]. Clearly \mathrm{type}(v^n)\stackrel{d}\rightarrow \pi, with the immediate mild notation abuse of viewing \pi as a probability distribution on [k].

Then, conditional on \mathrm{type}(v^n)=i:

  • when j\ne i, the number of type j neighbours of v^n is distributed as \mathrm{Bin}\left(p_j,1-\exp\left(-\frac{\kappa_{i,j}}{n}\right)\right).
  • the number of type i neighbours of v^n is distributed as \mathrm{Bin}\left( p_i-1,1-\exp\left(-\frac{\kappa_{i,i}}{n}\right)\right).

Note that p_j\left[1-\exp\left(-\frac{\kappa_{i,j}}{n}\right)\right]\approx \frac{p_j\cdot \kappa_{i,j}}{n} \approx \kappa_{i,j}\pi_j, and similarly in the case j=i, so in both cases, the number of neighbours of type j is distributed approximately as \mathrm{Poisson}(\kappa_{i,j}\pi_j).

This motivates the following definition of a branching process tree, whose vertices have k types. Continue reading

Lecture 7 – The giant component

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

As we edge into the second half of the course, we are now in a position to return to the question of the phase transition between the subcritical regime \lambda<1 and the supercritical regime \lambda>1 concerning the size of the largest component L_1(G(n,\lambda/n)).

In Lecture 3, we used the exploration process to give upper bounds on the size of this largest component in the subcritical regime. In particular, we showed that

\frac{1}{n}\big| L_1(G(n,\lambda/n)) \big| \stackrel{\mathbb{P}}\rightarrow 0.

If we used slightly stronger random walk concentration estimates (Chernoff bounds rather than 2nd-moment bounds from Chebyshev’s inequality), we could in fact have shown that with high probability the size of this largest component was at most some logarithmic function of n.

In this lecture, we turn to the supercritical regime. In the previous lecture, we defined various forms of weak local limit, and asserted (without attempting the notationally-involved combinatorial calculation) that the random graph G(n,\lambda/n) converges locally weakly in probability to the Galton-Watson tree with \text{Poisson}(\lambda) offspring distribution, as we’ve used informally earlier in the course.

Of course, when \lambda>1, this branching process has strictly positive survival probability \zeta_\lambda>0. At a heuristic level, we imagine that all vertices whose local neighbourhood is ‘infinite’ are in fact part of the same giant component, which should occupy (\zeta_\lambda+o_{\mathbb{P}}(1))n vertices. In its most basic form, the result is

\frac{1}{n}\big|L_1(G(n,\lambda/n))\big|\;\stackrel{\mathbb{P}}\longrightarrow\; \zeta_\lambda,\quad \frac{1}{n}\big|L_2(G(n,\lambda/n))\big| \;\stackrel{\mathbb{P}}\longrightarrow\; 0, (*)

where the second part is a uniqueness result for the giant component.

The usual heuristic for proving this result is that all ‘large’ components must in fact be joined. For example, if there are two giant components, with sizes \approx \alpha n,\approx \beta n, then each time we add a new edge (such an argument is often called ‘sprinkling‘), the probability that these two components are joined is \approx 2ab, and so if we add lots of edges (which happens as we move from edge probability \lambda-\epsilon\mapsto \lambda ) then with high probability these two components get joined.

It is hard to make this argument rigorous, and the normal approach is to show that with high probability there are no components with sizes within a certain intermediate range (say between \Theta(\log n) and n^\alpha) and then show that all larger components are the same by a joint exploration process or a technical sprinkling argument. Cf the books of Bollobas and of Janson, Luczak, Rucinski. See also this blog post (and the next page) for a readable online version of this argument.

I can’t find any version of the following argument, which takes the weak local convergence as an assumption, in the literature, but seems appropriate to this course. It is worth noting that, as we shall see, the method is not hugely robust to adjustments in case one is, for example, seeking stronger estimates on the giant component (eg a CLT).

Anyway, we proceed in three steps:

Step 1: First we show, using the local limit, that for any \epsilon>0,

\frac{1}{n}\big|L_1(G(n,\lambda/n))\big| \le \zeta_\lambda+\epsilon, with high probability as n\rightarrow\infty.

Step 2: Using a lower bound on the exploration process, for \epsilon>0 small enough

\frac{1}{n}\big|L_1(G(n,\lambda/n))\big| \ge \epsilon, with high probability.

Step 3: Motivated by duality, we count isolated vertices to show

\mathbb{P}(\epsilon n\le |L_1| \le (\zeta_\lambda-\epsilon)n) \rightarrow 0.

We will return to uniqueness at the end.

Step 1

This step is unsurprising. The local limit gives control on how many vertices are in small components of various sizes, and so gives control on how many vertices are in small components of all finite sizes (taking limits in the right order). This gives a bound on how many vertices can be in the giant component. Continue reading

Lecture 6 – Local limits

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

By this point of the course, we’ve studied several aspects of the Erdos-Renyi random graph, especially in the sparse setting G(n,\frac{\lambda}{n}). We’ve also taken a lengthy detour to revise Galton-Watson trees, with a particular focus on the case of Poisson offspring distribution.

This is deliberate. Note that a given vertex v of G(n,\frac{\lambda}{n}) has some number of neighbours distributed as \mathrm{Bin}(n-1,\frac{\lambda}{n})\stackrel{d}\approx\mathrm{Po}(\lambda), and the same approximation remains valid as we explore the graph (for example in a breadth-first fashion) either until we have seen a large number of vertices, or unless some ultra-pathological event happens, such as a vertex having degree n/3.

In any case, we are motivated by the notion that the local structure of G(n,\frac{\lambda}{n}) is well-approximated by the Galton-Watson tree with \mathrm{Po}(\lambda) offspring, and in this lecture and the next we try to make this notion precise, and discuss some consequences when we can show that this form of convergence occurs.

Deterministic graphs

Throughout, we will be interested in rooted graphs, since by definition we have to choose a root vertex whose local neighbourhood is to be studied. Usually, we will study a sequence of rooted graphs (G_n,\rho_n), where the vertex set of G_n is [n], or certainly increasing in n (as in the first example).

For some rooted graph (G,\rho), we say such a sequence (G_n,\rho_n) converges to (G,\rho) locally if for all radii r\ge 1, we have B_r^{G_n}(\rho_n)\simeq B_r^G(\rho). In words, the neighbourhood around \rho_n in G_n is the same up to radius r as the neighbourhood around \rho in G, so long as n is large enough (for given r).

This is best illustrated by an example, such as T_n, the binary tree to depth n.

If we take \rho_n to be the usual root, then the trees are nested, and converge locally to the infinite binary tree T_\infty. Slightly less obviously, if we take \rho_n to be one of the leaves, then the trees are still nested (up to labelling – ie in the sense of isomorphisms of rooted trees), and converge locally to the canopy tree, defined by a copy of \mathbb{Z}_{\ge 0} with nearest-neighbour edges, and where each vertex n\ge 1 is connected to the root of a disjoint copy of T_{n-1}, as shown below:

Things get more interesting when the root is chosen randomly, for example, uniformly at random, as this encodes more global information about the graphs G_n. In the case where the G_n are vertex-transitive, then if we only care about rooted graphs up to isomorphism, then it doesn’t matter how we choose the root.

Otherwise, we say that G_n converges in the local weak sense to (G,\rho) if, for all r\ge 1 and for all rooted graphs (H,\rho_H),

\mathbb{P}\left( B^{G_n}_r(\rho_n)\simeq (H,\rho_H) \right) \longrightarrow \mathbb{P}\left( B_r^G(\rho)\simeq H\right),

as n\rightarrow\infty.

Alternatively, one can phrase this as a result about convergence of rooted-graph-valued distributions.

A simple non-transitive example is G_n\simeq P_n, the path of length n. Then, the r-neighbourhood of a vertex is isomorphic to P_{2r}unless that vertex is within graph-distance (r-1) of one of the leaves of G_n. As n\rightarrow\infty, the proportion of such vertices vanishes, and so, \mathbb{P}\left( B^{P_n}_r(\rho_n)\simeq P_{2r}\right)\rightarrow 1, from which we conclude the unsurprising result that P_{n} converges in the local weak sense to \mathbb{Z}. (Which is vertex-transitive, so it doesn’t matter where we select the root.)

The binary trees offer a slightly richer perspective. Let \mathcal{L}_n be the set of leaves of T_n, and we claim that when \rho_n is chosen uniformly from the vertices of T_n, then d_{T_n}(\rho_n,\mathcal{L}_n) converges in distribution. Indeed, \mathbb{P}\left( d_{T_n}(\rho_n,\mathcal{L}_n)=k\right) = \frac{2^{n-k}}{2^{n+1}-1}, whenever n\ge k, and so the given distance converges in distribution to the Geometric distribution with parameter 1/2 supported on {0,1,2,…}.

This induces a random local weak limit, namely the canopy tree, rooted at one of the vertices we denoted by \mathbb{Z}_{\ge 0}, with the choice of this vertex given by Geometric(1/2). Continue reading

Lecture 4 – Hitting time theorem

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

This lecture consisted of revision of the most relevant theory of Galton-Watson trees, with a focus on the case where the offspring distribution is Poisson, since, as we have seen in previous lectures, this is a strong candidate to approximate the structure of G(n,\lambda/n). It makes sense to cover the theory of the trees before attempting to make rigorous the sense of approximation.

Given a Galton-Watson tree T, it is natural to label the vertices in a breadth-first order as \varnothing=v_1,v_2,\ldots,v_{|T|}. This is easiest if we have constructed the Galton-Watson tree as a subset of the infinite Ulam-Harris tree, where vertices have labels like (3,5,17,4), whose parent is (3,5,17). If this child vertex is part of the tree, then so are (3,5,17,1), (3,5,17,2), and (3,5,17,3). This means our breadth-first order is canonically well-defined, as we have a natural ordering of the children of each parent vertex.

Note: one advantage of using breadth-first order rather than depth-first order (which corresponds to the usual dictionary, or lexicographic ordering of the labels) is that if the tree is infinite, we don’t explore all of it during a depth-first search. (In the sense that there exist vertices which are never given a finite label.) For breadth-first search, a similar problem arises precisely when some vertex has infinitely many children. For a conventional Galton-Watson tree, the latter situation is much less of a problem than the infinite total population problem, which happens with positive probability whenever \mu=\mathbb{E}[X]>1.

Anyway, given the depth-first order, one can consider an exploration process S_0,S_1,S_2,\ldots,S_{|T|} given by

S_0=1,\quad S_i=S_{i-1}+(X_i-1),

where X_i is the number of children of v_i. In this way, we see that

S_i=\big| \Gamma(v_1)\cup\ldots\cup\Gamma(v_i)\backslash \{v_1,\ldots,v_i\}\big|,\quad i\ge 1,

records the number of vertices in some stack containing those which we have ‘seen but not explored’. Some authors prefer to start from 0, in which case one ends up with a similar but slightly different interpretation of the ‘stack’, but that’s fine since we aren’t going to define formally what ‘seen’ and ‘explored’ means in this post.

Essentially, we exhaust the vertices of the tree whenever S_t=0, and so the condition that |T|=n requires

S_n=0,\quad S_m\ge 1,\; m=0,1,\ldots,n-1.

Conveniently, so long as we have avoiding ordering ambiguity, for example by insisting that trees live within the Ulam-Harris tree, we can reconstruct T uniquely from (S_0,S_1,\ldots,S_{|T|}).

Furthermore, if T is a Galton-Watson process, then the numbers of children X_i are IID, and so in fact this exploration process is a random walk, and the size of the tree can be recovered as the hitting time of zero.

Note: making fully rigorous the argument that children in the GW tree are independent of the breadth-first walk fully rigorous is somewhat technical, and not to be dismissed lightly, though not of principle interest at the level of this topics course. See Proposition 1.5 in Section 1.2 of Le Gall’s notes or Section 1.2.2 of my doctoral thesis for further discussion and argument.

The hitting time theorem allows us to study the distribution of the hitting time of a random walk whose increments are bounded below by -1, in terms of the distribution of the value of the random walk.

Theorem: Let (S_n,\, n\ge 0) be a random walk with S_0=0 and IID increments (X_n,n\ge 1) satisfying \mathbb{P}(X_n\ge -1)=1. Let H_{-k}=\inf \left\{n\,:\, S_n=-k\right\} be the hitting time of -k.

Then \mathbb{P}\big( H_{-k}=n\big) = \frac{k}{n}\mathbb{P}\big(S_n=-k).

Commentary: there are local central limit theorem estimates and large deviation estimates that allow good control of the probability on the RHS for a rich class of contexts. So at a meta-level, the hitting time theorem allows us to reduce a complicated (though still classical) problem, to a real classical problem, which is particularly helpful when the LHS is a device for capturing relevant information about our random tree model.

Continue reading

Lecture 3 – Couplings, comparing distributions

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

In this third lecture, we made our first foray into the scaling regime for G(n,p) which will be the main focus of the course, namely the sparse regime when p=\frac{\lambda}{n}. The goal for today was to give a self-contained proof of the result that in the subcritical setting \lambda<1, there is no giant component, that is, a component supported on a positive proportion of the vertices, with high probability as n\rightarrow\infty.

More formally, we showed that the proportion of vertices contained within the largest component of G(n,\frac{\lambda}{n}) vanishes in probability:

\frac{1}{n} \left| L_1\left(G\left(n,\frac{\lambda}{n}\right)\right) \right| \stackrel{\mathbb{P}}\longrightarrow 0.

The argument for this result involves an exploration process of a component of the graph. This notion will be developed more formally in future lectures, aiming for good approximation rather than bounding arguments.

But for now, the key observation is that when we ‘explore’ the component of a uniformly chosen vertex v\in[n] outwards from v, at all times the number of ‘children’ of v which haven’t already been considered is ‘at most’ \mathrm{Bin}(n-1,\frac{\lambda}{n}). Since, for example, if we already know that eleven vertices, including the current one w are in C(v), then the distribution of the number of new vertices to be added to consideration because they are directly connected to w has conditional distribution \mathrm{Bin}(n-11,\frac{\lambda}{n}).

Firstly, we want to formalise the notion that this is ‘less than’ \mathrm{Bin}(n,\frac{\lambda}{n}), and also that, so long as we don’t replace 11 by a linear function of n, that \mathrm{Bin}(n-11,\frac{\lambda}{n})\stackrel{d}\approx \mathrm{Poisson}(\lambda).

Couplings to compare distributions

coupling of two random variables (or distributions) X and Y is a realisation (\hat X,\hat Y) on the same probability space with correct marginals, that is

\hat X\stackrel{d}=X,\quad \hat Y\stackrel{d}=Y.

We saw earlier in the course that we could couple G(n,p) and G(n,q) by simulating both from the same family of uniform random variables, and it’s helpful to think of this in general: ‘constructing the distributions from the same source of randomness’.

Couplings are a useful notion to digest at this point, as they embody a general trend in discrete probability theory. Wherever possible, we try to do as we can with the random objects, before starting any calculations. Think about the connectivity property of G(n,p) as discussed in the previous lecture. This can be expressed directly as a function of p in terms of a large sum, but showing it is an increasing function of p is essentially impossible by computation, whereas this is very straightforward using the coupling.

We will now review how to use couplings to compare distributions.

For a real-valued random variable X, with distribution function F_X, we always have the option to couple with a uniform U(0,1) random variable. That is, when U\sim U[0,1], we have (F_X^{-1}(U)\stackrel{d}= X, where the inverse of the distribution function is defined (in the non-obvious case of atoms) as

F_X^{-1}(u)=\inf\left\{ x\in\mathbb{R}\,:\, F(x)\ge u\right\}.

Note that when the value taken by U increases, so does the value taken by F_X^{-1}(U). This coupling can be used simultaneously on two random variables X and Y, as (F_X^{-1}(U),F_Y^{-1}(U)), to generate a coupling of X and Y.

The total variation distance between two probability measures is

d_{\mathrm{TV}}(\mu,\nu):= \sup_{A}|\mu(A)-\nu(A)|,

with supremum taken over all events in the joint support S of \mu,\nu. This is particularly clear in the case of discrete measures, as then

d_{\mathrm{TV}}(\mu,\nu)=\frac12 \sum_{x\in S} \left| \mu\left(\{x\}\right) - \nu\left(\{x\}\right) \right|.

(Think of the difference in heights between the bars, when you plot \mu,\nu simultaneously as a bar graph…)

The total variation distances records how well we can couple two distributions, if we want them to be equal as often as possible. It is therefore a bad measure of distributions with different support. For example, the distributions \delta_0 and \delta_{1/n} are distance 1 apart (the maximum) for all values of n. Similarly, the uniform distribution on [0,1] and the uniform distribution on \{0,1/n,2/n,\ldots, n-1/n, 1\} are also distance 1 apart.

When there is more overlap, the following result is useful.

Proposition: Any coupling (\hat X,\hat Y) of X\sim \mu,\,Y\sim \nu satisfies \mathbb{P}(X=Y)\le 1-d_{\mathrm{TV}}(\mu,\nu), and there exists a coupling such that equality is achieved. Continue reading

Lecture 2 – Connectivity threshold

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

The goal of the second lecture was to establish the sharp phase transition for the connectivity of the random graph G(n,p(n)) around the critical regime p(n)\sim \frac{\log n}{n}. In the end, we showed that when \omega(n) is any diverging sequence, and p(n)=\frac{\log n-\omega(n)}{n}, then we have that G(n,p(n)) is with high probability not connected.

In the next lecture, we will finish the classification by studying p(n)=\frac{\log n+\omega(n)}{n}, and show that for this range of p, the graph G(n,p(n)) is with high probability connected.

The details of the lecture, especially the calculation, are not presented fully here. There, I followed van der Hofstad’s recent book fairly closely, sometimes taking different approximations and routes through the algebra, though all versions remain fairly close to the original enumerations by Renyi.

Immediate remarks

  • One is allowed to be surprised that for almost all scalings of p(n), G(n,p) is either whp connected or whp not connected. The speed of the transition is definitely interesting.
  • As defined in lectures, the property that a graph is connected is an increasing property, meaning that it is preserved when you add additional edges to the graph.
  • Because of the natural coupling between G(n,p) and G(n,q), the fact that connectedness is an increasing property makes life easier. For example, we can insist temporarily that \omega(n)\ll \log n, or whatever scaling turns out to be convenient for the proof, but conclude the result for all diverging \omega(n). This avoids the necessity for an annoying case distinction.

Heuristics – Isolated vertices

It turns out that the ‘easiest’ way for such a graph to be disconnected is for it to have an isolated vertex. In determining that the graph has a cut into classes of sizes a and b with no edges between them, there is a trade-off between the number of ways to choose the partition (which increases with min(a,b) ) and the probabilistic penalty from banning the ab edges between the classes (which decreases with min(a,b) ). It turns out that the latter effect is slightly stronger, and so (1,n-1) dominates.

Method 1: second-moment method

In the case p(n)=\frac{\log n - \omega(n)}{n}, we use a second-moment method argument to establish that G(n,p) contains an isolated vertex with high probability. Note that a given vertex v is isolated precisely if n-1 edges are not present. Furthermore, two given vertices v,w are both isolated, precisely if 2n-3 edges are not present. So in fact, both the first moment and the second moment of the number of isolated vertices are straightforward to evaluate.

It turns out that the number of isolated vertices, Y_n, satisfies

\mathbb{E}[Y_n]= \exp(\omega(n)+o(1))\rightarrow\infty. (*)

As always, we have to eliminate the possibility that this divergent expectation is achieved by the graph typically having no isolated vertices, but occasionally having very many. So we turn to the second moment, and can show

\mathrm{Var}(Y_n)= (1+o(1))\mathbb{E}[Y_n],

and so by Chebyshev’s inequality, we have \mathbb{P}(Y_n=0)\rightarrow 0.

Method 2: first-moment method

Counter-intuitively, although the case p(n)=\frac{\log n + \omega(n)}{n} requires only a first-moment method, it is more technical because it involves the non-clear direction of the informal equivalence:

\text{Connected}\; ``\iff ''\; \text{no isolated vertices}.

At the time we showed (*), we also showed that for this regime of p(n), G(n,p) whp has no isolated vertices. It remains to show that it has no splits into (unions of) connected components of sizes k and n-k. Continue reading

Random Graphs – Lecture 1

My plan is to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics about the course can be found here.

In the first lecture, we revised some basic definitions about graphs, focusing on those which are most relevant to a first study of the Erdos-Renyi random graph G(n,p) which will be the focus of the lecture course. We discussed in abstract why the independence of the (potential) edges makes the model easier to analyse, but reduces its suitability as a direct model for lots of networks one might see in the real world, where knowledge that A is directly connected to both B and C affects the probability that B is directly connected to C, in either direction. Thinking about the Facebook friendship graph is one of the best examples, where in this case, we expect this extra information to increase the probability that B and C are connected. Even as the world moves away from heteronormativity, it realistically remains the case that in a graph of the dating history amongst a well-defined community we would likely observe the opposite effect.

All of these more complicated phenomena can be captured by various random graphs, but G(n,p) remains the corner stone, evinced by the >10^5 citations towards one of Erdos and Renyi’s original papers on the topic.

Somewhat paraphrasing, one of their (well, mostly Renyi’s) original questions was: when n is large, what should p be so that there’s a good chance that G(n,p) is connected?

The answer to this question lies in Lecture 2, but to cement understanding of the model, and explore some key methods for proofs in discrete probability (as well as play around with the big-O and little-o notation), we investigated the following two situations, which are very far from interesting as far as connectivity of G(n,p) is concerned.

Dense regime

When p is fixed, there are many interesting questions one could ask about the asymptotic properties of G(n,p), but connectivity is not one of them. In particular, for p\in(0,1) we claim:

Proposition: \mathrm{diam}(G(n,p)) \stackrel{\mathbb{P}}\rightarrow 2 as n\rightarrow\infty.

Note that if \mathrm{diam}(G(n,p))=1, then G(n,p)\simeq K_n, the complete graph on n vertices. In other words, every possible edge is actually present. But the probability of this event is p^{\binom{n}{2}}\rightarrow 0, so long as p<1.

It then suffices to prove that \mathbb{P}(G(n,p)>2) \rightarrow 0. We use a union bound, where we study the probability that the graph distance d_{G(n,p}(v,w)>2 for two fixed vertices v\ne w first, and then sum over all such pairs. Of course, there is a probability p that the two vertices are directly connected by an edge. Then, there are (n-2) other vertices with the potential to be a common neighbour of v and w, which would ensure that the graph distance between them is at most two. So

\mathbb{P}(d_{G(n,p)}(v,w)>2)=(1-p)[1-p^2]^{n-2} .

Note that we are using independence throughout this calculation. Then comes the union bound:

\mathbb{P}(\mathrm{diam}(G(n,p)) >2) \le \sum_{v\ne w \in[n]} \mathbb{P}(d_{G(n,P)}(v,w)>2)

\le \binom{n}{2} (1-p)[1-p^2]^{n-2} \rightarrow 0,

since exponential decay ‘kills’ polynomial growth.

Ultra-sparse regime

In general, we work in the setting where p=p(n) depends on n. If p(n) decays fast enough (see Exercise 2), then with high probability G(n,p) has no edges at all. However, when p(n)=o(n^{-3/2}) we have

Proposition: \mathbb{P}(\text{edges of }G(n,p)\text{ form a matching}) \rightarrow 1 as n\rightarrow\infty.

A matching is a collection of edges with no vertices in common. So if the edge set of the graph is a matching, we have essentially no interesting connectivity structure at all. The longest path has length one, for example.

To prove this, note that the edge set of the graph fails to be a matching precisely if one of the vertices has degree at least two. But since a vertex v is connected to each of the (n-1) other vertices in the graph independently with probability p, we have

\mathrm{deg}_{G(n,p)}(v) \sim \mathrm{Bin}(n-1,p),

and so we can directly make the crude approximation

\mathbb{P}(\mathrm{deg}_{G(n,p)}(v) =k) = \binom{n-1}{k}p^k(1-p)^{n-1-k}\le n^k p^k.

We’ve made this very weak bound to make life easier when we sum:

\mathbb{P}(\mathrm{deg}_{G(n,p)}(v) \ge 2) \le \sum_{k\ge 2}(np)^k = \frac{(np)^2}{1-np}.

Since p=o(n^{-3/2}), we have \frac{1}{1-np}=\frac{1}{1-o(1)}=1+o(1), and overall we obtain

\mathbb{P}(\mathrm{deg}_{G(n,p)}(v) \ge 2) = o(\frac{1}{n}).

Again, we finish with a union bound, considering this event across all vertices v\in[n].

\mathbb{P}(E(G(n,p))\text{ not a matching}) \le \sum_{v\in[n]} \mathbb{P}(\mathrm{deg}_{G(n,p)}(v)\ge 2)

=n\mathbb{P}(\mathrm{deg}_{G(n,p)}(1)\ge 2) = o(1),

as required.

Next time

In the next lecture, we’ll study the regime p(n)\sim \frac{\log n}{n}, where G(n,p) experiences a phase transition from probably not connected to probably connected. Part of this involves making the notion probably connected precise, which will be useful throughout the rest of the course, as well as establishing the language for comparing G(n,p) and G(n,q).

The proof itself requires some more sophisticated versions of calculations from Lecture 1, and more sophisticated probabilistic tools (first- and second-moment methods) to convert them into statements about convergence in probability. This will be an advertisement for the more classical enumerative methods that underpinned much of the early work on random graphs.

The rest of the course will exploit much more some comparisons and embeddings involving branching processes and exploration processes, so don’t worry – it won’t be 26 hours of counting trees!

The reflection principle and conditioned RWs

I haven’t published a post about probability for far too long. Several are queued, so perhaps this will be the start of a deluge.

Anyway, with my advisor at Technion, I’m still working on some problems concerning Gaussian random walk subject to some conditioning which is complicated, but in practice (we hope) only mildly different to conditioning the walk to stay positive. Our conditioning at step n depends on some external randomness, but also on the future trajectory of the walk (related to the embedding of the walk in a 2D DGFF), thus ruining the possibility of applying the Markov property in any proof without significant preliminary work.

It seemed worth writing briefly about some of these results in a slightly simpler setting. The goal is to assemble many of the ingredients required to prove a local limit for Gaussian random walk conditioned to stay positive, in a sense which will be clarified towards the end. This is not the best way to go about getting scaling limits (as discussed briefly here, and for which see references [Ig74] and [Bo76]), and it’s probably not the best way to get local limits in the simplest setting, but it’s the method we are currently working to generalise, and follows the outline of [B-JD05], but in much less technical detail.

Probabilities via the reflection principle

We start with Brownian motion. The reflection principle, as described briefly in this post from the depths of history, is a classical technique for studying the maximum of Brownian motion. Roughly speaking, we exploit the fact that (W_t,t\ge 0)\stackrel{d}=(-W_t,t\ge 0), but we then apply this at the hitting time of a particular positive value, using the Strong Markov Property.

Let S_t=\max_{0\le s\le t}W_s be the running maximum of the Brownian motion W_t, and \tau_b the hitting time of b. Then

\mathbb{P}(S_t\ge b, B_t\le a)=\mathbb{P}(\tau_b<t\text{ and }B_t-B_{\tau_b}\le a-b),

which, by SMP at \tau_b and the reflection invariance of a standard BM, is equal to

\mathbb{P}(\tau_b<t\text{ and }B_t-B_{\tau_b}\ge b-a) = \mathbb{P}(B_t\ge 2b-a).

This obviously assumed b\ge a, but if we set b=a, we find

\mathbb{P}(S_t\ge b)=\mathbb{P}(B_t>b)+\mathbb{P}(S_t\ge b,B_t\le b)=2\mathbb{P}(B_t\ge b).

Or, in other words, S_t\stackrel{d}=|B_t|.

While we can’t derive such nice equalities in distribution, the reflection principle is robust enough to treat more complicated settings, such as Brownian bridge.

We might want to ask about the maximum of a standard Brownian bridge, but we can be more general, and ask about the maximum of a Brownian bridge with drift (let’s say general bridge here). It’s important to remember that a general Brownian bridge has the same distribution as a linear transformation of a standard Brownian bridge. Everything is Gaussian, after all. So asking whether the maximum of a general Brownian bridge is less than a particular value is equivalent to asking whether a standard Brownian bridge lies below a fixed line. Wherever possible, we make such a transformation at the start and perform the simplest version of the required calculation.

So, suppose we have a bridge B from (0,0) to (t,a), and we want to study \max_{s\in[0,t]} B_s. Fix some b>a, and work with a standard Brownian motion W_s. By a similar argument to before,

\mathbb{P}(\tau_b\le t, W_t\in[a,a+\mathrm{d}x]) = \mathbb{P}(W_t\in [2b-a-\mathrm{d}x,2b-a]) = \frac{\mathrm{d}x}{\sqrt{2\pi t}}e^{-(2b-a)^2/2t},

and

\mathbb{P}(W_t\in[a,a+\mathrm{d}x])=\frac{\mathrm{d}x}{\sqrt{2\pi t}}e^{-a^2/2t}.

So

\mathbb{P}(\max_{s\in[0,t]}B_t\ge b) = \exp\left(\frac{-(2b-a)^2 + a^2}{2t}\right).

Random walk conditioned to stay positive

Our main concern is conditioning to stay above zero. Let \mathbb{P}_{0,x}^{t,y} be some complete if cumbersome notation for a Brownian bridge B from (0,x) to (t,y). Then another simple transformation of the previous result gives

\mathbb{P}_{0,x}^{t,y}(B_s\ge 0,\,s\in[0,t])=1-\exp\left( \frac{-(x+y)^2 + (x-y)^2}{2t} \right)= 1-\exp\left(-\frac{2xy}{t}\right).

Then, if xy\ll t, we can approximate this by \frac{2xy}{t}. (*)

Extend the notation so \mathbb{P}_{0,x} describes Brownian motion started from (0,x). Then integrating over y, gives

\mathbb{P}_{0,x}(B_s\ge 0,\, s\in[0,t] ) = \frac{x}{t}\mathbb{E}[B_t\vee 0] = \sqrt{\frac{2}{\pi}} \frac{x}{\sqrt{t}}.

(It might appear that we have integrated the approximation (*) over parts of the range where it is not a valid approximation, but the density of B_t=\Theta(t) vanishes exponentially fast, and so actually it’s not a problem.)

We now want to extend this to random walks. Some remarks:

  • We used the Gaussian property of Brownian motion fairly heavily throughout this derivation. In general random walks are not Gaussian, though we can make life easier by focusing on this case.
  • We also used the continuity property of Brownian motion when we applied the reflection principle. For a general random walk, it’s hopeless to consider the hitting times of individual values. We have to consider instead the hitting times of regions \tau_{(-\infty,b]}, and so on. One can still apply SMP and a reflection principle, but this gives bounds rather than equalities. (The exception is simple random walk, for which other more combinatorial methods may be available anyway.)
  • On the flip side, if we are interested in Brownian motion/bridge staying positive, we can’t start it from zero, as then the probability of this event is zero, by Blumenthal’s 0-1 law. By contrast, we can certainly ask about random walk staying positive when started from zero without taking a limit.

A useful technique will be the viewpoint of random walk as the values taken by Brownian motion at a sequence of stopping times. This Skorohod embedding is slightly less obvious when considering a general random walk bridge inside a general Brownian bridge, but is achievable. We want to study quantities like

\mathbb{P}(S_k\ge 0,\, k=1,\ldots,n \big| S_0=x,S_n=y),

where for simplicity let’s just take (S_k,k\ge 0) to be a random walk with standard Gaussian increments. It’s possible we might want to take a scaling limit in x and y as functions of n. But first if we take x,y fixed, and embed the random walk bridge with these endpoints into the corresponding Brownian bridge with t\approx n, we are then faced with the question:

What’s the probability that the Brownian bridge goes below zero, but the embedded RW with n steps does not?

If the Brownian bridge conditioned to go below zero spends time \Theta_p(n) below zero, then for large n it’s asymptotically very unlikely that the n places at which we embed the random walk avoids this set of intervals.

Several technical estimates are required to make this analysis rigorous. The conclusion is that there exists a function f(x) for which f(x)=x(1+o(1)) as x\rightarrow\infty, such that

q_n(x,y):=\mathbb{P}(S_k\ge 0,\, k=0,1,\ldots,n \,\big|\, S_0=x,S_n=y) \sim \frac{2f(x)f(y)}{n},

\text{and}\quad q_n(x):=\mathbb{P}(S_k\ge 0,\,k=0,1,\ldots,n\,\big|\, S_0=x)\sim \sqrt{\frac{2}{\pi}}\frac{f(x)}{\sqrt{n}}.

As earlier, the second is obtained from the first by integrating over suitable y. This function f has to account for the extra complications when either end-point is near zero, for which the event where the Brownian motion goes negative without the random walk going negative requires additional care.

Limits for the conditioned random walk

In the previous post on this topic, we addressed scaling limits in space and time for conditioned random walks. But we don’t have to look at the classical Donsker scaling to see the effects of conditioning to stay positive. In our setting, we are interested in studying the distribution of S_m conditional on the event (S_1\ge 0,S_2\ge 0,\ldots, S_n\ge 0), with limits taken in the order n\rightarrow\infty and then m\rightarrow\infty.

(At a more general level, it is meaningful to describe the random walk conditioned on staying positive forever. Although this would a priori require conditioning on an event of probability zero, it can be handled formally as an example of an h-transform.)

As explained in that previous post, the scaling invariance of the Bessel process W^+ (which it’s not unreasonable to think of as ‘Brownian motion conditioned to stay non-negative’) suggests that this limit should exist, and be given by the entrance law of W^+. But this is hard to extract from a scaling limit.

However, we can use the previous estimate to start a direct calculation.

\mathbb{P}(S_m\in \mathrm{d}y \,\big|\, S_k\ge 0,\, k=1,\ldots,n) = \frac{q_m(0,y) q_{n-m}(y) \mathbb{P}(S_m\in\mathrm{d}y)}{q_n(0)}.

Here, we used the Markov property at time m to split the event that S_m=y and the walk stays positive into two time-intervals. We will later take m large, so we may approximate as

\frac{2f(0)f(y)/m \times \sqrt{\frac{2}{\pi}}f(y)/\sqrt{n-m}\times \mathbb{P}(S_m\in\mathrm{d}y) } { \sqrt{\frac{2}{\pi}}f(0)/\sqrt{n}}\stackrel{n\rightarrow\infty}=\frac{2f(y)^2}{m}\mathbb{P}(S_m\in\mathrm{d}y).

This final probability emphasises that as m\rightarrow\infty we only really have to consider y=\Theta(\sqrt{m}), so set y=z\sqrt{m}, and we obtain

\lim_{n\rightarrow\infty}\mathbb{P}(\frac{S_m}{\sqrt{m}}\in\mathrm{d}z\,\big|\, S_k\ge 0,\,k=1,\ldots,n)

\sim \sqrt{m}\cdot\frac{2z^2m}{m}\cdot \frac{1}{\sqrt{2\pi}}\frac{1}{\sqrt{m}}e^{-z^2/2} = \sqrt{\frac{2}{\pi}}z^2 e^{-z^2/2}.

This is precisely the entrance law of the 3-dimensional Bessel process, usually denoted R. This process is invariant under time-rescaling in the same fashion as Brownian motion. Indeed, one representation of R is as the radial part of a three-dimensional Brownian motion, given by independent BMs in each coordinate. (See [Pi75] for explanation of the relation to ‘BM conditioned to stay non-negative’.) We could complete the analogy by showing that q_n(x,y) converges to the transition density of R as well. (Cf the prelude to Theorem 2.2 of [B-JD05].)

Final remarks

The order of taking limits is truly crucial. We can also obtain a distributional scaling limit at time n under conditioning to stay non-negative up to time n. But then this is the size-biased normal distribution \sim ze^{-z^2/2} (the Rayleigh distribution), rather than the square-size-biased normal distribution we say in this setting. And we can exactly see why. Relative to the normal distribution which applies in the absence of conditioning, we require size-biasing to account for the walk staying positive up to time m, and then also size-biasing to account for the walk staying positive for the rest of time (or up to n in the n\rightarrow\infty limit if you prefer).

The asymptotics for q_n(x,y) were the crucial step, for which only heuristics are present in this post. It remains the case that estimates of this kind form the crucial step in other more exotic conditioning scenarios. This is immediately visible (even if the random walk notation is rather exotic) in, for example, Proposition 2.2 of [CHL17], of which we currently require a further level of generalisation.

References

[Bo76] – Bolthausen – On a functional central limit theorem for random walks conditioned to stay positive

[B-JD05] – Bryn-Jones, Doney – A functional limit theorem for random walk conditioned to stay non-negative

[CHL17] – Cortines, Hartung, Louidor – The structure of extreme level sets in branching Brownian motion

[Ig74] – Iglehart – Functional central limit theorems for random walks conditioned to stay positive

[Pi75] – Pitman – One-dimensional Brownian motion and the three-dimensional Bessel process

BMO2 2018

The second round of the British Mathematical Olympiad was taken yesterday by the 100 or so top scoring eligible participants from the first round, as well as some open entries. Qualifying for BMO2 is worth celebrating in its own right. The goal of the setters is to find the sweet spot of difficult but stimulating for the eligible participants, which ultimately means it’s likely to be the most challenging exam many of the candidates sit while in high school, at least in mathematics.

I know that lots of students view BMO2 as something actively worth preparing for. As with everything, this is a good attitude in moderation. Part of the goal in writing about the questions at such length (and in particular not just presenting direct solutions) is because I think at this level it’s particularly easy to devote more time than needed to preparation, and use it poorly.

All these questions could be solved by able children. In fact, each could be solved by able children in less than an hour. You definitely count as an able child if you qualified or if your teacher allowed you to make an open entry! Others count too naturally. But most candidates won’t in fact solve all the questions, and many won’t solve any. And I think candidates often come up with the wrong reasons why they didn’t solve problems. “I didn’t know the right theorems” is very very rarely the reason. Olympiad problems have standard themes and recurring tropes, but the task is not to look at the problem and decide that it is an example of Olympiad technique #371. The task is actually to have as many ideas as possible, and eliminate the ones that don’t work as quickly as possible.

The best way to realise that an idea works is to solve the problem immediately. For the majority of occasions when we’re not lucky enough for that to happen, the second-best way to realise that an idea works is to see that it makes the problem look a bit more like something familiar. Conversely, the best way to realise that an idea doesn’t work is to observe that if it worked it would solve a stronger but false problem too. (Eg Fermat’s Last Theorem *does* have solutions over the reals…) The second-best way to realise that an idea doesn’t work is to have the confidence that you’ve tried it enough and you’ve only made the problem harder, or less familiar.

Both of these second-best ideas do require a bit of experience, but I will try to explain why none of the ideas I needed for various solutions this year required any knowledge beyond the school syllabus, some similarities to recent BMOs, and a small bit of creativity.

As usual, the caveat that these are not really solutions, and certainly not official solutions, but they are close enough to spoil the problems for anyone who hasn’t tried them by themselves already. Of course, the copyright for the problems is held by BMOS, and reproduced here with permission.

Question One

I wrote this question. Perhaps as a focal point of the renaissance of my interest in geometry, or at least my interest in teaching geometry, I have quite a lot to say about the problem, its solutions, its origin story, the use of directed angles, the non-use of coordinate methods and so on. In an ideal world I would write a book about this sort of thing, but for now, a long and separate post is the answer.

This will be available once I’ve successfully de-flooded my apartment.

Question Two

I also wrote this problem, though I feel it’s only fair to show the version I submitted to the BMO committee. All the credit for the magical statement that appears above lies with them. There is a less magical origin story as well, but hopefully with some interesting combinatorial probability, which is postponed until the end of this post.One quick observation is that in my version Joe / Hatter gets to keep going forever. As we shall see, all the business happens in the first N steps, but a priori one doesn’t know that, and in my version it forces you to strategise slightly differently for Neel / Alice. In the competition version, we know Alice is done as soon as she visits a place for a second time, but not in the original. So in the original we only have to consider ‘avoid one place’ rather than the multiple possibilities now of ‘avoid one place’ or ‘visit a place again’.

But I think the best idea is to get Alice to avoid one particular place c\not\equiv 0 whenever possible. At all times she has two possible options for where to go next, lets say b_k+a_k, b_k-a_k in the language of the original statement. We lose nothing by assuming -N/2 < a_k\le N/2, and certainly it would be ridiculous for Joe / Hatter ever to choose a_k=0. The only time Alice’s strategy doesn’t work is when both of these are congruent to c, which implies N\,|\, 2a_k, and thus we must have N= 2a_k. In other words, Alice’s strategy will always work if N is odd.

I think it’s really worth noticing that the previous argument is weak. We certainly did not show that N must be odd for Alice to win. We showed that Alice can avoid a congruence class modulo an odd integer. We didn’t really need that odd integer to be N for this to work. In particular, if N has an odd factor p (say a prime), then the same argument works to show that we can avoid visiting any site with label congruent to 1 modulo p.

It’s actually very slightly more complicated. In the original argument, we didn’t need to use any property of b_k. But obviously here, if b_k\equiv 1 modulo p and p\,|\,a_k, then certainly b_{k+1}\equiv 1 modulo p. So we have to prove instead that Alice can ensure she never ‘visits 1 modulo p for the first time’. Which is fine, by the same argument.

So, we’ve shown that Neel / Alice wins if N is odd, or has an odd factor. The only values that remain are powers of 2. I should confess that I was genuinely a little surprised that Joe / Hatter wins in the power of 2 case. You can find a construction fairly easily for N=2 and N=4, but I suspected that might be a facet of small numbers. Why? Because it still felt we could avoid a particular site. In order for Alice’s strategy to fail, we have to end up exactly opposite the particular site at exactly the time when the next a_k=N/2, and so maybe we could try to avoid that second site as well, and so on backwards?

But that turned out to be a good example of something that got very complicated quite quickly with little insight. And, as discussed at the beginning, that’s often a sign in a competition problem that your idea isn’t so good. (Obviously, when composing a problem, that’s no guarantee at all. Sometimes things are true but no good ideas work.) So we want other ideas. Note that for N=4, the sequence (2,1,2) works for Joe / Hatter, because that forces Alice / Neel to visit either (0,2,1,3) or (0,2,3,1). In particular, this strategy gave Alice no control on the first step nor the last step, and the consequence is that we force her to visit the evens first, then transfer to an odd, and then force her to visit the other odd.

We might play around with N=8, or we might proceed directly to a general extension. If we have a Joe / Hatter strategy for N, then by doubling all the a_ks, we have a strategy for 2N which visits all the even sites in the first N steps. But then we can move to an odd site eg by taking a_N=1. Just as in the N=4 case, it doesn’t matter which odd site we start from, since if we again double all the a_ks, we will visit all the other odd sites. This gives us an inductive construction of a strategy for powers of two. To check it’s understood, the sequence for N=8 is (4,2,4,1,4,2,4).

Although we don’t use it, note that this strategy takes Alice on a tour of sites described by decreasing order of largest power of two dividing the label of the site.

Continue reading