Lecture 6 – Local limits

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

By this point of the course, we’ve studied several aspects of the Erdos-Renyi random graph, especially in the sparse setting $G(n,\frac{\lambda}{n})$. We’ve also taken a lengthy detour to revise Galton-Watson trees, with a particular focus on the case of Poisson offspring distribution.

This is deliberate. Note that a given vertex v of $G(n,\frac{\lambda}{n})$ has some number of neighbours distributed as $\mathrm{Bin}(n-1,\frac{\lambda}{n})\stackrel{d}\approx\mathrm{Po}(\lambda)$, and the same approximation remains valid as we explore the graph (for example in a breadth-first fashion) either until we have seen a large number of vertices, or unless some ultra-pathological event happens, such as a vertex having degree n/3.

In any case, we are motivated by the notion that the local structure of $G(n,\frac{\lambda}{n})$ is well-approximated by the Galton-Watson tree with $\mathrm{Po}(\lambda)$ offspring, and in this lecture and the next we try to make this notion precise, and discuss some consequences when we can show that this form of convergence occurs.

Deterministic graphs

Throughout, we will be interested in rooted graphs, since by definition we have to choose a root vertex whose local neighbourhood is to be studied. Usually, we will study a sequence of rooted graphs $(G_n,\rho_n)$, where the vertex set of $G_n$ is [n], or certainly increasing in n (as in the first example).

For some rooted graph $(G,\rho)$, we say such a sequence $(G_n,\rho_n)$ converges to $(G,\rho)$ locally if for all radii $r\ge 1$, we have $B_r^{G_n}(\rho_n)\simeq B_r^G(\rho)$. In words, the neighbourhood around $\rho_n$ in $G_n$ is the same up to radius r as the neighbourhood around $\rho$ in $G$, so long as n is large enough (for given r).

This is best illustrated by an example, such as $T_n$, the binary tree to depth n. If we take $\rho_n$ to be the usual root, then the trees are nested, and converge locally to the infinite binary tree $T_\infty$. Slightly less obviously, if we take $\rho_n$ to be one of the leaves, then the trees are still nested (up to labelling – ie in the sense of isomorphisms of rooted trees), and converge locally to the canopy tree, defined by a copy of $\mathbb{Z}_{\ge 0}$ with nearest-neighbour edges, and where each vertex $n\ge 1$ is connected to the root of a disjoint copy of $T_{n-1}$, as shown below: Things get more interesting when the root is chosen randomly, for example, uniformly at random, as this encodes more global information about the graphs $G_n$. In the case where the $G_n$ are vertex-transitive, then if we only care about rooted graphs up to isomorphism, then it doesn’t matter how we choose the root.

Otherwise, we say that $G_n$ converges in the local weak sense to $(G,\rho)$ if, for all $r\ge 1$ and for all rooted graphs $(H,\rho_H)$, $\mathbb{P}\left( B^{G_n}_r(\rho_n)\simeq (H,\rho_H) \right) \longrightarrow \mathbb{P}\left( B_r^G(\rho)\simeq H\right),$

as $n\rightarrow\infty$.

Alternatively, one can phrase this as a result about convergence of rooted-graph-valued distributions.

A simple non-transitive example is $G_n\simeq P_n$, the path of length n. Then, the r-neighbourhood of a vertex is isomorphic to $P_{2r}$unless that vertex is within graph-distance (r-1) of one of the leaves of $G_n$. As $n\rightarrow\infty$, the proportion of such vertices vanishes, and so, $\mathbb{P}\left( B^{P_n}_r(\rho_n)\simeq P_{2r}\right)\rightarrow 1$, from which we conclude the unsurprising result that $P_{n}$ converges in the local weak sense to $\mathbb{Z}$. (Which is vertex-transitive, so it doesn’t matter where we select the root.)

The binary trees offer a slightly richer perspective. Let $\mathcal{L}_n$ be the set of leaves of $T_n$, and we claim that when $\rho_n$ is chosen uniformly from the vertices of $T_n$, then $d_{T_n}(\rho_n,\mathcal{L}_n)$ converges in distribution. Indeed, $\mathbb{P}\left( d_{T_n}(\rho_n,\mathcal{L}_n)=k\right) = \frac{2^{n-k}}{2^{n+1}-1}$, whenever $n\ge k$, and so the given distance converges in distribution to the Geometric distribution with parameter 1/2 supported on {0,1,2,…}.

This induces a random local weak limit, namely the canopy tree, rooted at one of the vertices we denoted by $\mathbb{Z}_{\ge 0}$, with the choice of this vertex given by Geometric(1/2).

Notation

Such limits, and the random graph versions to follow, are sometimes referred to as Benjamini-Schramm convergence, after the authors of the paper [LINK] in which this was first heavily introduced. Various subsequent authors use local weak convergence and Benjamini-Schramm convergence to denote a general distribution on roots versus a uniform measure on roots (or some similar distinction). For the purpose of this course, I followed the notation in Volume II of van der Hofstad’s notes exactly, and did not use the latter phrase.

A note on finiteness and tightness

Note at this point that we have not said anything about the class of random rooted graphs $(G,\rho)$ which may appear as a limit. However, it can be seen that they must be locally-finite if there’s a chance that they appear as a weak limit of a sequence of finite graphs.

In this course, we won’t think of metrizing the space of rooted graphs, but it’s worth mentioning that a tightness criterion for local weak convergence is that the degree of the root is uniformly integrable, meaning that very large degrees do not contribute significantly to the expected degree in the limit. This induces probabilistic bounds on the largest degree to be seen within the radius r neighbourhood (bearing in mind that size-biasing appears everywhere when studying degrees), and tightness follows.

Random graphs – quenched versus annealed settings

In the previous example, all the randomness came from the choice of root. We want to return to a setting where the graph itself is random as well, though will restrict attention to the case where the set of vertices of $G_n$ is always [n]. In this course, we also restrict attention to the two cases where either the root is chosen uniformly from the vertex set (as relevant to G(n,p)) or when the graph is a branching process tree with the usual (deterministic root).

We say that $(G_n,\rho_n)$ converges in distribution locally weakly to $(G,\rho)$ if, as before, $\mathbb{P}\left( B^{G_n}_r(\rho_n)\simeq (H,\rho_H) \right) \longrightarrow \mathbb{P}\left( B_r^G(\rho)\simeq H\right),$

as $n\rightarrow\infty$. Here, we are taking the probability over both sources of randomness on the LHS. This is more transparent if we rewrite as $\mathbb{P}\left(B^{G_n}_r(\rho_n) \simeq H\right) = \mathbb{E}_{G_n}\left[ \frac{1}{n}\sum_{v\in[n]} \mathbf{1}\left\{B^{G_n}_r(v) \simeq H\right\}\right]\longrightarrow \mathbb{P}\left(B_r^G(\rho)\simeq H\right).$

In particular, if the graph $G_n$ is vertex-transitive then $\mathbb{P}(B_r^{G_n}(\rho_n)\simeq H) = \mathbb{P}(B_r^{G_n}(1)\simeq H)$.

We also say that $(G_n,\rho_n)$ converges in probability locally weakly to $(G,\rho)$ if $\mathbb{P}\left( B^{G_n}_r(\rho_n)\simeq H \,\big|\, G_n\right)\stackrel{\mathbb{P}}\longrightarrow \mathbb{P}\left(B_r^G(\rho)\simeq H\right),$

as $n\rightarrow\infty$. Note now that the LHS is a random variable, depending on $G_n$ (but not on $\rho_n$). Again, the indicator function form may be more transparent: $\frac{1}{n}\sum_{v\in[n]} \mathbf{1}\{B^{G_n}_r(v)\simeq H\} \stackrel{\mathbb{P}}\longrightarrow \mathbb{P}\left( B^G_r(\rho)\simeq H\right).$

The choice of names for these convergence settings might be counter-intuitive on a first reading. In both settings, we are studying the number of vertices for which $(H,\rho_H)$ exactly describes its local neighbourhood.

• In the first setting, we require this quantity to converge in expectation after rescaling.
• In the second setting, we require this quantity to concentrate in distribution to a constant (which is the same as converging in probability to a constant).
• It’s probably not helpful actually to think of these as convergence in distribution vs convergence in probability of any quantity.
• This distinction is referred to in general as quenched (conditional on the graph) versus annealed (averaged over the graph).

It’s clear that this represents a division into first- and second-moment arguments, such as we’ve seen earlier in the course. Unsurprisingly, this induces a result about asymptotic independence, when we have local weak convergence in probability. In the lecture we phrased this as $\mathbb{P}\left(B^{G_n}_r(\rho_n^{(1)})\simeq H^{(1)},\, B^{G_n}_s(\rho_n^{(2)}) \simeq H^{(2)} \right)$ $\longrightarrow \mathbb{P}\left(B^G_r(\rho)\simeq H^{(1)}\right) \mathbb{P}\left( B^G_s(\rho)\simeq H^{(2)}\right).$

We prove this by conditioning on $G_n$, and then the conditional probability of the event on the LHS becomes $\left(\frac{1}{n} \sum_{v\in[n]} \mathbf{1}\{B_r^{G_n}(v)\simeq H^{(1)}\} \right) \left(\frac{1}{n}\sum_{w\in[n]} \mathbf{1}\{B_s^{G_n}(w)\simeq H^{(2)}\right).$

Both of these bracketed terms are random, but they converge in probability to constants by the convergence assumption. And it is true and straightforward to show that if $X_n\stackrel{\mathbb{P}}\rightarrow x$ and $Y_n\stackrel{\mathbb{P}}\rightarrow y$, then $X_nY_n\stackrel{\mathbb{P}}\rightarrow xy$, which shows that the given result in fact holds in the quenched setting, so certainly holds in the annealed setting (ie by removing the conditioning on $G_n$) too. Note throughout, that the fact the quantities of interest are bounded (by 1, since they are probabilities…) makes it easy to move from convergence in distribution to convergence in expectation.

Notes

• In the quenched setting, if we start breaking the setting that the pre-limit graphs are growing, we get some pathologies, such as a random graph not converging in probability locally weakly to itself. This phenomenon is a feature of the quenched setting, rather than the space of rooted graphs.
• Proving that $G(n,\frac{\lambda}{n})$ converges locally weakly in probability to the Galton-Watson tree with offspring distribution $\mathrm{Po}(\lambda)$ is not going to be part of the course, since the details are a more technical version of previous second-moment calculations. What is most relevant is that a lot useful results can be read off as a consequence of this property. At a meta level, it is preferable to do one combinatorially technical second-moment argument and derive lots of results from this, than to do slightly technical arguments from scratch every time.
• In the next lecture, we’ll return to the question of component sizes in $G(n,\frac{\lambda}{n})$, starting to consider the supercritical case $\lambda>1$. We will see that the weak local limits provide upper bounds on the size of the giant component (in generality), but will need more involved arguments to provide matching lower bounds.