# Lecture 7 – The giant component

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

As we edge into the second half of the course, we are now in a position to return to the question of the phase transition between the subcritical regime $\lambda<1$ and the supercritical regime $\lambda>1$ concerning the size of the largest component $L_1(G(n,\lambda/n))$.

In Lecture 3, we used the exploration process to give upper bounds on the size of this largest component in the subcritical regime. In particular, we showed that $\frac{1}{n}\big| L_1(G(n,\lambda/n)) \big| \stackrel{\mathbb{P}}\rightarrow 0.$

If we used slightly stronger random walk concentration estimates (Chernoff bounds rather than 2nd-moment bounds from Chebyshev’s inequality), we could in fact have shown that with high probability the size of this largest component was at most some logarithmic function of n.

In this lecture, we turn to the supercritical regime. In the previous lecture, we defined various forms of weak local limit, and asserted (without attempting the notationally-involved combinatorial calculation) that the random graph $G(n,\lambda/n)$ converges locally weakly in probability to the Galton-Watson tree with $\text{Poisson}(\lambda)$ offspring distribution, as we’ve used informally earlier in the course.

Of course, when $\lambda>1$, this branching process has strictly positive survival probability $\zeta_\lambda>0$. At a heuristic level, we imagine that all vertices whose local neighbourhood is ‘infinite’ are in fact part of the same giant component, which should occupy $(\zeta_\lambda+o_{\mathbb{P}}(1))n$ vertices. In its most basic form, the result is $\frac{1}{n}\big|L_1(G(n,\lambda/n))\big|\;\stackrel{\mathbb{P}}\longrightarrow\; \zeta_\lambda,\quad \frac{1}{n}\big|L_2(G(n,\lambda/n))\big| \;\stackrel{\mathbb{P}}\longrightarrow\; 0,$ (*)

where the second part is a uniqueness result for the giant component.

The usual heuristic for proving this result is that all ‘large’ components must in fact be joined. For example, if there are two giant components, with sizes $\approx \alpha n,\approx \beta n$, then each time we add a new edge (such an argument is often called ‘sprinkling‘), the probability that these two components are joined is $\approx 2ab$, and so if we add lots of edges (which happens as we move from edge probability $\lambda-\epsilon\mapsto \lambda$ ) then with high probability these two components get joined.

It is hard to make this argument rigorous, and the normal approach is to show that with high probability there are no components with sizes within a certain intermediate range (say between $\Theta(\log n)$ and $n^\alpha$) and then show that all larger components are the same by a joint exploration process or a technical sprinkling argument. Cf the books of Bollobas and of Janson, Luczak, Rucinski. See also this blog post (and the next page) for a readable online version of this argument.

I can’t find any version of the following argument, which takes the weak local convergence as an assumption, in the literature, but seems appropriate to this course. It is worth noting that, as we shall see, the method is not hugely robust to adjustments in case one is, for example, seeking stronger estimates on the giant component (eg a CLT).

Anyway, we proceed in three steps:

Step 1: First we show, using the local limit, that for any $\epsilon>0$, $\frac{1}{n}\big|L_1(G(n,\lambda/n))\big| \le \zeta_\lambda+\epsilon,$ with high probability as $n\rightarrow\infty$.

Step 2: Using a lower bound on the exploration process, for $\epsilon>0$ small enough $\frac{1}{n}\big|L_1(G(n,\lambda/n))\big| \ge \epsilon,$ with high probability.

Step 3: Motivated by duality, we count isolated vertices to show $\mathbb{P}(\epsilon n\le |L_1| \le (\zeta_\lambda-\epsilon)n) \rightarrow 0.$

Step 1

This step is unsurprising. The local limit gives control on how many vertices are in small components of various sizes, and so gives control on how many vertices are in small components of all finite sizes (taking limits in the right order). This gives a bound on how many vertices can be in the giant component.

(Note: parts of this argument appear in the text and exercises of Section 1.4 in the draft of Volume II of van der Hofstad’s notes, which can be found here.)

We can proceed in greater generality, by considering a sequence of random graphs $G_n$ which converge locally weakly in probability to T, a random tree, with survival probability $\zeta=\mathbb{P}(|T|=\infty)>0$. We will show that:

Proposition: $\mathbb{P}(L_1(G_n)\ge (\zeta+\epsilon)n) \rightarrow 0,$ for each $\epsilon>0$.

As a preliminary, note that for every $k\in\mathbb{N}$, there are finitely many rooted graphs $(H,\rho_H)$ with size k. We can also identify whether a graph has size k by looking at a ball of radius r>k around any vertex. In particular, by summing over all graphs with size k, the weak local limit implies: $\frac{1}{n}\sum_{v\in[n]} \mathbf{1}_{\{|C^{G_n}(\rho_n)|=k\}} = \frac{1}{n} \sum_{|V(H)|=k} \sum_{v\in[n]} \mathbf{1}_{\{B_r^{G_n}(\rho_n)\simeq (H,\rho_H)\}}$ $\stackrel{\mathbb{P}}\longrightarrow \;\sum_{|V(H)|=k} \mathbb{P}(B_r^T(\rho)\simeq (H,\rho_H)) = \mathbb{P}(|T|=k).$

Furthermore, we can then control the tail as $\frac{1}{n}\sum_{v\in[n]} \mathbf{1}_{\{|C^{G_n}(v)|\ge k\}}\;\stackrel{\mathbb{P}}\longrightarrow \mathbb{P}(|T|\ge k).$

(Recall that the LHS of this statement is the proportion of vertices in components of size at least k.)

We will make the trivial but useful observation that in any graph the largest component has size at least k precisely if at least k vertices are in components of size at least k (!). Ie $|L_1(G)|\ge k\quad\iff\quad \sum_{v\in[n]} \mathbf{1}{\{C^G(v)|\ge k\}} \ge k.$

Returning now to the problem at hand, we have $\mathbb{P}(|T|\ge k)\downarrow\zeta$ as $k\rightarrow\infty$, so we may pick k such that $\mathbb{P}(|T|\ge k)<\zeta+\epsilon$.

But then, using our ‘trivial but useful’ observation: $\mathbb{P}(L_1(G_n)\ge (\zeta+\epsilon)n) = \mathbb{P}(\sum_{v\in[n]} \mathbf{1}_{\{|C^{G_n}(v)|\ge (\zeta+\epsilon)n \}} \ge (\zeta+\epsilon)n)$ $\le \mathbb{P}(\frac{1}{n}\sum_{v\in[n]} \mathbf{1}_{\{|C^{G_n}(v)|\ge k\}} \ge \zeta+\epsilon).$ (**)

Note that we have replaced $(\zeta+\epsilon)n$ by k in this final step for a bound. However, the random quantity inside the probability is known to converge in probability to $\mathbb{P}(|T|\ge k)<\zeta+\epsilon$. So in fact this probability (**) vanishes as $n\rightarrow\infty$.

Step 2

Remember the exploration process, where $v=v_1,v_2,\ldots,v_n$ is a labelling of the vertices of $G(n,\lambda/n)$ in breadth-first order. Defining $X_i:= \#\{w\in[n]\,:\, w\in\Gamma(v_i),\,w\not\in \Gamma(v_j),\,j\in[i-1]\},$

the number of children of vertex $v_i$, we set $S_0:=0,\quad S_i:=S_{i-1}+(X_i-1),\; i\ge 1,$

to be (a version of) the exploration process. It will be useful to study $H_0:=0,\quad H_k:=\min\{i\,:\, S_i=-k\},$

the hitting times of (-k), as then $\{v_{H_{k-1}+1},\ldots,v_{H_k}\}$ is the kth component to be explored.

Unlike for a tree, we have multiple components, and essentially the process decreases by one each time we start a new component, which means that the current value no longer describes the number of vertices on the stack. In general, this is given by $S_i - \min_{0\le j\le i}S_j$, and so $X_i\stackrel{d}= \text{Bin}(n-i-(S_{i-1}-\min_{0\le j\le i-1}S_j),\, \frac{\lambda}{n}),$

which we may stochastically bound below by $\ge_{st} \text{Bin}(n-2i-S_{i-1},\,\frac{\lambda}{n}),$

noting that this is extremely crude.

We want to study whether $S_i$ ever exceeds $\epsilon n$, for some $\epsilon>0$ to be determined later.

For reasons that will become clear in the following deduction, it’s convenient to fix $\alpha>0$ small such that $\lambda(1-2\alpha)>1$, and then choose $\epsilon>0$ such that $\alpha\left[\lambda(1-2\alpha-\epsilon)-1\right]>\epsilon.$

(which is possible by continuity since the given relation holds when $\epsilon=0$.)

Now, when $i\le \alpha n$ and $S_{i-1}\le \epsilon n$, we have $X_i\ge_{st} \text{Bin}(n(1-2\alpha-\epsilon),\frac{\lambda}{n}).$

The following argument requires some kind of submartingale approach (involving coupling with a simpler process at the stopping time) to make rigorous, which is beyond the scope of this course’s prerequisites.

However, informally, if we assume that $\max_{i\le \alpha n} S_i\le \epsilon n$, ‘then’ $\frac{1}{n}S_{\alpha n}\ge_{st} \frac{1}{n}\text{Bin}(\alpha n\cdot n(1-2\alpha-\epsilon),\,\frac{\lambda}{n}) - \alpha.$

But this distribution is concentrated on a value which is, by our obscure assumption, $>\epsilon$ (!) contradicting the assumption on the maximum. Thus we conclude that $\mathbb{P}(\max_{i\le \alpha n} S_i\le \epsilon n)\rightarrow 0,$ as $n\rightarrow\infty.$

We conclude that $\max_{i\le \alpha n} S_i\ge\epsilon n$ holds with high probability. But remember that $S_{i+1}\ge S_i -1$ so if $S_i\ge \epsilon n$, then all of $S_i,S_{i+1},\ldots,S_{i+\lfloor \epsilon n\rfloor}$ are non-negative, and so certainly $v_i,v_{i+1},\ldots,v_{i+\lfloor \epsilon n\rfloor}$ are in the same component of the graph, and $L_1(G(n,\lambda/n))\ge \epsilon n$ with high probability.

Step 3

The motivation for this section is duality. Recall (from Lecture 5) that if we condition a supercritical Poisson GW tree on extinction, we obtain the distribution of a dual subcritical Poisson GW tree. This relation moves across to the world of the sparse Erdos-Renyi random graph. If you exclude the giant component, you are left with a subcritical random graph (on a smaller vertex set), and this applies equally well to the local limits. Essentially, if we exclude a component, and take the local limit of what remains, we get the wrong answer unless the component we excluded was a giant component with size $\approx \zeta_\lambda n$, or was small.

As we shall see, this effect is captured sufficiently by counting isolated vertices.

First, we state a Fact: when $1-\zeta_\lambda , then $ye^{-\lambda y}>e^{-\lambda}$. This convexity property is easily checked by comparing derivatives, and will be useful shortly.

Now, we study $I_n$, the number of isolated vertices in $G_n$, under conditioning that $\{1,2,\ldots,k\}$ is a component for various values k. Note that unless k=1, we have $\mathbb{E}[I_n\,\big|\, \{1,2,\ldots,k\}\text{ a cpt}] = (n-k)(1-\frac{\lambda}{n})^{n-k-1},$

for exactly the same reason as when we did this calculation for the original graph several lectures back. We will consider k in the range $\epsilon n \le k\le (\zeta_\lambda - \epsilon) n$.

We can take a limit of this expectation in appropriate uniformly using the Fact above, since the function $ye^{-\lambda y}$ is suitably well-behaved, to obtain $\liminf_{n\rightarrow\infty} \frac{1}{n}\min_{\epsilon n\le k\le (\zeta-\epsilon)n} (n-k)(1-\frac{\lambda}{n})^{n-k-1}$ $\ge \min_{\epsilon \le x \le \zeta-\epsilon} (1-x)e^{-\lambda(1-x)}\ge e^{-\lambda}+\epsilon',$

where $\epsilon'>0$. So $\liminf_{n\rightarrow\infty} \min_{\epsilon n\le k\le (zeta-\epsilon)n} \frac{1}{n}\mathbb{E}\left[ I_n\,\big|\, \{1,\ldots,k\}\text{ a cpt}\right] \ge e^{-\lambda}+\epsilon'.$

But $\frac{I_n}{n}$ is bounded above (by 1, of course), and so lower bounds on the expectation give lower bounds on upper tail, leading to $\liminf_{n\rightarrow\infty}\min_{\epsilon n\le k\le (\zeta-\epsilon)n} \mathbb{P}\left( \frac{I_n}{n}\ge e^{-\lambda}+\frac{\epsilon'}{2}\,\big|\, \{1,\ldots,k\}\text{ a cpt} \right) >0.$

However, we know $\frac{I_n}{n}\stackrel{\mathbb{P}}\rightarrow e^{-\lambda}$ (for example by local convergence…). Therefore, in order to make the unconditional probability vanish, the probability of the conditioning event in question must also vanish, ie $\mathbb{P}\left(\epsilon n\le |C^{G_n}(1)|\le (\zeta_\lambda-\epsilon)n\right)\rightarrow 0.$

Finally, since $\mathbb{P}\left(\epsilon n \le |C^{G_n}(1)|\le (\zeta_\lambda-\epsilon)n\right) \ge \frac{1}{\epsilon}\mathbb{P}(\epsilon n\le |L_1(G)|\le (\zeta_\lambda-\epsilon)n),$

the corresponding result holds for the largest component, not just the observed component.

Uniqueness can be obtained by a slight adjustment of Step 1. Morally, Step 1 is saying that a proportion asymptotically at most $\zeta_\lambda$ of the vertices are in large components, so it is possible (and an exercise in the course) to adjust the argument to show $\frac{1}{n}|L_1|+\frac{1}{n}|L_2| \le \zeta_\lambda+\epsilon,$ with high probability,

from which the uniqueness result follows immediately.

In particular, it’s worth noting that this is an example of a bootstrapping argument, where we show a weak version of our goal result (in Step 2), but then use this to show the full result.

Note also that we can use the duality principle to show logarithmic bounds on the size of the second-largest component in exactly the same way that we showed logarithmic bounds on the size of the largest component in the subcritical regime. The whole point of duality is that these are the same problem!