# Critical Components in Erdos-Renyi

In various previous posts, I’ve talked about the phase transition in the Erdos-Renyi random graph process. Recall the definition of the process. Here we will use the Gilbert model G(n,p), where we have n vertices, and between any pair of vertices we add an edge, independently of other pairs with probability p. We are interested in the sparse scaling, where the typical vertex has degree O(1) in n, and so p=c/n for constant c>0, and we assume throughout that n is large. We could alternatively have considered the alternative Erdos-Renyi model where we choose uniformly at random from the set of graphs with n vertices and some fixed number of edges. Almost all the results present work equally well in this setting.

As proved by Erdos and Renyi, the typical component structure of such a graph changes noticeably around the threshold c=1. Below this, in the subcritical regime, all the components are small, meaning of size at most order O(log n). Above this, in the supercritical regime, there is a single giant component on some non-zero proportion of the vertices. The rest of the graph looks subcritical. The case c=1 exhibits a phase transition between these qualitatively different behaviours. They proved that here, the largest component is with high probability O(n^2/3). It seems that they thought this result held whenever c=1-o(1), but it turns out that this is not the case. In this post, I will discuss some aspects of behaviour around criticality, and the tools needed to treat them.

The first question to address is this: how many components of size n^{2/3} are there? It might be plausible that there is a single such component, like for the subsequent giant component. It might also be plausible that there are n^1/3 such components, so O(n) vertices are on such critical components. As then it is clear how we transition out of criticality into supercriticality – all the vertices on critical components coalesce to form the new giant component.

In fact neither of these are correct. The answer is that for all integers k>0, with high probability the k-th largest component is on a size scale of n^2/3. This is potentially a confusing statement. It looks like there are infinitely many such components, but of course for any particular value of n, this cannot be the case. We should think of there being w(1) components, but o(n^b) for any b>0.

The easiest way to see this is by a duality argument, as we have discussed previously for the supercritical phase. If we remove a component of size O(n^2/3), then what remains is a random graph with n-O(n^2/3) vertices, and edge probability the same as originally. It might make sense to rewrite this probability 1/n as

$\frac{1}{n-O(n^{2/3})}\cdot \frac{n-O(n^{2/3})}{n}=\frac{1-O(n^{-1/3})}{n-O(n^{2/3})}.$

The approximation in the final numerator is basically the same as

$1-o\left(n-O(n^{2/3})\right).$

Although we have no concrete reasoning, it seems at least plausible that this should look similar in structure to G(n,1/n). In particular, there should be another component of size

$O\left([n-O(n^{2/3})]^{2/3}\right)=O(n^{2/3})$.

In fact, the formal proof of this proceeds by an identical argument, only using the exploration process. Because I’ve described this several times before, I’ll be brief. We track how far we have gone through each component in a depth-first walk. In both the supercritical and subcritical cases, when we scale correctly we get a random path which is basically deterministic in the limit (in n). For exactly the same reasons as visible CLT fluctuations for partial sums of RVs with expectation zero, we start seeing interesting effects at criticality.

The important question is the order of rescaling to choose. At each stage of the exploration process, the number of vertices added to the stack is binomial. We want to distinguish between components of size $O(n^{2/3})$ so we should look at the exploration process at time $sn^{2/3}$. The drift of the exploration process is given by the expectation of a binomial random variable minus one (since we remove the current vertex from the stack as we finish exploring it). This is given by

$\mathbb{E}=\left[n-sn^{2/3}\right]\cdot \frac{1}{n}-1=-sn^{-1/3}.$

Note that this is the drift in one time-step. The drift in $n^{2/3}$ time-steps will accordingly by $sn^{1/3}$. So, if we rescale time by $n^{2/3}$ and space by $n^{1/3}$, we should get a nice stochastic process. Specifically, if Z is the exploration process, then we obtain:

$\frac{1}{n^{1/3}}Z^{(n)}_{sn^{2/3}} \rightarrow_d W_s,$

where W is a Brownian motion with inhomogeneous drift -s at time s. The net effect of such a drift at a fixed positive time is given by integrating up to that time, and hence we might say the process has quadratic drift, or is parabolic.

We should remark that our binomial expectation is not entirely correct. We have discounted those $sn^{2/3}$ vertices that have already been explored, but we have not accounted for the vertices currently in the stack. We should also be avoiding considering these. However, we now have a heuristic for the approximate number of these. The number of vertices in the stack should be $O(n^{1/3})$ at all times, and so in particular will always be an order of magnitude smaller than the number of vertices already considered. Therefore, they won’t affect this drift term, though this must be accounted for in any formal proof of convergence. On the subject of which, the mode of convergence is, unsurprisingly, weak convergence uniformly on compact sets. That is, for any fixed S, the convergence holds weakly on the random functions up to time $sn^{2/3}$.

Note that this process will tend to minus infinity almost surely. Component sizes are given by excursions above the running minimum. The process given by the height of the original process above the running minimum is called reflected. Essentially, we construct the reflected process by having the same generator when the current value is positive, and forcing the process up when it is at zero. There are various ways to construct this more formally, including as the scaling limit of some simple random walks conditioned never to stay non-negative.

The cute part of the result is that it holds equally well in a so-called critical window either side of the critical probability 1/n. When the probability is $\frac{1+tn^{-1/3}}{n}$, for any $t\in \mathbb{R}$, the same argument holds. Now the drift at time s is t-s, though everything else still holds.

This result was established by Aldous in [1], and gives a mechanism for calculating distributions of component sizes and so on through this critical window.

In particular, we are now in a position to answer the original question regarding how many such components there were. The key idea is that because whenever we exhaust a component in the exploration process, we choose a new vertex uniformly at random, we are effectively choosing a component according to the size-biased distribution. Roughly speaking, the largest components will show up near the beginning. Note that a critical $O(n^{2/3})$ component will not necessarily be exactly the first component in the exploration process, but the components that are explored before this will take up sufficiently few vertices that they won’t show up in the scaling of the limit.

In any case, the reflected Brownian motion ‘goes on forever’, and the drift is eventually very negative, so there cannot be infinitely wide excursions, hence there are infinitely many such critical components.

If we care about the number of cycles, we can treat this also via the exploration process. Note that in any depth-first search we are necessarily only interested in a spanning tree of the host graph. Anyway, when we are exploring a vertex, there could be extra edges to other vertices in the stack, but not to vertices we’ve already finished exploring (otherwise the edge would have been exposed then). So the expected number of excess edges into a vertex is proportional to the height of the exploration process at that vertex. So the overall expected number of excess edges, conditional on the exploration process is the area under the curve. This carries over perfectly well into the stochastic process limit. It is then a calculation to verify that the area under the curve is almost surely infinite, and thus that we expect there to be infinitely many cycles in a critical random graph.

REFERENCES

[1] Aldous D. – Brownian excursions, critical random graphs and the multiplicative coalescent