I’ve spent a bit of time this week reading and doing all the exercises from some excellent notes by van der Hofstad about random graphs. I think they are absolutely excellent and would not be surprised if they become the standard text for an introduction to probabilistic combinatorics. You can find them hosted on the author’s website. I’ve been reading chapters 4 and 5, which approaches the properties of phase transitions in G(n,p) by formalising the analogy between component sizes and population sizes in a binomial branching process. When I met this sort of material for the first time during Part III, the proofs generally relied on careful first and second moment bounds, which is fine in many ways, but I enjoyed vdH’s (perhaps more modern?) approach, as it seems to give a more accurate picture of what is actually going on. In this post, I am going to talk about using the branching process picture to explain why the giant component emerges when it does, and how to get a grip on how large it is at any time after it has emerged.

**Background**

A quick tour through the background, and in particular the notation will be required. At some point I will write a post about this topic in a more digestible format, but for now I want to move on as quickly as possible.

We are looking at the sparse random graph , in the super-critical phase . With high probability (that is, with probability tending to 1 as n grows), we have a so-called giant component, with O(n) vertices.

Because all the edges in the configuration are independent, we can view the component containing a fixed vertex as a branching process. Given vertex v(1), the number of neighbours is distributed like . The number of neighbours of each of these which we haven’t already considered is then , conditional on k, the number of vertices we have already discounted. After any finite number of steps, k=o(n), and so it is fairly reasonable to approximate this just by . Furthermore, as n grows, this distribution converges to , and so it is natural to expect that the probability that the fixed vertex lies in a giant component is equal to the survival probability (that is, the probability that it is infinite) of a branching process with offspring distribution. Note that given a graph, the probability of a fixed vertex lying in a giant component is equal to the fraction of the vertex in the giant component. At this point it is clear why the emergence of the giant component must happen at , because we require for the survival probability to be non-zero. Obviously, all of this needs to be made precise and rigorous, and this is treated in sections 4.3 and 4.4 of the notes.

**Exploration Process**

A common functional of a rooted branching process to consider is the following. This is called in various places an exploration process, a depth-first process or a Lukasiewicz path. We take a depth-first labelling of the tree v(0), v(1), v(2),… , and define c(k) to be the number of children of vertex v(k). We then define the exploration process by:

By far the best way to think of this is to imagine we are making the depth-first walk on the tree. S(k) records how many vertices we have seen (because they are connected by an edge to a vertex we have visited) but have not yet visited. To clarify understanding of the definition, note that when you arrive at a vertex with no children, this should decrease by one, as you can see no new vertices, but have visited an extra one.

This exploration process is useful to consider for a couple of reasons. Firstly, you can reconstruct the branching process directly from it. Secondly, while other functionals (eg the height, or contour process) look like random walks, the exploration process genuinely is a random walk. The distribution of the number of children of the next vertex we arrive at is independent of everything we have previously seen in the tree, and is the same for every vertex. If we were looking at branching processes in a different context, we might observe that this gives some information in a suitably-rescaled limit, as rescaled random walks converge to Brownian motion if the variance of the (offspring) distribution is finite. (This is Donsker’s result, which I should write something about soon…)

The most important property is that the exploration process returns to 0 precisely when we have exhausted all the vertices in a component. At that point, we have seen exactly the vertices which we have explored. There is no reason not to extend the definition to forests, that is a union of trees. The depth-first exploration is the same – but when we have exhausted one component, we move onto another component, chosen according to some labelling property. Then, running minima of the exploration process (ie times when it is smaller than it has been before) correspond to jumping between components, and thus excursions above the minimum to components themselves. The running minimum will be non-positive, with absolute value equal to the number of components already exhausted.

Although the exploration process was defined with reference to and in the language of trees, the result of a branching process, this is not necessary. With some vertex denoted as the root, we can construct a depth-first labelling of a general graph, and the exploration process follows exactly as before. Note that we end up ignoring all edges except a set that forms a forest. This is what we will apply to G(n,p).

**Exploring G(n,p)**

When we jump between components in the exploration process on a supercritical (that is ) random graph, we move to a component chosen randomly with size-biased distribution. If there is a giant component, as we know there is in the supercritical case, then this will dominate the size-biased distribution. Precisely, if the giant component takes up a fraction H of the vertices, then the number of components to be explored before we get to the giant component is geometrically distributed with parameter H. All other components have size O(log n), so the expected number of vertices explored before we get to the giant component is O(log n)/H = o(n), and so in the limit, we explore the giant component immediately.

The exploration process therefore gives good control on the giant component in the limit, as roughly speaking the first time it returns to 0 is the size of the giant component. Fortunately, we can also control the distribution of S_t, the exploration process at time t. We have that:

This is not too hard to see. is number of vertices we have explored or seen, ie are connected to a vertex we have explored. Suppose the remaining vertices are called unseen, and we began the exploration at vertex 1. Then any vertex with label in {2,…,n} is unseen if it successively avoids being in the neighbourhood of v(1), v(2), … v(t). This happens with probability , and so the probability of being an explored or seen vertex is the complement of this.

In the supercritical case, we are taking with , and we also want to speed up S, so that all the exploration processes are defined on [0,1], and rescale the sizes by n, so that it records the fraction of the graph rather than the number of vertices. So we set consider the rescaling .

It is straightforward to use the distribution of S_t we deduce that the asymptotic mean .

Now we are in a position to provide more concrete motivation for the claim that the proportion of vertices in the giant component is , the survival probability of a branching process with offspring distribution. It helps to consider instead the extinction probability . We have:

where the second equality is a consequence of the simple form for the moment generating function of the Poisson distribution.

As a result, we have that . In fact we also have a central limit theorem for S_t, which enables us to deduce that with high probability, as well as in expectation, which is precisely what is required to prove that the giant component of has size .

###### Related articles

- Branching Processes and Dwass’s Theorem (eventuallyalmosteverywhere.wordpress.com)
- Coloring a Unicyclic Graph (cs.stackexchange.com)
- Flaw in this Vertex Cover Algorithm (cs.stackexchange.com)
- How many edges must a graph with N vertices have in order to guarantee that it is connected? (cs.stackexchange.com)

Pingback: Analytic vs Probabilistic Arguments for Supercritical BP | Eventually Almost Everywhere

Pingback: From G(n,p) to G(n,m) | Eventually Almost Everywhere

Pingback: Poisson Tails | Eventually Almost Everywhere

Pingback: Random Graphs Week – Monday | Eventually Almost Everywhere