# Analytic vs Probabilistic Arguments for a Supercritical BP

This follows on directly from the previous post. I was originally going to talk only about what follows, but I got rather carried away with the branching process account. I was stuck on a particular exercise, and we ended up coming up with two arguments: one analytic and one probabilistic. Since the typical flavour of this blog is to present problems which show the advantage of the probabilistic approach, it seems only fair to remark on this case, where the analytic method was less interesting, but much simpler.

Recall that we have a supercritical random graph $G(n,\frac{\lambda}{n}), \lambda>1$, and we are considering the rescaled exploration process $S_{nt}$, which has asymptotic mean $\mu_t=1-t-e^{-\lambda t}$. We can calculate similarly an expression for the asymptotic variance

$\frac{\text{Var}(S_{nt})}{n}\rightarrow v_t=e^{-\lambda t}(1-e^{-\lambda t}).$

To use this to verify the result about the size of the giant component, we verify that $\mu_{\zeta_\lambda+x/\sqrt{n}}$ is negative, and has small variance, which would confirm that the giant component has size bounded above by $\zeta_\lambda$ almost surely. A similar argument is required for the lower bound. The variance is a separate matter, but it is therefore necessary that $\mu_t$ should be decreasing at $t=\zeta_\lambda$, that is $\mu_t'=\lambda e^{-\lambda \zeta_\lambda}<0$. This is what we try to prove in the remainder of this post. Recall that in the previous post we have checked that it is equal to zero here.

Heuristic Explanation

$\mu_t$ has been rescaled from the original definition of the exploration process in both size and time-scale so some care is needed to see why this should hold in the limit. Remember that all components apart from the giant component are of size O(log n). So immediately after exhausting the giant component, you are likely to be visiting components of size roughly log n. A time interval of dt for $\mu$ corresponds to ndt for S, during which S will visit some components of size log n and some of O(1) and some in between. In particular, some fixed proportion of vertices are isolated, that is, in a component of size 1.

There is then a complicated size-biasing train of thought. A component of size log n is more likely to come up than an isolated vertex, but there are not as many of them. The log n components push the derivative $\mu_t'$ towards zero, because S_t decreases by 1 over a time-interval of length log n, which gives a gradient of zero in the limit. However, the isolated vertices give a gradient of -1, because S_t decreases by 1 over a time interval of 1. Despite the fact that log n intervals are likely to appear earlier, it still remains the case that after exhausting a component (in particular, at time $t=\zeta_\lambda$, after exhausting the giant component), with some bounded below positive probability you will choose an isolated vertex next. The component size only affects that time-scale if it is O(n), which none of the remaining components are, so the derivative $\mu_{\zeta_\lambda}'$ consists of some complicated weighted mean of 0 and -1. In particular, it is negative.

Analytic solution

Obviously, that won’t do in practice. Suppressing lambdas for ease of notation, the key fact is: $e^{-\lambda \zeta}=1-\zeta$. We want to show that $\lambda e^{-\lambda \zeta}<1$. Substituting

$\lambda=-\frac{\log(1-\zeta)}{\zeta},$

means that it is required to show:

$-\frac{1-\zeta}{\zeta}\log(1-\zeta)<1.$

Differentiating the left hand side gives:

$\frac{\log(1-\zeta)+\zeta}{\zeta^2}<0,$

since of course $\log(1-\zeta)=\zeta+\frac{\zeta^2}{2}+\frac{\zeta^3}{3}+\dots$. So it suffice to check the result for small $\zeta$. But, again using a Taylor series:

$-\frac{1-\zeta}{\zeta}\log(1-\zeta)=1-\frac12\zeta+O(\zeta^2)<1,$

for small $\zeta$. This gives the required result.

Probabilistic Interpretation and Solution

First, we observe that $\lambda e^{-\lambda\zeta}=\lambda(1-\zeta)$ is the expected number of vertices in the first generation of a $\text{Po}(\lambda)$ whose progeny become extinct. This motivates considering the canonical decomposition of a supercritical branching process Z into the skeleton process and the dual process. The skeleton $Z^+$ consists of all vertices which have infinitely many successors. It is relatively easy to show that this is a branching process with offspring distribution $\text{Po}(\lambda\zeta)$ conditioned on being positive. The dual process $Z^*$ is a G-W branching process with offspring distribution $\text{Po}(\lambda)$ conditioned on dying. This is the same as a branching process with offspring distribution $\text{Po}(\lambda(1-\zeta)$, by a sprinkling argument, which says that if we begin with a Poisson number of things, then remove each one independently with some fixed probability, the remaining number of things is Poisson also.

We can construct the original branching process by

• With probability $\zeta$, take the skeleton, and affixe independent copies of $Z^*$ at every vertex in the skeleton.
• With probability $1-\zeta$, just take a copy of $Z^*$.

It is immediately clear that $\lambda(1-\zeta)\leq 1$. After all, the dual process is almost surely finite, so the offspring distribution cannot have expectation greater than 1. Checking that this is strong is more fiddly. The best way I have come up with is to examine the tail of the distribution of total population size of the original branching process.

The total population size T of a branching process has an exponential tail if the offspring distribution is subcritical. It isn’t hugely surprising that this behaves like a large deviation for iid RVs, since in the limit such an event requires a lot of the offspring counts to deviate substantially from the mean. The same holds in the supercritical case, with the additional complication that though the finite tail decays exponential, there is positive probability that the total size will be infinite. In the critical case, however, there is a power-law decay. This is not hugely surprising as it marks the threshhold for the appearance of the infinite population, just as in a multiplicative coalescent at time 1, we have a load of very large components just about to form a giant component. The tool for all of these results is Dwass’s Theorem, which says:

$\mathbb{P}(T=n)=\frac{1}{n}\mathbb{P}(X_1+\ldots+X_n=n-1),$

where $X_1$ are iid with the offspring distribution. When $\mathbb{E}X_1\neq 1$, this is a large deviation event, for which Cramer’s theorem applies (assuming, as is the case for the Poisson distribution, that the offspring distribution has finite variance). When, $\mathbb{E}X=1$, the Central Limit Theorem says that with high probability,

$X_1+\ldots+X_n\in [n-n^{3/4},n+n^{3/4}],$

so, skating over the details of whether everything is exactly uniform within this CLT scaling window,

$\mathbb{P}(T=n)\geq \frac{1}{n}\cdot\frac{1}{2n^{3/4}}.$

The true exponent of the power law decay is substantially slower than this, but the above argument works as a back-of-the-envelope bound.

In particular, if the dual process has mean 1, then the population size of the original branching process is given by taking a distribution with exponential tail with some probability and a distribution with power-law tail with some probability. Obviously the power-law will dominate, which contradicts the assumption that the original branching process was supercritical, and so has an exponential tail.

# Branching Processes and Dwass’s Theorem

This is something I had to think about when writing my Part III essay, and it turns out to be relevant to some of the literature I’ve been reading this week. The main result is hugely helpful for reducing a potentially complicated combinatorial object to a finite sum of i.i.d. random variables, which in general we do know quite a lot about. I was very pleased with the proof I came up with while writing the essay, even if in the end it turned out to have appeared elsewhere before. (Citation at end)

Galton-Watson processes

A Galton-Watson process is a stochastic process describing a simple model for evolution of a population. At each stage of the evolution, a new generation is created as every member of the current generation produces some number of `offspring’ with identical and independent (both across all generations and within generations) distributions. Such processes were introduced by Galton and Watson to examine the evolution of surnames through history.

More precisely, we specify an offspring distribution, a probability distribution supported on $\mathbb{N}_0$. Then define a sequence of random variables $(Z_n,n\in\mathbb{N})$ by:

$Z_{n+1}=Y_1^n+\ldots+Y_{Z_n}^n,$

where $(Y_k^n,k\geq 1,n\geq 0)$ is a family of i.i.d. random variables with the offspring distribution $Y$. We say $Z_n$ is the size of the $n$th generation. From now on, assume $Z_0=1$ and then we call $(Z_n,n\geq 0)$ a Galton-Watson process. We also define the total population size to be

$X:=Z_0+Z_1+Z_2+\ldots,$

noting that this might be infinite. We refer to the situation where $X<\infty$ finite as extinction, and can show that extinction occurs almost surely when $\mathbb{E}Y\leq 1$, excepting the trivial case $Y=\delta_1$. The strict inequality parts are as you would expect. We say the process is critical if $\mathbb{E}Y=1$, and this is less obvious to visualise, but works equally well in the proof, which is usually driven using generating functions.

Total Population Size and Dwass’s Theorem

Of particular interest is $X$, the total population size, and its distribution. The following result gives us a precise and useful result linking the probability of the population having size $n$ and the distribution of the sum of $n$ RVs with the relevant offspring distribution. Among the consequences are that we can conclude immediately, by CLT and Cramer’s Large Deviations Theorem, that the total population size distribution has power-law decay in the critical case, and exponential decay otherwise.

Theorem (Dwass (1)): For a general branching process with a single time-0 ancestor and offspring distribution $Y$ and total population size $X$:

$\mathbb{P}(X=k)=\frac{1}{k}\mathbb{P}(Y^1+\ldots+ Y^k=k-1),\quad k\geq 1$

where $Y^1,\ldots,Y^k$ are independent copies of $Y$.

We now give a proof via a combinatorial argument. The approach is similar to that given in (2). Much of the literature gives a proof using generating functions.

Proof: For motivation, consider the following. It is natural to consider a branching process as a tree, with the time-0 ancestor as the root. Suppose the event $\{X=k\}$ in holds, which means that the tree has $k$ vertices. Now consider the numbers of offspring of each vertex in the tree. Since every vertex except the root has exactly one parent, and there are no vertices outside the tree, we must have $Y^1+\ldots+Y^k=k-1$ where $Y^1,\ldots,Y^k$ are the offspring numbers in some order. However, observe that this is not sufficient. For example, if $Y^1$ is the number of offspring of the root, and $k\geq 2$, then we must have $Y^1\geq 1$. Continue reading