Branching Processes and Dwass’s Theorem

This is something I had to think about when writing my Part III essay, and it turns out to be relevant to some of the literature I’ve been reading this week. The main result is hugely helpful for reducing a potentially complicated combinatorial object to a finite sum of i.i.d. random variables, which in general we do know quite a lot about. I was very pleased with the proof I came up with while writing the essay, even if in the end it turned out to have appeared elsewhere before. (Citation at end)

Galton-Watson processes

A Galton-Watson process is a stochastic process describing a simple model for evolution of a population. At each stage of the evolution, a new generation is created as every member of the current generation produces some number of `offspring’ with identical and independent (both across all generations and within generations) distributions. Such processes were introduced by Galton and Watson to examine the evolution of surnames through history.

More precisely, we specify an offspring distribution, a probability distribution supported on \mathbb{N}_0. Then define a sequence of random variables (Z_n,n\in\mathbb{N}) by:


where (Y_k^n,k\geq 1,n\geq 0) is a family of i.i.d. random variables with the offspring distribution Y. We say Z_n is the size of the nth generation. From now on, assume Z_0=1 and then we call (Z_n,n\geq 0) a Galton-Watson process. We also define the total population size to be


noting that this might be infinite. We refer to the situation where X<\infty finite as extinction, and can show that extinction occurs almost surely when \mathbb{E}Y\leq 1, excepting the trivial case Y=\delta_1. The strict inequality parts are as you would expect. We say the process is critical if \mathbb{E}Y=1, and this is less obvious to visualise, but works equally well in the proof, which is usually driven using generating functions.

Total Population Size and Dwass’s Theorem

Of particular interest is X, the total population size, and its distribution. The following result gives us a precise and useful result linking the probability of the population having size n and the distribution of the sum of n RVs with the relevant offspring distribution. Among the consequences are that we can conclude immediately, by CLT and Cramer’s Large Deviations Theorem, that the total population size distribution has power-law decay in the critical case, and exponential decay otherwise.

Theorem (Dwass (1)): For a general branching process with a single time-0 ancestor and offspring distribution $Y$ and total population size $X$:

\mathbb{P}(X=k)=\frac{1}{k}\mathbb{P}(Y^1+\ldots+ Y^k=k-1),\quad k\geq 1

where Y^1,\ldots,Y^k are independent copies of Y.

We now give a proof via a combinatorial argument. The approach is similar to that given in (2). Much of the literature gives a proof using generating functions.

Proof: For motivation, consider the following. It is natural to consider a branching process as a tree, with the time-0 ancestor as the root. Suppose the event \{X=k\} in holds, which means that the tree has k vertices. Now consider the numbers of offspring of each vertex in the tree. Since every vertex except the root has exactly one parent, and there are no vertices outside the tree, we must have Y^1+\ldots+Y^k=k-1 where Y^1,\ldots,Y^k are the offspring numbers in some order. However, observe that this is not sufficient. For example, if Y^1 is the number of offspring of the root, and k\geq 2, then we must have Y^1\geq 1.

We can define the branching process in a more subtle way via a coupling that allows us to exploit this more directly. Formally, take (Y^1,Y^2,\ldots) a set of i.i.d. random variables with the same distribution as Y. Now construct the branching process as follows. Using graph theoretic notation, define a root v_1. Then demand that v_1 has Y^1 children, and so if Y^1>0, define v_2,\ldots,v_{1+Y^1} to be v_1‘s children. Then take the smallest v_n (ordered by index) which hasn’t been treated yet, and demand that it has Y^n children, labelled with the smallest currently available indices. Continue. It is clear that this is equivalent to the definition previously given. And when \mathbb{E}Y\leq 1, this process terminates after some almost surely finite number of iterations, giving a tree which has a breadth-first labelling.

The key observation is that the following are equal:

X=\inf\{k\geq 1: Y^1+\ldots+Y^k=k-1\}.

Or in other words, as events

\{X=k\}=\{Y^1+\ldots+Y^k=k-1\}\cap\{Y^1+\ldots+Y^j\geq j,\;\forall 1\leq j\leq k-1\}.

So it will suffice to prove:

\mathbb{P}\left[Y^1+\ldots+Y^j\geq j,\;\forall 1\leq j\leq k-1\Big|Y^1+\ldots+Y^k=k-1\right]=\frac{1}{k}.

We observe that the joint distribution of (Y^1,\ldots,Y^k) is unchanged under cyclic permutations of the indices. We will show that given Y^1+\ldots+Y^k=k-1, there is precisely one cyclic permutation of the indices for which initial segments of the sum are bounded suitably. For ease of notation, we define z_i=Y^i-1, hence z_i\in\{-1,0,1,2,\ldots\} for each i. The following lemma will therefore provide precisely what is required.

Lemma: With z_i as above, if z_1+\ldots+z_n=-1, then there is precisely one \sigma\in C_n, the cyclic group acting on [n], such that, defining w_i=z_{\sigma(i)}\quad \forall i\in[1,n], we have:

w_1\geq 0,\quad w_1+w_2\geq 0,\quad\ldots,\quad w_1+\ldots+w_{n-1}\geq 0.

Proof: We first check uniqueness. Let \tau:=(12\ldots n)\in C_n. It suffices to check that the result cannot hold simultaneously for \sigma=\text{id} and for \sigma=\tau^k where 1\leq k\leq n-1. This is simple. If this holds for such a \sigma, then

w_{\sigma(1)}+\ldots+w_{\sigma(n-k)}=z_{k+1}+\ldots+z_n\geq 0,\quad\text{and}\quad z_1+\ldots+z_k\geq 0,

\Rightarrow \quad -1=z_1+\ldots+z_n\geq 0.

For existence, define

k:=\arg\min_{j\in[1,n]} \left(z_1+\ldots+z_j\right).

If there are multiple j at which the function is minimised, then take k to be the least such j. Then claim that \sigma:=\tau^k satisfies the statement of the lemma. For, taking w_i=z_{\sigma(i)},

w_1+\ldots+w_j=\left\{\begin{array}{l l}\left(z_1+\ldots+z_{k+j}\right)-\left(z_1+\ldots+z_k\right)\geq 0 & \quad k+j\leq n\\ \left(-1+z_1+\ldots+z_{j+1-n}\right)-\left(z_1+\ldots+z_k\right)>-1 & \quad k+j>n \end{array} \right.,

by definition of k. In particular, notice that by assumption all z_i\in\mathbb{Z}, and so the second case also gives w_1+\ldots+w_j\geq 0.

This completes the proof of the proposition.


(1) Dwass, M. The total progeny in a branching process and a related random walk. J. APp. Prob. 6 (1969), 682-686.

(2) van der Hofstad, R. and Keane, M. An elementary proof of the hitting time theorem. Amer. Math. Monthly 115 (8) (2008), 753-756.


2 thoughts on “Branching Processes and Dwass’s Theorem

  1. Pingback: Exploring the Supercritical Random Graph | Eventually Almost Everywhere

  2. Pingback: Analytic vs Probabilistic Arguments for Supercritical BP | Eventually Almost Everywhere

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s