This is something I had to think about when writing my Part III essay, and it turns out to be relevant to some of the literature I’ve been reading this week. The main result is hugely helpful for reducing a potentially complicated combinatorial object to a finite sum of i.i.d. random variables, which in general we do know quite a lot about. I was very pleased with the proof I came up with while writing the essay, even if in the end it turned out to have appeared elsewhere before. (Citation at end)
A Galton-Watson process is a stochastic process describing a simple model for evolution of a population. At each stage of the evolution, a new generation is created as every member of the current generation produces some number of `offspring’ with identical and independent (both across all generations and within generations) distributions. Such processes were introduced by Galton and Watson to examine the evolution of surnames through history.
More precisely, we specify an offspring distribution, a probability distribution supported on . Then define a sequence of random variables by:
where is a family of i.i.d. random variables with the offspring distribution . We say is the size of the th generation. From now on, assume and then we call a Galton-Watson process. We also define the total population size to be
noting that this might be infinite. We refer to the situation where finite as extinction, and can show that extinction occurs almost surely when , excepting the trivial case . The strict inequality parts are as you would expect. We say the process is critical if , and this is less obvious to visualise, but works equally well in the proof, which is usually driven using generating functions.
Total Population Size and Dwass’s Theorem
Of particular interest is , the total population size, and its distribution. The following result gives us a precise and useful result linking the probability of the population having size and the distribution of the sum of RVs with the relevant offspring distribution. Among the consequences are that we can conclude immediately, by CLT and Cramer’s Large Deviations Theorem, that the total population size distribution has power-law decay in the critical case, and exponential decay otherwise.
Theorem (Dwass (1)): For a general branching process with a single time-0 ancestor and offspring distribution $Y$ and total population size $X$:
where are independent copies of .
We now give a proof via a combinatorial argument. The approach is similar to that given in (2). Much of the literature gives a proof using generating functions.
Proof: For motivation, consider the following. It is natural to consider a branching process as a tree, with the time-0 ancestor as the root. Suppose the event in holds, which means that the tree has vertices. Now consider the numbers of offspring of each vertex in the tree. Since every vertex except the root has exactly one parent, and there are no vertices outside the tree, we must have where are the offspring numbers in some order. However, observe that this is not sufficient. For example, if is the number of offspring of the root, and , then we must have .
We can define the branching process in a more subtle way via a coupling that allows us to exploit this more directly. Formally, take a set of i.i.d. random variables with the same distribution as . Now construct the branching process as follows. Using graph theoretic notation, define a root . Then demand that has children, and so if , define to be ‘s children. Then take the smallest (ordered by index) which hasn’t been treated yet, and demand that it has children, labelled with the smallest currently available indices. Continue. It is clear that this is equivalent to the definition previously given. And when , this process terminates after some almost surely finite number of iterations, giving a tree which has a breadth-first labelling.
The key observation is that the following are equal:
Or in other words, as events
So it will suffice to prove:
We observe that the joint distribution of is unchanged under cyclic permutations of the indices. We will show that given , there is precisely one cyclic permutation of the indices for which initial segments of the sum are bounded suitably. For ease of notation, we define , hence for each . The following lemma will therefore provide precisely what is required.
Lemma: With as above, if , then there is precisely one , the cyclic group acting on , such that, defining we have:
Proof: We first check uniqueness. Let . It suffices to check that the result cannot hold simultaneously for and for where . This is simple. If this holds for such a , then
For existence, define
If there are multiple at which the function is minimised, then take to be the least such . Then claim that satisfies the statement of the lemma. For, taking ,
by definition of . In particular, notice that by assumption all , and so the second case also gives .
This completes the proof of the proposition.
(1) Dwass, M. The total progeny in a branching process and a related random walk. J. APp. Prob. 6 (1969), 682-686.
(2) van der Hofstad, R. and Keane, M. An elementary proof of the hitting time theorem. Amer. Math. Monthly 115 (8) (2008), 753-756.