Convex ordering on Galton-Watson trees

This blog was recently revived, via a post about convex ordering and its relevance to the problem of sampling with and without replacement that forms part of the potpourri of related results all sometimes referred to as Hoeffding’s inequality.

The previous post had been lying almost-complete but dormant for over two years. I revisited it because of a short but open-ended question about trees posed in our research group meeting by Serte Donderwinkel, one of our high-flying doctoral students.

Simplified Question:

For a Galton-Watson tree, can one obtain upper bounds in probability on the height of the tree, uniformly across all offspring distributions with mean \mu?

Note that in this setting, it is helpful to have in mind the classical notation (Z_0,Z_1,Z_2,\ldots) for a Galton-Watson process, where typically Z_0=1, and Z_{n+1} is the sum of Z_n IID copies of the offspring distribution. Then we have

\mathbb{P}(\mathrm{height}(\mathcal{T}) < k) = \mathbb{P}(Z_k=0).

1) Subcritical case. When \mu<1, we certainly have \mathbb{P}(Z_k>0)\le \mathbb{E}[Z_k]=\mu^k.

Furthermore, if we’re studying all such offspring distributions, this is the best possible upper bound, by considering the offspring distribution given by Z_1=1 with probability \mu and zero otherwise.

2) In the critical or supercritical case, \mu\ge 1 it is possible that the height is infinite with probability one.

So neither case is especially interesting for now.

Refined question:

What if instead we aren’t trying to obtain a bound uniformly across all offspring distributions with given mean \mu, but instead across a subset \mathcal{X} of these distributions? How do we determine which distribution in \mathcal{X} maximises the probability of reaching height k?

This is the question Serte was asking in our group meeting, in the setting where \mu=1+o(1) and the height k has a particular scaling. Also, as far as I understand, the approach outlined in this post didn’t provide strong enough bounds in this particular context. Happily, Serte has recently tied up all the corners of this project concerning the supercritical Galton-Watson forest, and interested readers can find her preprint here on the Arxiv.

Nonetheless the interpretation via convex ordering feels perfect for a blog post, rather than being lost forever.

Convex ordering for offspring distributions

The main observation is that given two offspring distributions X and Y, such that X\le_{cx} Y (which recall means that the means are the same but X is more concentrated) then a number of distributions associated to the Galton-Watson trees for X and Y also satisfy convex ordering relations.

As a warm-up, and because it was the original genesis, we first study heights. We will use the notation

(Z_0^X,Z_1^X,Z_2^X,\ldots), (Z_0^Y,Z_1^Y,Z_2^Y,\ldots),

to denote the two Galton-Watson processes. We shall compare \mathbb{P}(Z^X_k=0) and \mathbb{P}(Z^Y_k=0). If we write \delta_0(\cdot) for the function defined on the non-negative integers such that

\delta_0(0)=1,\quad \delta_0(n)=0,\,n\ge 1,

it holds that \delta_0(\cdot) is convex. In particular, if X\le_{cx}Y, then \mathbb{E}[\delta_0(X)]\le \mathbb{E}[\delta_0(Y)], which exactly says that

\mathbb{P}(Z^X_1 = 0)\le \mathbb{P}(Z^Y_1 = 0).

We can then prove that \mathbb{P}(Z^X_k=0)\le \mathbb{P}(Z^Y_k=0) by induction on k\ge 1. Note that \mathbb{P}(Z^X_k=0)^n is a convex function of n, regardless of the value of this probability, and so we have

\mathbb{P}(Z^X_{k+1}=0) = \mathbb{E}\left[ (\mathbb{P}(Z^X_k=0))^X\right] \le \mathbb{E}\left[(\mathbb{P}(Z^X_k=0))^Y\right].

By the induction hypothesis, this final quantity is at most

\mathbb{E}\left[(\mathbb{P}(Z^Y_k=0))^Y\right] = \mathbb{P}(Z^Y_{k+1}=0).

In conclusion, we have shown that \mathbb{P}(Z^X_k=0)\le \mathbb{P}(Z^Y_k=0) holds for all k, and thus

\mathrm{height}(\mathcal{T}^X) \ge_{st} \mathrm{height}(\mathcal{T}^Y).

To return to the original context, suppose we have a large class of offspring distributions \mathcal{Y} and a subclass \mathcal{X}\subseteq \mathcal{Y} such that for all Y\in\mathcal{Y}, there exists X\in \mathcal{X} such that X\le_{cx} Y. Then one can obtain uniform bounds on the heights of Galton-Watson trees with offspring distributions drawn from \mathcal{Y} by checking those generated from distributions in \mathcal{X} (which is particularly convenient if, for example, \mathcal{X} is finite).

Convex ordering of generation sizes

The above argument solves the original problem, but brushes over the natural question: is it true that Z^X_k \le_{cx} Z^Y_k?

The answer is yes. Here’s a proof.

This follows from the following general statement:

Lemma 1: Suppose X\le_{cx} Y are non-negative valued RVs and the non-negative integer valued RVs M,N also satisfy M \le_{cx} N. Then

X_1+\ldots+X_M \le_{cx} Y_1+\ldots Y_N,

where X_1,X_2,\ldots are IID copies of X and, independently, Y_1,Y_2,\ldots are IID copies of Y.

Lemma 2: Suppose W_1\le_{cx}Z_1 and W_2\le_{cx} Z_2, and the four random variables are independent. Then W_1+W_2\le_{cx}Z_1+Z_2.

Proof of Lemma 2: First, note that for any random variable X, and convex function f

\mathbb{E}\left[f(Z+x)\right] is a convex function of x.

(Indeed, this holds since “f(z+x) is convex” holds for every z, and any definition of convex will pass to the expectation.)

Now we can attack the lemma directly, we may write

\mathbb{E}\left[ f(W_1+W_2)\right]=\mathbb{E}\left[\, \mathbb{E}[f(W_1+W_2) \mid W_2 ] \,\right] \le \mathbb{E}\left[\, \mathbb{E}[f(W_1+Z_2)\mid Z_2 ] \, \right].

But then for any z_2, we know f(\cdot+z_2) is convex, so \mathbb{E}[f(W_1+z_2)]\le \mathbb{E}[f(Z_1+z_2)], and it follows that

\mathbb{E}\left[ f(W_1+W_2)\right]\le \mathbb{E} \left[ f(Z_1+Z_2)\right],

which proves the lemma.

Corollary 3: When W_1,\ldots,W_m, Z_1,\ldots,Z_m are independent, and satisfy W_i \le_{cx} Z_i, then we have W_1+\ldots+W_m\le_{cx} Z_1+\ldots+Z_m.

Proof of Lemma 1: Note that

\mathbb{E}\left[ f(X_1+\ldots+X_M)\mid M=n\right] \le \mathbb{E}\left[ f(Y_1+\ldots+Y_N)\mid N=n\right],

follows from Corollary 3. So a useful question to consider is whether \mathbb{E}\left[f(Y_1+\ldots+Y_n)\right] (*) is a convex function of n?

Denote this quantity by F(n). To check convexity of a function defined on the integers, it suffices to verify that F(n+1)-F(n)\ge F(n)-F(n-1).

There is a canonical coupling between the RVs used to define all of F(n-1),F(n),F(n+1), but it will be convenient to adjust the coupling, and write:

F(n+1)-F(n)= \mathbb{E}\left[ f(Y_1+\ldots+Y_n + Y^*) - f(Y_1+\ldots+Y_n)\right],

F(n)-F(n-1)=\mathbb{E}\left[f(Y_1+\ldots+Y_{n-1}+Y^*) - f(Y_1+\ldots+Y_{n-1})\right],

where Y^* is a further independent copy of Y. But note that for any choice C\ge c and y\in \mathbb{R},

f(C+y) - f(C) - f(c+y) + f(c)\ge 0. (*)

(Essentially, this says that the ‘chord’ of f on the interval [c,C+y] lies above the chord on interval [C,c+y] or [c+y,C], which some people choose to call Karamata’s inequality, but I think is more helpful to think of as part of the visual definition of convexity.)

In any case, setting y=Y^*, c=Y_1+\ldots+Y_{n-1}, C=Y_1+\ldots+Y_n and taking expectations, we obtain

\mathbb{E}\left[ f(Y_1+\ldots+Y_n+Y^*) - f(Y_1+\ldots+Y_n)\right.

\left.- f(Y_1+\ldots+Y_{n-1}+Y^*) + f(Y_1+\ldots+Y_{n-1})\right]\ge 0,

as required. So F(n) is convex. We may now finish off as

\mathbb{E}\left[ X_1+\ldots+X_M\right] = \mathbb{E}\left[ \,\mathbb{E}[X_1+\ldots+X_M\mid M]\,\right] \le \mathbb{E}\left[\, \mathbb{E}[Y_1+\ldots+Y_M\mid M]\,\right] = \mathbb{E}[f(M)]\le \mathbb{E}[f(N)] = \mathbb{E}[Y_1+\ldots+Y_N],

completing the proof of Lemma 1.

Final comments

  • The analysis in this post is not sufficient to study the total population sizes of two Galton-Watson trees generated by X and Y. Note that in Lemma 2, it is important that the random variables are independent. Otherwise, we could, for example, consider \mathbb{E}[X]=\mathbb{E}[Y]=0 with X\le_{cx}Y but clearly it should not hold that X_1+X_2 \le_{cx} Y + (-Y) = 0. So for total population size, since (Z^X_k,\,k\ge 1) are not independent, an alternative approach would be required.
  • A further characterisation of convex ordering is given by Strassen’s theorem [Str65], which is touched on in the previous post, and to which I may return to in a future post on this topic. This may be a more promising avenue for established a convex ordering result on total population size.
  • Lemma 1 requires that X,Y are non-negative. Note that during the argument we set y=Y^*, c=Y_1+\ldots+Y_{n-1}, C=Y_1+\ldots+Y_n, and when we relax the non-negative support condition, it is no longer guaranteed that C\ge c, which is crucial for the step which follows.
  • In a recent article in ECP addressing Lemma 1 by a different method, Berard and Juillet [BJ20] provide a simple example showing that the non-negative assumption is genuinely necessary. Consider the random variable \tau\in \{0,2\} with equal probability so 1\le_{cx} \tau. But then, taking both X and Y to be simple random walk on \mathbb{Z}, we do not have S_1\le_{cx}S_{\tau}.


[BJ20] – Berard, Juillet – A coupling proof of convex ordering for compound distributions, 2020

[Str65] – Strassen – The existence of probability measures with given marginals, 1965


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s