This blog was recently revived, via a post about convex ordering and its relevance to the problem of sampling with and without replacement that forms part of the potpourri of related results all sometimes referred to as *Hoeffding’s inequality*.

The previous post had been lying almost-complete but dormant for over two years. I revisited it because of a short but open-ended question about trees posed in our research group meeting by Serte Donderwinkel, one of our high-flying doctoral students.

**Simplified Question:**

For a Galton-Watson tree, can one obtain upper bounds in probability on the *height* of the tree, uniformly across all offspring distributions with mean ?

Note that in this setting, it is helpful to have in mind the classical notation for a Galton-Watson process, where typically , and is the sum of IID copies of the offspring distribution. Then we have

1) Subcritical case. When , we certainly have .

Furthermore, if we’re studying *all* such offspring distributions, this is the best possible upper bound, by considering the offspring distribution given by with probability and zero otherwise.

2) In the critical or supercritical case, it is possible that the height is infinite with probability one.

So neither case is especially interesting for now.

**Refined question:**

What if instead we aren’t trying to obtain a bound uniformly across **all** offspring distributions with given mean , but instead across a subset of these distributions? How do we determine which distribution in maximises the probability of reaching height k?

This is the question Serte was asking in our group meeting, in the setting where and the height k has a particular scaling. Also, as far as I understand, the approach outlined in this post didn’t provide strong enough bounds in this particular context. Happily, Serte has recently tied up all the corners of this project concerning the *supercritical Galton-Watson forest*, and interested readers can find her preprint here on the Arxiv.

Nonetheless the interpretation via convex ordering feels perfect for a blog post, rather than being lost forever.

**Convex ordering for offspring distributions**

The main observation is that given two offspring distributions X and Y, such that (which recall means that the means are the same but X is more concentrated) then a number of distributions associated to the Galton-Watson trees for X and Y also satisfy convex ordering relations.

As a warm-up, and because it was the original genesis, we first study heights. We will use the notation

to denote the two Galton-Watson processes. We shall compare and . If we write for the function defined on the non-negative integers such that

,

it holds that is convex. In particular, if , then , which exactly says that

We can then prove that by induction on . Note that is a convex function of n, regardless of the value of this probability, and so we have

By the induction hypothesis, this final quantity is at most

.

In conclusion, we have shown that holds for all k, and thus

To return to the original context, suppose we have a large class of offspring distributions and a subclass such that for all , there exists such that . Then one can obtain uniform bounds on the heights of Galton-Watson trees with offspring distributions drawn from by checking those generated from distributions in (which is particularly convenient if, for example, is finite).

**Convex ordering of generation sizes**

The above argument solves the original problem, but brushes over the natural question: is it true that ?

The answer is yes. Here’s a proof.

This follows from the following general statement:

**Lemma 1: **Suppose are non-negative valued RVs and the non-negative *integer valued* RVs also satisfy . Then

where are IID copies of X and, independently, are IID copies of Y.

**Lemma 2: **Suppose and , and the four random variables are independent. Then .

*Proof of Lemma 2:* First, note that for any random variable X, and convex function f

is a convex function of x.

(Indeed, this holds since “f(z+x) is convex” holds for every z, and any definition of convex will pass to the expectation.)

Now we can attack the lemma directly, we may write

But then for any , we know is convex, so , and it follows that

which proves the lemma.

**Corollary 3: **When are independent, and satisfy , then we have .

*Proof of Lemma 1:* Note that

follows from Corollary 3. So a useful question to consider is whether (*) is a convex function of n?

Denote this quantity by . To check convexity of a function defined on the integers, it suffices to verify that .

There is a canonical coupling between the RVs used to define all of , but it will be convenient to adjust the coupling, and write:

where is a further independent copy of . But note that for any choice and ,

(*)

(Essentially, this says that the ‘chord’ of f on the interval lies above the chord on interval or , which some people choose to call *Karamata’s inequality*, but I think is more helpful to think of as part of the visual definition of convexity.)

In any case, setting and taking expectations, we obtain

as required. So is convex. We may now finish off as

completing the proof of Lemma 1.

**Final comments**

- The analysis in this post is not sufficient to study the
*total population sizes*of two Galton-Watson trees generated by X and Y. Note that in Lemma 2, it is important that the random variables are independent. Otherwise, we could, for example, consider with but clearly it should not hold that . So for total population size, since are not independent, an alternative approach would be required. - A further characterisation of convex ordering is given by
*Strassen’s theorem*[Str65], which is touched on in the previous post, and to which I may return to in a future post on this topic. This may be a more promising avenue for established a convex ordering result on total population size. - Lemma 1 requires that X,Y are non-negative. Note that during the argument we set , and when we relax the non-negative support condition, it is no longer guaranteed that , which is crucial for the step which follows.
- In a recent article in ECP addressing Lemma 1 by a different method, Berard and Juillet [BJ20] provide a simple example showing that the non-negative assumption is genuinely necessary. Consider the random variable with equal probability so . But then, taking both X and Y to be simple random walk on , we do not have .

**References**

[BJ20] – Berard, Juillet – A coupling proof of convex ordering for compound distributions, 2020

[Str65] – Strassen – The existence of probability measures with given marginals, 1965