Generating uniform trees

A long time ago, I wrote quite a few a things about uniform trees. That is, a uniform choice from the n^{n-2} unrooted trees with vertex set [n]. This enumeration, normally called Cayley’s formula, has several elegant arguments, including the classical Prufer bijection. But making a uniform choice from a large set is awkward, and so we seek more probabilistic methods to sample such a tree, which might also give insight into the structure of a ‘typical’ uniform tree.

In another historic post, I talked about the Aldous-Broder algorithm. Here’s a quick summary. We run a random walk on the complete graph K_n started from a uniformly-chosen vertex. Every time we arrive at a vertex we haven’t visited before, we record the edge just traversed. Eventually we have visited all n vertices, so have recorded n-1 edges. It’s easy enough to convince yourself that these n-1 edges form a tree (how could there be a cycle?) and a bit more complicated to decide that the distribution of this tree is uniform.

It’s worth noting that this algorithm works to construct a uniform spanning tree on any connected base graph.

This post is about a few alternative constructions and interpretations of the uniform random tree. The first construction uses a Galton-Watson process. We take a Galton-Watson process where the offspring distribution is Poisson(1), and condition that the total population size is n. The resulting random tree has a root but no labels, however if we assign labels in [n] uniformly at random, the resulting rooted tree has the uniform distribution among rooted trees on [n].

Proof

This is all about moving from ordered trees to non-ordered trees. That is, when setting up a Galton-Watson tree, we distinguish between the following two trees, drawn extremely roughly in Paint:

That is, it matters which of the first-generation vertices have three children. Anyway, for such a (rooted) ordered tree T with n vertices, the probability that the Galton-Watson process ends up equal to T is

\mathbb{P}(GW = T) = \prod_{v\in T} \frac{e^{-1}}{C(v)!} = e^{-n} \prod_{v\in T}\frac{1}{C(v)!},

where C(v) is the number of children of a vertex v\in T. Then, since \mathbb{P}( |GW|=n ) is a function of n, we find

\mathbb{P}(GW=T \,\big|\, |GW|=n) = f(n)\prod_{v\in T} \frac{1}{C(v)!},

where f(n) is a function of n alone (ie depends on T only through its size n).

But given an unordered rooted tree t, labelled by [n], there are \prod_{v \in t} C(v)! ordered trees associated to t in the natural way. Furthermore, if we take the Poisson Galton-Watson tree conditioned to have total population size n, and label uniformly at random with [n], we obtain any one of these ordered trees with probability \frac{f(n)}{n!} \prod_{v\in t} \frac{1}{C(v)!}. So the probability that we have t after we forget about the ordering is \frac{f(n)}{n!}, which is a function of n alone, and so the distribution is uniform among the set of rooted unordered trees labelled by [n], exactly as required.

Heuristic for Poisson offspring distribution

In this proof, the fact that \mathbb{P}(C(v)=k)\propto \frac{1}{k!} exactly balances the number of orderings of the k children explains why Poisson(1) works out. Indeed, you can see in the proof that Poisson(c) works equally well, though when c\ne 1, the event we are conditioning on (namely that the total population size is n) has probability decaying exponentially in n, whereas for c=1, the branching process is critical, and the probability decays polynomially.

We can provide independent motivation though, from the Aldous-Broder construction. Both the conditioned Galton-Watson construction and the A-B algorithm supply the tree with a root, so we’ll keep that, and look at the distribution of the degree of the root as constructed by A-B. Let \rho=v_1,v_2,v_3,\ldots be the vertices [n], ordered by their discovery during the construction. Then \rho is definitely connected by an edge to v_2, but thereafter it follows by an elementary check that the probability \rho is connected to v_m is \frac{1}{n-1}, independently across all m. In other words, the distribution of the degree of \rho in the tree as constructed by A-B is

1+ \mathrm{Bin}\left(n-2,\frac{1}{n-1}\right) \approx 1+\mathrm{Poisson}(1).

Now, in the Galton-Watson process, conditioning the tree to have fixed, large size changes the offspring distribution of the root. Conveniently though, in a limiting sense it’s the same change as conditioning the tree to have size at least n. Since these events are monotone in n, it’s possible to take a limit of the conditioning events, and interpret the result as the Galton-Watson tree conditioned to survive. It’s a beautiful result that this interpretation can be formalised as a local limit. The limiting spine decomposition consists of an infinite spine, where the offspring distribution is a size-biased version of the original offspring distribution (and so in particular, always has at least one child) and where non-spine vertices have the original distribution.

In particular, the number of the offspring of the root is size-biased, and it is well-known and not hard to check that size-biasing Poisson(c) gives 1+Poisson(c) ! So in fact we have, in an appropriate limiting sense in both objects, a match between the degree distribution of the root in the uniform tree, and in the conditioned Galton-Watson tree.

This isn’t supposed to justify why a conditioned Galton-Watson tree is relevant a priori (especially the unconditional independence of degrees), but it does explain why Poisson offspring distributions are relevant.

Construction via G(N,p) and the random cluster model

The main reason uniform trees were important to my thesis was their appearance in the Erdos-Renyi random graph G(N,p). The probability that vertices {1, …, n} form a tree component in G(N,p) with some particular structure is

p^{n-1} (1-p)^{\binom{n}{2}-(n-1)} \times (1-p)^{n(N-m)}.

Here, the first two terms give the probability that the graph structure on {1, …, n} is correct, and the the final term gives the probability of the (independent) event that these vertices are not connected to anything else in the graph. In particular, this has no dependence on the tree structure chosen on [n] (for example, whether it should be a path or a star – both examples of trees). So the conditional distribution is uniform among all trees.

If we work in some limiting regime, where pn\rightarrow 0 (for example if n is fixed and p=\frac{1}{N}\rightarrow 0), then we can get away asymptotically with less strong conditioning. Suppose we condition instead just that [n] form a component. Now, there are more ways to form a connected graph with one cycle on [n] than there are trees on [n], but the former all require an extra edge, and so the probability that a given one such tree-with-extra-edge appears as the restriction to [n] in G(N,p) is asymptotically negligible compared to the probability that the restriction to [n] of G(N,p) is a tree. Naturally, the local limit of components in G(N,c/N) is a Poisson(c) Galton-Watson branching process, and so this is all consistent with the original construction.

One slightly unsatisfying aspect to this construction is that we have to embed the tree of size [n] within a much larger graph on [N] to see uniform trees. We can’t choose a scaling p=p(n) such that G(n,p) itself concentrates on trees. To guarantee connectivity with high probability, we need to take p> \frac{\log n}{n}, but by this threshold, the graph has (many) cycles with high probability.

At this PIMS summer school in Vancouver, one of the courses is focusing on lattice spin models, including the random cluster model, which we now briefly define. We start with some underlying graph G. From a physical motivation, we might take G to be \mathbb{Z}^d or some finite subset of it, or a d-ary tree, or the complete graph K_N. As in classical bond percolation (note G(N,p) is bond percolation on K_N), a random subset of the edges of G are included, or declared open. The probability of a given configuration w, with e open edges is proportional to

p^e (1-p)^{|E(G)| - e} q^{k(w)}, (*)

where the edge-weight p\in(0,1) as usual, and cluster weight q\in (0,\infty), and k(w) counts the number of connected components in configuration w. When q=1, we recover classical bond percolation (including G(N,p) ), while for q>1, this cluster-reweighting favours having more components, and q<1 favours fewer components. Note that in the case q\ne 1, the normalising constant (or partition function) of (*) is generally intractable to calculate explicitly.

As in the Erdos-Renyi graph, consider fixing the underlying graph G, and taking p\rightarrow 0, but also taking \frac{q}{p}\rightarrow 0. So the resulting graph asymptotically ‘wants to have as few edges as possible, but really wants to have as few components as possible’. In particular, 1) all spanning trees of G are equally likely; 2) any configuration with more than one component has asymptotically negligible probability relative to any tree; 3) any graph with a cycle has #components + #edges greater than that of a tree, and so is asymptotically negligible probability relative to any tree.

In other words, the limit of the distribution is the uniform spanning tree of G, and so this (like Aldous-Broder) is a substantial generalisation, which constructs the uniform random tree in the special case where G=K_n.

 

Advertisements