# Generating uniform trees

A long time ago, I wrote quite a few a things about uniform trees. That is, a uniform choice from the $n^{n-2}$ unrooted trees with vertex set [n]. This enumeration, normally called Cayley’s formula, has several elegant arguments, including the classical Prufer bijection. But making a uniform choice from a large set is awkward, and so we seek more probabilistic methods to sample such a tree, which might also give insight into the structure of a ‘typical’ uniform tree.

In another historic post, I talked about the Aldous-Broder algorithm. Here’s a quick summary. We run a random walk on the complete graph $K_n$ started from a uniformly-chosen vertex. Every time we arrive at a vertex we haven’t visited before, we record the edge just traversed. Eventually we have visited all n vertices, so have recorded n-1 edges. It’s easy enough to convince yourself that these n-1 edges form a tree (how could there be a cycle?) and a bit more complicated to decide that the distribution of this tree is uniform.

It’s worth noting that this algorithm works to construct a uniform spanning tree on any connected base graph.

This post is about a few alternative constructions and interpretations of the uniform random tree. The first construction uses a Galton-Watson process. We take a Galton-Watson process where the offspring distribution is Poisson(1), and condition that the total population size is n. The resulting random tree has a root but no labels, however if we assign labels in [n] uniformly at random, the resulting rooted tree has the uniform distribution among rooted trees on [n].

Proof

This is all about moving from ordered trees to non-ordered trees. That is, when setting up a Galton-Watson tree, we distinguish between the following two trees, drawn extremely roughly in Paint:

That is, it matters which of the first-generation vertices have three children. Anyway, for such a (rooted) ordered tree T with n vertices, the probability that the Galton-Watson process ends up equal to T is

$\mathbb{P}(GW = T) = \prod_{v\in T} \frac{e^{-1}}{C(v)!} = e^{-n} \prod_{v\in T}\frac{1}{C(v)!},$

where $C(v)$ is the number of children of a vertex $v\in T$. Then, since $\mathbb{P}( |GW|=n )$ is a function of n, we find

$\mathbb{P}(GW=T \,\big|\, |GW|=n) = f(n)\prod_{v\in T} \frac{1}{C(v)!},$

where f(n) is a function of n alone (ie depends on T only through its size n).

But given an unordered rooted tree t, labelled by [n], there are $\prod_{v \in t} C(v)!$ ordered trees associated to t in the natural way. Furthermore, if we take the Poisson Galton-Watson tree conditioned to have total population size n, and label uniformly at random with [n], we obtain any one of these ordered trees with probability $\frac{f(n)}{n!} \prod_{v\in t} \frac{1}{C(v)!}$. So the probability that we have t after we forget about the ordering is $\frac{f(n)}{n!}$, which is a function of n alone, and so the distribution is uniform among the set of rooted unordered trees labelled by [n], exactly as required.

Heuristic for Poisson offspring distribution

In this proof, the fact that $\mathbb{P}(C(v)=k)\propto \frac{1}{k!}$ exactly balances the number of orderings of the k children explains why Poisson(1) works out. Indeed, you can see in the proof that Poisson(c) works equally well, though when $c\ne 1$, the event we are conditioning on (namely that the total population size is n) has probability decaying exponentially in n, whereas for c=1, the branching process is critical, and the probability decays polynomially.

We can provide independent motivation though, from the Aldous-Broder construction. Both the conditioned Galton-Watson construction and the A-B algorithm supply the tree with a root, so we’ll keep that, and look at the distribution of the degree of the root as constructed by A-B. Let $\rho=v_1,v_2,v_3,\ldots$ be the vertices [n], ordered by their discovery during the construction. Then $\rho$ is definitely connected by an edge to $v_2$, but thereafter it follows by an elementary check that the probability $\rho$ is connected to $v_m$ is $\frac{1}{n-1}$, independently across all m. In other words, the distribution of the degree of $\rho$ in the tree as constructed by A-B is

$1+ \mathrm{Bin}\left(n-2,\frac{1}{n-1}\right) \approx 1+\mathrm{Poisson}(1).$

Now, in the Galton-Watson process, conditioning the tree to have fixed, large size changes the offspring distribution of the root. Conveniently though, in a limiting sense it’s the same change as conditioning the tree to have size at least n. Since these events are monotone in n, it’s possible to take a limit of the conditioning events, and interpret the result as the Galton-Watson tree conditioned to survive. It’s a beautiful result that this interpretation can be formalised as a local limit. The limiting spine decomposition consists of an infinite spine, where the offspring distribution is a size-biased version of the original offspring distribution (and so in particular, always has at least one child) and where non-spine vertices have the original distribution.

In particular, the number of the offspring of the root is size-biased, and it is well-known and not hard to check that size-biasing Poisson(c) gives 1+Poisson(c) ! So in fact we have, in an appropriate limiting sense in both objects, a match between the degree distribution of the root in the uniform tree, and in the conditioned Galton-Watson tree.

This isn’t supposed to justify why a conditioned Galton-Watson tree is relevant a priori (especially the unconditional independence of degrees), but it does explain why Poisson offspring distributions are relevant.

Construction via G(N,p) and the random cluster model

The main reason uniform trees were important to my thesis was their appearance in the Erdos-Renyi random graph G(N,p). The probability that vertices {1, …, n} form a tree component in G(N,p) with some particular structure is

$p^{n-1} (1-p)^{\binom{n}{2}-(n-1)} \times (1-p)^{n(N-m)}.$

Here, the first two terms give the probability that the graph structure on {1, …, n} is correct, and the the final term gives the probability of the (independent) event that these vertices are not connected to anything else in the graph. In particular, this has no dependence on the tree structure chosen on [n] (for example, whether it should be a path or a star – both examples of trees). So the conditional distribution is uniform among all trees.

If we work in some limiting regime, where $pn\rightarrow 0$ (for example if n is fixed and $p=\frac{1}{N}\rightarrow 0$), then we can get away asymptotically with less strong conditioning. Suppose we condition instead just that [n] form a component. Now, there are more ways to form a connected graph with one cycle on [n] than there are trees on [n], but the former all require an extra edge, and so the probability that a given one such tree-with-extra-edge appears as the restriction to [n] in G(N,p) is asymptotically negligible compared to the probability that the restriction to [n] of G(N,p) is a tree. Naturally, the local limit of components in G(N,c/N) is a Poisson(c) Galton-Watson branching process, and so this is all consistent with the original construction.

One slightly unsatisfying aspect to this construction is that we have to embed the tree of size [n] within a much larger graph on [N] to see uniform trees. We can’t choose a scaling p=p(n) such that G(n,p) itself concentrates on trees. To guarantee connectivity with high probability, we need to take $p> \frac{\log n}{n}$, but by this threshold, the graph has (many) cycles with high probability.

At this PIMS summer school in Vancouver, one of the courses is focusing on lattice spin models, including the random cluster model, which we now briefly define. We start with some underlying graph G. From a physical motivation, we might take G to be $\mathbb{Z}^d$ or some finite subset of it, or a d-ary tree, or the complete graph $K_N$. As in classical bond percolation (note G(N,p) is bond percolation on $K_N$), a random subset of the edges of G are included, or declared open. The probability of a given configuration w, with e open edges is proportional to

$p^e (1-p)^{|E(G)| - e} q^{k(w)},$ (*)

where the edge-weight $p\in(0,1)$ as usual, and cluster weight $q\in (0,\infty)$, and $k(w)$ counts the number of connected components in configuration w. When q=1, we recover classical bond percolation (including G(N,p) ), while for q>1, this cluster-reweighting favours having more components, and q<1 favours fewer components. Note that in the case $q\ne 1$, the normalising constant (or partition function) of (*) is generally intractable to calculate explicitly.

As in the Erdos-Renyi graph, consider fixing the underlying graph G, and taking $p\rightarrow 0$, but also taking $\frac{q}{p}\rightarrow 0$. So the resulting graph asymptotically ‘wants to have as few edges as possible, but really wants to have as few components as possible’. In particular, 1) all spanning trees of G are equally likely; 2) any configuration with more than one component has asymptotically negligible probability relative to any tree; 3) any graph with a cycle has #components + #edges greater than that of a tree, and so is asymptotically negligible probability relative to any tree.

In other words, the limit of the distribution is the uniform spanning tree of G, and so this (like Aldous-Broder) is a substantial generalisation, which constructs the uniform random tree in the special case where $G=K_n$.

# Balkan MO 2017 – Qs 1, 3 and 4

The UK is normally invited to participate as a guest team at the Balkan Mathematical Olympiad, an annual competition between eleven countries from South-Eastern Europe. I got to take part in Rhodes almost exactly ten years ago, and this year the competition was held in Ohrid, in Macedonia. There’s one paper, comprising four questions, normally one from each of the agreed olympiad topic areas, with 4.5 hours for students to address them. The contest was sat this morning, and I’m going to say quite a bit about the geometric Q2, and a little bit about Qs 1 and 3 also. In all cases, this discussion will include most of a solution, with some commentary, so don’t read these if you are planning to try the problems yourself.

I’m not saying anything about Q4, because I haven’t solved it. (Edit: I have solved it now, so will postpone Q2 until later today.)

Question One

Find all ordered pairs of positive integers (x,y) such that

$x^3+y^3=x^2+42xy+y^2.$

The first thought is that if either of x or y is ‘large’, then the LHS is bigger than the RHS, and so equality can’t hold. That is, there are only finitely many solutions. The smallest possible value of y is, naturally, 1, and substituting y=1 is convenient as then $y^2=y^3$, and it’s straightforward to derive $x=7$ as a solution.

Regarding the non-existence of large solutions, you can make this precise by factorising the LHS as

$(x+y)(x^2-xy+y^2) = x^2+42xy+y^2.$

There are 44 terms of degree two on the RHS, and one term of degree in the second bracket on the LHS. With a bit of AM-GM, you can see then that if $x+y>44$, you get a contradiction, as the LHS will be greater than the RHS. But that’s still a lot of possibilities to check.

It struck me that I could find ways to reduce the burden by reducing modulo various primes. 2, 3 and 7 all divide 42, and furthermore cubes are nice modulo 7 and squares are nice modulo 3, so maybe that would bring the number of possibilities down. But my instinct was that this wasn’t the right way to use the fact that we were solving over positive integers.

The second bracket in the factorisation looks enough like the RHS, that it’s worth exploring. If we move $x^2-xy+y^2$ from the right to the left, we get

$(x+y-1)(x^2-xy+y^2) = 43xy.$ (1.1)

Now it suddenly does look useful that we are solving over positive integers, because 43 is a prime, so has to appear as a factor somewhere on the LHS. But it’s generally quite restrictive that $x^2-xy+y^2 | 43xy$. This definitely looks like something that won’t hold often. If x and y are coprime, then certainly $x^2-xy+y^2$ and $y$ are coprime also. But actually if x and y have a non-trivial common factor d, we can divide both sides by $d^2$, and it still holds. Let’s write

$x=dm,\quad y=dn,\quad\text{where }d=\mathrm{gcd}(x,y).$

Then $m^2 -mn+n^2$ really does divide 43, since it is coprime to both m and n. This is now very restrictive indeed, since it requires that $m^2-mn+n^2$ be equal to 1 or 43. A square-sandwiching argument gives $m^2-mn+n^2=1$ iff $m=n=1$. 43 requires a little bit more work, with (at least as I did it) a few cases to check by hand, but again only has one solution, namely $m=7, n=1$ and vice versa.

We now need to add the common divisor d back into the mix. In the first case, (1.1) reduces to $(2d-1)=43$, which gives $(x,y)=(22,22)$. In the second case, after cancelling a couple of factors, (1.1) reduces to $(8d-1)=7$, from which $(x,y)=(7,1),(1,7)$ emerges, and these must be all the solutions.

The moral here seemed to be that divisibility was a stronger tool than case-reduction. But that was just this question. There are other examples where case-reduction is probably more useful than chasing divisibility.

Question Three

Find all functions $f:\mathbb{N}\rightarrow\mathbb{N}$ such that

$n+f(m) \,\big|\, f(n)+nf(m)$

for all $m,n\in\mathbb{N}$.

What would be useful here? There are two variables, and a function. It would be useful if we could reduce the number of variables, or the number of occurences of f. We can reduce the number of variables by taking m=n, to get

$n+f(n) \,\big|\, f(n) [1+n].$ (3.1)

From this, we might observe that $f(n)\equiv 1$ is a solution. Of course we could analyse this much more, but this doesn’t look like a 10/10 insight, so I tried other things first.

In general, the statement that $a\,|\,b$ also tells us that $a\,|\, b-ka$. That is, we can subtract arbitrary multiples of the divisor, and the result is still true. A recurring trope is that the original b is elegant, but an adjusted b-ka is useful. I don’t think we can do the latter, but by subtracting $n^2 +nf(m)$ from the problem statement, we get

$n+f(m) \,\big|\, n^2-f(n).$ (3.2)

There’s now no m on the RHS, but this relation has to hold for all m. One option is that $f(n)=n^2$ everywhere, then what we’ve deduced always holds since the RHS is zero. But if there’s a value of n for which $f(n)\ne n^2$, then (3.2) is a very useful statement. From now on, we assume this. Because then as we fix n and vary m, we need $n+f(m)$ to remain a divisor of the RHS, which is fixed, and so has finitely many divisors. So $f(m)$ takes only finitely many values, and in particular is bounded.

This ties to the observation that $f\equiv 1$ is a solution, which we made around (3.1), so let’s revisit that: (Note, there might be more elegant ways to finish from here, but this is what I did. Also note, n is no longer fixed as in previous paragraph.)

$n+f(n) \,\big|\, f(n) [1+n].$ (3.1)

Just to avoid confusion between the function itself, and one of the finite collection of values it might take, let’s say b is a value taken by f. So there are values of n for which

$n+b \,\big|\, b(1+n).$

By thinking about linear equations, you might be able to convince yourself that there are only finitely many solutions (in n) to this relation. There are certainly only finitely many solutions where LHS=RHS (well, at most one solution), and only finitely many where 2xLHS=RHS etc etc. But why do something complicated, when we can actually repeat the trick from the beginning, and subtract $b(n+b)$, to obtain

$n+b \,\big|\, b^2-b.$

For similar reasons to before, this is a great deduction, because it means if $b\ne 1$, then the RHS is positive, which means only finitely many n can satisfy this relation. Remember we’re trying to show that no n can satisfy this relation if $b\ne 1$, so this is definitely massive progress!

If any of what’s already happened looked like magic, I hope we can buy into the idea that subtracting multiples of the divisor from the RHS is the only tool we used, and that making the RHS fixed gives a lot of information about the LHS as the free variable varies. The final step is not magic either. We know that f is eventually 1. If you prefer “for large enough n, $f(n)=1$,” since all other values appear only finitely often. I could write this with quantifiers, but I don’t want to, because that makes it seem more complicated than it is. We genuinely don’t care when the last non-1 value appears.

Anyway, since we’ve deduced this, we absolutely have to substitute this into something we already have. Why not the original problem statement? Fix m, then for all large enough n

$n+f(m) \,\big|\, 1+nf(m).$ (3.3)

To emphasise, (3.3) has to hold for all large enough n. Is it possible that f(m)=2? Again, it’s easy to convince yourself not. But, yet again, why not use the approach we’ve used so profitably before to clear the RHS? In fact, we already did this, and called it (3.2), and we can make that work [3.4], but in this setting, because f(m) is fixed and we’re working with variable large n, it’s better to eliminate n, to get

$n+f(m)\,\big|\, f(m)^2-1,$

again for all large enough n. By the same size argument as before, this is totally impossible unless f(m)=1. Which means that in fact $f(m)=1$ for all m. Remember ages ago we assumed that f(n) was not $n^2$ everywhere, so this gives our two solutions: $f(n)=1,\, f(n)=n^2$.

Moral: choosing carefully which expression to work with can make life much more interesting later. Eliminating as many variables or difficult things from one side is a good choice. Playing with small values can help you understand the problem, but here you need to think about soft properties of the expression, in particular what happens when you take one variable large while holding another fixed.

[3.4] – if you do use the original approach, you get $n^2-1$ on the RHS. There’s then the temptation to kill the divisibility by taking n to be the integer in the middle of a large twin prime pair. Unfortunately, the existence of such an n is still just a conjecture

Question Four

(Statement copied from Art of Problem Solving. I’m unsure whether this is the exact wording given to the students in the contest.)

On a circular table sit n>2 students. First, each student has just one candy. At each step, each student chooses one of the following actions:

(A) Gives a candy to the student sitting on his left or to the student sitting on his right.

(B) Separates all its candies in two, possibly empty, sets and gives one set to the student sitting on his left and the other to the student sitting on his right.

At each step, students perform the actions they have chosen at the same time. A distribution of candy is called legitimate if it can occur after a finite number of steps.
Find the number of legitimate distributions.

My moral for this question is this: I’m glad I thought about this on the bus first. What I found hardest here was getting the right answer. My initial thoughts:

• Do I know how to calculate the total number of possibilities, irrespective of the algorithm? Fortunately yes I do. Marbles-in-urns = barriers between marbles on a line (maybe add one extra marble per urn first). [4.1]
• What happens if you just use technique a)? Well first you can get into trouble because what happens if you have zero sweets? But fine, let’s temporarily say you can have a negative number of sweets. If n is even, then there’s a clear parity situation developing, as if you colour the children red and blue alternately, at every stage you have n/2 sweets moving from red children to blue and vice versa, so actually the total number of sweets among the red children is constant through the process.
• What happens if you just use technique b)? This felt much more promising.
• Can you get all the sweets to one child? I considered looking at the child directly opposite (or almost-directly opposite) and ‘sweeping’ all the sweets away from them. It felt like this would work, except if for some parity reason we couldn’t prevent the final child having one (or more, but probably exactly one) sweets at the crucial moment when all the other sweets got passed to him.

Then I got home, and with some paper, I felt I could do all possibilities with n=5, and all but a few when n=6. My conjecture was that all are possible with n odd, and all are possible with n even, except those when none of the red kids or none of the kids get a sweet. I tried n=8, and there were a few more that I couldn’t construct, but this felt like my failure to be a computer rather than a big problem. Again there’s a trade-off between confirming your answer, and trying to prove it.

Claim: If n is even, you can’t achieve the configurations where either the red children or the blue children have no sweets.

Proof: Suppose you can. That means there’s a first time that all the sweets were on one colour. Call this time T. Without loss of generality, all the sweets are on red at T. Where could the sweets have been at time T-1? I claim they must all have been on blue, which contradicts minimality. Why? Because if at least one red child had at least one sweet, they must have passed at least one sweet to a blue neighbour.

Now it remains to give a construction for all other cases. In the end, my proof has two stages:

Step One: Given a configuration, in two steps, you can move a candy two places to the right, leaving everything else unchanged.

This is enough to settle the n odd case. For the even case, we need an extra step, which really corresponds to an initial phase of the construction.

Step Two: We can make some version of the ‘sweeping’ move precise, to end up in some configuration where the red number of children have any number of sweets except 0 or n.

Step one is not so hard. Realising that step one would be a useful tool to have was probably the one moment where I shifted from feeling like I hadn’t got into the problem to feeling that I’d mostly finished it. As ever in constructions, working out how to do a small local adjustment, which you plan to do lots of times to get a global effect, is great. (Think of how you solve a Rubik’s cube for example.)

Step two is notationally fiddly, and I would think very carefully before writing it up. In the end I didn’t use the sweeping move. Instead, with the observation that you can take an adjacent pair and continually swap their sweets it’s possible to set up an induction.

Actual morals: Observing the possibility to make a small change in a couple of moves (Step one above) was crucial. My original moral does still hold slightly. Writing lots of things down didn’t make life easier, and in the end the ideas on the bus were pretty much everything I needed.

[4.1] – one session to a group of 15 year olds is enough to teach you that the canon is always ‘marbles in urns’ never ‘balls’ nor ‘bags’, let alone both.