# Enumerating Forests

I’ve just got back from a visit to Budapest University of Technology, where it was very pleasant to be invited to give a talk, as well as continuing the discussion our research programme with Balazs. My talk concerned a limit for the exploration process of an Erdos-Renyi random graph conditioned to have no cycles. Watch this space (hopefully very soon) for a fully rigorous account of this. In any case, my timings were not as slick as I would like, and I had to miss out a chunk I’d planned to say about a result of Britikov concerning enumerating unrooted forests. It therefore feels like an excellent time to write something again, and explain this paper, which you might be able to find here, if you have appropriate journal rights.

We are interested to calculate $a_{n,m}$ the number of forests with vertex set [n] consisting of m unrooted trees. Recall that if we were interested in rooted trees, we could appeal to Prufer codes to show that there are $m n^{n-m-1}$ such forests, and indeed results of Pitman give a coalescent/fragmentation scheme as m varies between 1 and n-1. It seems that there is no neat combinatorial re-interpretation of the unrooted case though, so Britikov uses an analytic method.

We know that

$a_{n,m}= \frac{n!}{m!} \sum_{\substack{k_1+\ldots+k_m=n\\ k_i\ge 1}} \prod_{j=1}^m \frac{k_j^{k_j-2}}{k_j!}.$

To see this, observe that the $k_j$s correspond to the sizes of the m trees in the forest; $\frac{n!}{\prod k_j!}$ gives the multinomial number of ways to assign vertices to the trees; given the labels for a tree of size $k_j$, there are $k_j^{k_j-2}$ ways to make up the tree itself; and $\frac{1}{m!}$ accounts for the fact that the trees have no order.

What we would really like to do is to take the uniform distribution on the set of all labelled trees, then simulate m IID copies of this distribution, and condition the union to contain precisely n vertices. But obviously this is an infinite set, so we cannot choose uniformly from it. Instead, we can tilt so that large trees are unlikely. In particular, for each x we define

$\mathbb{P}(\xi=k) \propto \frac{k^{k-2} x^k}{k!}$,

and define the normalising constant

$B(x):= \sum_{k\ge 1} \frac{k^{k-2}x^k}{k!},$

whenever it exists. It turns out that $x\le e^{-1}$ is precisely the condition for $B(x)<\infty$. Note now that if $\xi_1,x_2,\ldots$ are IID copies of $\xi$, then

$\mathbb{P}(\xi_1+\ldots+\xi_m=n) = \frac{x^n}{B(x)^m} \sum_{k_1+\ldots + k_m=n} \prod_{j=1}^m \frac{k_j^{k_j-2}}{k_j!},$

and so we obtain

$a_{n,m}= \frac{n!}{m!} \frac{B(x)^m}{x^n} \mathbb{P}(\xi_1+\ldots + \xi_m=n).$

So asymptotics for $a_{n,m}$ might follows from laws of large numbers of this distribution $\xi$.

So far, we haven’t said anything about how to choose this value x. But observe that if you want to have lots of trees in the forest, then the individual trees should generally be small, so we take x small to tilt away from a preference for large trees. It turns out that there is a similar interpretation of criticality for forests as for general graphs, and taking x equal to 1/e, its radius of convergence works well for this setting. If you want even fewer trees, there is no option to take x larger than 1/e, but instead one can use large deviations machinery rather than laws of large number asymptotics.

We will be interested in asymptotics of the characteristic function of $\xi$ for x=1/e. In particular $\mathbb{E}[e^{it\xi}]=\frac{B(xe^{it})}{B(x)}$, and it will be enough to clarify the behaviour of this as $t\rightarrow 0$. It’s easier to work with a relation analytic function

$\theta(x)=\sum_{k\ge 1} \frac{k^{k-1}x^k}{k!},$

ie the integral of B. What now feels like a long time ago I wrote a masters’ thesis on the subject of multiplicative coalescence, and this shows up as the generating function of the solutions to Smoluchowski’s equations with monodisperse initial conditions, which are themselves closely related to the Borel distributions. In any case, several of the early papers on this topic made progress by establishing that the radius of convergence is 1/e, and that $\theta(x)e^{-\theta(x)}=x$ everywhere where $|x|\le 1/e$. We want to consider x=1/e, for which $\theta=1$.

Note that $\mathbb{E}\xi = \frac{\theta(x)}{B(x)}$, so we will make progress by relating $B(x),\theta(x)$ in two ways. One way involves playing around with contour integrals in a fashion that is clear in print, but involves quite a lot of notation. The second way is the Renyi relation which asserts that $\theta(x)=B(x)+\frac{\theta(x)^2}{2}$. We will briefly give a combinatorial proof. Observe that after multiplying through by factorials and interpreting the square of a generating function, this is equivalent to

$k^{k-1} = k^{k-2} + \frac12 \sum_{\substack{l+m=k\\l,m\ge 1}} l^{l-1}m^{m-1}\binom{k}{l},$

for all k. As we might expect from the appearance of this equality, we can prove it using a bijection on trees. Obviously on the LHS we have the size of the set of rooted trees on [k]. Now consider the set of pairs of disjoint rooted trees with vertex set [k]. This second term on the RHS is clearly the size of this set. Given an element of this set, join up the two roots, and choose whichever root was not initially in the same tree as 1 to be the new root. We claim this gives a bijection between this set, and the set of rooted trees on [k], for which 1 is not the root. Given the latter, the only pair of trees that leads to the right rooted tree on [k] under this mapping is given by cutting off the unique edge incident to the root that separates the root and vertex 1. In particular, since there is a canonical bijection between rooted trees for which 1 is the root, and unrooted trees (!), we can conclude the Renyi relation.

The Renyi relation now gives $\mathbb{E}\xi = \frac{\theta(x)}{B(x)}=2$ when x=1/e. If we wanted, we could show that the variance is infinite, which is not completely surprising, as the parameter x lies on the radius of convergence of the generating function.

Now, playing around with contour integrals, and being careful about which strands to take leads to the asymptotic as $t\rightarrow 0$

$\mathbb{E}[ e^{it\xi}] = 1+2it + \frac{2}{3}i |2t|^{3/2} (i\mathrm{sign}(t))^{3/2} + o(|t|^{3/2}).$

So from this, we can show that the characteristic function of the rescaled centred partial sum $\frac{\xi_1+\ldots+\xi_N-2N}{bN^{2/3}}$ converges to $\exp(-|t|^{3/2}\exp(\frac{i\pi}{4}\mathrm{sign} t))$, where $b= (32/9)^{1/3}$ is a constant arising out of the previous step.

We recognise this as the characteristic function of the stable distribution with parameters 3/2 and -1. In particular, we know now that $\xi$ is in the domain of attraction for a stable-3/2 distribution. If we wanted a version of the central limit theorem for such partial sums, we could have that, but since we care about the partial sums of the $\xi_i$s taking a specific value, rather than a range of values on the scale of the fluctuations, we actually need a local limit theorem.

To make this clear, let’s return to the simplest example of the CLT, with some random variables with mean $\mu$ and variance $\sigma^2<\infty$. Then the partial sums satisfy

$\mathbb{P}(\mu N + a\sigma\sqrt{N} \le S_N \le \mu_N+b\sigma\sqrt{N}) \rightarrow \int_a^b f_{\mathcal N}(x)dx,$

as $N\rightarrow\infty$. But what about the probability of $S_N$ taking a particular value m that lies between $\mu N+a\sigma \sqrt{N}$ and $\mu N + b\sigma \sqrt{N}$? If the underlying distribution was continuous, this would be uncontroversial – considering the probability of lying in a range that is smaller than the scale of the CLT can be shown in a similar way to the CLT itself. A local limit theorem asserts that when the underlying distribution is supported on some lattice, mostly naturally the integers, then these probabilities are in the limit roughly the same whenever m is close to $\mu N+a\sigma\sqrt{N}$.

In this setting, a result of Ibragimov and Linnik that I have struggled to find anywhere in print (especially in English) gives us local limit theory for integer-supported distributions in the domain of attraction of a stable distribution. Taking p( ) to be the density of this distribution, we obtain

$bm^{2/3}\mathbb{P}(\xi_1+\ldots+\xi_m=n) - p(\frac{n-2m}{b m^{2/3}}) \rightarrow 0$

as $n\rightarrow\infty$, uniformly on any set of m for which $z= \frac{n-2m}{bm^{2/3}}$ is bounded. Conveniently, the two occurrences of b clear, and Britikov obtains

$a_{n,m} = (1+o(1)) \frac{\sqrt{2\pi} n^{n-1/6}}{2^{n-m}(n-m)!} p(\frac{n-2m}{n^{2/3}},$

uniformly in the same sense as before.

# Bell Polynomials

Trees with a single cycle

When counting combinatorial objects, it is often the case that we have two types of structure present at different levels. The aim of this post is to introduce the Bell polynomials, which provides the most natural notation for describing this sort of situation, and to mention some of the results that become easier to derive in this framework. This post is based on material and exercises from Chapter 1 of Jim Pitman’s book Combinatorial Stochastic Processes, which is great, and also available online here.

The structures that Bell polynomials enumerate are called composite structures in this account. Rather than give a definition right away, I shall give an example. An object I have been thinking about in the past few weeks are graphs on n vertices containing precisely one cycle. Some of the background for this has been explained in recent posts.

In a recent post on Prufer codes, I gave the classical argument showing that the number of trees on n vertices is $n^{n-2}$. We might consider a unicyclic graph to be a tree with an extra edge. But if we consider the number of ways to add a further vertex to a tree, we get

$n^{n-2}\left[\binom{n}{2}-(n-1)\right]=n^{n-2}\binom{n-1}{2}.$

Obviously, we have overcounted. If the single cycle in a graph has length k, then the graph has been counted exactly k times in this enumeration. But it is not obvious how many graphs have a single cycle of length k.

Instead, we stop worrying about exactly how many of these there are, as there might not be a simple expression anyway. As soon as we start using them in any actual argument, it will be useful to know various properties about the graphs, but probably not exactly how many there are.

Let’s focus on this single cycle of length k say. If we remove the edges of the cycle, we are left with a collection of trees. Why? Well if there was a cycle in the remaining graph, then the original graph would have had at least two cycles. So we have a collection of trees, unsurprisingly called a forest. Remembering that some of the trees may in fact be a single vertex (on the cycle), it is clear that there is a bijection between these trees and the vertices of the cycle in the obvious way. We can think of the graph as a k-cycle, dressed with trees.

Alternatively, once we have specified its size, we can forget about the k-cycle altogether. The graph is precisely defined by a forest of k trees on n vertices, with a specified root in each tree indicating which vertex lies on the cycle, and a permutation specifying the cyclic ordering of the trees. We can write this as

$N_{n,k}=(k-1)!\sum_{(A_1,\ldots,A_k)\in\mathcal{P}^k(n)}a_1^{a_1-1}\cdot\ldots\cdot a_k^{a_k-1},\quad \text{for }a_i=|A_i|,$

where $\mathcal{P}^k(n)$ is the number of partitions of [n] with k blocks. Remember that the blocks in a partition are necessarily unordered. This makes sense in this setting as the cyclic permutation chosen from the (k-1)! possibilities specifies the order on the cycle.

Bell Polynomials

The key point about this description is that there are two types of combinatorial structure present. We have the rooted trees, and also a cyclic ordering of the rooted trees. Bell polynomials generalise this idea. It is helpful to be less specific and think of partitions of [n] into blocks. There are $w_j$ arrangements of any block of size j, and there are $v_k$ ways to arrange the blocks, if there are k of them. Note that we assume $v_k$ is independent of the arrangements within the collection of blocks. So in the previous example, $w_j=j^{j-2}$, and $v_k=(k-1)!$. Pitman denotes these sequences by $v_\bullet,w_\bullet$. Then the (n,k)th partial Bell polynomial, $B_{n,k}(w_\bullet)$ gives the number of divisions into k blocks:

$B_{n,k}(w_\bullet):=\sum_{(A_1,\ldots,A_k)\in\mathcal{P}^k(n)}\prod_{i=1}^k w_{a_i}.$

The total number of arrangements is given by the Bell polynomial

$B_n(v_\bullet,w_\bullet):=\sum_{k=1}^n v_k B_{n,k}(w_\bullet).$

Here are some other examples of Bell polynomials. The Stirling numbers of the first kind $c_{n,k}$ give the number of permutations of [n] with k cycles. Since we don’t want to impose any combinatorial structure on the set of cycles, we don’t need to consider $v_\bullet$, and the number of ways to make a j-cycle from a j-block is $w_j=(j-1)!$, so $c_{n,k}:=B_{n,k}((\bullet-1)!)$. Similarly, the Stirling numbers of the second kind $S_{n,k}$ give the number of permutations of [n] into k blocks. Almost by definition, $S_{n,k}:=B_{n,k}(1^\bullet)$, where $1^\bullet$ is defined to be the sequence containing all 1s.

Applications

So far, this is just a definition that gives an abbreviated description for the sizes of several interesting sets of discrete objects. Having clean notation is always important, but there are further advantages of using Bell polynomials. I don’t want to reproduce the entirety of the chapter I’ve read, so my aim for this final section is to give a very vague outline of why this is a useful formulation.

Bell polynomials can be treated rather nicely via generating functions. The key to this is to take a sum not over partitions, but rather over ordered partitions, which are exactly the same, except now we also care about the order of the blocks. This has the advantage that there is a correspondence between ordered partitions with k blocks and compositions with k terms. If the composition is $n_1+\ldots+n_k=n$, it is clear why there are $\binom{n}{n_1,\ldots,n_k}$ ordered partitions encoding this structure. This multinomial coefficient can be written as a product of factorials of $n_i$s over i, and so we can write:

$B_{n,k}(w_\bullet)=\frac{n!}{k!}\sum_{(n_1,\ldots,n_k)}\prod_{i=1}^k \frac{w_{n_i}}{n_i!}.$

This motivates considering the exponential generating function given by

$w(\xi)=\sum_{j=1}^\infty w_j\frac{\xi_j}{j!},$

as this leads to the neat expressions:

$B_{n,k}(w_\bullet)=n![\xi^n]\frac{w(\xi)^k}{k!},\quad B_n(v_\bullet,w_\bullet)=n![\xi^n]v(w(\xi)).$

The Bell polynomial $B_n(v_\bullet,w_\bullet)$ counts the number of partitions of [n] subject to some extra structure. If we choose uniformly from this set, we get a distribution on this combinatorial object, for which the Bell polynomial provides the normalising constant. If we then ignore the extra structure, the sequences $v_\bullet,w_\bullet$ induce a probability distribution on the set of partitions of n. This distribution is known as a Gibbs partition. It is interesting to consider when and whether it is possible to define a splitting mechanism such that the Gibbs partitions can be coupled to form a fragmentation process. This is the opposite of a coalescence process. Here, we have a sequence of masses, and at each integer time we have rules to determine which mass to pick, and a rule for how to break it into two pieces. It is certainly not the case that for an arbitrary splitting rule and sequences $v_\bullet,w_\bullet$, the one-step fragmentation of the Gibbs partition on n gives the corresponding Gibbs partition on (n-1).

CLT for random permutations

For the final demonstration of the use of Bell polynomials, I am going to sketch the outline of a solution to exercise 1.5.4. which shows that the number of cycles in a uniformly chosen permutation has a CLT. This is not at all obvious, since the number of permutations of [n] with k cycles is given by $B_{n,k}((\bullet-1)!)$ and there is certainly no simple form for this, so the possibility of doing a technical limiting argument seems slim.

For ease of notation, we copy Pitman and write $c_{n,k}:=B_{n,k}((\bullet-1)!)$ as before. First we show exercise 1.2.3. which asserts that

$x(x+1)\ldots(x+(n-1))=\sum_{k=1}^n c_{n,k}x^k.$

We argue combinatorially. The RHS is the number of ways to choose $\sigma\in S_n$ and a colouring of [n] with k colours such that the orbits of $\sigma$ are monochromatic. We prove that the LHS also has this property by induction on the number of vertices. We claim there is a 1-to-(x+n) map from configurations on n vertices to configurations on (n+1) vertices. Given $\sigma\in S_n$ and colouring, for any $a\in[n]$, we construct $\sigma_a\in S_{n+1}$ by $\sigma_a(a)=n+1$, $\sigma_a(n)=\sigma(a)$ and for all other x, $\sigma_a(x)=\sigma(x)$. We give n+1 the same colour as a. This gives us n possibilities. Alternatively, we can map (n+1) to itself and give it any colour we want. This gives us x possibilities. A slightly more careful argument shows that this is indeed a 1-to-(x+n) map, which is exactly what we require.

So the polynomial

$A_n(z)=\sum_{k=0}^nc_{n,k}z^k,$

has n real zeros, which allows us to write

$\frac{c_{n,k}}{A_n(1)}=\mathbb{P}(X_1+\ldots+X_n=k),$

where the Xs are independent but not identically distributed Bernoulli trials. The number of cycles is then given by this sum, and so becomes a simple matter to verify the CLT by checking a that the variances grows appropriately. As both mean and variance are asymptotically log n, we can conclude that:

$\frac{K_n - \log n}{\sqrt{\log n}}\stackrel{d}{\rightarrow} N(0,1).$

In a future post, I want to give a quick outline of section 1.3. which details how the Bell polynomials can be surprisingly useful to find the moments of infinitely divisible distributions.