Enumerating Forests

I’ve just got back from a visit to Budapest University of Technology, where it was very pleasant to be invited to give a talk, as well as continuing the discussion our research programme with Balazs. My talk concerned a limit for the exploration process of an Erdos-Renyi random graph conditioned to have no cycles. Watch this space (hopefully very soon) for a fully rigorous account of this. In any case, my timings were not as slick as I would like, and I had to miss out a chunk I’d planned to say about a result of Britikov concerning enumerating unrooted forests. It therefore feels like an excellent time to write something again, and explain this paper, which you might be able to find here, if you have appropriate journal rights.

We are interested to calculate a_{n,m} the number of forests with vertex set [n] consisting of m unrooted trees. Recall that if we were interested in rooted trees, we could appeal to Prufer codes to show that there are m n^{n-m-1} such forests, and indeed results of Pitman give a coalescent/fragmentation scheme as m varies between 1 and n-1. It seems that there is no neat combinatorial re-interpretation of the unrooted case though, so Britikov uses an analytic method.

We know that

a_{n,m}= \frac{n!}{m!} \sum_{\substack{k_1+\ldots+k_m=n\\ k_i\ge 1}} \prod_{j=1}^m \frac{k_j^{k_j-2}}{k_j!}.

To see this, observe that the k_js correspond to the sizes of the m trees in the forest; \frac{n!}{\prod k_j!} gives the multinomial number of ways to assign vertices to the trees; given the labels for a tree of size k_j, there are k_j^{k_j-2} ways to make up the tree itself; and \frac{1}{m!} accounts for the fact that the trees have no order.

What we would really like to do is to take the uniform distribution on the set of all labelled trees, then simulate m IID copies of this distribution, and condition the union to contain precisely n vertices. But obviously this is an infinite set, so we cannot choose uniformly from it. Instead, we can tilt so that large trees are unlikely. In particular, for each x we define

\mathbb{P}(\xi=k) \propto \frac{k^{k-2} x^k}{k!},

and define the normalising constant

B(x):= \sum_{k\ge 1} \frac{k^{k-2}x^k}{k!},

whenever it exists. It turns out that x\le e^{-1} is precisely the condition for B(x)<\infty. Note now that if \xi_1,x_2,\ldots are IID copies of \xi, then

\mathbb{P}(\xi_1+\ldots+\xi_m=n) = \frac{x^n}{B(x)^m} \sum_{k_1+\ldots + k_m=n} \prod_{j=1}^m \frac{k_j^{k_j-2}}{k_j!},

and so we obtain

a_{n,m}= \frac{n!}{m!} \frac{B(x)^m}{x^n} \mathbb{P}(\xi_1+\ldots + \xi_m=n).

So asymptotics for a_{n,m} might follows from laws of large numbers of this distribution \xi.

So far, we haven’t said anything about how to choose this value x. But observe that if you want to have lots of trees in the forest, then the individual trees should generally be small, so we take x small to tilt away from a preference for large trees. It turns out that there is a similar interpretation of criticality for forests as for general graphs, and taking x equal to 1/e, its radius of convergence works well for this setting. If you want even fewer trees, there is no option to take x larger than 1/e, but instead one can use large deviations machinery rather than laws of large number asymptotics.

We will be interested in asymptotics of the characteristic function of \xi for x=1/e. In particular \mathbb{E}[e^{it\xi}]=\frac{B(xe^{it})}{B(x)}, and it will be enough to clarify the behaviour of this as t\rightarrow 0. It’s easier to work with a relation analytic function

\theta(x)=\sum_{k\ge 1} \frac{k^{k-1}x^k}{k!},

ie the integral of B. What now feels like a long time ago I wrote a masters’ thesis on the subject of multiplicative coalescence, and this shows up as the generating function of the solutions to Smoluchowski’s equations with monodisperse initial conditions, which are themselves closely related to the Borel distributions. In any case, several of the early papers on this topic made progress by establishing that the radius of convergence is 1/e, and that \theta(x)e^{-\theta(x)}=x everywhere where |x|\le 1/e. We want to consider x=1/e, for which \theta=1.

Note that \mathbb{E}\xi = \frac{\theta(x)}{B(x)}, so we will make progress by relating B(x),\theta(x) in two ways. One way involves playing around with contour integrals in a fashion that is clear in print, but involves quite a lot of notation. The second way is the Renyi relation which asserts that \theta(x)=B(x)+\frac{\theta(x)^2}{2}. We will briefly give a combinatorial proof. Observe that after multiplying through by factorials and interpreting the square of a generating function, this is equivalent to

k^{k-1} = k^{k-2} + \frac12 \sum_{\substack{l+m=k\\l,m\ge 1}} l^{l-1}m^{m-1}\binom{k}{l},

for all k. As we might expect from the appearance of this equality, we can prove it using a bijection on trees. Obviously on the LHS we have the size of the set of rooted trees on [k]. Now consider the set of pairs of disjoint rooted trees with vertex set [k]. This second term on the RHS is clearly the size of this set. Given an element of this set, join up the two roots, and choose whichever root was not initially in the same tree as 1 to be the new root. We claim this gives a bijection between this set, and the set of rooted trees on [k], for which 1 is not the root. Given the latter, the only pair of trees that leads to the right rooted tree on [k] under this mapping is given by cutting off the unique edge incident to the root that separates the root and vertex 1. In particular, since there is a canonical bijection between rooted trees for which 1 is the root, and unrooted trees (!), we can conclude the Renyi relation.

The Renyi relation now gives \mathbb{E}\xi = \frac{\theta(x)}{B(x)}=2 when x=1/e. If we wanted, we could show that the variance is infinite, which is not completely surprising, as the parameter x lies on the radius of convergence of the generating function.

Now, playing around with contour integrals, and being careful about which strands to take leads to the asymptotic as t\rightarrow 0

\mathbb{E}[ e^{it\xi}] = 1+2it + \frac{2}{3}i |2t|^{3/2} (i\mathrm{sign}(t))^{3/2} + o(|t|^{3/2}).

So from this, we can show that the characteristic function of the rescaled centred partial sum \frac{\xi_1+\ldots+\xi_N-2N}{bN^{2/3}} converges to \exp(-|t|^{3/2}\exp(\frac{i\pi}{4}\mathrm{sign} t)), where b= (32/9)^{1/3} is a constant arising out of the previous step.

We recognise this as the characteristic function of the stable distribution with parameters 3/2 and -1. In particular, we know now that \xi is in the domain of attraction for a stable-3/2 distribution. If we wanted a version of the central limit theorem for such partial sums, we could have that, but since we care about the partial sums of the \xi_is taking a specific value, rather than a range of values on the scale of the fluctuations, we actually need a local limit theorem.

To make this clear, let’s return to the simplest example of the CLT, with some random variables with mean \mu and variance \sigma^2<\infty. Then the partial sums satisfy

\mathbb{P}(\mu N + a\sigma\sqrt{N} \le S_N \le \mu_N+b\sigma\sqrt{N}) \rightarrow \int_a^b f_{\mathcal N}(x)dx,

as N\rightarrow\infty. But what about the probability of S_N taking a particular value m that lies between \mu N+a\sigma \sqrt{N} and \mu N + b\sigma \sqrt{N}? If the underlying distribution was continuous, this would be uncontroversial – considering the probability of lying in a range that is smaller than the scale of the CLT can be shown in a similar way to the CLT itself. A local limit theorem asserts that when the underlying distribution is supported on some lattice, mostly naturally the integers, then these probabilities are in the limit roughly the same whenever m is close to \mu N+a\sigma\sqrt{N}.

In this setting, a result of Ibragimov and Linnik that I have struggled to find anywhere in print (especially in English) gives us local limit theory for integer-supported distributions in the domain of attraction of a stable distribution. Taking p( ) to be the density of this distribution, we obtain

bm^{2/3}\mathbb{P}(\xi_1+\ldots+\xi_m=n) - p(\frac{n-2m}{b m^{2/3}}) \rightarrow 0

as n\rightarrow\infty, uniformly on any set of m for which z= \frac{n-2m}{bm^{2/3}} is bounded. Conveniently, the two occurrences of b clear, and Britikov obtains

a_{n,m} = (1+o(1)) \frac{\sqrt{2\pi} n^{n-1/6}}{2^{n-m}(n-m)!} p(\frac{n-2m}{n^{2/3}},

uniformly in the same sense as before.

Random Maps 2 – The Schaeffer Bijection

As indicated at the end of the previous post, our aim is to find a natural bijection between the set of pointed, rooted quadrangulations with n faces, and some set of objects based on decorating rooted plane trees with n edges in some fashion. Unlike our previous example, the construction of this bijection is definitely not trivial. It seems like a foolish ambition to explain this without several pictures, so I’m going to focus on some aspects of the analysis which I found challenging, rather than the construction itself.

Anyway, we don’t yet know what the extended set of trees should be. We need an extra factor of 3^n, so it is natural to consider adding some sort of labelling of the tree, where for each non-root vertex in turn there are three options. So, given a rooted tree T, we label the vertices such that the root has label 0, and if a parent vertex has label k, any offspring has label k-1, k or k+1. Such a labelling is called admissable, and \mathbb{T}_n is the set of rooted plane trees with n edges and an admissable labelling.

We now demonstrate how to construct an element of \mathcal{Q}_n from an element of \mathbb{T}_n. Various authors had considered this problem to various extents, and so what follows is known as the Cori-Vauquelin-Schaeffer bijection, at least in this course.

Consider a contour exploration of the tree. That is, start out at the root and at all times take first-edge you encounter going clockwise from your current direction. When you arrive at a leaf, you will indeed therefore immediately retrace your most recent step. The key property is that you traverse each edge exactly twice, and so we may think of the tree as having 2n oriented edges. It is more useful to think about corners. A corner is the directed arc (WLOG clockwise) between adjacent edges at a vertex. There is a natural bijection between corners and directed edges, by looking anti-clockwise from the tail of the edge. So the contour process explored the directed edges in some order, and hence explores the corners of the tree. One thing I found confusing initially was switching between considering vertices and corners. I feel in retrospect that the only reason we need the vertices themselves is to induce the labelling onto the corners. These are the only thing we will use in the construction.

As we trace out the contour process, naturally we see different labels. We define the successor of a corner with label k to be the next corner seen in the contour process (taken modulo 2n if necessary) with label k-1. Note that any corner on a vertex with minimal label will not have a successor. To counter this, we add a new vertex, suggestively called v_*, with a single corner (ie no edges yet) and denote this corner to be the successor of the corners in the original tree with minimal label.

To construct our quadrangulation, we simply join up every corner with its successor corner. Note that if you are thinking of the successor of a corner as a vertex (rather than as a corner) you will get in trouble here, as it might be several ways to draw this arc.


The red arcs and vertex v* are added to form the quadrangulation. Note the blue angles indicate the three corners around the vertex labelled -1.

It is not obvious that it is possible to do this so that the arcs do not overlap. However, by considering the label process as you explore via the contour process, it becomes clear that you can discount the possibility of any overlaps one by one. This applies equally to pairs of new arcs overlapping, as well as new arcs overlapping with edges of the original tree. In any case, we remove the edges of the original tree to obtain the quadrangulation.

Note that when you move from any corner of a vertex with label k to its successor, then to the successor of its successor and so on, the labels are decreasing, so eventually you must end up at a corner with minimal label, and hence at v_*. We conclude that the graph of arcs is connected. It remains to show that it is a quadrangulation.

This is rather fiddly to do without a diagram. Note first that whenever we have a directed edge in the tree going from label k to label k-1, then this edge essentially becomes an arc of the quadrangulation. We show that the edge oriented in the other direction, called say e, induces three further arcs of a quadrangle. So e goes from label k-1 to k. Consider the corners before and following e in the contour exploration, which is a corner around the vertex with label k. The successor of the corner after e is a corner with label k-1, and this has a successor with label k-2. By construction, this must also be the successor of the corner before e. Why? Well as we traverse the contour beyond e, the first appearance of label k-1 must happen before the first appearance of label k-2, as the increments can only be in {-1,0,1}. This gives us the three further arcs. Note also that the 2-colouring of the quadrangulation is given by the parity of the tree-labelling.

I was bothered about what happens if two vertices with label k-1 are in fact the same. This would happen if, for example, the vertex labelled k is a leaf. Then, at least two of the corners around the single vertex with label k-1 have the same corner as successor. A naïve attempt at drawing the resulting arcs did not give a quadrangle. The key observation is that you have to draw the arcs in the direction of the contour process. So in this case, the arc from the corner before edge e will loop all the way around the vertex with label k, so it contains the other two relevant arcs on its way to the vertex with label k-2, giving us the ‘pacman’ quadrangle discussed earlier.

The other case we have to check is when our base edge joins two vertices with label k. Then the other two vertices of the face will have label k-1. This is similar to the above, and slightly easier.

As a preliminary to checking that we can invert this construction, we observe that the vertices of the quadrangulation are the vertices of the original tree plus v_*, and furthermore, the labels in the tree are given by the graph distance from v_* in the quadrangulation, with a constant added uniformly so that the root vertex has label 0.

At this point, we observe that in the construction, we didn’t specify how to choose the rooted edge of the quadrangulation. Canonically, we take it to be the arc between the first corner of the root in the contour process, and its successor. However, we can orient it in either direction, giving us the extra factor of 2 we were looking for.

Returning to the inverse, it is clear what to do when we see a quadrangle corresponding to the second case above – namely put an edge between the two vertices with label k. In the case where the face has labels {k,k-1,k-1,k-2} it is less obvious. Note though that by starting at the first corner of the root, which is identified by the rooted edge in the quadrangulation, we can recover the contour process from the arcs of the quadrangulation, and the labels. So when we see such a face, we can use this information to choose which of the (k-1)-labelled vertices to join to the vertex with label k.

Anyway, now we are convinced that this bijection works, the next stage is to apply it to gain extra information about a uniformly-chosen large quadrangulation. We can view the vertices as being those of a large uniform plane tree, and the labels as given by a random walk along this large tree. We might expect to see this labelling structure converge to something that looks like Brownian motion indexed by a Brownian continuum random tree, in a sense to be made more precise. And the labelling is not merely a decoration in the quadrangulation, since it specifies the distance to the identified point v_*. In particular, this gives a bound on the distance between any two vertices in the quadrangulation, eg two vertices chosen uniformly at random. In fact, by looking more carefully at the scaling limit of the uniform tree’s contour process, we can say rather more than that.

Rhombus Tilings and a Nice Bijection

I want to write a short post giving an example of what seems to me to be a rather nice proof without words. Like all the best proofs without words, they require some words to set everything up, and then even the proof itself is enhanced with a few words.

The goal is a bijection between two combinatorial objects. The first is the family of rhombus tilings. Perhaps the easiest way to define these is to give an example.

DSC_2215 - Copy

As you can see, we have tiled a hexagon with rhombi. The tiles are allowed to be in any of the three possible orientations. It matters that the angles of the hexagon are 120, as we want it to be possible to squeeze rhombi into a corner in two different ways (ie either a single tile or two tiles together), and thus the rhombi should also have angles 120 and 60. The hexagon does not have to be equilateral, as in this example, but obviously all the side lengths should be an integer multiple of the side length of the rhombus, which without loss of generality we may take to be 1.

The other combinatorial object is the class of plane partitions. We again give an example:

4 3 3 2 1
4 2 2 2
3 2 2 1
2 1 1

Notice that all the rows and columns are weakly decreasing. One observation worth making is that the diagonals gives a family of so-called interlaced partitions. In any case, we want to establish this bijection. First I show the idea, that is the proof without words bit. Then I’ll clarify exactly how to make the bijection work.

The first step is to colour a rhombus tiling with a different colour for each orientation, as shown.

DSC_2216 - Copy

The next step is the proof without words bit. We now look at the diagram as if we were looking into a stack of cubes arranged in the positive orthant of R^3. The colouring makes this much more visually arresting. Black rhombi correspond to the (visible) top sides of cubes, while blue and red faces point out in the x and y directions respectively. The key observation is that a rhombus tiling means we can see at least one face of every cube. Otherwise we would need some smaller rhombi to account for the way that some cubes will be partially hidden between taller but closer piles. So if make a note of the heights of all the piles, we should get a plane partition.

After reordering our definition of plane partition, so it is weakly increasing left-right and down-up, corresponding to the x and y axes drawn on the above diagram, the given rhombus tiling should give the following plane partition:

1 2 3
0 1 3
0 0 2

The only thing we need to sort out is precisely how the dimensions of the hexagon restrict the choice of plane partition. Note that we could keep the heights exactly the same but get a different tiling by adding an extra row of red oriented rhombi above the top-left part, and an extra row of blue oriented rhombi above the top-right part. The point is that this would give us a bigger hexagon.

The first observation is that the dimensions of the plane partition correspond to two of the side lengths of the hexagon, indeed the bottom two sides. The third length of the hexagon corresponds to the maximum possible height (ie z component) of the region we are looking at. This is therefore an upper bound on the heights of the stacks.

So we can conclude our bijective argument. There is a bijection between rhombus tilings of the hexagon with side lengths X, Y, Z and plane partitions with dimension X x Y, (where entries are allowed to be zero) where the largest element (which is by definition also the top-left element, or top-right in our re-definition) is at most Z.

It seems there are plenty of interesting questions to be asked about both deterministic and random tilings and plane partitions, based on talks in Marseille. For now though, I feel ill-qualified even to read about such things, so will leave it at that for today.

The Contour Process

As I explained in my previous post, I haven’t been reading around as much as I would generally like to recently. A few days in London staying with my parents and catching up with some friends has therefore been a good chance to get back into the habit of leafing through papers and Pitman’s book among other things.

This morning’s post should be a relatively short one. I’m going to define the contour process, a function of a (random or deterministic) tree, related to the exploration process which I have mentioned a few times previously. I will then use this to prove a simple but cute result equating in distribution the sizes of two different branching processes via a direct bijection.

The Contour Process

To start with, we have to have a root, and from that root we label the tree with a depth-first labelling. An example of this is given below. It is helpful at this stage to conceive this process as an explorer walking on the tree, and turning back on themselves only when there is no option to visit a vertex they haven’t already seen. So in the example tree shown, the depth-first exploration visits vertex V_2 exactly four times. Note that with this description, it is clear that the exploration traverses every edge exactly twice, and so the length of the sequence is 2n-1, where n is the number of vertices in the tree since obviously, we start and end at the root.

Another common interpretation of this depth-first exploration is to take some planar realisation of the tree. (Note trees are always planar – proof via induction after removing a leaf.) Then if you treat the tree as a hedge and starting at the root walk along, following the outer boundary with your right hand, this exactly recreates the process.

The height of a tree at a particular vertex is simply the graph distance between that vertex and the root. So when we move from one vertex to an adjacent vertex, the height must increase or decrease by 1.

The contour process is the sequence of heights seen along the depth-first exploration. It is therefore a sequence:

0=h_0,h_1,\ldots,h_{2n-1}=0,\quad h_i\geq 0,

and such that |h_{i+1}-h_i|=1.

Note that though the contour process uniquely determines the tree structure, the choice of depth-first labelling is a priori non-canonical. For example, in the display above, V_3 might have been explored before V_2. Normally this is resolved by taking the suitable vertex with the smallest label in the original tree to be next. It makes little difference to any analysis to choose the ordering of descendents of some vertex in a depth-first labelling randomly. Note that this explains why it is rather hard to recover Cayley’s theorem about the number of rooted trees on n vertices from this characterisation. Although the number of suitable contour functions is possible to calculate, we would require a complicated multiplicative correction for labelling if we wanted to recover the number of trees.

The only real observation about the uses of the contour process at this stage is that it is not in general a random walk with IID increments for a Galton-Watson branching process. This equivalence is what made the exploration process so useful. In particular, it made it straightforward, at least heuristically, to see why large trees might have a limit interpretation through Brownian excursions. If for example, the offspring distribution is bounded above, say by M, then the contour process certainly cannot be a random walk, as if we have visited a particular vertex exactly M+1 times, then it cannot have another descendent, and so we must return closer to the root at the next step.

I want to mention that in fact Aldous showed his results on scaling limits towards the Continuum Random Tree through the contour process rather than the exploration process. However, I don’t want to say any more about that right now.

A Neat Equivalence

What I do want to talk about is the following distribution on the positive integers. This comes up in Balazs Rath and Balint Toth’s work on forest-fires on the complete graph that I have been reading about recently. The role of this distribution is a conjectured equilibrium distribution for component size in a version of the Erdos-Renyi process where components are deleted (or ‘struck by lightning’) at a rate tuned so that giant components ‘just’ never emerge.

This distribution has the possibly useful property that it is the distribution of the total population size in a Galton-Watson process with Geom(1/2) offspring distribution. It is also the distribution of the total number of leaves in a critical binary branching process, where every vertex has either two descendents or zero descendents, each with probability 1/2. Note that both of these tree processes are critical, as the expected number of offspring is 1 in each case. This is a good start, as it suggests that the relevant equilibrium distribution should also have the power-law tail that is found in these critical branching processes. This would confirm that the forest-fire model exhibits self-organised criticality.

Anyway, as a sanity check, I tried to find a reason why, ignoring the forest-fires for now, these two distributions should be the same. One can argue using generating functions, but there is also the following nice bijective argument.

We focus first on the critical Geometric branching process. We examine its contour function. As explained above, the contour process is not in general a random walk with IID increments. However, for this particular case, it is. The geometric distribution should be viewed as the family of discrete memoryless distributions.

This is useful for the contour process. Note that if we are at vertex V for the (m+1)th time, that is we have already explored m of the edges out of V, then the probability that there is at least one further edge is 1/2, independently of the history of the exploration, as the offspring distribution is Geometric(1/2), which we can easily think of as adding edges one at a time based on independent fair coin tosses until we see a tail for example. The contour process for this random tree is therefore a simple symmetric random walk on Z. Note that this will hit -1 at some point, and the associated contour process is the RW up to the final time it hits 0 before hitting -1. We can check that this obeys the clear rule that with probability 1/2 the tree is a single vertex.

Now we consider the other model, the Galton-Watson process with critical binary branching mechanism. We should consider the exploration process. Recall that the increments in this process are given by the offspring distribution minus one. So this random sequence also behaves as a simple symmetric random walk on Z, again stopped when we hit -1.

To complete the bijective argument, we have to relate leaves in the binary process to vertices in the geometric one. A vertex is a leaf if it has no offspring, so the number of leaves is the number of times before the hitting time of -1 that the exploration process decreases by 1. (*)

Similarly for the contour process. Note that there is bijection between the set of vertices that aren’t the root and the set of edges. The contour process explores every edge exactly twice, once giving an increase of 1 and once giving a decrease of 1. So there is a bijection between the times that the contour process decreases by 1 and the non-root vertices. But the contour process was defined only up to the time we return to the root. This is fine if we know in advance how large the tree is, but we don’t know which return to the root is the final return to the root. So if we extend the random walk to the first time it hits -1, the portion up until the last increment is the contour process, and the final increment must be a decrease by 1, hence there is a bijection between the number of vertices in the Geom(1/2) G-W tree and the number of times that the contour process decreases by 1 before the hitting time of -1. Comparing with (*) gives the result.

Generating Functions for the IMO

The background to this post is that these days I find myself using generating functions all the time, especially for describing the stationary states of various coalescence-like processes. I remember meeting them vaguely while preparing for the IMO as a student. However, a full working understanding must have eluded me at the time, as for Q5 on IMO 2008 in Madrid I had written down in big boxes the two statements involving generating functions that immediately implied the answer, but failed to finish it off. The aim of this post is to help this year’s team avoid that particular pitfall.

What are they?

I’m going to define some things in a way which will be most relevant to the type of problems you are meeting now. Start with a sequence (a_0,a_1,a_2,\ldots). Typically these will be the sizes of various combinatorial sets. Eg a_n = number of partitions of [n] with some property. Define the generating function of the sequence to be:

f(x)=\sum_{k\geq 0}a_k x^k=a_0+a_1x+a_2x^2+\ldots.

If the sequence is finite, then this generating function is a polynomial. In general it is a power series. As you may know, some power series can be rather complicated, in terms of where they are defined. Eg


only when |x|<1. For other values of x, the LHS diverges. Defining f over C is fine too. This sort of thing is generally NOT important for applications of generating functions to combinatorics. To borrow a phrase from Wilf, a generating function is a convenient `clothesline’ on which to hang a sequence of numbers.

We need a notation to get back from the generating function to the coefficients. Write [x^k]g(x) to denote the coefficient of x^k in the power series g(x). So, if g(x)=3x^3-5x^2+7, then [x^2]g(x)=-5. It hopefully should never be relevant unless you read some other notes on the topic, but the notation [\alpha x^2]g(x):=\frac{[x^2]g(x)}{\alpha}, which does make sense after a while.

How might they be useful?

Example: binomial coefficients a_k=\binom{n}{k} appear, as the name suggests, as coefficients of

f_n(x)=(1+x)^n=\sum_{k=0}^n \binom{n}{k}x^k.

Immediate consequence: it’s trivial to work out \sum_{k=0}^n \binom{n}{k} and \sum_{k=0}^n(-1)^k \binom{n}{k} by substituting x=\pm 1 into f_n.

Less obvious consequence. By considering choosing n from a red balls and b blue balls, one can verify

\binom{a+b}{n}=\sum_{k=0}^n \binom{a}{k}\binom{b}{n-k}.

We can rewrite the RHS as


Think how we calculate the coefficient of x^n in the product f(x)g(x), and it is now clear that \binom{a+b}{n}=[x^n](1+x)^{a+b}, while


so the result again follows. This provides a good slogan for generating functions: they often replicate arguments via bijections, even if you can’t find the bijection.

Useful for? – Multinomial sums

The reason why the previous argument for binomial coefficients worked nicely is because we were interested in the coefficients, but had a neat expression for the generating function as a polynomial. In particular, we had an expression

\sum_{k+l=n}a_k b_l.

This is always a clue that generating functions might be useful. This is sometimes called a convolution.

Exercise: prove that in general, if f(x) is the generating function of (a_k) and g(x) the generating function of (b_l), then f(x)g(x) is the generating function of \sum_{k+l=n}a_kb_l.

Even more usefully, this works in the multinomial case:

\sum_{k_1+\ldots+k_m=n}a^{(1)}_{k_1}\ldots a^{(m)}_{k_m}.

In many applications, these a^{(i)}s will all be the same. We don’t even have to specify how many k_i’s there are to be considered. After all, if we want the sum to be n, then only finitely many can be non-zero. So:

\sum_{m}\sum_{k_1+\ldots+k_m=n}a_{k_1}\ldots a_{k_m}=[x^n]f(x)^n=[x^n]f(x)^\infty,

provided f(0)=1.

Useful when? – You recognise the generating function!

In some cases, you can identify the generating function as a `standard’ function, eg the geometric series. In that case, manipulating the generating functions is likely to be promising. Here is a list of some useful power series you might spot.

1+x+x^2+\ldots=\frac{1}{1-x},\quad |x|<1

1+2x+3x^2+\ldots=\frac{1}{(1-x)^2},\quad |x|<1


\cos x=1-\frac{x^2}{2!}+\frac{x^4}{4!}\pm\ldots

Exercise: if you know what differentiation means, show that if f(x) is the gen fn of (a_k), then xf'(x) is the gen fn of ka_k.

Technicalities: some of these identities are defined only for certain values of x. This may be a problem if they are defined at, say, only a single point, but in general this shouldn’t be the case. In addition, you don’t need to worry about differentiability. You can definition differentiation of power series by x^n\mapsto nx^{n-1}, and sort out convergence later if necessary.

Useful for? – Recurrent definitions

The Fibonacci numbers are defined by:

F_0=F_1=1,\quad F_{n+1}=F_n+F_{n-1},\quad n\geq 1.

Let F(x) be the generating function of the sequence F_n. So, for n=>1,


and F(0)=1, so we can conclude that:

F(x)=1+(x+x^2)F(x)\quad\Rightarrow\quad F(x)=\frac{1}{1-x-x^2}.

Exercise: Find a closed form for the generating function of the Catalan numbers, defined recursively by:


Can you now find the coefficients explicitly for this generating function?

Useful for? – Partitions

Partitions can be an absolute nightmare to work with because of the lack of explicit formulae. Often any attempt at a calculation turns into a massive IEP bash. This prompts a search for bijective or bare-hands arguments, but generating functions can be useful too.

For now (*), let’s assume a partition of [n] means a sequence of positive integers a_1\geq a_2\geq\ldots\geq a_k such that a_1+\ldots+a_k=n. Let p(n) be the number of partitions of [n].

(* there are other definitions, in terms of a partition of the set [n] into k disjoint but unlabelled sets. Be careful about definitions, but the methods often extend to whatever framework is required. *)

Exercise: Show that the generating function of p(n) is:


Note that if we are interested only in partitions of [n], then we don’t need to consider any terms with exponent greater than n, so if we wanted we could take a finite product instead.

Example: the mint group will remember this problem from the first session in Cambridge:

Show that the number of partitions of [n] with distinct parts is equal to the number of partitions of [n] with odd parts.

Rather than the fiddly bijection argument found in the session, we can now treat this as a simple calculation. The generating function for distinct parts is given by:


while the generating function for odd parts is given by:


Writing the former as


shows that these are equal and the result follows.

Other things – Multivariate Generating Functions

If you want to track a sequence in two variables, say a_{m,n}, then you can encode this with the bivariate generating function

f(x,y):=\sum_{m,n\geq 0}a_{m,n}x^my^n.

The coefficients are then extracted by [x^ay^b] and so on. There’s some interesting stuff on counting lattice paths with this method.

Sums over arithmetic progressions via roots of unity

Note that we can extract both \sum a_n and \sum (-1)^na_n by judicious choice of x in f(x). By taking half the sum or half the difference, we can obtain

a_0+a_2+a_4+\ldots=\frac12(f(1)+f(-1)),\quad a_1+a_3+a_5+\ldots=\frac12(f(1)-f(-1)).

Can we do this in general? Yes actually. If you want a_0+a_k+a_{2k}+\ldots, this is given by:


where w=e^{2\pi i/k} is a $k$th root of unity. Exercise: Prove this.

For greater clarity, first try the case k=4, and consider the complex part of the power series evaluated at +i and -1.

Bijections, Prufer Codes and Cayley’s Formula

I’m currently at the training camp in Cambridge for this year’s UK IMO squad. This afternoon I gave a talk to some of the less experienced students about combinatorics. My aim was to cover as many useful tricks for calculating the sizes of combinatorial sets as I could in an hour and a half. We started by discussing binomial coefficients, which pleasingly turned out to be revision for the majority. But my next goal was to demonstrate that we are much more interested in the fact that we can calculate these if we want than in the actual expression for their values.

Put another way, my argument was that the interpretation of \binom{n}{m} as the number of ways to choose m objects from a collection of n, or the number of up-and-right paths from (0,0) to (m,n) is more useful than the fact that \binom{n}{m}=\frac{n!}{m!(n-m)!}. The opening gambit was to prove the fundamental result underlying the famous construction of Pascal’s triangle that


This is not a hard result to prove by manipulating factorials, but it is a very easy result to prove in the path-counting setting, for example.

So it turned out that the goal of my session, as further supported by some unsubtly motivated problems from the collection, was to convince the students to use bijections as much as possible. That is, if you have to count something awkward, show that counting the awkward thing is equivalent to counting something more manageable, then count that instead. For many simpler questions, this equivalence is often drawn implicitly using words (“each of the n objects can be in any subset of the collection of bags so we multiply…” etc), but it is always worth having in mind the formal bijective approach. Apart from anything else, asking the question “is this bijection so obvious I don’t need to prove it” is often a good starting-point for assessing whether the argument is in fact correct!

Anyway, I really wanted to show my favouriite bijection argument, but there wasn’t time, and I didn’t want to spoil other lecturers’ thunder by defining a graph and a tree and so forth. The exploration process encoding of trees is a strong contender, but today I want to define quickly the Prufer coding for trees, and use it to prove a famous result I’ve been using a lot recently, Cayley’s formula for the number of spanning trees on the complete graph with n vertices, n^{n-2}.

We are going to count rooted trees instead. Since we can choose any vertex to be the root, there are n^{n-1} rooted trees on n vertices. The description of the Prufer code is relatively simple. Take a rooted tree with vertices labelled by [n]. A leaf is a vertex with degree 1, other than the root. Find the leaf with the largest label. Write down the label of the single vertex to which this leaf is connected, then delete the leaf. Now repeat the procedure, writing down the label of the vertex connected to the leaf now with the largest label, until there are only two vertices remaining, when you delete the non-root vertex, and write down the label of the root. We get a string of (n-1) labels. We want to show that this mapping is a bijection from the set of rooted trees with vertices labelled by [n] to [n]^{n-1}.

Let’s record informally how we would recover a tree from the Prufer code. First, observe that the label of any vertex which is not a leaf must appear in the code. Why? Well, the root label appears right at the end, if not earlier, and every vertex must be deleted. But a vertex cannot be deleted until it has degree one, so the neighbours further from the root (or ancestors) of the vertex must be removed first, and so by construction the label appears. So know what the root is, and what the leaves are straight away.

In fact we can say slightly more than this. The number of times the root label appears is the degree of the root, while the number of times any other label appears is the degree of the corresponding vertex minus one. Call this sequence the Prufer degrees.

So we construct the tree backwards from the leaves towards the root. We add edges one at a time, with the k-th edge joining the vertex with the k-th label to some other vertex. For k=1, this other vertex is the leaf with maximum label. In general, let G_k be the graph formed after the addition of k-1 edges, so G_1 is empty, and G_n is the full tree. Define T_k to be the set of vertices such that their degree in G_k is exactly one less than their Prufer degree. Note that T_1 is therefore the set of leaves suggested by the Prufer code. So we form G_{k+1} by adding an edge between the vertex with label appearing at position k+1 in the Prufer sequence and the vertex of T_k with maximum label.

Proving that this is indeed the inverse is a bit fiddly, more because of notation than any actual mathematics. You probably want to show injectivity by an extremal argument, taking the closest vertex to the root that is different in two trees with the same Prufer code. I hope it isn’t a complete cop out to swerve around presenting this in full technical detail, as I feel I’ve achieved by main goal of explaining why bijection arguments can reduce a counting problem that was genuinely challenging to an exercise in choosing sensible notation for proving a fairly natural bijection.