Generating Functions for Dice

So last week I was writing an article for Betting Expert about laws of large numbers, and I was trying to produce some representations of distributions to illustrate the Weak LLN and the Central Limit Theorem. Because tossing a coin feels too simplistic, and also because the natural state space for this random variable, at least verbally, is not a subset of the reals, I decided to go for dice instead. So it’s clear what the distribution of the outcome of a single dice roll is, and with a bit of thought or a 6×6 grid, we can work out the distribution of the average of two dice rolls. But what about 100 rolls? Obviously, we need large samples to illustrate the laws of large numbers! In this post, we discuss how to calculate the distribution of the sample mean of n dice rolls.

First we observe that the total set of outcomes of n dice rolls is $6^n$. The sum of the outcomes must lie between n and 6n inclusive. The distribution of the sum and the distribution of the sample mean are equivalent up to dividing by n. The final observation is that because the total number of outcomes has a nice form, we shouldn’t expect it to make any difference to the method if we calculate the probability of a given sum, or the number of configurations giving rise to that sum.

Indeed, tying in nicely with the first year probability course, we are going to use generating functions, and there is no difference in practice between the probability generating function, and the combinatorial generating function, if the underlying mechanism is a uniform choice. Well, in practice, there is a small difference, namely a factor of 6 here. The motivation for using generating functions is clear: we are considering the distribution of a sum of independent random variables. This is pretty much exactly why we bother to set up the machinery for PGFs.

Anyway, since each of {1,2,…,6} is equally likely, the GF of a single dice roll is

$x+x^2+\ldots+x^6=x\cdot \frac{1-x^6}{1-x}.$

So, if we want the generating function of the sum of n independent dice rolls, we can obtain this by raising the above function to the power n. We obtain

$x^n(1-x^6)^n(1-x)^{-n}.$

Note the factor of $x^n$ at the beginning arises because the minimum value of the sum is n. So to work out the number of configurations giving rise to sum k, we need to evaluate the coefficient of $x^k$. We can deal with $(1-x^6)^n$ fairly straightforwardly, but some thought it required regarding whether it’s possible to do similar job on $(1-x)^{-n}$.

We have to engage briefly with what is meant by a binomial coefficient. Note that

$\binom{x}{k}=\frac{x(x-1)\ldots(x-k+1)}{1\cdot\ldots\cdot k}$

is a valid definition even when x is not a positive integer, as it is simply a degree k polynomial in x. This works if x is a general positive real, and indeed if x is a general negative real. At this stage, we do need to keep k a positive integer, but that’s not a problem for our applications.

So we need to engage with how the binomial theorem works for exponents that are not positive integers. The tricky part with the standard expression as

$(a+b)^n=\binom{n}{0}a^n+\ldots + \binom{n}{n}b^n,$

is that the attraction of this symmetry in a and b prompts us to work in more generality than is entirely necessary to state the result. Note if we instead write

$(1+x)^n=1+\binom{n}{1}x+\binom{n}{2}x^2+\ldots,$

we have unwittingly described this finite sum as an infinite series. It just happens that all the binomial coefficients apart from the first (n+1) are zero. The nice thing about this definition is that it might plausibly generalise to non-integer or negative values of n. And indeed it does. I don’t want to go into the details here, but it’s just a Taylor series really, and the binomial coefficients are set up with factorials in the right places to look like a Taylor series, so it all works out.

It is also worth remarking that it follows straight from the definition of a negative binomial coefficient, that

$\binom{-n}{j}=(-1)^j \binom{n+j-1}{j}.$

In any case, we can rewrite our expression for the generating function of the IID sum as

$x^n\left[\sum_{k=0}^n \binom{n}{k}(-1)^k x^{6k}\right]\left[\sum_{j\ge 0} \binom{-n}{j}(-1)^j x^j\right]$

By accounting for where we can gather exponents from each bracket, we can evaluate the coefficient of $x^m$ as

$\sum_{6k+j=m+n}\binom{n}{k}\binom{n+j-1}{j}(-1)^k.$

Ie, k in the sum takes values in $\{0,1,\ldots, \lfloor \frac{m+n}{6}\rfloor\}$. At least in theory, this now gives us an explicit way to calculate the distribution of the average of multiple dice rolls. We have to be wary, however, that many compilers will not be happy dealing with large binomial coefficients, as the large factorials grow extremely rapidly. An approximation using logs is likely to be more tractable for larger settings.

Anyway, I leave you with the fruits of my labours.

Related articles

Bijections, Prufer Codes and Cayley’s Formula

I’m currently at the training camp in Cambridge for this year’s UK IMO squad. This afternoon I gave a talk to some of the less experienced students about combinatorics. My aim was to cover as many useful tricks for calculating the sizes of combinatorial sets as I could in an hour and a half. We started by discussing binomial coefficients, which pleasingly turned out to be revision for the majority. But my next goal was to demonstrate that we are much more interested in the fact that we can calculate these if we want than in the actual expression for their values.

Put another way, my argument was that the interpretation of $\binom{n}{m}$ as the number of ways to choose m objects from a collection of n, or the number of up-and-right paths from (0,0) to (m,n) is more useful than the fact that $\binom{n}{m}=\frac{n!}{m!(n-m)!}$. The opening gambit was to prove the fundamental result underlying the famous construction of Pascal’s triangle that

$\binom{n+1}{m+1}=\binom{n}{m}+\binom{n}{m+1}.$

This is not a hard result to prove by manipulating factorials, but it is a very easy result to prove in the path-counting setting, for example.

So it turned out that the goal of my session, as further supported by some unsubtly motivated problems from the collection, was to convince the students to use bijections as much as possible. That is, if you have to count something awkward, show that counting the awkward thing is equivalent to counting something more manageable, then count that instead. For many simpler questions, this equivalence is often drawn implicitly using words (“each of the n objects can be in any subset of the collection of bags so we multiply…” etc), but it is always worth having in mind the formal bijective approach. Apart from anything else, asking the question “is this bijection so obvious I don’t need to prove it” is often a good starting-point for assessing whether the argument is in fact correct!

Anyway, I really wanted to show my favouriite bijection argument, but there wasn’t time, and I didn’t want to spoil other lecturers’ thunder by defining a graph and a tree and so forth. The exploration process encoding of trees is a strong contender, but today I want to define quickly the Prufer coding for trees, and use it to prove a famous result I’ve been using a lot recently, Cayley’s formula for the number of spanning trees on the complete graph with n vertices, $n^{n-2}$.

We are going to count rooted trees instead. Since we can choose any vertex to be the root, there are $n^{n-1}$ rooted trees on n vertices. The description of the Prufer code is relatively simple. Take a rooted tree with vertices labelled by [n]. A leaf is a vertex with degree 1, other than the root. Find the leaf with the largest label. Write down the label of the single vertex to which this leaf is connected, then delete the leaf. Now repeat the procedure, writing down the label of the vertex connected to the leaf now with the largest label, until there are only two vertices remaining, when you delete the non-root vertex, and write down the label of the root. We get a string of (n-1) labels. We want to show that this mapping is a bijection from the set of rooted trees with vertices labelled by [n] to $[n]^{n-1}$.

Let’s record informally how we would recover a tree from the Prufer code. First, observe that the label of any vertex which is not a leaf must appear in the code. Why? Well, the root label appears right at the end, if not earlier, and every vertex must be deleted. But a vertex cannot be deleted until it has degree one, so the neighbours further from the root (or ancestors) of the vertex must be removed first, and so by construction the label appears. So know what the root is, and what the leaves are straight away.

In fact we can say slightly more than this. The number of times the root label appears is the degree of the root, while the number of times any other label appears is the degree of the corresponding vertex minus one. Call this sequence the Prufer degrees.

So we construct the tree backwards from the leaves towards the root. We add edges one at a time, with the k-th edge joining the vertex with the k-th label to some other vertex. For k=1, this other vertex is the leaf with maximum label. In general, let $G_k$ be the graph formed after the addition of k-1 edges, so $G_1$ is empty, and $G_n$ is the full tree. Define $T_k$ to be the set of vertices such that their degree in $G_k$ is exactly one less than their Prufer degree. Note that $T_1$ is therefore the set of leaves suggested by the Prufer code. So we form $G_{k+1}$ by adding an edge between the vertex with label appearing at position k+1 in the Prufer sequence and the vertex of $T_k$ with maximum label.

Proving that this is indeed the inverse is a bit fiddly, more because of notation than any actual mathematics. You probably want to show injectivity by an extremal argument, taking the closest vertex to the root that is different in two trees with the same Prufer code. I hope it isn’t a complete cop out to swerve around presenting this in full technical detail, as I feel I’ve achieved by main goal of explaining why bijection arguments can reduce a counting problem that was genuinely challenging to an exercise in choosing sensible notation for proving a fairly natural bijection.