The Chinese Restaurant Process

A couple of months ago I wrote a post about Polya’s Urn, the simplest example of self-reinforcing process. Recall that we have a bag containing black and white balls, and sequentially we draw a ball then replace it together with an additional ball of the same colour. The process is self-reinforcing in the sense that if there is a surplus of black balls, the dynamics will reinforce this by adding more black balls than white balls. Alternatively, you can think of a natural limit process when the number of balls is large, for which any distribution is an invariant distribution. We have seen models such as the Preferential Attachment dynamics for network creation, where the degrees of vertices clearly have this self-reinforcing property. New vertices are more likely to join to existing vertices with large degrees.

One difference between the Polya Urn and some of the models we might be interested in for applications is that for the urn model, the number of classes (in this context colours of balls) is fixed. In many applications, we will want to allow new classes to appear. In the process which follows, we will allow this, and the new classes will have initial size equal to 1, so will be at a disadvantage for the self-reinforcing dynamics. Nonetheless, some will show up in a meaningful way in the limit. It is worth emphasising that Polya’s Urn gave us the Dirichlet distribution in the limit, and this can be thought of as a partition of [0,1]. These more general processes will give us a more interesting family of partitions, called the Poisson-Dirichlet distributions. These will turn up in a wide variety of contexts, and this is perhaps the friendliest way to introduce them.

The model is this. We start with a single diner who sits at the first table. Then whenever the (n+1)th diner arrives, they join a table with k diners already with probability k/n+1, and they start a new table with probability 1/n+1.

(Aside: I’m not exactly sure how this relates to a Chinese restaurant? It seems more reminiscent of a university dining hall during freshers’ week, but I guess that would be a less catchy name for a model.)

Anyway, the interest in this description lies not in organising seating arrangements. Consider choosing uniformly at random from the set of permutations on [n+1]. Suppose x maps to n+1 and n+1 maps to y. Consider taking the permutation of [n] formed by instead mapping x to y and ignoring n+1. This has the uniform distribution on the set of permutations of [n]. By reversing this procedure, we can construct a uniform permutation of [n+1] from a uniform permutation of [n]. When you do this as a process for n growing, observe that the orbits correspond exactly to tables in the Chinese Restaurant Process. If we wanted the CRP to give all the information about the permutation, we could specify the ordering round each table, by saying that with probability 1/n+1 the new diner sits to the left of any given existing diner.

As a starting point for why this is a useful description of the uniform permutation distribution, observe that the size of the component containing the element 1 evolves as a Polya Urn with initial vector (1,1). The second 1 in the initial vector corresponds to the possibility of starting a new table, which is maintained at every stage. This tells us immediately that as n grows to infinity, the proportion of elements in the same cycle as 1 in the uniform permutation converges in distribution to U[0,1]. The construction also allows for an easy proof that the expected number of cycles is roughly log n for large n, since on each pass of the process, the probability that there is a new cycle formed is 1/k.

In this case, the partition induced on [n] by the process is clearly exchangeable given our permutation description. However, this will turn out to hold in greater generality. Note also,, that conditional on the size of the cycle containing 1, the sizes of the remaining cycles are given by a uniform permutation on a smaller number of elements. So the limiting result holds jointly in the first k cycle sizes for all k. More precisely, if $(N_1,N_2,\ldots)$ are the cycle sizes ordered by least element, then the frequencies converge to:

$(U_1,(1-U_1)U_2,(1-U_1)(1-U_2)U_3,\ldots),$

where the Us are independent U[0,1] RVs. This is known as a stick-breaking procedure, where at each step we break off some proportion of the stick according to a fixed distribution, and assemble the pieces into a partition.

We generalise this process to get a two-parameter version. The standard notation for the parameters is $(\alpha,\theta)$. Then we amend the dynamics. We now have to take into account how many tables are occupied when the (n+1)th diner arrives. If k tables are occupied, and the ith table has $n_i$ diners, then the new one will join this table with probability $\frac{n_i-\alpha}{n+\theta}$, and will start a new table otherwise, so with probability $\frac{\theta+k\alpha}{n+\theta}$. The original process therefore corresponds to parameters (0,1).

First we examine which parameters are possible. If $\alpha<0$, and $m|\alpha|<\theta<(m+1)|\alpha|$, then with high probability the (m+1)th table will eventually be occupied, whereafter the probability of forming a further table will be negative. So we have to demand instead that $\theta$ is an integer multiple of $-\alpha$. Then the number of tables is bounded by this multiple, so for large n, the probability of joining one of the k (fixed) tables is roughly $\frac{n_i}{n}$, so this should behave roughly like the standard Polya Urn. And indeed, the induced frequencies do converge to the Dirichlet distribution with k equal parameters.

Obviously $\alpha$ cannot be greater than 1, otherwise the probability of the second diner joining the first table is negative. If it is equal to 1, then every diner starts a new table, which isn’t very interesting. So we care about $\alpha\in[0,1)$, and for the probability of the second diner starting a new table to be non-negative we require $\theta>-\alpha$.

It turns out that the partitions induced by this process are exchangeable also. We also have a stick-breaking construction, although now the broken proportions are not IID, but distributed as

$U_i\sim \mathrm{Beta}(1-\alpha,\theta+i\alpha),$

with the same notation otherwise. It turns out that under mild assumptions, these are all the infinite exchangeable random partitions with this stick-breaking property.

My initial struggle with this process was to understand what roles $(\alpha,\theta)$ played in a more precise way. It turns out this is best explained through the limit of the partition, but Pitman’s Exercise 3.2.2 does at least give an idea of how such a process with parameters (1/2,0) might naturally arise as a version of an urn model.

3.2.2. Let an urn initially contain two balls of different colours. Draw 1 is a simple draw from the urn with replacement. Thereafter, balls are drawn from the urn, with replacement of the ball drawn, and addition of two more balls as follows. If the ball drawn is of a colour never drawn before, it is replaced together with two additional balls of two distinct new colours, different to the colours of balls already in the urn. Whereas if the ball drawn is of a colour that has been drawn before, it is replaced together with two balls of its own colour.

Let $n_1$ be the number of times a ball of the colour of the first ball drawn (and replaced) is drawn. Let $n_2,n_3,\ldots$ be the number of times balls of each other colour are drawn. Suppose after n draws, we have drawn k colours. (There will be other colours in the bag not yet drawn.) Then, for each drawn colour i, there are $2n_i-1$ balls of that colour in the bag, giving 2n-k in total. But there should be 2n balls in total, so there are k other balls. Then the probability that we see a new colour is k/2n, and the probability that we see colour i again is $\latex \frac{2n_i-1}{2n}=\frac{n_i-1/2}{n}$, which exactly corresponds to the dynamics for PD(1/2,0).

The other question I was puzzled by initially is where does the dust come from in the limit? Recall that in an infinite exchangeable partition, the sum of the frequencies does not need to be 1. The difference between this sum and 1 gives the probability that an element is in a block by itself. Obviously, when the number of tables is bounded (as when $\alpha<0$) this is not an issue, but for positive $\alpha$, this won’t hold. So we need to account for these singletons. The temptation is to imagine that these correspond to tables which are started but never joined. But this use of ‘never’ is not ideal. For each k, the k-th table will eventually include arbitrarily large numbers of diners. But for any finite n, there will likely be some proportion of people dining alone, some in pairs, and so on. So the sum of all of these proportions in the limit gives this dust.

Generalising Polya’s Urn in another direction, if I have time, I might write something about a model which I recently read about on arXiv where the classes are vertices of a graph, and there is dependence between them based on the presence of edges. This might also be a good moment to explain some other generalisations and stochastic approximation methods used to treat them.

REFERENCES

This post is almost entirely a paraphrase of Sections 3.1 and 3.2 from Pitman’s Combinatorial Stochastic Processes, available online here.