Increments of Random Partitions

The following is problem 2.1.4. from Combinatorial Stochastic Processes:

Let X_i be the indicator of the event that i the least element of some block of an exchangeable random partition \Pi_n of [n]. Show that the joint law of the (X_i,1\leq i\leq n) determines the law of \Pi_n.

As Pitman says, this is a result by Serban Nacu, the paper for which can be found here. In this post I’m going to explain what an exchangeable random partition is, how to prove the result, and a couple of consequences.

The starting point is the question ‘what is an exchangeable random partition?’ The most confusing aspect is that there are multiple definitions depending on whether the blocks of the partition are sets or just integers corresponding to a size. Eg, {1,2,4} u {3} is a partition of [4], corresponding to the partition 3+1 of 4. Obviously one induces the other, and in an exchangeable setting the laws of one may determine the laws of the other.

In the second case, we assume 3+1 is the same partition as 1+3. If order does matter then we call it a composition instead. This gets a bit annoying for set partitions, as we don’t want these to be ordered either. But if we want actually to talk about the sets in question we have to give them labels, which becomes an ordering, so we need some canonical way to assign these labels. Typically we will say \Pi_n=\{A_1,\ldots,A_k\}, where the curly brackets indicate that we don’t care about order, and we choose the labels by order of appearance, so by increasing order of least elements.

We say that a random partition \Pi_n of [n] is exchangeable if its distribution is invariant the action on partitions induced by the symmetric group. That is, relabelling doesn’t change probabilities. We can express this functionally by saying


for p a symmetric function. This function is then called the exchangeable partition probability function (EPPF) by Pitman.

Consider a partition of 4 into sets of sizes 3 and 1. There is a danger that this definition looks like it might be saying that the probability that A_1 is the set of size 3 is the same as the probability that A_1 is the set of size 1. This would be a problem because we expect to see some size-biasing to the labelling. Larger sets are more likely to contain small elements, merely because they contain more elements. Fortunately the definition is not broken after all. The statement above makes no reference to the probabilities of seeing various sizes for A_1 etc. For that, we would have to sum over all partitions with that property. It merely says that the partitions:

\{1,2,3\}\cup\{4\},\quad \{1,2,4\}\cup\{3\},\quad\{1,3,4\}\cup\{2\},\quad \{2,3,4\}\cup\{1\}

have respective probabilities:

p(3,1),\quad p(3,1),\quad p(3,1),\quad p(1,3),

and furthermore these are equal.

Anyway, now let’s turn to the problem. The key idea is that we want to be looking at strings of 0s and 1s that can only arise in one way. For example, the string 10…01 can only arise corresponding to the partitions {1,2,…,n-1} u {n} and {1,2,…,n-2,n} u {n-1}. So now we know p(n-1,1) and so also p(1,n-1). Furthermore, note that 10…0 and 11…1 give the probabilities of 1 block of size n and n blocks of size 1 respectively at once.

So then the string 10…010 can only arise from partitions {1,2,…,n-2,n} u {n-1} or {1,2,…,n-2} u {n-1,n}. We can calculate the probability that it came from the former using the previously found value of p(n-1,1) and a combinatorial weighting, so the remaining probability is given by p(2,n-2). Keep going. It is clear what ‘keep going’ means in the case of p(a,b) but for partitions with more than two blocks it seems a bit more complicated.

Let’s fix k the number of blocks in partitions under consideration, and start talking about compositions, that is a_1+\ldots+a_k=n. The problem we might face in trying to generalise the previous argument is that potentially lots of compositions might generate the same sequence of 0s and 1s, so the ‘first time’ we consider a composition might be the same for more than one composition. Trying it out in the case k=3 makes it clear that this is not going to happen, but we need some partial ordering structure to explain why this is the case.

Recall that a composition with k blocks is a sequence a=(a_1,\ldots,a_k) which sums to n. Let’s say a majorizes b if all its partial sums are at least as large. That is a_1+\ldots+a_l\geq b_1+\ldots+b_l for all 1\leq l \leq k. We say this is strict if at least one of the inequalities is strict. It is not hard to see that if a majorizes b then this is strict unless a = b.

Since we don’t care about ordering, we assume for now that all compositions are arranged in non-increasing order. So we find a partition corresponding to some such composition a_1,\ldots,a_k. The partition is:


This generates a sequence of 0s and 1s as describe above, with a_i-1 0s between the i’th 1 and the (i+1)th 1. The claim is that given some composition which admits a partition with this same corresponding sequence, that composition must majorize a. Proof by induction on l. So in fact we can prove Nacu’s result inductively down the partial ordering described. We know the probability of the sequence of 0s and 1s corresponding to the partition of [n] described by assumption. We know the probability of any partition corresponding to a composition which majorizes a by induction, and we know how many partitions with this sequence each such composition generates. Combining all of this, we can find the probability corresponding to a.

Actually I’m not going to say much about consequences of this except to paraphrase very briefly what Nacu says in the paper. One of the neat consequences of this result is that it allows us to prove in a fairly straightforward way that the only infinite family of exchangeable random partitions with independent increments is the so-called Chinese Restaurant process.

Instead of attempting to prove this, I will explain what all the bits mean. First, the Chinese Restaurant process is the main topic of the next chapter of the book, so I won’t say any more about it right now, except that its definition is almost exact what is required to make this particular result true.

We can’t extend the definition of exchangeable to infinite partitions immediately, because considering invariance under the symmetric group on the integers is not very nice, in particular because there’s a danger all the probabilities will end up being zero. Instead, we consider restrictions of the partition to [n]\subset\mathbb{N}, and demand that these nest appropriately, and are exchangeable.

Independent increments is a meaningful thing to consider since one way to construct a partition, infinite or otherwise, is to consider elements one at a time in the standard ordering, either adding the new element to an already present block, or starting block. Since 0 or 1 in the increment sequence corresponds precisely to these events, it is meaningful to talk about independent increments.

Hewitt-Savage Theorem

This final instalment in my exploration of exchangeability gives a stronger version of Kolmogorov’s 0-1 law, and suggests some applications. It is easy to see that the tail sigma-field is a subset of the exchangeable sigma-field. For, if A is a tail event, then it is independent of the first n random variables in the underlying sequence, so in particular, is invariant under permutations of initial segments of the sequence.

Kolmogorov’s 0-1 Law: (X_n) a sequence of independent (not necessarily iid) random variables in some probability space. Define the tail sigma-field \tau=\cap_n \sigma(X_{n+1},X_{n+2},\ldots). Then \tau is trivial; that is, \forall A\in\tau\; P(A)\in\{0,1\}.

Proof: Set \tau_n=\sigma(X_{n+1},X_{n+2},\ldots), F_n=\sigma(X_1,\ldots,X_n). Then F_n is independent of \tau_m whenever $m\geq n$. So F_n is independent of \tau for all n, hence so is \cup_n F_n, which generates the entire sigma-field F_\infty, so this is independent of \tau also. Since A\in \tau\Rightarrow A\in F_\infty trivially, the independence criterion gives P(A)=P(A\cap A)=P(A)P(A), and hence P(A)\in\{0,1\}.

Hewitt-Savage 0-1 Law: (X_n) a sequence of iid random variables. Then the sigma field of exchangeable events \mathcal{E} is trivial.

Proof: Take A\in\mathcal{E}, and approximate by A_n\in F_n, P(A\triangle A_n)\rightarrow 0 which is possible, since \cup F_n generates the whole sigma-field. Write A_n=\{(X_1,\ldots,X_n)\in B_n\} for later ease of notation. To exploit exchangeability, set \tilde{A}_n=\{X_{n+1},\ldots,X_{2n}\in B_n\}, as the permutation of RVs that sends A_n\mapsto \tilde{A}_n leaves A invariant. So P(\tilde{A}_n\triangle A)=P(A_n\triangle A)\rightarrow 0\Rightarrow P(A_n\cap \tilde{A}_n)\rightarrow P(A). But because (X) is iid (Note, this is where we use identical distributions), P(A_n\cap \tilde{A}_n)=P(A_n)P(\tilde{A}_n)=P(A_n)^2\rightarrow P(A)^2. Hence P(A)\in\{0,1\}.

Application: Given a stochastic process with iid increments, the event that a state is visited infinitely often is in the tail space of the process, however it is not in the tail space of the increments, so Kolmogorov does not apply. It is however an exchangeable event, and so occurs with probability 0 or 1.

References (for this and the related two previous posts):

Kingman – Uses of Exchangeability (1978)

Breiman – Probability, Chapter 3

Zitkovic – Theory of Probability Lecture Notes


An Exchangeable Law of Large Numbers

In the proof of De Finetti’s Theorem in my last post, I got to a section where I needed to show a particular convergence property of a sequence of exchangeable random variables. For independent identically distributed RVs, we have Kolmogorov’s 0-1 law, and in particular a strong law of large numbers. Does a version of this result hold for exchangeable sequences? As these represent only a mild generalisation of iid sequences, we might hope so. The following argument demonstrates that this is true, as well as providing a natural general proof of De Finetti.

Define \mathcal{E}_n=\sigma(\{f(X_1,\ldots,X_n): f\text{ symmetric, Borel}\}), the smallest sigma-field wrt which the first n RVs are exchangeable. Note that \mathcal{E}_1\supset\mathcal{E}_2\supset\ldots\supset \mathcal{E}=\cap_n\mathcal{E}_n, the exchangeable sigma-field.

So now take g(X) symmetric in the first n variables. By exchangeability E[\frac{1}{n}\sum_1^n f(X_j)g(X)]=E[f(X_1)g(X)]. Now set g=1_A, for A\in\mathcal{E}_n, and so because the LHS integrand is \mathcal{E}_n-meas. we have Z_n=\frac{1}{n}\sum_1^n f(X_j)=E[f(X_1)|\mathcal{E}_n]. So Z is a backwards martingale.

We have a convergence theorem for backwards martingales, which tells us that \lim_n n^{-1}\sum^n f(X_j) exists, and in fact = E[f(X_1)|\mathcal{E}] almost surely. Setting f(X)=1(X\leq x) gives that \lim_n\frac{\#\{X_i\leq x: i\leq n\}}{n}=F(x):=P(X_1\leq x|\mathcal{E}). We now perform a similar procedure for functions defined on the first k RVs, in an attempt to demonstrate independence.

For f:\mathbb{R}^k\rightarrow\mathbb{R}, we seek a backwards martingale, so we take sums over the n^{(k)} ways to choose k of the first n RVs. So \frac{1}{n(n-1)\ldots(n-k+1)}\sum_{I\subset[n]} f(X_{i_1},\ldots,X_{i_k}) is a backwards martingale, and hence E[f(X_1,\ldots,X_k)|\mathcal{E}]=\lim_n \frac{1}{n(n-1)\ldots(n-k+1)}\sum f(-). As before, set f(y_1,\ldots,y_k)=1(y_1\leq x_1)\ldots 1(y_k\leq x_k). Crucially, we can replace the falling factorial term with n^{-k} as we are only considering the limit, then exchange summation as everything is positive and nice to get: E[f(X_1,\ldots,X_k)|\mathcal{E}]=\lim(\frac{1}{n}\sum 1(X_1\leq x_1))\ldots(\frac{1}{n}\sum 1(X_k\leq x_k)) thus demonstrating independence of (X_n) conditional on \mathcal{E}.

So what have we done? Well, we’ve certainly proven de Finetti in the most general case, and we have in addition demonstrated the existence of a Strong Law of Large Numbers for exchangeable sequences, where the limit variable is \mathcal{E}-measurable.

Exchangeability and De Finetti’s Theorem

Exchangeability generalises the notion of a sequence of random variables being iid. Essentially, the motivation is that in frequentist statistics data is assumed to be generated by a series of iid RVs with distribution parameterised by some unknown p. The theory for sequences of iid RVs is rich, with laws of large numbers, and limit theorems. However, from a Bayesian perspective, the parameter p has some prior distribution, so the random variables which give the data are no longer independent. That is, each random variable has non-trivial dependence on p, so in general will have non-trivial dependence on each other.

We say a sequence of random variables X=(X_1,X_2,\ldots) is exchangeable if the law of X is invariant under finite permutation of the indices of the sequence. Formally, if for any \sigma\in S_n, (X_1,\ldots,X_n)\stackrel{d}{=}(X_{\sigma(1)},\ldots,X_{\sigma_n)}. Note that permutations with non-trivial action on an infinite subset of N are not considered in this definition, as the law of the entire sequence of RVs is generated by the laws of finite subsets of the sequence. For example, take Y,Y_1,Y_2,\ldots iid, and set X_n=Y+Y_n. Provided Y has some non-trivial distribution, the sequence X is not iid, but it is exchangeable. Note that, conditional on Y, the sequence X is iid. This is the exact situation as in the Bayesian inference framework, where the RVs are iid conditional on some underlying random parameter. De Finetti’s Theorem gives that this in fact holds for any exchangeable sequence.

Theorem (De Finetti): X=(X_1,X_2,\ldots) an exchangeable sequence of random variables. Then there exists a random probability measure \mu (that is, a RV taking values in the space of probability measures) such that conditional on \mu,\; X_1,X_2\ldots \stackrel{iid}{\sim}\mu. Continue reading