# Exchangeability and De Finetti’s Theorem

Exchangeability generalises the notion of a sequence of random variables being iid. Essentially, the motivation is that in frequentist statistics data is assumed to be generated by a series of iid RVs with distribution parameterised by some unknown p. The theory for sequences of iid RVs is rich, with laws of large numbers, and limit theorems. However, from a Bayesian perspective, the parameter p has some prior distribution, so the random variables which give the data are no longer independent. That is, each random variable has non-trivial dependence on p, so in general will have non-trivial dependence on each other.

We say a sequence of random variables $X=(X_1,X_2,\ldots)$ is exchangeable if the law of X is invariant under finite permutation of the indices of the sequence. Formally, if for any $\sigma\in S_n, (X_1,\ldots,X_n)\stackrel{d}{=}(X_{\sigma(1)},\ldots,X_{\sigma_n)}$. Note that permutations with non-trivial action on an infinite subset of N are not considered in this definition, as the law of the entire sequence of RVs is generated by the laws of finite subsets of the sequence. For example, take $Y,Y_1,Y_2,\ldots$ iid, and set $X_n=Y+Y_n$. Provided Y has some non-trivial distribution, the sequence X is not iid, but it is exchangeable. Note that, conditional on Y, the sequence X is iid. This is the exact situation as in the Bayesian inference framework, where the RVs are iid conditional on some underlying random parameter. De Finetti’s Theorem gives that this in fact holds for any exchangeable sequence.

Theorem (De Finetti): $X=(X_1,X_2,\ldots)$ an exchangeable sequence of random variables. Then there exists a random probability measure $\mu$ (that is, a RV taking values in the space of probability measures) such that conditional on $\mu,\; X_1,X_2\ldots \stackrel{iid}{\sim}\mu$.

Alternatively, define the exchangeable $\sigma$-algebra $\mathcal{E}$, the set of all events unchanged under finite permutation of the variables. Formally $\mathcal{E}_n$ be generated by $X_{n+1},X_{n+2}$ and all symmetric measurable functions of $(X_1,\ldots,X_n)$. Then set $\mathcal{E}=\cap_{n}E_n$. Now De Finetti says that conditional on $\mathcal{E}$, X is iid.

Proof: I will give a proof of this theorem in the case where X is a sequence of Bernoulli variables (that is, they take values in {0,1}) with density f. It will suffice to show the existence of a random Bernoulli parameter $\theta$ and with measure $\mu(\theta)$ such that for each n $f(x_1,\ldots,x_n)=\int_0^1\prod_{i=1}^n \theta^{x_i}(1-\theta)^{1-x_i}dF(\theta)$.

Begin with the observation that for $n\leq m$, by exchangeability, a combintorial argument gives $P(X_1+\ldots+X_n=s|X_1+\ldots+X_m=t)=\binom{m}{n}^{-1}\binom{t}{s}\binom{m-t}{n-s}1(s\leq t\leq m-(n-s)$. So, using falling factorial notation for brevity: $P(X_1+\ldots+X_n=s)=\binom{n}{s}\sum_{t=s}^{m-(n-s)}\frac{t^{(s)}(m-t)^{(n-s)}}{m^{(n)}}P(X_1+\ldots+X_m=t)$.

So now set $F_m(\theta)$ the distribution function of $\frac{1}{m}(X_1+\ldots+X_m)$, so after a bit of algebraic rearrangement we can express the above as $\binom{n}{s}\int_0^1\frac{(m\theta)^{(s)}[m(1-\theta)]^{(n-s)}}{m^{(n)}}dF_m(\theta)$, as the density is strictly positive only when $m\theta$ is an integer. Now it is clear that $\frac{(m\theta)^{(s)}[m(1-\theta)]^{(n-s)}}{m^{(n)}}\rightarrow \theta^s(1-\theta)^{n-s}$. But what about convergence of the distribution function?

We want to take $F(\theta)=\lim_m F_m(\theta)$. By the compactness of the probability space, we have convergence to a distribution on a subsequence (also a case of Prohorov’s Theorem). To prove this properly, need a law of large numbers for exchangeable random variables. This exists, and is a very nice result, but is easier to show in generality, using martingales and the exchangeable sigma-field introduced above. I think I’ll save this for another post, and take it as known for now.

So we have $P(X_1+\ldots+X_n=s)=\binom{n}{r}\int_0^1 \theta^s(1-\theta)^{n-s}dF(\theta)$. Finally $f(x_1,\ldots,x_r)=\binom{n}{\sum x_i}^{-1} P(X_1+\ldots+X_n=\sum x_i)$ by exchangeability, and so the result follows.