Borel-Cantelli Lemmas

I’m currently lecturing my first course at King’s, which builds measure theory from the ground up to the construction of the Lebesgue integral, along with some more probabilistic topics. In this second week, we have been discussing various matters related to the construction of sigma-algebras. It took me slightly by surprise that this ends up being the natural moment during an undergraduate’s progression through probability to introduce the Borel-Cantelli lemmas. It’s not hard to imagine a pedagogical schedule in which they appear earlier, but equally, one benefits from seeing them with a bit more mathematical experience, even if they are only very loosely related to the construction of the Lebesgue integral!

Anyway, I noticed that I have not written about these results on the blog before. There are a couple of neat applications that lie slightly beyond the remit (and time allowance) of the lecture course, so they are presented at the end.

Context – events holding eventually and infinitely often

Recall that (in the notation of probability spaces) a sigma-algebra \mathcal F provides a set of events A\subseteq \Omega for which it is meaningful and consistent to define a probability measure. This is extremely uncontroversial in the setting where the set of outcomes \Omega is finite, but thornier in the setting where \Omega is uncountable, for example [0,1] or \mathbb{R}. The key axiom for sigma-algebras involves countable unions. If (A_n)_{n\ge 1} is a sequence of events in \mathcal F, we must also have \bigcup_{n\ge 1}A_n\in\mathcal F.

Since the axioms also demand that \mathcal F is closed under taking complements, it only takes a few steps to show that countable intersections are also valid, ie \bigcap_{n\ge 1}A_n\in\mathcal F also. It is uncontroversial to note that

\bigcap_{n\ge 1}A_n \subseteq \bigcup_{n\ge 1}A_n.

The main initial motivation for considering the events \{A_n\text{ infinitely often}\},\{A_n\text{ eventually}\} are that they lie between the intersection and the union in terms of containment.

  • ‘Infinitely often’ is relatively uncontroversial, just meaning ‘for infinitely many values of n’.
  • ‘Eventually’ is less clear on an initial reading. Perhaps ‘eventually always’ would be better? [And closer to the title of this blog!] The meaning is ‘always, except for finitely many n’, which implies that there is some threshold N such that it holds for all n>N.
  • In set notation, infinitely often (or i.o.) means that \bigcup_{k\ge n}A_k must hold for every n, and so the event \{A_n\text{ i.o.}\}=\bigcap_n\bigcup_{k\ge n}A_k.
  • Similarly, eventually means that there must be an n for which \bigcap_{k\ge n}A_k holds, so the event \{A_n\text{ eventually}\}=\bigcup_n\bigcap_{k\ge n}A_k.

We end up with the richer containment sequence

\bigcap_{n\ge 1}A_n\subseteq \{A_n\text{ eventually}\}\subseteq \{A_n\text{ i.o.}\}\subseteq \bigcup_{n\ge 1}A_n.

These individual containment relations may not be obvious a priori in this form. Probably the most clear is the central containment, that if an A_n holds for all except finitely many n, then it holds for infinitely many n. The right-most says that if it holds infinitely often, then it holds at least once. For the left-most containment, note that \bigcap_{n\ge 1}A_n is literally the same as the n=1 case of \bigcap_{k\ge n}A_k, so taking a union over other values of n only increases the set.

It’s also worth thinking about the complements. Note that \left(\bigcap_{n\ge 1}A_n\right)^c = \bigcup_{n\ge 1} A_n^c and \left(\bigcup_{n\ge 1}A_n\right)^c=\bigcap_{n\ge 1}A_n^c. Versions of these apply to eventually/i.o. also. In particular

\{A_n\text{ i.o.}\}^c = \{A_n\text{ for only finitely many }n\} = \{A_n^c\text{ eventually}\}.

and \{A_n\text{ eventually}\}^c = \{A_n^c\text{ for infinitely many }n\}.

So in any calculation or application, there is flexibility to study (A_n^c) is this is more convenient. As we will see, it makes sense to focus on whichever of (A_n),(A_n^c) has vanishing probabilities as n\to\infty.

Comment: It is standard to write \limsup_{n}A_n= \bigcap_{n\ge 1}\bigcup_{k\ge n}A_k, and \liminf_n A_n=\bigcup_{n\ge 1}\bigcap_{k\ge n}A_k. I am happy to admit I get these confused with non-vanishing probability, and prefer to stick to eventually/i.o. or the full union/intersection statements for safety!

Borel-Cantelli Lemmas

The Borel-Cantelli lemmas establish some conditions under which the event \{A_n\text{ i.o.}\} almost surely holds, or almost surely doesn’t hold. We’ll state them now:

BC1: If \sum_{n\ge 1}\mathbb{P}(A_n)<\infty, then \mathbb{P}(A_n\text{ i.o.})=0.

BC2: If \sum_{n\ge 1}\mathbb{P}(A_n)=\infty and the events (A_n) are independent, then \mathbb{P}(A_n\text{ i.o.})=1.

Proof of BC1: Since \{A_n\text{ i.o.}\}\subseteq \bigcup_{k\ge n}A_k for every $n$, we have

\mathbb{P}(A_n\text{ i.o.}) \le \sum_{k\ge n}\mathbb{P}(A_k)\;\stackrel{n\to\infty}\longrightarrow\; 0,

since \sum \mathbb{P}(A_n)<\infty.

Comment: we could also introduce the random variable N which counts how many events A_n occur, ie N=\sum_{n\ge 1}\mathbf{1}_{A_n}. Then \{A_n\text{ i.o.}\}=\{N=\infty\}, and the given condition is equivalent to \mathbb{E}[N]<\infty, which certainly implies \mathbb{P}(N=\infty)=0. Indeed, it’s reasonable to think of BC1 as a first-moment result, with BC2 as (sort of) a second-moment result, and this is reflected in the comparative complexity of the two proofs.

Proof of BC2: As discussed, it is equivalent to show that \mathbb{P}(A_n^c\text{ eventually})=0. To do this, we study the intersections of the complements, whose probabilities are tractable using independence:

\mathbb{P}\left( \bigcup_{k\ge n}A_k^c\right) = \prod_{k\ge n}\mathbb{P}(A_k^c)=\prod_{k\ge n}(1-\mathbb{P}(A_k))

\le \prod_{k\ge n}\exp(-\mathbb{P}(A_k))=\exp(-\sum_{k\ge n}\mathbb{P}(A_k)).

We have assumed that \sum_{k\ge n}\mathbb{P}(A_k)=\infty for each n, and so we obtain

\mathbb{P}\left(\bigcup_{k\ge n}A_k^c\right)=0,\quad\forall n\ge 1.

But since (\bigcap_{k\ge n}A_k^c)_{n\ge 1} is an increasing sequence of events, so taking a union over n is particularly natural. Precisely, by continuity of measure, we have

\mathbb{P}\left(\bigcup_{n\ge 1}\bigcap_{k\ge n}A_k^c\right) = \lim_{n\to\infty}\mathbb{P}\left( \bigcap_{k\ge n}A_k^c\right)=0.

Comment: the independence condition is certainly necessary, as otherwise one can have situations where A_1\supseteq A_2\supseteq\cdots. For example, with U\sim \mathrm{Unif}[0,1], consider A_n:=\{U\le \frac1n\}, for which \mathbb{P}(A_n)=\frac1n, whose sum diverges, while clearly \mathbb{P}(A_n\text{ i.o.})=\mathbb{P}(U=0)=0.

Classic examples: the famous ‘infinite monkey theorem’, relatively well-known outside the realm of theoretical probability, asserts that a monkey typing randomly at a keyboard for long enough will eventually type the complete works of William Shakespeare. Other standard examples involve independent (A_n) with probabilities decaying at various speeds. The latter do have some overlap with the slightly less standard examples presented below.

Example – generating random permutations

Consider a sequence of random permutations (\sigma_n)_{n\ge 1} where \sigma_n is a permutation of [n]. We construct these iteratively as follows: \sigma_1 is the identity (ie the only permutation of one element!); then for each n\ge 2, define \sigma_n from \sigma_{n-1} by inserting element n at a uniformly-chosen position in \sigma_{n-1}.

We ask the question whether this gives a well-defined limiting permutation \sigma_\infty on \mathbb{N}. A necessary condition for this is that the first element \sigma_n(1) is eventually constant. But in fact the events \{\sigma_n(1)=n\} are independent, and each occurs with probability 1/n, and so

\mathbb{P}(\text{first element changes i.o.})\ge \mathbb{P}(\sigma_n(1)=n\text{ for infinitely many }n)=1,

by BC2.

On reflection, each \sigma_n has the uniform distribution on \Sigma_n, permutations of [n]. So our result is consistent with the idea that one cannot construct a uniform random permutation on an infinite set.

Now, alternatively, suppose the new element is added at place \max(n-X_n,1) where (X_2,X_3,\ldots) are IID \mathrm{Geometric}(q) random variables. (The truncation is just to prevent pathologies like trying to add the new entry in ‘place -6’.) Then


and since \sum_n (1-q)^{n-2}<\infty, BC1 shows that the first element (\sigma_1(1),\sigma_2(1),\sigma_3(1),\ldots) changes only finitely often. A similar argument applies for the second element, and the third element, and so on. Consequently, almost surely we have a well-defined pointwise limit permutation \sigma_\infty. This is known as the Mallows Permutation on the integers, and is one of the more popular models for a random permutation on an infinite set, with lots of interesting properties. I supervised an undergraduate research project on this model over Summer 2022, and will write more when time allows.

Example – well-separated random integers

The final example is a toy version of a construction I’ve been working on recently regarding subsets of the vertices of a graph which ‘trap’ random walk on that graph.

Suppose we are trying to colour some positive integers randomly, so that the coloured integers are infinite, but also asymptotically sparse. We also do not want pairs of coloured integers to be close together. Let’s say that two integers m,n are close if |m-n|\le \sqrt{\max(m,n)}.

Now, suppose we colour each integer n green independently with probability \frac1n. Then, whenever we have two integers m,n which are close, and coloured green, we re-colour them both red.

We would like to prove that infinitely many integers are coloured green. Let C_n:={n\text{ coloured with some colour}}. Using BC2, we know that almost surely, infinitely many are coloured with some colour, and it seems implausible that many would be close. However, the events G_n={n\text{ coloured green}} are not independent, so one cannot apply BC2 directly to these. However, we can study instead R_n={n\text{ coloured red}}. We can bound these probabilities as

\mathbb{P}(R_n)\le\frac{1}{n}\frac{2\sqrt{n}}{n-\sqrt{n}} = \frac{2+o(1)}{n^{3/2}},

and apply BC1 to show \mathbb{P}(R_n\text{ i.o.})=0. Then, since \{G_n\text{ i.o.}\}\supseteq \{C_n\text{ i.o.}\}\setminus{R_n\text{ i.o.}}, we obtain \mathbb{P}(G_n\text{ i.o.})=1, as required.