Weak Convergence and the Portmanteau Lemma

Much of the theory of Large Deviations splits into separate treatment of open and closed sets in the rescaled domains. Typically we seek upper bounds for the rate function on closed sets, and lower bounds for the rate function on open sets. When things are going well, these turn out to be same, and so we can get on with some applications and pretty much forget about the topology underlying the construction. Many sources made a comment along the lines of “this is natural, by analogy with weak convergence”.

Weak convergence is a topic I learned about in Part III Advanced Probability. I fear it may have been one of those things that leaked out of my brain shortly after the end of the exam season… Anyway, this feels like a good time to write down what it is all about a bit more clearly. (I’ve slightly cheated, and chosen definitions and bits of the portmanteau lemma which look maximally similar to the Large Deviation material, which I’m planning on writing a few posts about over the next week.)

The motivation is that we want to extend the notion of convergence in distribution of random variables to general measures. There are several ways to define convergence in distribution, so accordingly there are several ways to generalise it. Much of what follows will be showing that these are equivalent.

We work in a metric space (X,d) and have a sequence $(\mu_n)$ and $\mu$ of (Borel) probability measures. We say that $(\mu_n)$ converges weakly to $\mu$, or $\mu_n\Rightarrow\mu$ if:

$\mu_n(f)\rightarrow\mu(f), \quad\forall f\in\mathcal{C}_b(X).$

So the test functions required for result are the class of bounded, continuous functions on X. We shall see presently that it suffices to check a smaller class, eg bounded Lipschitz functions. Indeed the key result, which is often called the portmanteau lemma, gives a set of alternative conditions for weak convergence. We will prove the equivalence cyclically.

Portmanteau Lemma

The following are equivalent.

a) $\mu_n\Rightarrow \mu$.

b) $\mu_n(f)\rightarrow\mu(f)$ for all bounded Lipschitz functions f.

c) $\limsup_n \mu_n(F)\leq \mu(F)$ for all closed sets F. Note that we demanded that all the measures be Borel, so there is no danger of $\mu(F)$ not being defined.

d) $\liminf_n \mu_n(F)\geq \mu(G)$ for all open sets G.

e) $\lim_n \mu_n(A)=\mu(A)$ whenever $\mu(\partial A)=0$. Such an A is called a continuity set.

Remarks

a) All of these statements are well-defined if X is a general topological space. I can’t think of any particular examples where we want to use measures on a non-metrizable space (eg C[0,1] with topology induced by pointwise convergence), but there seem to be a few references (such as the one cited here) implying that the results continue to hold in this case provided X is locally compact Hausdorff. This seems like an interesting thing to think about, but perhaps not right now.

b1) This doesn’t strike me as hugely surprising. I want to say that any bounded continuous function can be uniformly approximated almost everywhere by bounded Lipschitz functions. Even if that isn’t true, I am still not surprised.

b2) In fact this condition could be replaced by several alternatives. In the proof that follows, we only use one type of function, so any subset of $\mathcal{C}_b(X)$ that contains the ones we use will be sufficient to determine weak convergence.

c) and d) Why should the sign be this way round? The canonical example to have in mind is some sequence of point masses $\delta_{x_n}$ where $x_n\rightarrow x$ in some non-trivial way. Then there is some open set eg X\{x} such that $\mu_n(X\backslash x)=1$ but $\mu(X\backslash x)=0$. Informally, we might say that in the limit, some positive mass could ‘leak out’ into the boundary of an open set.

e) is then not surprising, as the condition of being a continuity set precisely prohibits the above situation from happening.

Proof

a) to b) is genuinely trivial. For b) to c), find some set F’ containing F such that $\mu(F')-\mu(F)=\epsilon$. Then find a Lipschitz function f which is 0 outside F’ and 1 on F. We obtain

$\limsup_n \mu_n(F)\leq \limsup \mu_n(f)=\mu(f)\leq \mu(F').$

But $\epsilon$ was arbitrary, so the result follows as it tends to zero. c) and d) are equivalent after taking $F^c=G$. If we assume c) and d) and apply them to $A^\circ, \bar{A}$, then e) follows.

e) to a) is a little trickier. Given a bounded continuous function f, assume WLOG that it has domain [0,1]. At most countably many events $\{f=a\}$ have positive mass under each of $\mu, (\mu_n)$. So given $M>0$, we can choose a sequence

$-1=a_0 such that $|a_{k+1}-a_k|<\frac{1}{M}$,

and $\mu(f=a_k)=\mu_n(f=a_k)=0$ for all k,n. Now it is clear what to do. $\{f\in[a_k,a_{k+1}]\}$ is a continuity set, so we can apply e), then patch everything together. There are slightly too many Ms and $\epsilon$s to do this sensibly in WordPress, so I will leave it at that.

I will conclude by writing down a combination of c) and d) that will look very familiar soon.

$\mu(A^\circ)\leq \liminf_n \mu_n(A)\leq \limsup_n\mu_n(A)\leq \mu(\bar{B}).$

References

Apart from the Part III Advanced Probability course, this article was prompted by various books on Large Deviations, including those by Frank den Hollander and Ellis / Dupuis. I’ve developed the proof above from the hints given in the appendix of these very comprehensible notes by Rassoul-Agha and Seppalainen.