Levy’s Convergence Theorem

We consider some of the theory of Weak Convergence from the Part III course Advanced Probability. It has previously been seen, or at least discussed, that characteristic functions uniquely determine the laws of random variables. We will show Levy‘s theorem, which equates weak convergence of random variables and pointwise convergence of characteristic functions.

We have to start with the most important theorem about weak convergence, which is essentially a version of Bolzano-Weierstrass for measures on a metric space M. We say that a sequence of measures is tight if for any \epsilon>0, there exists a compact K_\epsilon such that $\sup_n\mu(M\backslash K_\epsilon)\leq \epsilon$. Informally, each measure is concentrated compactly, and this property is uniform across all the measures. We can now state and prove a result of Prohorov:

Theorem (Prohorov): Let (\mu_n) be a tight sequence of probability measures. Then there exists a subsequence (n_k) and a probability measure \mu such that \mu_{n_k}\Rightarrow \mu.

Summary of proof in the case M=\mathbb{R}By countability, we can use Bolzano-Weierstrass and a standard diagonal argument to find a subsequence such that the distribution functions

F_{n_k}(x)\rightarrow F(x)\quad\forall x\in\mathbb{Q}

Then extend F to the whole real line by taking a downward rational limit, which ensures that F is cadlag. Convergence of the distribution functions then holds at all points of continuity of F by monotonicity and approximating by rationals from above. It only remains to check that F(-\infty)=0,F(\infty)=1, which follows from tightness. Specifically, monotonicity guarantees that F has countably many points of discontinuity, so can choose some large N such that both N and -N are points of continuity, and exploit that eventually

\sup_n \mu_n([-N,N])>1-\epsilon

We can define the limit (Borel) measure from the distribution function by taking the obvious definition F(b)-F(a) on intervals, then lifting to the Borel sigma-algebra by Caratheodory’s extension theorem.

Theorem (Levy): X_n,X random variables in \mathbb{R}^d. Then:

L(X_n)\rightarrow L(X)\quad\iff\quad \phi_{X_n}(z)\rightarrow \phi_X(z)\quad \forall z\in\mathbb{R}^d

The direction \Rightarrow is easy: x\mapsto e^{i\langle z,x\rangle} is continuous and bounded.

In the other direction, we can in fact show a stronger constructive result. Precisely, if \exists \psi:\mathbb{R}^d\rightarrow \mathbb{C} continuous at 0 with \psi(0)=1 (*) and such that \phi_{X_n}(z)\rightarrow \psi(z)\quad \forall z\in\mathbb{R}^d, then \psi=\phi_X the characteristic function of some random variable and L(X_n)\rightarrow L(X). Note that the conditions (*) are the minimal such that \phi could be a characteristic function.

We now proceed with the proof. We apply a lemma that is basically a calculation that we don’t repeat here.

\mathbb{P}(||X||_\infty>K)\stackrel{\text{Lemma}}{<}C_dK^d\int_{[-\frac{1}{K},\frac{1}{K}]^d}(1-\Re \phi_{X_n}(u))du\stackrel{\text{DOM}}{\rightarrow}C_dK^d\int (1-\Re \psi(u))du

where we apply that the integrand is dominated by 2. From the conditions on \psi, this is <\epsilon for large enough K. This bound is of course also uniform in n, and so the random variables are tight. Prohorov then gives a convergent subsequence, and so a limit random variable exists.

Suppose the whole sequence doesn’t converge to X. Then by Prohorov, there is a separate subsequence which converges to Y say, so by the direction of Levy already proved there is convergence of characteristic functions along this subsequence. But characteristic functions determine law, so X=Y, which is a contradiction.


2 thoughts on “Levy’s Convergence Theorem

  1. Pingback: Advanced Probability Revision Summary | Eventually Almost Everywhere

  2. Hello,

    I’m reading your proof, thank you for posting it. It is quite pretty isn’t it?
    However, there is a point that still bothers me, perhaps you could help me.

    Using the fact that \psi is continuous, we deduce the existence of a \delta such that if \norm(x) < \delta, then (1-\Re\psi(x) ) < \epsilon / 2C_d. I guessed you then take K great enough to have 1/K K) still gets arbitrarily small when K gets bigger.

    Thank you for your help,


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s