Convergence of Random Variables

The relationship between the different modes of convergence of random variables is one of the more important topics in any introduction to probability theory. For some reason, many of the textbooks leave the proofs as exercises, so it seems worthwhile to present a sketched but comprehensive summary.

Almost sure convergence: X_n\rightarrow X\;\mathbb{P}-a.s. if \mathbb{P}(X_n\rightarrow X)=1.

Convergence in Probability: X_n\rightarrow X in \mathbb{P}-probability if \mathbb{P}(|X_n-X|>\epsilon)\rightarrow 0 for any \epsilon>0.

Convergence in Distribution: X_n\stackrel{d}{\rightarrow} X if \mathbb{E}f(X_n)\rightarrow \mathbb{E}f(X) for any bounded, continuous function f. Note that this definition is valid for RVs defined on any metric space. When they are real-valued, this is equivalent to the condition that F_{X_n}(x)\rightarrow F_X(x) for every point x\in \mathbb{R} where F_X is continuous. It is further equivalent (by Levy’s Convergence Theorem) to its own special case, convergence of characteristic functions: \phi_{X_n}(u)\rightarrow \phi_X(U) for all u\in\mathbb{R}.

Note: In contrast to the other conditions for convergence, convergence in distribution (also known as weak convergence) doesn’t require the RVs to be defined on the same probability space. This thought can be useful when constructing counterexamples.

L^p-convergence: X_n\rightarrow X in L^p if ||X_n-X||_p\rightarrow 0; that is, \mathbb{E}|X_n-X|^p\rightarrow 0.

Uniform Integrability: Informally, a set of RVs is UI if the integrals over small sets tend to zero uniformly. Formally: (X_n) is UI if \sup_{n,A\in\mathcal{F}}\{\mathbb{E}[|X_n|1(A)]|\mathbb{P}(A)\leq \delta\}\rightarrow 0 as \delta\rightarrow 0.

Note: In particular, a single RV, and a collection of independent RVs are UI. If X~U[0,1] and X_n=n1(X\leq \frac{1}{n}), then the collection is not UI.

THEOREM 1: Almost-sure convergence implies convergence in probability.

THEOREM 2: Convergence in probability means exists a subsequence on which there is almost sure convergence.

THEOREM 3:  Convergence in probability implies convergence in distribution.

THEOREM 4: Convergence in L^1 implies convergence in probability.

THEOREM 5: Convergence in L^p iff convergence in probability and (|X_n|^p) UI.

Remark 1: The converse is false. This isn’t hugely surprising if you allow the |X_n-X| to be large with probabilities which decrease quite slowly. The classic counterexample is independent X_n=1 with probability 1/n, 0 otherwise. This clearly converges to the constant 0 in probability, but by BC1 almost surely infinitely many of the RVs take the value 1. So no almost sure convergence.

Proof 1: \mathbb{P}(|X_n-X|\leq\epsilon)\geq \mathbb{P}(\cap_{m\geq n}\{|X_m-X|\leq\epsilon\})

\uparrow\mathbb{P}(|X_n-X|\leq\epsilon\text{ eventually})\geq \mathbb{P}(X_n\rightarrow X)=1

Proof 2: There exists an increasing sequence (n_k) such that \mathbb{P}(|X_{n_k}-X|\geq\frac{1}{k})<\frac{1}{k^2}. By Borel-Cantelli, this subsequence will be suitable.

Remark 3: The converse is obviously false. Eg, take i.i.d. X, X_1,X_2,\ldots, where the distribution is non-constant.

Proof 3: By bounded convergence, almost sure convergence implies convergence in distribution (by the conditions on f). Given convergence in probability, have convergence almost surely on a subsequence of any subsequence. If we didn’t have convergence in distribution, there would be a subsequence for which the appropriate expectation expression was bounded away from 0. By the two remarks, this can’t be the case.

Remark 4: Convergence in L^1 is weaker than convergence in L^p, p>1, so by proving this can extend automatically to all p.

Proof 4: \mathbb{P}(|X_n-X|>\epsilon)\leq \frac{\mathbb{E}|X_n-X|}{\epsilon}\rightarrow 0.

Remark 5: The example given in the definition of UI, is a counterexample to the converse of Theorem 4. Essentially, uniform integrability is precisely what you need to make Theorem 5 hold.

Proof 5: WLOG assume p=1. \Rightarrow. Convergence in probability as above. For UI, write \mathbb{E}[|X_n|1(A)]\leq \mathbb{E}|X_n-X|+\mathbb{E}[|X|1(A)], then exploit the fact that the single RV X is UI. \Leftarrow. Write \mathbb{E}|X_n-X|\leq \mathbb{E}|X_n1(|X_n|\geq N)|+\mathbb{E}|X_n^N-X^N|+\mathbb{E}|X1(X\geq N)| where X^N is the restriction of X to [-N,N] in the obvious way. The convergence in probability condition restricts as well, so the middle term decays by boundedness, while the outer expectations decay by UI assumptions.

3 thoughts on “Convergence of Random Variables

  1. Pingback: Distribution function and weak convergence | Blog about Statistics.

  2. Pingback: Convergence in probability, convergence almost surely and the continuous mapping theorem | Blog about Statistics.

  3. Pingback: 100k Views | Eventually Almost Everywhere

Leave a comment