# Skorohod Representation Theorem

Continuing the theme of revising theory in the convergence of random processes that I shouldn’t have forgotten so rapidly, today we consider the Skorohod Representation Theorem. Recall from the standard discussion of the different modes of convergence of random variables that almost sure convergence is among the strongest since it implies convergence in probability and thus convergence in distribution. (But not convergence in $L_1$. For example, take U uniform on [0,1], and $X_n=n\mathbf{1}_{\{U<\frac{1}{n}\}}$.)

Almost sure convergence is therefore in some sense the most useful form of convergence to have. However, it comes with a strong prerequisite, that the random variables be defined on the same probability space, which is not required for convergence in distribution. Indeed, one can set up weak versions of convergence in distribution which do not even require the convergents to be random variables. The Skorohod representation theorem gives a partial converse to this result. It states some conditions under which random variables which converge in distribution can be coupled on some larger probability space to obtain almost sure convergence.

Skorohod’s original proof dealt with convergence of distributions defined on complete, separable metric spaces (Polish spaces). The version discussed here is from Chapter 5 of Billingsley , and assumes the limiting distribution has separable support. More recent authors have considered stronger convergence conditions (convergence in total variation or Wasserstein distance, for example) with weaker topological requirements, and convergence of random variables defined in non-metrizable spaces.

Theorem (Skorohod representation theorem): Suppose that distributions $P_n\Rightarrow P$, where P is a distribution with separable support. Then we can define a probability space $(\Omega,\mathcal{F},\mathbb{P})$ and random variables $X,(X_n)_{n\ge 1}$ on this space such that the laws of $X,X_n$ are $P,P_n$ respectively and $X_n(\omega)\rightarrow X(\omega)$ for all $\omega\in\Omega$.

NB. We are proving ‘sure convergence’ rather than merely almost sure convergence! It is not surprising that this is possible, since changing the value of all the $X_n$s on a set with measure zero doesn’t affect the conditions for convergence in distribution.

Applications: Before going through the Billingsley proof, we consider one simple application of this result. Let S be a separable metric space containing the support of X, and g a continuous function $S\rightarrow S'$. Then $X_n\stackrel{a.s.}{\rightarrow}X\quad\Rightarrow\quad g(X_n)\stackrel{a.s.}{\rightarrow}g(X).$

So, by applying the Skorohod representation theorem once, and the result that almost sure convergence implies convergence in distribution, we have shown that $X_n\stackrel{d}{\rightarrow}X\quad\Rightarrow\quad g(X_n)\stackrel{d}{\rightarrow}g(X),$

subject to these conditions on the space supporting X. And we have avoided the need to be careful about exactly which class of functions determine convergence in distribution, as would be required for a direct argument.

Proof (from ): Unsurprisingly, the idea is to construct realisations of the $(X_n)$ from a realisation of X. We take X, and a partition of the support of X into small measurable sets, chosen so that the probability of lying in a particular set is almost the same for $X_n$ as for X, for large n. Then, the $X_n$ are constructed so that for large n, with limitingly high probability $X_n$ lies in the same small set as X.

Constructing the partition is the first step. For each $x\in S:=\mathrm{supp}(X)$, there must be some radius $\frac{\epsilon}{4} such that $P(\partial B(x,r_x)=0$. This is where we use separability. Since every point in the space is within $\frac{\epsilon}{4}$ of some element of a countable sequence of elements of the space, we can take a countable subset of these open balls $B(x,r_x)$ which cover the space. Furthermore, we can take a finite subset of the balls which cover all of the space apart from a set of measure at most $\epsilon$. We want the sets to be disjoint, and we can achieve this by removing the intersections inductively in the obvious way. We end up with a collection $B_0,B_1,\ldots,B_k$, where $B_0$ is the leftover space, such that

• $P(B_0)<\epsilon$
• $P(\partial B_i)=0,\quad i=0,1,\ldots,k$
• $\mathrm{diam}(B_i)<\epsilon,\quad i=1\ldots,k$.

Now suppose for each m, we take such a partition $B^m_0,B^m_1,\ldots,B^m_{k_m}$, for which $\epsilon_m=\frac{1}{2^m}$. Unsurprisingly, this scaling of $\epsilon$ is chosen so as to use Borel-Cantelli at the end. Then, from convergence in distribution, there exists an integer $N_m$ such that for $n\ge N_m$, we have $P_n(B^m_i)\ge (1-\epsilon_m)P(B^m_i),\quad i=0,1,\ldots,k_m.$ (*)

Now, for $N_m\le n , for each $B^m_i$ with non-zero probability under P, take $Y_{n,i}$ to be independent random variables with law $P_n(\cdot | B^m_i)$ equal to the restriction onto the set. Now take $\xi\sim U[0,1]$ independent of everything so far. Now we make concrete the heuristic for constructing $X_n$ from X. We define: $X_n=\sum_{i=0}^{k_m}\mathbf{1}_{\{\xi\le 1-\epsilon_m, X\in B^m_i\}} Y_{n,i} + \mathbf{1}_{\{\xi>1-\epsilon_m\}}Z_n.$

We haven’t defined $Z_n$ yet. But, from (*), there is a unique distribution such that taking $Z_n$ to be independent of everything so far, with this distribution, we have $\mathcal{L}(X_n)=P_n$. Note that by iteratively defining random variables which are independent of everything previously defined, our resulting probability space $\Omega$ will be a large product space.

Note that $\xi$ controls whether the $X_n$ follow the law we have good control over, and we also want to avoid the set $B^m_0$. So define $E_m:=\{X\not \in B^m_0, \xi\le 1-\epsilon_m\}$. Then, $P(E_m)<2\epsilon_m=2^{-(m-1)}$, and so by Borel-Cantelli, with probability 1, $E_m$ holds for all m larger than some threshold. Let us call this $\liminf_m E_m=: E$, and on this event E, we have by definition $X_n \rightarrow X$. So we have almost sure convergence. But we can easily convert this to sure convergence by removing all $\omega\in\Omega$ for which $\xi(\omega)=1$ and setting $X_n\equiv X$ on $E^c$, as this does not affect the distributions.

Omissions:

• Obviously, I have omitted the exact construction of the distribution of $Z_n$. This can be reverse reconstructed very easily, but requires more notation than is ideal for this medium.
• It is necessary to remove any sets $B^m_i$ with zero measure under P for the conditioning to make sense. These can be added to $B^m_0$ without changing any of the required conditions.
• We haven’t dealt with any $X_n$ for $n.

The natural question to ask is what happens if we remove the restriction that the space be separable. There are indeed counterexamples to the existence of a Skorohod representation. The clearest example I’ve found so far is supported on (0,1) with a metric inducing the discrete topology. If time allows, I will explain this construction in a post shortly.

References

 – Billingsley – Convergence of Probability Measures, 2nd edition (1999)