Non-separable Skorohod Representations

In the previous post, I discussed the statement and proof of the Skorohod representation theorem. This concerns the conditions under which it is possible to couple distributions which converge in law, to obtain a family of random variable on a possibly very large probability space, which converge almost surely. The condition for the theorem to hold is that the base space, or at least the support of the limiting distribution should be a separable metric space. Skorohod’s original proof concerned the case where all the distributions were supported on a complete, separable metric space (Polish space), but this extension is not particularly involved, and was proven not long after the original result.

It is natural to ask exactly what goes wrong in non-separable or non-metrizable spaces. Recall a space is separable if it contains a countable dense subset. Obviously, finite or countable sets are by definition separable with any metric. Considering the points with rational coordinates shows that \mathbb{R}^d is separable for each d, and the Stone-Weierstrass theorem shows that continuous functions with on a bounded interval are also separable with the uniform topology, as they can be approximated uniformly well by polynomials with rational coefficients. One heuristic is that a separable space does not have ‘too many’ open sets.

There are references (for example, see [2]) to examples of Skorohod non-representation in non-metrizable topological spaces, which are ‘big’ enough to allow convergence in distribution with respect to a particular class of test functions, but where the distributions are not uniformly tight, so cannot converge almost surely. However, I don’t really understand this well at all, and have struggled to chase the references, some of which are unavailable, and some in French.

Instead, I want to talk about an example given in [1] of a family of distributions on a non-separable space, which cannot be coupled to converge almost surely. The space is (0,1) equipped with the discrete metric, which says that d(x,y)=1 whenever x\ne y. Note that it is very hard to have even deterministic convergence in this space, since the only way to be close to a element of the space is indeed to be equal to that element. We will construct random variables and it will unsurprising that they cannot possibly converge almost surely in any coupling, but the exact nature of the construction will lead to convergence in distribution.

Based on what we proved last time, the support of the limiting distribution will be non-separable. It turns out that the existence of such a distribution is equiconsistent in the sense of formal logic with the existence of an extension of Lebesgue measure to the whole power set of (0,1). This is not allowed under the Axiom of Choice, but is consistent under the slightly weaker Axiom of Dependent Choice (AC). This weaker condition says, translated into language more familiar to me, that every directed graph with arbitrary (and in particular, potentially uncountable) vertex set, and with all out-degrees at least 1 contains an infinite directed path. This seems obvious when viewed through the typically countable context of graph theory. But the natural construction is to start somewhere and ‘just keep going’ wherever possible, which involves making a choice from the out-neighbourhood at lots of vertices. Thus it is clear why this is weaker than AC. Anyway, in the sequel, we assume that this extension of Lebesgue measure exists.

Example (from [1]): We take (X_n)_{n\ge 1} to be an IID sequence of non-negative RVs defined on the probability space ((0,1),\mathcal{B}(0,1),\mathrm{Leb}), with expectation under Lebesgue measure equal to 1. It is not obvious how to do this, with the restriction on the probability space. One example might be to write \omega\in(0,1) as \overline{\omega_1\omega_2\ldots}, the binary expansion, and then set X_n=2\omega_n. We will later require that X_n is not identically 1, which certainly holds in this example just given.

Let \mu be the extension of Lebesgue measure to the power set \mathcal{P}=\mathcal{P}(0,1). Now define the measures:

\mu_n(B)=\mathbb{E}_\mu(X_n \mathbf{1}_B),\quad \forall B\in\mathcal{P}.

To clarify, we are defining a family of measures which also are defined for all elements of the power set. We have defined them in a way that is by definition a coupling. This will make it possible to show convergence in distribution, but they will not converge almost surely in this coupling, or, in fact, under any coupling. Now consider a restricted class of sets, namely B\in \sigma(X_1,\ldots,X_k), the class of sets distinguishable by the outcomes of the first k RVs.

[Caution: the interpretation of this increasing filtration is a bit different to the standard setting with for example Markov processes, as the sets under consideration are actually subsets of the probability space on which everything is defined. In particular, there is no notion that a ‘fixed deterministic set’ lies in all the layers of the filtration.]

Anyway, by independence, when n>k, by independence, we have


So whenever B\in\mathcal{F}\bigcup_k \sigma(X_1,\ldots,X_k), \lim_n \mu_n(B)=\mu(B). By MCT, we can extend this convergence to any bounded \mathcal F-measurable function.

This is the clever bit. We want to show that \mu_n(B)\rightarrow\mu(B) for all B\in\mathcal P, but we only have it so far for B\in\mathcal F. But since \mathcal{F}\subset \mathcal P, which is the base field of the probability space under the (non-AC) assumption, we can take conditional expectations. In particular for any B\in\mathcal P, \mathbb{E}_\mu[\mathbf{1}_B | \mathcal{F}] is a bounded, \mathcal F-measurable function. Hence, by definition of \mu_n and the extended MCT result:

\mu_n(B)=\mathbb{E}_\mu[X_n\mathbb{E}_\mu[\mathbf{1}_B|\mathcal F]]=\mathbf{E}_{\mu_n}[\mathbb{E}_\mu[\mathbf{1}_B|\mathcal F]] \rightarrow \mathbb{E}_\mu [\mathbb{E}_\mu[\mathbf{1}_B |\mathcal{F}]].

Now, since by definition \mathbf{1}_B is \mathcal{P}-measurable, applying the tower law gives that this is equal to \mu(B). So we have

\mu_n(B)\rightarrow \mu(B),\quad \forall B\in\mathcal{P}. (*)

This gives weak convergence \mu_n\Rightarrow \mu. At first glance it might look like we have proved a much stronger condition than we need. But recall that in any set equipped with the discrete topology, any set is both open and closed, and so to use the portmanteau lemma, (*) really is required.

Now we have to check that we can’t have almost sure convergence in any coupling of these measures. Suppose that we have a probability space with random variables Y,(Y_n) satisfying \mathcal L(Y)=\mu, \mathcal L(Y_n)=\mu_n. But citing the example I gave of X_n satisfying the conditions, the only values taken by Y_n are 0 and 2, and irrespective of the coupling,

\mathbb{P}(Y_n=2\text{ infinitely often})>0.

So it is impossible that Y_n can converge almost surely to any supported on [0,1].


[1] Berti, Pratelli, Rigo – Skorohod Representation and Disintegrability (here – possibly not open access)

[2] Jakubowski – The almost sure Skorokhod representation for subsequences in non-metric spaces.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s