Skorohod Space

The following is a summary of a chapter from Billingsley’s Convergence of Probability Measures. The ideas are easy to explain heuristically, but this was the first text I could find which explained how to construct Skorohod space for functions on the whole of the non-negative reals in enough stages that it was easily digestible.

It is relatively straightforward to define a topology on C[0,1], as we can induce from the most sensible metric. In this topology, functions f and g are close together if

\sup_{t\in[0,1]} |f(t)-g(t)| is small.

For cadlag functions, things are a bit more complicated. Two functions might be very similar, but have a discontinuity of similar magnitude at slightly different places. The sup norm of the difference is therefore macroscopically large. So we want a metric that also allows uniformly small deformations of the time scale.

We define the Skorohod (or Skorokhod depending on your transliteration preferences) metric d on D[0,1] as follows. Let \Lambda be the family of continuous, strictly increasing functions from [0,1] to [0,1] which map 0 to 0 and 1 to 1. This will be our family of suitable reparameterisations of the time scale (or abscissa – a new word I learned today. The other axis in a co-ordinate pair is called the ordinate). Anyway, we now say that

d(x,y)<\epsilon\quad\text{if }\exists \lambda\in\Lambda\text{ s.t. }

||\lambda - id||_\infty<\epsilon\quad\text{and}\quad ||f-\lambda\circ g||_\infty<\epsilon.

In other words, after reparameterising the time scale for g, without moving any time by more than epsilon, the functions are within epsilon in the sup metric.

Weak Convergence

We have the condition: if \{P_n\} is a tight sequence of probability measures and we have

\text{If }P_n\pi_{t_1,\ldots,t_k}^{-1}\Rightarrow P\pi_{t_1,\ldots,t_k}^{-1}\quad\forall t_1,\ldots,t_k\in[0,1],\quad\text{then }P_n\Rightarrow P,

where \pi_{t_1,\ldots,t_k} is the projection onto a finite-dimensional set. This is a suitable condition for C[0,1]. For D[0,1], we have the additional complication that these projections might not be continuous everywhere.

We can get over this problem. For a measure P, set T_P to be the set of t\in[0,1] such that \pi_t is continuous P-almost everywhere (ie for all f\in D apart from a collection with P-measure = 0). Then, for all P, it is not hard to check that 0,1\in T_P and [0,1]\backslash T_P is countable.

The tightness condition requires two properties:

1) \lim_{K\rightarrow\infty} \limsup_{n}P_n[f:||f||\geq K]=0.

2) \forall \epsilon>0:\,\lim_\delta\limsup_n P_n[f:w_f'(\delta)\geq\epsilon]=0.

These say, respectively, that the measure of ||f|| doesn’t escape to \infty, and there is no mass given in the limit to functions which ‘wiggle with infinite frequency on an epsilon scale of amplitude’.

D_\infty=D[0,\infty)

Our earlier definition of the Skorohod metric could have been written:

d(f,g)=\inf_{\lambda\in\Lambda}\{||\lambda-\text{id}||\vee||f-\lambda\circ g||\}.

From a topological convergence point of view, there’s no need to use the sup norm on \lambda - \text{id}. We want to regulate smoothness of the reparameterisation, so we could use the norm:

||\lambda||^\circ=\sup_{s<t}|\log\frac{\lambda(t)-\lambda(s)}{t-s}|,

that is, the slope is uniformly close to 1 if ||\lambda||^\circ is small. The advantage of this choice of norm is that an extension to D[0,\infty) is immediate. Also, the induced product norm

d^\circ(f,g)=\inf_{\lambda\in\Lambda} \{||\lambda - \text{id}||^\circ \vee||x-\lambda\circ y||\}

is complete. This gives us a few problems, as for example

d_\circ(1_{[0,1)},1_{[0,1-\frac{1}{n})})=1,

as you can’t reparameterise over the jump in a way that ensures the log of the gradient is relatively small. (In particular, to keep the sup norm less than 1, we would need \lambda to send [1-\frac{1}{n}]\mapsto 1, and so ||\lambda||^\circ=\infty by definition.)

So we can’t immediately define Skorohod convergence on D_\infty by demanding convergence on any restriction to [0,t]. We overcome this in a similar way to convergence of distribution functions.

Lemma: If d_t^\circ (f_n,f)\rightarrow_n 0 then for any s<t with f cts at s, then d_s^\circ(f_n,f)\rightarrow_n 0.

So this says that the functions converge in Skorohod space if for arbitrarily large times T where the limit function is continuous, the restrictions to [0,T] converge. (Note that cadlag functions have at most countably many discontinuities, so this is fine.)

A metric for D_\infty

If we want to specify an actual metric d_\infty^\circ, the usual tools for specifying a countable product metric will do here:

d_\infty^\circ(f,g)=\sum_{m\geq 1}2^{-m}[1\wedge d_m^\circ(f^m,g^m)],

where f^m is the restriction of f to [0,m], with the potential discontinuity at m smoothed out:

f^m(t)=\begin{cases}t&t\leq m-1\\ (m-t)f(t)&t\in[m-1,m]\\ 0&t\geq m.\end{cases}

In particular, d_\infty^\circ(f,g)=0\Rightarrow f^m=g^m\,\forall m.

It can be checked that:

Theorem: d_\infty^\circ(f_n,f)\rightarrow 0 in D_\infty if and only iff

\exists \lambda_n\in\Lambda_\infty\text{ s.t. }||\lambda_n-\text{id}||\rightarrow 0

\text{and }\sup_{t\leq m}|\lambda_n\circ f_n-f|\rightarrow_n 0,\,\forall m,

and that d_\infty^\circ (f_n,f)\rightarrow 0 \Rightarrow d_t^\circ(f_n,f)\rightarrow 0 for every point of continuity t of f.

Similarly weak convergence and tightness properties are available, roughly as you might expect. It is probably better to reference Billingsley’s book or similar sources rather than further attempting to summarise them here.

Advertisement

Levy’s Convergence Theorem

We consider some of the theory of Weak Convergence from the Part III course Advanced Probability. It has previously been seen, or at least discussed, that characteristic functions uniquely determine the laws of random variables. We will show Levy‘s theorem, which equates weak convergence of random variables and pointwise convergence of characteristic functions.

We have to start with the most important theorem about weak convergence, which is essentially a version of Bolzano-Weierstrass for measures on a metric space M. We say that a sequence of measures is tight if for any \epsilon>0, there exists a compact K_\epsilon such that $\sup_n\mu(M\backslash K_\epsilon)\leq \epsilon$. Informally, each measure is concentrated compactly, and this property is uniform across all the measures. We can now state and prove a result of Prohorov:

Theorem (Prohorov): Let (\mu_n) be a tight sequence of probability measures. Then there exists a subsequence (n_k) and a probability measure \mu such that \mu_{n_k}\Rightarrow \mu.

Summary of proof in the case M=\mathbb{R}By countability, we can use Bolzano-Weierstrass and a standard diagonal argument to find a subsequence such that the distribution functions

F_{n_k}(x)\rightarrow F(x)\quad\forall x\in\mathbb{Q}

Then extend F to the whole real line by taking a downward rational limit, which ensures that F is cadlag. Convergence of the distribution functions then holds at all points of continuity of F by monotonicity and approximating by rationals from above. It only remains to check that F(-\infty)=0,F(\infty)=1, which follows from tightness. Specifically, monotonicity guarantees that F has countably many points of discontinuity, so can choose some large N such that both N and -N are points of continuity, and exploit that eventually

\sup_n \mu_n([-N,N])>1-\epsilon

We can define the limit (Borel) measure from the distribution function by taking the obvious definition F(b)-F(a) on intervals, then lifting to the Borel sigma-algebra by Caratheodory’s extension theorem.

Theorem (Levy): X_n,X random variables in \mathbb{R}^d. Then:

L(X_n)\rightarrow L(X)\quad\iff\quad \phi_{X_n}(z)\rightarrow \phi_X(z)\quad \forall z\in\mathbb{R}^d

The direction \Rightarrow is easy: x\mapsto e^{i\langle z,x\rangle} is continuous and bounded.

In the other direction, we can in fact show a stronger constructive result. Precisely, if \exists \psi:\mathbb{R}^d\rightarrow \mathbb{C} continuous at 0 with \psi(0)=1 (*) and such that \phi_{X_n}(z)\rightarrow \psi(z)\quad \forall z\in\mathbb{R}^d, then \psi=\phi_X the characteristic function of some random variable and L(X_n)\rightarrow L(X). Note that the conditions (*) are the minimal such that \phi could be a characteristic function.

We now proceed with the proof. We apply a lemma that is basically a calculation that we don’t repeat here.

\mathbb{P}(||X||_\infty>K)\stackrel{\text{Lemma}}{<}C_dK^d\int_{[-\frac{1}{K},\frac{1}{K}]^d}(1-\Re \phi_{X_n}(u))du\stackrel{\text{DOM}}{\rightarrow}C_dK^d\int (1-\Re \psi(u))du

where we apply that the integrand is dominated by 2. From the conditions on \psi, this is <\epsilon for large enough K. This bound is of course also uniform in n, and so the random variables are tight. Prohorov then gives a convergent subsequence, and so a limit random variable exists.

Suppose the whole sequence doesn’t converge to X. Then by Prohorov, there is a separate subsequence which converges to Y say, so by the direction of Levy already proved there is convergence of characteristic functions along this subsequence. But characteristic functions determine law, so X=Y, which is a contradiction.