Tightness in Skorohod Space

This post continues the theme of revising topics in the analytic toolkit relevant to proving convergence of stochastic processes. Of particular interest is the question of how to prove that families of Markov chains might have a process scaling limit converging to a solution of some stochastic differential equation, in a generalisation of Donsker’s theorem for Brownian motion. In this post, however, we address more general aspects of convergence of stochastic processes, with particular reference to Skorohod space.

Topological Background

I’ve discussed Skorohod space in a previous post. For now, we focus attention on compactly supported functions, D[0,T]. Some of what follows can be extended to the infinite-time setting easily, and some requires more work. Although we can define a metric on the space of cadlag functions in lots of ways, it is more useful to think topologically, or at least with a more vague sense of metric. We say two cadlag functions are close to one another if there is a reparameterisation of the time-axis, (a function [0,T] to itself) that is uniformly close to the identity function, and when applied to one of the cadlag functions, brings it close to the other cadlag function. Heuristically, two cadlag functions are close if their large jumps are close to one another and of similar size, and if they are uniformly close elsewhere. It is worth remembering that a cadlag function on even an unbounded interval can have only countably many jumps, and only finitely many with magnitude greater than some threshold on any compact interval.

For much of the theory one would like to use, it is useful for the spaces under investigation to be separable. Recall a topological space is separable if there exists a countable dense subset. Note in particular that D[0,T] is not separable under the uniform metric, since we can define f_x(\cdot)=\mathbf{1}_{(\cdot \ge x)} for each x\in[0,T], then ||f_x-f_y||_\infty=1 whenever x\ne y. In particular, we have an uncountable collection of disjoint open sets given by the balls \mathcal{B}(f_x,\frac12), and so the space is not countable. Similarly, C[0,\infty) is not separable. A counterexample might be given by considering functions which take the values {0,1} on the integers. Thus we have a map from \{0,1\}^{\mathbb{N}}\rightarrow C[0,\infty), where the uniform distance between any two distinct image points is at least one, hence the open balls of radius 1/2 around each image point give the same contradiction as before. However, the Stone-Weierstrass theorem shows that C[0,T] is separable, as we can approximate any such function uniformly well by a polynomial, and thus uniformly well by a polynomial with rational coefficients.

In any case, it can be shown that D[0,T] is separable with respect to the natural choice of metric. It can also be shown that there is a metric which gives the same open sets (hence is a topologically equivalent metric) under which D[0,T] is complete, and hence a Polish space.

Compactness in C[0,T] and D[0,T]

We are interested in tightness of measures on D[0,T], so first we need to address compactness for sets of deterministic functions in D[0,T]. First, we consider C[0,T]. Here, the conditions for a set of functions to be compact is given by the celebrated Arzela-Ascoli theorem. We are really interested in compactness as a property of size, so we consider instead relative compactness. A set is relatively compact (sometimes pre-compact) if its closure is compact. For the existence of subsequential limits, this is identical to compactness, only now we allow the possibility of the limit point lying outside the set.

We note that the function C[0,T]\rightarrow \mathbb{R} given by ||f||_\infty is continuous, and hence uniform boundedness is certainly a required condition for compactness in C[0,T]. Arzela-Ascoli states that uniform boundedness plus equicontinuity is sufficient for a set of such functions to be compact. Equicontinuity should be thought of as uniform continuity that is uniform among all the functions in the set, rather than just within the argument of an individual particular function.

For identical reasons, we need uniform boundedness for relative compactness in D[0,T], but obviously uniform continuity won’t work as a criterion for discontinuous functions! We seek some analogue of the modulus of continuity that ignores jumps. We define

\omega'_\delta(f):=\inf_{\{t_i\}} \max_i \sup_{s,t\in[t_{i-1},t_i)} |f(s)-f(t)|,

where the infimum is taken over all meshes 0=t_0<t_1<\ldots<t_r with t_i-t_{i-1}>\delta. Note that as \delta\downarrow 0, we can, if we want, place the t_i so that large jumps of the function f take place over the boundaries between adjacent parts of the mesh. In particular, for a given cadlag function, it can be shown fairly easily that \omega'_\delta(f)\downarrow 0 as \delta\rightarrow 0. Then, unsurprisingly, in a similar fashion to the Arzela-Ascoli theorem, it follows that a set of functions A\subset D[0,T] is relatively compact if it is uniformly bounded, and

\lim_{\delta\rightarrow 0} \sup_{f\in A}\omega'_\delta(f)=0.

Note that this ‘modulus of continuity’ needs to decay uniformly across the set of functions, but that we do not need to choose the mesh at level \delta uniformly across all functions. This would obviously not work, as then the functions \mathbf{1}_{(\cdot\ge x_n)} for any sequence x_n\rightarrow x would not be compact, but they clearly converge in Skorohod space!

Tightness in C[0,T] and D[0,T]

Naturally, we are mainly interested in (probability) measures on D[0,T], and in particular conditions for tightness on this space. Recall a family of measures is tight if for any \epsilon>0, there exists some compact set A such that

\pi(A)>1-\epsilon,\quad \forall \pi\in\Pi.

So, for measures (\mu_n) on D[0,T], the sequence is tight precisely if for any \epsilon>0, there exists M,\delta and some N such that for any n>N, both

\mu_n(||f||_\infty >M)\le \epsilon,\quad \mu_n(\omega'_\delta(f)>\epsilon)\le \epsilon

hold. In fact, the second condition controls variation sufficiently strongly, that we can replace the first condition with

\mu_n(|f(0)|>M)\le \epsilon.

Often we might be taking some sort of scaling limit of these processes in D[0,T], where the jumps become so small in the limit that we expect the limit process to be continuous, perhaps an SDE or diffusion. If we can replace \omega'_\delta by \omega_\delta, the standard modulus of continuity, then we have the additional that any weak limit lies in C[0,T].

In general, to prove convergence of some stochastic processes, we will want to show that the processes are tight, by demonstrating the properties above, or something equivalent. Then Prohorov’s theorem (which I tend to think of as a probabilistic functional version of Bolzano-Weierstrass) asserts that the family of processes has a weak subsequential limit. Typically, one then shows that any weak subsequential limit must have the law of some particular random process. Normally this is achieved by showing some martingale property (eg for an SDE) in the limit, often by using the Skorohod representation theorem to use almost sure subsequential convergence rather than merely weak convergence. Then one argues that there is a unique process with this property and a given initial distribution. So since all weak subsequential limits are this given process, in fact the whole family has a weak limit.

Skorohod Space

The following is a summary of a chapter from Billingsley’s Convergence of Probability Measures. The ideas are easy to explain heuristically, but this was the first text I could find which explained how to construct Skorohod space for functions on the whole of the non-negative reals in enough stages that it was easily digestible.

It is relatively straightforward to define a topology on C[0,1], as we can induce from the most sensible metric. In this topology, functions f and g are close together if

\sup_{t\in[0,1]} |f(t)-g(t)| is small.

For cadlag functions, things are a bit more complicated. Two functions might be very similar, but have a discontinuity of similar magnitude at slightly different places. The sup norm of the difference is therefore macroscopically large. So we want a metric that also allows uniformly small deformations of the time scale.

We define the Skorohod (or Skorokhod depending on your transliteration preferences) metric d on D[0,1] as follows. Let \Lambda be the family of continuous, strictly increasing functions from [0,1] to [0,1] which map 0 to 0 and 1 to 1. This will be our family of suitable reparameterisations of the time scale (or abscissa – a new word I learned today. The other axis in a co-ordinate pair is called the ordinate). Anyway, we now say that

d(x,y)<\epsilon\quad\text{if }\exists \lambda\in\Lambda\text{ s.t. }

||\lambda - id||_\infty<\epsilon\quad\text{and}\quad ||f-\lambda\circ g||_\infty<\epsilon.

In other words, after reparameterising the time scale for g, without moving any time by more than epsilon, the functions are within epsilon in the sup metric.

Weak Convergence

We have the condition: if \{P_n\} is a tight sequence of probability measures and we have

\text{If }P_n\pi_{t_1,\ldots,t_k}^{-1}\Rightarrow P\pi_{t_1,\ldots,t_k}^{-1}\quad\forall t_1,\ldots,t_k\in[0,1],\quad\text{then }P_n\Rightarrow P,

where \pi_{t_1,\ldots,t_k} is the projection onto a finite-dimensional set. This is a suitable condition for C[0,1]. For D[0,1], we have the additional complication that these projections might not be continuous everywhere.

We can get over this problem. For a measure P, set T_P to be the set of t\in[0,1] such that \pi_t is continuous P-almost everywhere (ie for all f\in D apart from a collection with P-measure = 0). Then, for all P, it is not hard to check that 0,1\in T_P and [0,1]\backslash T_P is countable.

The tightness condition requires two properties:

1) \lim_{K\rightarrow\infty} \limsup_{n}P_n[f:||f||\geq K]=0.

2) \forall \epsilon>0:\,\lim_\delta\limsup_n P_n[f:w_f'(\delta)\geq\epsilon]=0.

These say, respectively, that the measure of ||f|| doesn’t escape to \infty, and there is no mass given in the limit to functions which ‘wiggle with infinite frequency on an epsilon scale of amplitude’.

D_\infty=D[0,\infty)

Our earlier definition of the Skorohod metric could have been written:

d(f,g)=\inf_{\lambda\in\Lambda}\{||\lambda-\text{id}||\vee||f-\lambda\circ g||\}.

From a topological convergence point of view, there’s no need to use the sup norm on \lambda - \text{id}. We want to regulate smoothness of the reparameterisation, so we could use the norm:

||\lambda||^\circ=\sup_{s<t}|\log\frac{\lambda(t)-\lambda(s)}{t-s}|,

that is, the slope is uniformly close to 1 if ||\lambda||^\circ is small. The advantage of this choice of norm is that an extension to D[0,\infty) is immediate. Also, the induced product norm

d^\circ(f,g)=\inf_{\lambda\in\Lambda} \{||\lambda - \text{id}||^\circ \vee||x-\lambda\circ y||\}

is complete. This gives us a few problems, as for example

d_\circ(1_{[0,1)},1_{[0,1-\frac{1}{n})})=1,

as you can’t reparameterise over the jump in a way that ensures the log of the gradient is relatively small. (In particular, to keep the sup norm less than 1, we would need \lambda to send [1-\frac{1}{n}]\mapsto 1, and so ||\lambda||^\circ=\infty by definition.)

So we can’t immediately define Skorohod convergence on D_\infty by demanding convergence on any restriction to [0,t]. We overcome this in a similar way to convergence of distribution functions.

Lemma: If d_t^\circ (f_n,f)\rightarrow_n 0 then for any s<t with f cts at s, then d_s^\circ(f_n,f)\rightarrow_n 0.

So this says that the functions converge in Skorohod space if for arbitrarily large times T where the limit function is continuous, the restrictions to [0,T] converge. (Note that cadlag functions have at most countably many discontinuities, so this is fine.)

A metric for D_\infty

If we want to specify an actual metric d_\infty^\circ, the usual tools for specifying a countable product metric will do here:

d_\infty^\circ(f,g)=\sum_{m\geq 1}2^{-m}[1\wedge d_m^\circ(f^m,g^m)],

where f^m is the restriction of f to [0,m], with the potential discontinuity at m smoothed out:

f^m(t)=\begin{cases}t&t\leq m-1\\ (m-t)f(t)&t\in[m-1,m]\\ 0&t\geq m.\end{cases}

In particular, d_\infty^\circ(f,g)=0\Rightarrow f^m=g^m\,\forall m.

It can be checked that:

Theorem: d_\infty^\circ(f_n,f)\rightarrow 0 in D_\infty if and only iff

\exists \lambda_n\in\Lambda_\infty\text{ s.t. }||\lambda_n-\text{id}||\rightarrow 0

\text{and }\sup_{t\leq m}|\lambda_n\circ f_n-f|\rightarrow_n 0,\,\forall m,

and that d_\infty^\circ (f_n,f)\rightarrow 0 \Rightarrow d_t^\circ(f_n,f)\rightarrow 0 for every point of continuity t of f.

Similarly weak convergence and tightness properties are available, roughly as you might expect. It is probably better to reference Billingsley’s book or similar sources rather than further attempting to summarise them here.