Skorohod embedding


Suppose we are given a standard Brownian motion (B_t), and a stopping time T. Then, so long as T satisfies one of the regularity conditions under which the Optional Stopping Theorem applies, we know that \mathbb{E}[B_T]=0. (See here for a less formal introduction to OST.) Furthermore, since B_t^2-t is a martingale, \mathbb{E}[B_T^2]=\mathbb{E}[T], so if the latter is finite, so is the former.

Now, using the strong Markov property of Brownian motion, we can come up with a sequence of stopping times 0=T_0, T_1, T_2,\ldots such that the increments T_k-T_{k-1} are IID with the same distribution as T. Then 0,B_{T_1},B_{T_2},\ldots is a centered random walk. By taking T to be the hitting time of \{-1,+1\}, it is easy to see that we can embed simple random walk in a Brownian motion using this approach.

p1020956_compressedEmbedding simple random walk in Brownian motion.

The Skorohod embedding question asks: can all centered random walks be constructed in this fashion, by stopping Brownian motion at a sequence of stopping time? With the strong Markov property, it immediately reduces the question of whether all centered finite-variance distributions X can be expressed as B_T for some integrable stopping time T.

The answer to this question is yes, and much of what follows is drawn from, or at least prompted by Obloj’s survey paper which details the problem and rich history of the many approaches to its solution over the past seventy years.

Applications and related things

The relationship between random walks and Brownian motion is a rich one. Donsker’s invariance principle asserts that Brownian motion appears as the scaling limit of a random walk. Indeed, one can construct Brownian motion itself as the limit of a sequence of consistent random walks with normal increments on an increasingly dense set of times. Furthermore, random walks are martingales, and we know that continuous, local martingales can be expressed as a (stochastically) time-changed Brownian motion, from the Dubins-Schwarz theorem.

The Skorohod embedding theorem can be used to prove results about random walks with general distribution by proving the corresponding result for Brownian motion, and checking that the construction of the sequence of stopping times has the right properties to allow the result to be carried back to the original setting. It obviously also gives a coupling between a individual random walk and a Brownian motion which may be useful in some contexts, as well as a coupling between any pair of random walks. This is useful in proving results for random walks which are much easier for special cases of the distribution. For example, when the increments are Gaussian, or when there are combinatorial approaches to a problem about simple random walk. At the moment no aspect of this blog schedule is guaranteed, but I plan to talk about the law of the iterated logarithm shortly, whose proof is approachable in both of these settings, as well as for Brownian motion, and Skorohod embedding provides the route to the general proof.

At the end, we will briefly compare some other ways to couple a random walk and a Brownian motion.

Adding extra randomness

One thing we could do is sample a copy of X independently from the Brownian motion, then declare T= \tau_{X}:= \inf\{t\ge 0: B_t=X\}, the hitting time of (random value) X. But recall that unfortunately \tau_x has infinite expectation for all non-zero x, so this doesn’t fit the conditions required to use OST.

Skorohod’s original method is described in Section 3.1 of Obloj’s notes linked above. The method is roughly to pair up positive values taken by X appropriately with negative values taken by X in a clever way. If we have a positive value b and a negative value a, then \tau_{a,b}, the first hitting time of \mathbb{R}\backslash (a,b) is integrable. Then we choose one of these positive-negative pairs according to the projection of the distribution of X onto the pairings, and let T be the hitting time of this pair of values. The probability of hitting b conditional on hitting {a,b} is easy to compute (it’s \frac{-a}{b-a}) so we need to have chosen our pairs so that the ‘probability’ of hitting b (ie the density) comes out right. In particular, this method has to start from continuous distributions X, and treat atoms in the distribution of X separately.

The case where the distribution X is symmetric (that is X\stackrel{d}=-X) is particularly clear, as then the pairs should be (-x,x).

However, it feels like there is enough randomness in Brownian motion already, and subsequent authors showed that indeed it wasn’t necessary to introduce extra randomness to provide a solution.

One might ask whether it’s possible to generate the distribution on the set of pairs (as above) out of the Brownian motion itself, but independently from all the hitting times. It feels like it might be possible to make the distribution on the pairs measurable with respect to

\mathcal{F}_{0+} = \bigcap\limits_{t>0} \mathcal{F}_t,

the sigma-algebra of events determined by limiting behaviour as t\rightarrow 0 (which is independent of hitting times). But of course, unfortunately \mathcal{F}_{0+} has a zero-one law, so it’s not possible to embed non-trivial distributions there.

Dubins solution

The exemplar for solutions without extra randomness is due to Dubins, shortly after Skorohod’s original argument. The idea is to express the distribution X as the almost sure limit of a martingale. We first use the hitting time of a pair of points to ‘decide’ whether we will end up positive or negative, and then given this information look at the hitting time (after this first time) of two subsequent points to ‘decide’ which of four regions of the real interval we end up in.

I’m going to use different notation to Obloj, corresponding more closely with how I ended up thinking about this method. We let

a_+:= \mathbb{E}[X \,|\, X>0], \quad a_- := \mathbb{E}[X\,|\, X<0], (*)

and take T_1 = \tau_{\{a_-,a_+\}}. We need to check that

\mathbb{P}\left( B_{T_1}=a_+\right) = \mathbb{P}\left(X>0\right),

for this to have a chance of working. But we know that

\mathbb{P}\left( B_{T_1}=a_+\right) = \frac{a_+}{a_+-a_-},

and we can also attack the other side using (*) and the fact that \mathbb{E}[X]=0, using the law of total expectation:

0=\mathbb{E}[X]=\mathbb{E}[X\,|\, X>0] \mathbb{P}(X>0) + \mathbb{E}[X\,|\,X<0]\mathbb{P}(X<0) = a_+ \mathbb{P}(X>0) + a_- \left(1-\mathbb{P}(X>0) \right),

\Rightarrow\quad \mathbb{P}(X>0)=\frac{a_+}{a_+-a_-}.

Now we define

a_{++}=\mathbb{E}[X \,|\, X>a_+],\quad a_{+-}=\mathbb{E}[X\,|\, 0<X<a_+],

and similarly a_{-+},a_{--}. So then, conditional on B_{T_1}=a_+, we take

T_2:= \inf_{t\ge T_1}\left\{ B_t\not\in (a_{+-},a_{++})  \right\},

and similarly conditional on B_{T_1}=a_-. By an identical argument to the one we have just deployed, we have \mathbb{E}\left[B_{T_2} \,|\,\mathcal{F}_{T_1} \right] = B_{T_1} almost surely. So, although the a_{+-+} notation now starts to get very unwieldy, it’s clear we can keep going in this way to get a sequence of stopping times 0=T_0,T_1,T_2,\ldots where B_{T_n} determines which of the 2^n regions of the real line any limit \lim_{m\rightarrow\infty} B_{T_m} should lie in.

A bit of work is required to check that the almost sure limit T_n\rightarrow T is almost surely finite, but once we have this, it is clear that B_{T_n}\rightarrow B_T almost surely, and B_T has the distribution required.

Komlos, Major, Tusnady coupling

We want to know how close we can make this coupling between a centered random walk with variance 1, and a standard Brownian motion. Here, ‘close’ means uniformly close in probability. For large times, the typical difference between one of the stopping times 0,T_1,T_2,\ldots in the Skorohod embedding and its expectation (recall \mathbb{E}[T_k]=k) is \sqrt{n}. So, constructing the random walk S_0,S_1,S_2,\ldots from the Brownian motion via Skorohod embedding leads to

\left |S_k - B_k \right| = \omega(n^{1/4}),

for most values of k\le n. Strassen (1966) shows that the true scale of the maximum

\max_{k\le n} \left| S_k - B_k \right|

is slightly larger than this, with some extra powers of \log n and \log\log n as one would expect.

The Komlos-Major-Tusnady coupling is a way to do a lot better than this, in the setting where the distribution of the increments has a finite MGF near 0. Then, there exists a coupling of the random walk and the Brownian motion such that

\max_{k\le n}\left|S_k- B_k\right| = O(\log n).

That is, there exists C such that

\left[\max_{k\le n} \left |S_k-B_k\right| - C\log n\right] \vee 0

is a tight family of distributions, indeed with uniform exponential tail. To avoid digressing infinitely far from my original plan to discuss the proof of the law of iterated logarithm for general distributions, I’ll stop here. I found it hard to find much coverage of the KMT result apart from the challenging original paper, and many versions expressed in the language of empirical processes, which are similar to random walks in many ways relevant to convergence and this coupling, but not for Skorohod embedding. So, here is a link to some slides from a talk by Chatterjee which I found helpful in getting a sense of the history, and some of the modern approaches to this type of normal approximation problem.

Sticky Brownian Motion

This follows on pretty much directly from the previous post about reflected Brownian motion. Recall that this is a process defined on the non-negative reals which looks like Brownian motion away from 0. We consider whether RBM is the only such process, and how any alternative might be constructed as a limit of discrete-time Markov processes.

One of the alternatives is called Sticky Brownian motion. This process spends more time at 0 than reflected Brownian motion. In fact it spends some positive proportion of time at 0. My main aim here is to explain why some intuitive ideas I had about how this might arise are wrong.

The first thought was to ensure that each visit to 0 last some positive measure of time. This could be achieved by staying at 0 for an Exp(1) duration, every time the process visited it. It doesn’t seem unreasonable that this might appear as the limit of a standard SRW adjusted so that on each visit to 0 the walker waits for a time given by independent geometric distributions. These distributions are memoryless, so that is fine, but by Blumenthal’s 0-1 Law, starting from 0 a Brownian motion hits zero infinitely many times before any small time t. So in fact the process described above would be identically zero as before it gets anywhere it would have to spend some amount of time at 0 given by an infinite sum of Exp(1) RVs.

We will return later to the question of why the proposed discrete-time model will still converge to reflected BM rather than anything more exotic. First though, we should discount the possibility of any intermediate level of stickiness, where the set of times spent at 0 still has measure zero, but the local time at 0 grows faster than for standard reflected BM. We can define the local time at 0 through a limit

L_t=\lim_{\epsilon\downarrow 0}\frac{1}{2\epsilon}\text{Leb}(\{0\le s \le t: |B_t|<\epsilon\})

of the measure of time spent very near 0, rescaled appropriately. So if the measure of the times when the process is at 0 is zero, then the local time is determined by behaviour near zero rather than by behaviour at zero. More precisely, on the interval [-\epsilon,\epsilon], the process behaves like Brownian motion, except on a set of measure zero, so the local time process should look the same as that of BM itself. Note I don’t claim this as a formal proof, but I hope it is a helpful heuristic for why you can’t alter the local time process without altering the whole process.

At this stage, it seems sensible to define Sticky Brownian motion. For motivation, note that we are looking for a process which spends a positive measure of time at 0. So let’s track this as a process, say C_t. Then the set of times when C is increasing is sparse, as it coincides with the process being 0, but we know we cannot wait around at 0 for some interval of time without losing the Markov property. So C shares properties with the local time of a reflected BM. The only difference is that the measure of times when C is increasing is positive here, but zero for the local time.

So it makes sense to construct the extra time spent at zero from the local time of a standard reflected BM. The heuristic is that we slow down the process whenever it is at 0, so that local time becomes real time. We can also control the factor by which this slowing-down happens, so define

\sigma(s)=\rho L(s)+s,

where L is the local time process of an underlying reflected BM, and \rho>0 is a constant. So \sigma is a map giving a random time-change. Unsurprisingly, we now define Sticky BM as the reflected BM with respect to this time-change. To do this formally, it is easiest to define a family of stopping times \{\tau_t\}, such that \sigma(\tau_t)=t, \tau_{\sigma(s)}=s, then if X is the reflected BM, define Y_t=X_{\tau_t} for the sticky BM.

It is worth thinking about what the generator of this process should be. In particular, why should it be different to reflected BM? The key observation is that the drift of the underlying reflected BM is essentially infinite at 0. By slowing down the process at 0, this drift becomes finite. So the martingale associated with sticky BM is precisely a time-changed version of the martingale associated with the underlying reflected BM, but this time-change is precisely what is required to give a generator. We get:

(\mathcal{L}f)(x)=\begin{cases}\frac12f''(x)&\quad x>0\\ \rho^{-1}f'(0) &\quad x=0.\end{cases}

Now that we have the generator, it starts to become apparent how sticky BM might appear as a limit of discrete-time walks. The process must look like mean-zero, unit-variance RW everywhere except near 0, where the limiting drift should be \rho^{-1}. Note that when considering the limiting drift near zero, we are taking a joint limit in x and h. The order of this matters. As explained at the end of the previous article, we only need to worry about the limiting drift along sequences of x,h such that a_h(x)\rightarrow 0. If no such sequences exist, or the limiting drift along any of these is infinite, then we actually have a reflected boundary condition.

This highlights one confusing matter about convergence of reflected processes. The boundary of the discrete-time process should converge to the boundary of the reflected process, but we also have to consider where reflective behaviour happens. Can we get sticky BM with reflection only at the boundary in the discrete-time processes? The answer turns out to be no. At the start of this article, I proposed a model of SRW with geometric waiting times whenever the origin was visiting. What is the limit of this?

The trick is to consider how long the discrete process spends near 0, after rescaling. It will spend a multiple 1/p more time at 0 itself, where p is the parameter of the geometric distribution, but no more time than expected at any point x\in(0,\epsilon). But time spent in (0,\epsilon) dominates time spent at 0 before this adjustment, so must also dominate it after the adjustment, so in the limit, the proportion of time spent at 0 is unchanged, and so in particular it cannot be positive.

Because of all of this, in practice it seems that most random walks we might be interested in converge (if they converge to a process at all) to a reflected SDE/diffusion etc, rather than one with sticky boundary conditions. I feel I’ve been talking a lot about Markov processes converging, so perhaps next, or at least soon, I’ll write some more technical things about exactly what conditions and methods are required to prove this.


S. Varadhan – Chapter 16 from a Lecture Course at NYU can be found here.

Enhanced by Zemanta

Reflected Brownian Motion

A standard Brownian motion is space-homogeneous, meaning that the behaviour of B_{T+t}-B_T does not depend on the value of B_T. By Donsker’s Theorem, such a Brownian motion is also the limit in a process space of any homogeneous random walk with zero-drift and constant variance, after suitable rescaling.

In many applications, however, we are interested in real-valued continuous-time Markov processes that are defined not on the whole of the real line, but on the half-line \mathbb{R}_{\ge 0}. So as BM is the fundamental real-valued continuous-time Markov process, we should ask how we might adjust it so that it stays non-negative. In particular, we want to clarify uniqueness, or at least be sure we have found all the sensible ways to make this adjustment, and also to consider how Donsker’s Theorem might work in this setting.

We should consider what properties we want this non-negative BM to have. Obviously, it should be non-negative, but it is also reasonable to demand that it looks exactly like BM everywhere except near 0. But since BM has a scale-invariance property, it is essentially meaningful to say ‘near 0’, so we instead demand that it looks exactly like BM everywhere except at 0. Apart from this, the only properties we want are that it is Markov and has continuous sample paths.

A starting point is so-called reflected Brownian motion, defined by X_t:=|B_t|. This is very natural and very convenient for analysis, but there are some problems. Firstly, this has the property that it looks like Brownian motion everywhere except 0 only because BM is space-homogeneous but also symmetric, in the sense that B_t\stackrel{d}{=}-B_t. This will be untrue for essentially any other process, so as a general method for how to keep stochastic processes positive, this will be useless. My second objection is a bit more subtle. If we consider this as an SDE, we get


This is a perfectly reasonable SDE but it is undesirable, because we have a function of B as coefficient on the RHS. Ideally, increments of X would be a function of X, and the increments of B, rather than the values of B. That is, we would expect X_{t+\delta t}-X_t to depend on X_t and on (B_{t+s}-B_t, 0\le s\le \delta t), but not on B_t itself, as that means we have to keep track of extra information while constructing X.

So we need an alternative method. One idea might be to add some non-negative process to the BM so that the sum stays non-negative. If this process is deterministic and finite, there there is some positive probability that the sum will eventually be negative, so this won’t do. We are looking therefore so a process which depends on the BM. Obviously we could take \max(-B_t,0), but this sum would then spend macroscopic intervals of time at 0, and these intervals would have the Raleigh distribution (for Brownian excursions) rather than the exponential distribution, hence the process given by the sum would not be memoryless and Markov.

The natural alternative is to look for an increasing process A_t, and then it makes sense to talk about the minimal increasing process that has the desired property. A moment’s thought suggests that A_t=-min_{s\le t}B_t satisfies this. So we have the decomposition


where S_t is the height of B above its running minimum. So S is an ideal alternative definition of reflecting BM. In particular, when B is away from its minimum, dB_t=dS_t, so this has the property that it evolves exactly as the driving Brownian motion.

What we have done is to decompose a general continuous process into the sum of a decreasing continuous process and a non-negative process. This is known as the Skorohod problem, and was the subject of much interest, even in the deterministic case. Note that process A has the property that it is locally constant almost everywhere, and is continuous, yet non-constant. Unsurprisingly, since A only changes when the underlying BM is 0, A is continuous with respect to the local time process at 0. In fact, A is the local time process of the underlying Brownian motion, by comparison with the construction by direct reflection.

One alternative approach is to look instead at the generator of the process. Recall that the generator of a process is an operator on some space of functions, with \mathcal{L}f giving the infinitissimal drift of f(X_t). In the case of Brownian motion, the generator (\mathcal{L}f)(x)=\frac12 f''(x) for bounded smooth functions f. This is equivalent to saying that

f(X_t)-f(X_0)-\int_0^t \frac12 f''(X_s)ds (*)

is a martingale. This must hold also for reflected Brownian motion, whenever x is greater than 0. Alternatively, if the function f is zero in a small neighbourhood of 0, it should have the same generator with respect to reflected BM. Indeed, for a general smooth bounded function f, we can still consider the expression (*) with respect to reflected BM. We know this expression behaves as a martingale except when X is zero. If f'(0)>0, and T is some hitting time of 0, then f(X_{T+\delta T})-f(X_T)\ge 0, hence the expression (*) is a submartingale. So if we restrict attention to functions with f'(0)=0, the generator remains the same. Indeed, by patching together all such intervals, it can be argued that even if f'(0) is not zero,

f(X_t)-f(X_0)-\int_0^t \frac12 f''(X_s)ds - f'(0)A_t

is a martingale, where A is the local time process at zero.

I was aware when I started reading about this that there was another family of processes called ‘Sticky Brownian Motion’ that shared properties with Reflected BM, in that it behaves like standard BM away from zero, but is also constrained to the non-negative reals. I think this will get too long if I also talk about that here, so that can be postponed, and for now we consider reflected BM as a limit of reflected (or other) random walks, bearing in mind that there is at least one other candidate to be the limit.

Unsurprisingly, if we have a family of random walks constrained to the non-negative reals, that are zero-drift unit-variance away from 0, then if they converge as processes, the limit is Brownian away from zero, and non-negative. Note that “away from 0” means after rescaling. So the key aspect is behaviour near zero.

What is the drift of reflected BM at 0? We might suspect it is infinite because of the form of the generator, but we can calculate it directly. Given X_0=0, we have:


so letting t\rightarrow 0, we see indeed that the drift is infinite at 0.

For convergence of discrete processes, we really need the generators to converge. Typically we index the discrete-time processes by the time unit h, which tends to 0, and b_h(x),a_h(x) are the rescaled drift and square-drift from x. We assume that we don’t see macroscopic jumps in the limit. For the case of simple random walk reflected at 0, it doesn’t matter exactly how we construct the joint limit in h and x, as the drift is uniform on x>0, but in general this does matter. I don’t want to discuss sticky BM right now, so it’s probably easiest to be vague and say that the discrete Markov processes converge to reflected BM so long they don’t spend more time than expected near 0 in the limit, as the title ‘sticky’ might suggest.

The two ways in which this can happen is if the volatility term a_h(x) is too small, in which case the process looks almost deterministic near 0, or if the drift doesn’t increase fast enough. And indeed, this leads to two conditions. The first is straightforward, if a_h(x) is bounded below, in the sense that \liminf_{h,x\rightarrow 0} a_h(x)\ge C>0, then we have convergence to reflected BM. Alternatively, the only danger can arise down those subsequences where a_h(x)\rightarrow 0, so if we have that b_h(x)\rightarrow +\infty whenever h,x,a_h(x)\rightarrow 0, then this convergence also holds.

Next time I’ll discuss what sticky BM means, what it doesn’t mean, why it isn’t easy to double the local time, and how to obtain sticky BM as a limit of discrete random walks in a similar way to the above.


S. Varadhan – Chapter 16 from a Lecture Course at NYU can be found here.

Enhanced by Zemanta

Supremum of Brownian Motion

We define the supremum process of Brownian Motion by:

S_t:=\sup_{0\leq s\leq t}B_s.

Here are two facts about Brownian Motion. Firstly, the Reflection Principle:

\mathbb{P}(S_t\geq b,B_t\leq a)=\mathbb{P}(B_t\geq 2b-a),

which we motivate by ‘stopping’ at time S_t, and using the SMP for Brownian Motion, even though it isn’t a stopping time. By setting a=b, we get:

\mathbb{P}(S_t\geq b)=\mathbb{P}(S_t\geq b,B_t\leq b)+\mathbb{P}(B_t\geq b)=2\mathbb{P}(B_t\geq b)=\mathbb{P}(|B|\geq b),

and conclude that

S_t\stackrel{d}{=}|B_t|\quad\text{for each }t\geq 0.

The second fact comes from the decomposition of BM into local times and excursions:

(S_t,S_t-B_t)_{t\geq 0}\stackrel{d}{=}(L_t,|B_t|)_{t\geq 0},

where L is the local time process at 0, and this equality in distribution holds for the processes. See the previous post on excursion theory for explanation of what local times mean.

In particular, combining these two facts gives:

S_t\stackrel{d}{=}S_t-B_t\quad\text{for every }t\geq 0.

I thought that was rather surprising, and wanted to think of a straightforward reason why this should be true. I think the following works:

Brownian motion is time-reversible. In particular, as processes, we have

(B_s)_{s\geq 0}\stackrel{d}{=}(B_{t-s}-B_t)_{s\geq 0}

\Rightarrow \sup_{0\leq r\leq t}B_r\stackrel{d}{=}\sup_{0\leq r\leq t}(B_{t-r}-B_t)

\Rightarrow S_t\stackrel{d}{=}S_t-B_t.

Subordinators and the Arcsine rule

After the general discussion of Levy processes in the previous post, we now discuss a particular class of such processes. The majority of content and notation below is taken from chapters 1-3 of Jean Bertoin’s Saint-Flour notes.

We say X_t is a subordinator if:

  • It is a right-continuous adapted stochastic process, started from 0.
  • It has stationary, independent increments.
  • It is increasing.

Note that the first two conditions are precisely those required for a Levy process. We could also allow the process to take the value \infty, where the hitting time of infinity represents ‘killing’ the subordinator in some sense. If this hitting time is almost surely infinite, we say it is a strict subordinator. There is little to be gained right now from considering anything other than strict subordinators.


  • A compound Poisson process, with finite jump measure supported on [0,\infty). Hereafter we exclude this case, as it is better dealt with in other languages.
  • A so-called stable Levy process, where \Phi(\lambda)=\lambda^\alpha, for some \alpha\in(0,1). (I’ll define \Phi very soon.) Note that checking that the sample paths are increasing requires only that X_1\geq 0 almost surely.
  • The hitting time process for Brownian Motion. Note that this does indeed have jumps as we would need. (This has \Phi(\lambda)=\sqrt{2\lambda}.)


  • In general, we describe Levy processes by their characteristic exponent. As a subordinator takes values in [0,\infty), we can use the Laplace exponent instead:

\mathbb{E}\exp(-\lambda X_t)=:\exp(-t\Phi(\lambda)).

  • We can refine the Levy-Khintchine formula;

\Phi(\lambda)=k+d\lambda+\int_{[0,\infty)}(1-e^{-\lambda x})\Pi(dx),

  • where k is the kill rate (in the non-strict case). Because the process is increasing, it must have bounded variation, and so the quadratic part vanishes, and we have a stronger condition on the Levy measure: \int(1\wedge x)\Pi(dx)<\infty.
  • The expression \bar{\Pi}(x):=k+\Pi((x,\infty)) for the tail of the Levy measure is often more useful in this setting.
  • We can think of this decomposition as the sum of a drift, and a PPP with characteristic measure \Pi+k\delta_\infty. As we said above, we do not want to consider the case that X is a step process, so either d>0 or \Pi((0,\infty))=\infty is enough to ensure this.

Analytic Methods

We give a snapshot of a couple of observations which make these nice to work with. Define the renewal measure U(dx) by:

\int_{[0,\infty)}f(x)U(dx)=\mathbb{E}\left(\int_0^\infty f(X_t)dt\right).

If we want to know the distribution function of this U, it will suffice to consider the indicator function f(x)=1_{X_t\leq x} in the above.

The reason to exclude step processes specifically is to ensure that X has a continuous inverse:

L_x=\sup\{t\geq 0:X_t\leq x\} so U(x)=\mathbb{E}L_x is continuous.

In fact, this renewal measure characterises the subordinator uniquely, as we see by taking the Laplace transform:

\mathcal{L}U(\lambda)=\int_{[0,\infty)}e^{-\lambda x}U(dx)=\mathbb{E}\int e^{-\lambda X_t}dt

=\int \mathbb{E}e^{-\lambda X_t}dt=\int\exp(-t\Phi(\lambda))dt=\frac{1}{\Phi(\lambda)}.

The Arcsine Law

X is Markov, which induces a so-called regenerative property on the range of X, \mathcal{R}. Formally, given s, we do not always have s\in\mathcal{R} (as the process might jump over s), but we can define D_s=\inf\{t>s:t\in\mathcal{R}\}. Then

\{v\geq 0:v+D_s\in\mathcal{R}\}\stackrel{d}{=}\mathcal{R}.

In fact, the converse holds as well. Any random set with this regenerative property is the range of some subordinator. Note that D_s is some kind of dual to X, since it is increasing, and the regenerative property induces some Markovian properties.

In particular, we consider the last passage time g_t=\sup\{s<t:s\in\mathcal{R}\}, in the case of a stable subordinator with \Phi(\lambda)=\lambda^\alpha. Here, \mathcal{R} is self-similar with scaling exponent \alpha. The distribution of \frac{g_t}{t} is thus independent of t. In this situation, we can derive the generalised arcsine rule for the distribution of g_1:

\mathbb{R}(g_1\in ds)=\frac{\sin \alpha\pi}{\pi}s^{\alpha-1}(1-s)^{-\alpha}ds.

The most natural application of this is to the hitting time process of Brownian Motion, which is stable with \alpha=\frac12. Then g_1=S_1-B_1, in the usual notation for the supremum process. Furthermore, we have equality in distribution of the processes (see previous posts on excursion theory and the short aside which follows):

(S_t-B_t)_{t\geq 0}\stackrel{d}{=}(|B_t|)_{t\geq 0}.

So g_1 gives the time of the last zero of BM before time 1, and the arcsine law shows that its distribution is given by:

\mathbb{P}(g_1\leq t)=\frac{2}{\pi}\text{arcsin}\sqrt{t}.

Brownian Excursions and Local Time

I’ve been spending a fair bit of time this week reading and thinking about the limits of various combinatorial objects, in particular letting the number of vertices tend to \infty in models of random graphs with various constraints. Perhaps predictably, like so many continuous stochastic objects, yet again the limiting ‘things’ turn out to be closely linked to Brownian Motion. As a result, I’ve ended up reading a bit about the notion of local time, and thought it was sufficiently elegant even by itself to justify a quick post.

Local Time

In general, we might be interested in calculating a stochastic integral like

\int_0^t f(B_s)ds.

Note that, except in some highly non-interesting cases, this is a random variable. Our high school understanding of Riemannian integration encourages thinking of this as a ‘pathwise’ integral along the path evolving in time. But of course, that’s orthogonal to the approach we start thinking about when we are introduced to the Lebesgue integral. There we think about potential values of the integrand, and weight their contribution by the (Lebesgue) measure of the subset of the domain in which they appear.

Can we do the same for the stochastic integral? That is, can we find a measure which records how long the Brownian Motion spends at a point x? This measure will not be deterministic – effectively the stochastic behaviour of BM will be encoded through the measure rather than the argument of the function.

The answer is yes, and the measure in question is referred to as local time. More formally, we want

\int_0^t f(B_s)ds=\int_\mathbb{R}f(x)L(t,x)dx. (*)

where the local time L(t,x) is a random process, increasing for fixed x. Informally, one could take

\partial_t L(t,x) \propto 1(B_t=x)

but clearly in practice that won’t do at all for a definition, and so instead we use (*). In the usual way, if we want (*) to hold for all reasonably nice functions f, it suffices to check it for the indicator functions of Borel sets. L(t,.) is therefore often referred to as occupation density, while L(.,A) is local time.

Local Time as natural index for Excursions

An excursion, for example of Brownian Motion, is a segment of the path that has zero value only at its endpoints. Alternatively, it is a maximal open interval of time such that the path is away from 0. We want to specify the measure on these excursions. Here are some obvious difficulties.

By Blumenthal’s 0-1 law, BM started from zero hits zero infinitely often in any time interval [0,e], so in the same way that there is no first positive rational, there is no first excursion. We could pick the excursion occurring in progress at a fixed time t, but this is little better. Firstly, the resultant measure is size-biased by the length of the excursion, and more importantly, the proximity of t to the origin may be significant unless we know of some memorylessness type of property to excursions.

Local time allows us to solve these problems. We restrict attention to L_t:=L(t,0), the occupation density of 0. Let’s think about some advantages of indexing excursions by local time rather than by the start time:

  • The key observation is that local time remains constant on excursions. That is, if we are avoiding 0, the local time at 0 cannot grow because the BM spends no time there!
  • If we use start time, then we have a countably infinite number of small excursions accumulating close to 0, ie with very small start time. However, local time increases rapidly when there are lots of small excursions. Remember, lots of small excursions means that the BM hits 0 lots of times. So local time grows quickly through the annoying bits, and effectively provides a size-biasing for excursions that allows us to ignore the effects of the ‘Blumenthal excursions’ near time 0.
  • When indexed by time, excursions might be Markovian, in the sense that subsequent excursions (and in particular their lengths) are independent of past excursions.This is certainly not the case if you index by start time! If an excursion starts at time t and has length u, then the ‘next’ excursions, in as much as that makes sense, must surely start at time t+u.

We know there are only countably many excursions, hence there are only countably many local times which pertain to an excursion. This motivates considering the set of excursions as a Poisson Point Process on local time. Once you’ve had this idea, everything follows quite nicely. Working out the distribution of the constant rate (which is a measure on the set of excursions) remains, but essentially we now have a sensible framework for tracking the process of excursions, and from this we can reconstruct the original Brownian Motion.

Dubins-Schwarz Theorem

In developing the stochastic integral, much of our motivation has come from considering integrals with respect to Brownian Motion. In this section, we develop some results which justify that Brownian Motion is the canonical stochastic process with non-zero quadratic variation (which is related, but not directly equivalent to the property of infinite total variation). In particular, we shall observe the Dubins-Schwarz theorem, which shows that martingales with unbounded (as time \rightarrow\infty) quadratic variation ARE Brownian Motion, up to a (stochastic) time change.

Recall Levy’s characterisation of a d-dimensional BM, which allows us to avoid considering independent normal increments. Given X^1,\ldots,X^d\in\mathcal{M}_{c,loc}:

X=(X^1,\ldots,X^d) a BM iff [X^i,X^j]_t=\delta_{ij}t

Obviously, one direction has been shown as part of the construction and properties of quadratic variation. For the other direction,, because laws are precisely defined by characteristic functions, it suffices to show that

\mathbb{E}\left[\exp(i\langle \theta,X_t-X_s\rangle)|\mathcal{F}_s\right]=\exp(-\frac12||\theta||^2(t-s))

We set Y_t:=\langle \theta,X_t\rangle, and deduce [Y]=t||\theta||^2 and Z=\mathcal{E}(iY)=\exp(iY_t+\frac12[Y]_t)\in\mathcal{M}_{c,loc}, and furthermore is bounded on compact [0,t], hence is a true martingale. So \mathbb{E}\left(\frac{Z_t}{Z_s}|\mathcal{F}_s\right)=1 which is pretty much what was required.

Now, Dubins-Schwarz states

Theorem: Given M\in\mathcal{M}_{c,loc}, M_0=0, [M]_\infty=\infty almost surely, if we set \tau_s:=\inf\{t:[M]_t>s\}, then B_s:=M_{\tau_s} is a (\mathcal{F}_{\tau_s})-BM, with M_t=B_{[M]_t}.

This final result is clear if [M]_t is almost surely strictly increasing in t: just take s=[M]_t in the definition.

We know B is cadlag: we first show B as defined is almost surely continuous. It remains to show B_{s-}=B_s\,\forall s>0\iff M_{\tau_{s-}}=M_{\tau_s}, noting that \tau_{s-}=\inf\{t\geq 0:[M]_t=s\} (by continuity) is a stopping time also.

The only interesting case is if \tau_{s-}<\tau_s, for which need to show [M] is constant. This is intuitively obvious, but formally, we must appeal to (M^2-[M])^{\tau_s} which is UI, since \mathbb{E}[M^{\tau_s}]_\infty<\infty. Now may apply OST to obtain \mathbb{E}[M_{\tau_s}^2-M_{\tau_{s-}}^2|\mathcal{F}_{\tau_{s-}}]=\mathbb{E}[(M_{\tau_s}-M_{\tau_{s-}})^2|\mathcal{F}_{\tau_{s-}}]=0 which implies M is almost surely constant on [\tau_{s-},\tau_s]. We need to lift this to the case where it holds for all s simultaneously almost surely. Note that cadlag almost surely plus almost surely continuous at each point does not implies almost surely continuous everywhere (eg consider H(U(0,1)) with H the Heaviside function and U a uniform distribution). Instead, we record intervals of constancy of both M_t,[M]_t. That is, we set

T_r=\inf\{t>r:M_t\neq M_r\},\quad S_r=\inf\{t>r:[M]_t\neq [M]_r\}

Then these are cadlag, and by above T_r=S_r\,\forall r\in\mathbb{Q}^+ a.s. therefore T_r=S_r\,\forall r almost surely. Thus M, [M] are constant on the same intervals.

We also check B is adapted to \mathcal{G}_t=\mathcal{F}_{\tau_t}. STP X_T1_{\{T<\infty\}} is \mathcal{F}_T-measurable for X cadlag adapted. Approximating T discretely from above gives the result, exploiting that the result is clear if T has countable support. Now, obtain M^{\tau_s}\in\mathcal{M}_c^2, so M_{t\wedge \tau_s} UI by Doob, so by OST, get \mathbb{E}[M_{\tau_s}|\mathcal{F}_{\tau_s}]=M_{\tau_r}, to get B a martingale. The finally:


And so we can apply Levy’s characterisation to finish the result.

Brownian Motion is not finite variation

There is a natural definition of ‘pathwise’ stochastic integrals of a certain type of ‘simple’ process with respect to cadlag non-decreasing processes. It can be a shown that a function is of finite variation iff it can be expressed as the difference of two such functions. Hence, these finite variation processes can be used as variable of integration via an obvious linear extension. One direction of this result is obvious; the other is fiddly. To proceed, we show that the total valuation process is cadlag (and, obviously, increasing), and then check that a'=\frac12(v+a),a''=\frac12(v-a) are processes satisfying the conditions of the result.

Our overall aim is to define integrals with respect to Brownian Motion since that is (in a sense to be made precise through the Dubins-Schwarz theorem later) the canonical non-trivial stochastic process with non-zero quadratic variation. The result we demonstrate shows that it is not possible to define the integral with respect to BM through pathwise finite variation integrals.

Theorem: M\in\mathcal{M}_{c,loc},M_0=0 a.s. is of finite variation. Then M is indistinguishable from 0.

We will show this for M a bounded martingale with bounded variation. Why does this suffice? In general, set S_n:=\inf\{t,V_t\leq n\}, noting that V is continuous adapted non-decreasing. If M^{S_n}\equiv 0\,\forall n, then we are done, as the S_ns are increasing. But this is a bounded martingale with bounded variation.

To prove this, we make use of the orthogonality relation which is a key trick for this sort of result: If M is a martingale, with M_s,M_t\in L^2, for s<t, then just by multiplying out:

\mathbb{E}[(M_t-M_s)^2|\mathcal{F}_s]=\mathbb{E}[M_t^2-M_s^2|\mathcal{F}_s] a.s.

Now, for this particular result, we decompose

\mathbb{E}[M_t^2]=\mathbb{E}\left[\sum_{k=0}^{2^n-1}(M_{(k+1)2^{-n}t}^2-M_{k2^{-n}t}^2)\right]=\mathbb{E}[\sum (M_{(k+1)2^{-n}t}-M_{k2^{-n}t})^2]

and then we bound this last term as

\leq \mathbb{E}\left[\sup_k [M_+-M_-]\sum_k |M_+-M_-|\right]

Now, as n\uparrow\infty, we have \sum_k |M_+-M_-|\uparrow V_t\leq N by the boundedness assumption. Furthermore, M is almost surely continuous on [0,t] and so it is in fact uniformly continuous, which allows us to conclude that

\sup_k |M_+-M_-|\downarrow 0

By bounded convergence, this limit applies equally under the expectation, and so conclude that \mathbb{E}M_t^2=0 for each time t, and so for each time t the martingale is almost surely equal to 0. In the usual, can lift this to rational points by countability, then to all points by continuity.

Strong Markov Property for BM

The Strong Markov Property is the most important result to demonstrate for any Markov process, such as Brownian Motion. It is also probably the most widely requested item of bookwork on the Part III Advanced Probability exam. I feel it is therefore worth practising writing as quickly as possible.

Theorem (SMP): Take (B_t) a standard (\mathcal{F}_t)-BM, and T an a.s. finite stopping time. Then (B_{T+t}-B_T,t\geq 0) is a standard BM independent of \mathcal{F}_T.

Proof: We write B_t^{(T)}=B_{T+t}-B_T for ease of notation. We will show that for any A\in\mathcal{F}_T and F bounded, measurable:


This will suffice to establish independence, and taking A=\Omega\in\mathcal{F}_t shows that B_t^T is a standard BM since (Levy), BM is uniquely characterised by its finite joint distributions.

To prove the result, we approximate discretely, and apply the Markov property.

\mathbb{E}[1_AF(B_{t_1}^{(T)},\ldots)]=\lim_{m\rightarrow\infty}\sum_{k=1}^\infty \mathbb{E}[1_{A\cap\{T\in((k-1)2^{-m},k2^{-m}]\}}F(B_{t_1}^{(k2^{-m})},\ldots)]

by bounded convergence, using continuity of F, right-continuity of B, and that T<\infty a.s. (so that 1_A=\sum 1_{A\cap \{T\in(-,-]\}})

\stackrel{\text{WMP}}{=}\lim_{m\rightarrow\infty}\sum_{k=1}^\infty \mathbb{P}[A\cap\{T\in((k-1)2^{-m},k2^{-m}]\}]\mathbb{E}F(B_{t_1},\ldots,B_{t_n})


which is exactly what we required.

Remarks: 1) We only used right-continuity of the process, and characterisation by joint marginals, so the proof works equally well for Levy processes.

2) We can in fact show that it is independent of \mathcal{F}_T^+, by considering T+\frac{1}{n} which is still a stopping time, then taking a limit in this as well in the above proof. For details of a similar result, see my post on Blumenthal’s 0-1 Law.

Remarkable fact about Brownian Motion #4: The Dirichlet Problem

So this property of Brownian Motion is so elegant, in my opinion, that when I was recently asked what my ‘favourite theorem’ was, I suggested this. With this result, we can use this probabilistic structure to specify solutions to an important PDE, with boundary conditions, over a large class of domains.

Given a domain D, Laplace’s equation is: \Delta u=0 on D, and u=f on the boundary dD, where f is any continuous function defined there. This PDE arises wherever the notion of potentials is defined, for example electromagnetism, fluids and thermodynamics.

Theorem: Given suitable regularity conditions on D to be discussed later, Laplace’s equation has a unique solution, given by:


Notation: First, what does this mean? Define T_D:=\inf\{t:B_t\not\in D\}, to be the time at which a Brownian Motion leaves the domain D. This is a stopping time, and so will be suitable for application of the Strong Markov Property. \mathbb{E}_x means that we are taking expectation with respect to a BM started at x. So informally, we are defining u(x) as: start a BM at x; see where it hits the boundary of D; record the value of f at that point. Then set u(x) to be the expected value of this process.

Existence: First, we are going to check that the solution conjectured is a solution. We will need a lemma:

Lemma: A locally-bounded function u satisfies \Delta u=0 on a domain D if and only if it has the property that for every closed ball \bar{B(x,r)}\subset D we have:


where \sigma_{x,r} is the surface area measure on the boundary S(x,r) of the ball radius r centred on x. Essentially, this says that u(x) is equal to the average value of u on a ball around x.

Proof of Theorem: First, existence. Set u as specified in the statement of the theorem. Given a Brownian Motion started at x, we have stopping times T_r<T_D corresponding to the hitting times of the ball radius r around x and the boundary dD. The domination condition holds by continuity provided B(x,r) is contained within D. So we may apply the Strong Markov Property:


By definition, the left hand expression is u(x). But also, because the distribution of B_{T_r} is uniform on S(x,r), the right hand side is equal to:


and so by the lemma, this guarantees that the function u is harmonic on the interior of D.

The lemma can also be used to show uniqueness. Continue reading