Skorohod embedding

Background

Suppose we are given a standard Brownian motion (B_t), and a stopping time T. Then, so long as T satisfies one of the regularity conditions under which the Optional Stopping Theorem applies, we know that \mathbb{E}[B_T]=0. (See here for a less formal introduction to OST.) Furthermore, since B_t^2-t is a martingale, \mathbb{E}[B_T^2]=\mathbb{E}[T], so if the latter is finite, so is the former.

Now, using the strong Markov property of Brownian motion, we can come up with a sequence of stopping times 0=T_0, T_1, T_2,\ldots such that the increments T_k-T_{k-1} are IID with the same distribution as T. Then 0,B_{T_1},B_{T_2},\ldots is a centered random walk. By taking T to be the hitting time of \{-1,+1\}, it is easy to see that we can embed simple random walk in a Brownian motion using this approach.

p1020956_compressedEmbedding simple random walk in Brownian motion.

The Skorohod embedding question asks: can all centered random walks be constructed in this fashion, by stopping Brownian motion at a sequence of stopping time? With the strong Markov property, it immediately reduces the question of whether all centered finite-variance distributions X can be expressed as B_T for some integrable stopping time T.

The answer to this question is yes, and much of what follows is drawn from, or at least prompted by Obloj’s survey paper which details the problem and rich history of the many approaches to its solution over the past seventy years.

Applications and related things

The relationship between random walks and Brownian motion is a rich one. Donsker’s invariance principle asserts that Brownian motion appears as the scaling limit of a random walk. Indeed, one can construct Brownian motion itself as the limit of a sequence of consistent random walks with normal increments on an increasingly dense set of times. Furthermore, random walks are martingales, and we know that continuous, local martingales can be expressed as a (stochastically) time-changed Brownian motion, from the Dubins-Schwarz theorem.

The Skorohod embedding theorem can be used to prove results about random walks with general distribution by proving the corresponding result for Brownian motion, and checking that the construction of the sequence of stopping times has the right properties to allow the result to be carried back to the original setting. It obviously also gives a coupling between a individual random walk and a Brownian motion which may be useful in some contexts, as well as a coupling between any pair of random walks. This is useful in proving results for random walks which are much easier for special cases of the distribution. For example, when the increments are Gaussian, or when there are combinatorial approaches to a problem about simple random walk. At the moment no aspect of this blog schedule is guaranteed, but I plan to talk about the law of the iterated logarithm shortly, whose proof is approachable in both of these settings, as well as for Brownian motion, and Skorohod embedding provides the route to the general proof.

At the end, we will briefly compare some other ways to couple a random walk and a Brownian motion.

Adding extra randomness

One thing we could do is sample a copy of X independently from the Brownian motion, then declare T= \tau_{X}:= \inf\{t\ge 0: B_t=X\}, the hitting time of (random value) X. But recall that unfortunately \tau_x has infinite expectation for all non-zero x, so this doesn’t fit the conditions required to use OST.

Skorohod’s original method is described in Section 3.1 of Obloj’s notes linked above. The method is roughly to pair up positive values taken by X appropriately with negative values taken by X in a clever way. If we have a positive value b and a negative value a, then \tau_{a,b}, the first hitting time of \mathbb{R}\backslash (a,b) is integrable. Then we choose one of these positive-negative pairs according to the projection of the distribution of X onto the pairings, and let T be the hitting time of this pair of values. The probability of hitting b conditional on hitting {a,b} is easy to compute (it’s \frac{-a}{b-a}) so we need to have chosen our pairs so that the ‘probability’ of hitting b (ie the density) comes out right. In particular, this method has to start from continuous distributions X, and treat atoms in the distribution of X separately.

The case where the distribution X is symmetric (that is X\stackrel{d}=-X) is particularly clear, as then the pairs should be (-x,x).

However, it feels like there is enough randomness in Brownian motion already, and subsequent authors showed that indeed it wasn’t necessary to introduce extra randomness to provide a solution.

One might ask whether it’s possible to generate the distribution on the set of pairs (as above) out of the Brownian motion itself, but independently from all the hitting times. It feels like it might be possible to make the distribution on the pairs measurable with respect to

\mathcal{F}_{0+} = \bigcap\limits_{t>0} \mathcal{F}_t,

the sigma-algebra of events determined by limiting behaviour as t\rightarrow 0 (which is independent of hitting times). But of course, unfortunately \mathcal{F}_{0+} has a zero-one law, so it’s not possible to embed non-trivial distributions there.

Dubins solution

The exemplar for solutions without extra randomness is due to Dubins, shortly after Skorohod’s original argument. The idea is to express the distribution X as the almost sure limit of a martingale. We first use the hitting time of a pair of points to ‘decide’ whether we will end up positive or negative, and then given this information look at the hitting time (after this first time) of two subsequent points to ‘decide’ which of four regions of the real interval we end up in.

I’m going to use different notation to Obloj, corresponding more closely with how I ended up thinking about this method. We let

a_+:= \mathbb{E}[X \,|\, X>0], \quad a_- := \mathbb{E}[X\,|\, X<0], (*)

and take T_1 = \tau_{\{a_-,a_+\}}. We need to check that

\mathbb{P}\left( B_{T_1}=a_+\right) = \mathbb{P}\left(X>0\right),

for this to have a chance of working. But we know that

\mathbb{P}\left( B_{T_1}=a_+\right) = \frac{a_+}{a_+-a_-},

and we can also attack the other side using (*) and the fact that \mathbb{E}[X]=0, using the law of total expectation:

0=\mathbb{E}[X]=\mathbb{E}[X\,|\, X>0] \mathbb{P}(X>0) + \mathbb{E}[X\,|\,X<0]\mathbb{P}(X<0) = a_+ \mathbb{P}(X>0) + a_- \left(1-\mathbb{P}(X>0) \right),

\Rightarrow\quad \mathbb{P}(X>0)=\frac{a_+}{a_+-a_-}.

Now we define

a_{++}=\mathbb{E}[X \,|\, X>a_+],\quad a_{+-}=\mathbb{E}[X\,|\, 0<X<a_+],

and similarly a_{-+},a_{--}. So then, conditional on B_{T_1}=a_+, we take

T_2:= \inf_{t\ge T_1}\left\{ B_t\not\in (a_{+-},a_{++})  \right\},

and similarly conditional on B_{T_1}=a_-. By an identical argument to the one we have just deployed, we have \mathbb{E}\left[B_{T_2} \,|\,\mathcal{F}_{T_1} \right] = B_{T_1} almost surely. So, although the a_{+-+} notation now starts to get very unwieldy, it’s clear we can keep going in this way to get a sequence of stopping times 0=T_0,T_1,T_2,\ldots where B_{T_n} determines which of the 2^n regions of the real line any limit \lim_{m\rightarrow\infty} B_{T_m} should lie in.

A bit of work is required to check that the almost sure limit T_n\rightarrow T is almost surely finite, but once we have this, it is clear that B_{T_n}\rightarrow B_T almost surely, and B_T has the distribution required.

Komlos, Major, Tusnady coupling

We want to know how close we can make this coupling between a centered random walk with variance 1, and a standard Brownian motion. Here, ‘close’ means uniformly close in probability. For large times, the typical difference between one of the stopping times 0,T_1,T_2,\ldots in the Skorohod embedding and its expectation (recall \mathbb{E}[T_k]=k) is \sqrt{n}. So, constructing the random walk S_0,S_1,S_2,\ldots from the Brownian motion via Skorohod embedding leads to

\left |S_k - B_k \right| = \omega(n^{1/4}),

for most values of k\le n. Strassen (1966) shows that the true scale of the maximum

\max_{k\le n} \left| S_k - B_k \right|

is slightly larger than this, with some extra powers of \log n and \log\log n as one would expect.

The Komlos-Major-Tusnady coupling is a way to do a lot better than this, in the setting where the distribution of the increments has a finite MGF near 0. Then, there exists a coupling of the random walk and the Brownian motion such that

\max_{k\le n}\left|S_k- B_k\right| = O(\log n).

That is, there exists C such that

\left[\max_{k\le n} \left |S_k-B_k\right| - C\log n\right] \vee 0

is a tight family of distributions, indeed with uniform exponential tail. To avoid digressing infinitely far from my original plan to discuss the proof of the law of iterated logarithm for general distributions, I’ll stop here. I found it hard to find much coverage of the KMT result apart from the challenging original paper, and many versions expressed in the language of empirical processes, which are similar to random walks in many ways relevant to convergence and this coupling, but not for Skorohod embedding. So, here is a link to some slides from a talk by Chatterjee which I found helpful in getting a sense of the history, and some of the modern approaches to this type of normal approximation problem.

Random Maps 3 – Leaves and Geodesics in BCRT

Recall in the previous two posts, we’ve introduced some of the background to maps on various surfaces. In particular, we’ve introduced the remarkable Cori-Vauquelin-Schaeffer bijection which maps between plane trees labelled with uniform increments and quadrangulations of the sphere, up to some careful fiddling around with rooting and pointing an edge.

We are interested in the case where we choose uniformly a large element from these classes. We want to derive a scaling limit for the uniform planar quadrangulation, and we hope that we will be able to carry some properties of the scaling limit of the labelled trees, which may well be simpler, across the CVS bijection. It is convenient that the vertices of the plane tree become the vertices of the quadrangulation. We are looking to find some sort of metric limit, in the Gromov-Hausdorff sense, and so it will remain to deduce exactly how to use the labelling obtained from the tree to gain information about distances in the (limiting) quadrangulation.

Of course, all of this relies on the fact that there is a nice limit for the ordered plane trees in the first place. Unsurprisingly, it turns out that this is Aldous’s Brownian continuum random tree. The easiest way to see this is to consider the contour process of the ordered plane tree. This is chosen uniformly at random from the set of paths from (0,0) to (2n,0) with increments of size {-1,1} and which stay non-negative. It is thus precisely a simple random walk started at (0,0) conditioned to hit (2n,0) and to be non-negative. Since SRW suitably rescaled converges to Brownian motion, it is unsurprising (but not totally trivial) that this conditioned object converges to a Brownian excursion.

The Brownian excursion can be viewed as a continuous analogue of the contour process for the BCRT, but it is more natural to consider this convergence in the Gromov-Hausdorff topology. In this setting, we say that for a large value of n, the tree is ‘roughly isometric’ to the BCRT in distribution. Here, roughly isometric means the two metric spaces can be embedded isometrically into a common metric space such that they are close together, now in the sense of Hausdorff distance.

At this point, it is worth thinking about this interpretation of the BCRT. We have previously considered this as the scaling limit of a uniformly chosen Cayley tree, that is any unrooted tree on n labelled vertices. Essentially, we are now specifying that the BCRT can carry extra information, namely a root, and geometric information about the order of branches. The root is uncontroversial. Canonically, the root of the BCRT will be at the point associated with time 0 in the driving Brownian excursion. However, we can easily check that the distribution of a uniform rooted plane tree is invariant under re-rooting, and so any argument we have for convergence of the rooted trees to the BCRT will work with the root in a different place. Applying something like a tower law, we conclude that the convergence works when the root is chosen uniformly in the limit.

One potential problem to be discussed is what it means to choose a point uniformly in the limit. We have two possible approaches. One is to consider Lebesgue measure on any path in the BCRT, and glue these together. However, we have a uniform stick-breaking construction of the BCRT, and one consequence of the construction is that the total length of sticks required is infinite, so this won’t work.

The other option is to project Lebesgue measure on [0,1] via the same map that sends points on the Brownian excursion to points in the tree. Note that the so-called real tree is constructed from the excursion by identifying points s and t where f(s)=f(t), and f(x)>f(s) for x in (s,t). But then we might wonder whether this can really be said to be ‘uniform’, since different points in the BCRT will have a different number of pre-images in [0,1]. In fact though, it turns out that in this sense, projected-Leb[0,1]-almost all the points in the BCRT are leaves.

To prove this, naturally we first need to define a leaf, in the setting of these continuum trees. The degree of a vertex is an idea we might keep in mind, but we can’t use this, as we don’t have vertices any more. However, we have a continuous analogue of degree, given by counting the number of connected components remaining after removing a vertex. In particular, we can define the set of leaves as

\mathcal{L}(\mathcal{T}):=\{x\in\mathcal{T}:\mathcal{T}\backslash \{x\}\text{ is connected}\}.

We will give a sketch proof of this result about leaves shortly. First, we clarify some notation, and consider properties of geodesics (shortest-length paths) in the tree.

Define \check{f}(u,v):=\min_{x\in[u\wedge v,u\vee v]} f(x) to be the minimum value attained by f between u and v. Consider the value x at which this minimum is attained. Then, projecting onto the tree, p(x) is the ‘most recent common ancestor’ of points u and v. We can make this a bit more precise by considering geodesics in the tree starting at the root. Analogous to the unique path property in a discrete tree, in this continuous setting there is a unique path from the root to any given point, along which the height is strictly increasing. This is not surprising. It follows from one of the definitions of a real tree that the length of the path from p(0) to p(s) should be f(s), and so there is a unique isometric embedding of [0,f(s)] into \mathcal{T}_f which starts at 0 and ends at p(s). Anyway, under this p(\check{f}(s,t)) gives the point at which the geodesics from p(s) to p(0) and from p(t) to p(0) meet.

DSC_4255

Furthermore, we can now describe the distance in the tree between p(s) and p(t). This is given by

d_f(s,t):= f(s)+f(t)-2\check{f}(s,t),

and with the geodesic picture, it is easy to see why. Consider the point x at which \check{f}(s,t) achieves the minimum. As we have said, this lies on the geodesics from p(s) to 0 and p(t) to 0, and paths between points are unique, so removing point x disconnects p(s) and p(t) in the tree. So we need to concatenate the geodesic from p(s) to p(x) and from p(x) to p(t). But these are subsets of the two geodesics discussed, and their respective lengths are f(s)-\check{f}(s,t) and f(t)-\check{f}(s,t).

We can now give a sketch proof the result that almost all the support of \lambda_f, the projection on Lebesgue measure from [0,1] onto \mathcal{T}_{f} is on \mathcal{T}(\mathcal{T}_f).

Given s,t\in[0,1], suppose we are removing p(s), and this separates p(t) from the root, which is canonically p(0). Without loss of generality, take t>s. Now suppose that \check{f}(s,t)<f(s), and that, as before, this infimum is attained at x\in[s,t]. Then the geodesic from p(0) to p(t) will pass through p(x), but not through p(s), so in particular, removing p(s) cannot disconnect p(t) from the root.

Thus, p(s) is not a leaf if and only if there exists some small window [s,t] such that f(s)\le f(x),\;\forall x\in[s,t]. By Blumenthal’s 0-1 law, for fixed x, this happens with probability 0 if f is Brownian motion. Here, f is not Brownian motion, but a Brownian excursion with length 1. However, Blumenthal’s 0-1 law depends on the instantaneous behaviour after time s, ie the sigma field \mathcal{F}_s^+. So, for s\in(0,1), the value of a Brownian at time 1 is independent of this sigma field, so if we imagine Brownian excursion as a ‘conditioned’ Brownian motion, this conditioning should have no effect on the conclusion of this corollary to Blumenthal’s 0-1 law.

This is not a formal argument, but it sketches why with probability 1, p(s) is a leaf for each s\in(0,1), from which the result follows.

Sticky Brownian Motion

This follows on pretty much directly from the previous post about reflected Brownian motion. Recall that this is a process defined on the non-negative reals which looks like Brownian motion away from 0. We consider whether RBM is the only such process, and how any alternative might be constructed as a limit of discrete-time Markov processes.

One of the alternatives is called Sticky Brownian motion. This process spends more time at 0 than reflected Brownian motion. In fact it spends some positive proportion of time at 0. My main aim here is to explain why some intuitive ideas I had about how this might arise are wrong.

The first thought was to ensure that each visit to 0 last some positive measure of time. This could be achieved by staying at 0 for an Exp(1) duration, every time the process visited it. It doesn’t seem unreasonable that this might appear as the limit of a standard SRW adjusted so that on each visit to 0 the walker waits for a time given by independent geometric distributions. These distributions are memoryless, so that is fine, but by Blumenthal’s 0-1 Law, starting from 0 a Brownian motion hits zero infinitely many times before any small time t. So in fact the process described above would be identically zero as before it gets anywhere it would have to spend some amount of time at 0 given by an infinite sum of Exp(1) RVs.

We will return later to the question of why the proposed discrete-time model will still converge to reflected BM rather than anything more exotic. First though, we should discount the possibility of any intermediate level of stickiness, where the set of times spent at 0 still has measure zero, but the local time at 0 grows faster than for standard reflected BM. We can define the local time at 0 through a limit

L_t=\lim_{\epsilon\downarrow 0}\frac{1}{2\epsilon}\text{Leb}(\{0\le s \le t: |B_t|<\epsilon\})

of the measure of time spent very near 0, rescaled appropriately. So if the measure of the times when the process is at 0 is zero, then the local time is determined by behaviour near zero rather than by behaviour at zero. More precisely, on the interval [-\epsilon,\epsilon], the process behaves like Brownian motion, except on a set of measure zero, so the local time process should look the same as that of BM itself. Note I don’t claim this as a formal proof, but I hope it is a helpful heuristic for why you can’t alter the local time process without altering the whole process.

At this stage, it seems sensible to define Sticky Brownian motion. For motivation, note that we are looking for a process which spends a positive measure of time at 0. So let’s track this as a process, say C_t. Then the set of times when C is increasing is sparse, as it coincides with the process being 0, but we know we cannot wait around at 0 for some interval of time without losing the Markov property. So C shares properties with the local time of a reflected BM. The only difference is that the measure of times when C is increasing is positive here, but zero for the local time.

So it makes sense to construct the extra time spent at zero from the local time of a standard reflected BM. The heuristic is that we slow down the process whenever it is at 0, so that local time becomes real time. We can also control the factor by which this slowing-down happens, so define

\sigma(s)=\rho L(s)+s,

where L is the local time process of an underlying reflected BM, and \rho>0 is a constant. So \sigma is a map giving a random time-change. Unsurprisingly, we now define Sticky BM as the reflected BM with respect to this time-change. To do this formally, it is easiest to define a family of stopping times \{\tau_t\}, such that \sigma(\tau_t)=t, \tau_{\sigma(s)}=s, then if X is the reflected BM, define Y_t=X_{\tau_t} for the sticky BM.

It is worth thinking about what the generator of this process should be. In particular, why should it be different to reflected BM? The key observation is that the drift of the underlying reflected BM is essentially infinite at 0. By slowing down the process at 0, this drift becomes finite. So the martingale associated with sticky BM is precisely a time-changed version of the martingale associated with the underlying reflected BM, but this time-change is precisely what is required to give a generator. We get:

(\mathcal{L}f)(x)=\begin{cases}\frac12f''(x)&\quad x>0\\ \rho^{-1}f'(0) &\quad x=0.\end{cases}

Now that we have the generator, it starts to become apparent how sticky BM might appear as a limit of discrete-time walks. The process must look like mean-zero, unit-variance RW everywhere except near 0, where the limiting drift should be \rho^{-1}. Note that when considering the limiting drift near zero, we are taking a joint limit in x and h. The order of this matters. As explained at the end of the previous article, we only need to worry about the limiting drift along sequences of x,h such that a_h(x)\rightarrow 0. If no such sequences exist, or the limiting drift along any of these is infinite, then we actually have a reflected boundary condition.

This highlights one confusing matter about convergence of reflected processes. The boundary of the discrete-time process should converge to the boundary of the reflected process, but we also have to consider where reflective behaviour happens. Can we get sticky BM with reflection only at the boundary in the discrete-time processes? The answer turns out to be no. At the start of this article, I proposed a model of SRW with geometric waiting times whenever the origin was visiting. What is the limit of this?

The trick is to consider how long the discrete process spends near 0, after rescaling. It will spend a multiple 1/p more time at 0 itself, where p is the parameter of the geometric distribution, but no more time than expected at any point x\in(0,\epsilon). But time spent in (0,\epsilon) dominates time spent at 0 before this adjustment, so must also dominate it after the adjustment, so in the limit, the proportion of time spent at 0 is unchanged, and so in particular it cannot be positive.

Because of all of this, in practice it seems that most random walks we might be interested in converge (if they converge to a process at all) to a reflected SDE/diffusion etc, rather than one with sticky boundary conditions. I feel I’ve been talking a lot about Markov processes converging, so perhaps next, or at least soon, I’ll write some more technical things about exactly what conditions and methods are required to prove this.

REFERENCES

S. Varadhan – Chapter 16 from a Lecture Course at NYU can be found here.

Enhanced by Zemanta

Markovian Excursions

In the previous post, I talked about the excursions of a Brownian motion. Today I’m thinking about how to extend these ideas to more general Markov chains. First we want to rule out some situations. In particular, we aren’t hugely interested in discrete time Markov chains. The machinery is fairly well established for excursions, whether or not the chain is transient. Furthermore, if the state space is discrete, as for a Poisson process for example, the discussion is not hugely interesting either. Remember that the technical challenges in the constructions of local time arise because of the Blumenthal 0-1 law property that Brownian motion visits 0 infinitely often in the small window after the start time. We therefore want the process to be regular at the point of the state space under discussion. This is precisely the condition described above for BM about 0.

Why is it harder in general?

The informal notion of a local time should transfer to a more general Markov chain, but there are some problems. Firstly, to define something in terms of an integral is not general enough. The state space E needs some topological structure, but any meaningful definition must be in terms of functions from E into the reals. There were also all sorts of special properties of Brownian motion, including the canonical time-space rescaling that came in handy in that particular case. It turns out to be easiest to consider the excursion measure on a general Markov chain through its Laplace transform.

Definition and Probabilistic interpretations

The resolvent is the Laplace transform of the transition probability P_t(x,A), viewed as an operator on functions f:E\rightarrow \mathbb{R}.

R_\lambda f(x):=\mathbb{E}_x\left[\int_0^\infty e^{-\lambda t}f(X_t)dt\right]=\int_0^\infty e^{-\lambda t}P_tf(x)dt.

We can interpret this in terms of the original process in a couple of ways which may be useful later. The motivation is that we would like to specify a Poisson process of excursions, for which we need to know the rate. We hope that the rate will in fact be constant, so it will in fact to suffice to work out the properties of the expected number of excursions (or whatever) up to various random times, in particular those given by exponential RVs.

So, we take \zeta\sim\text{Exp}(\lambda) independent of everything else, and assume that we ‘kill the chain’ at time \zeta. Then, by shuffling expectations in and out of the integral and separating independent bits, we get:

R_\lambda f(x)=\mathbb{E}_x\int_0^\zeta f(X_s)ds = \frac{1}{\lambda}\mathbb{E}_xf(X_\zeta).

As in the Brownian local time description

R_\lambda 1_A(x)=\mathbb{E}(\text{time spent in }A\text{ before death at time }\zeta_\lambda).

Markovian property

We want to show that excursions are Markov, once we’ve sorted out what an ‘excursion’ actually means in this context. We do know how to deal with the Markovian property once we are already on an excursion. It is relatively straightforward to define an extension of the standard transition probability operator, to include a condition that the chain should not hit point a during the transition. That is

_aP_t(x,A):= \mathbb{P}_x(X_t\in A\cap H_a>t).

This will suffice to define the behaviour once an excursion has started. The more complicated bit will be the entrance law n_t(A), being the probability of arriving at A after time t of an excursion. To summarise, as with BM, all the technical difficulties with an excursion happen at the beginning, ie bouncing around the start point, but once it is ‘up-and-running’, its path is Markovian and controlled by _aP_t.

Marking

The link between the resolvent and the excursions, is provided as in the Brownian case, by supplying a PPP of marks at uniform rate \lambda to real time. This induces a mark process on excursions, weighted by an (exponential) function of excursion length. We make no distinction between an excursion including one mark or many marks. Then the measure on marked excursion is, in a mild abuse of notation:

n_\lambda=(1-e^{-\lambda\zeta(f)})\cdot n.

We compare with the Laplace transform n_\lambda(dx)=\int_0^\infty e^{-\lambda t}dtn_t(dx) using a probabilistic argument.

We can apply the measure to a function in the usual way: \lambda n_\lambda(1_A) is the measure of those excursions for which the first mark occurs in A. So by taking A=E, we get

\lambda n_\lambda(1)=\text{ Excursion measure }=\int_U n(df)(1-e^{-\lambda\zeta(f)}).

We have therefore linked the exponential mark process on excursion measure with the Laplace transform of the entrance law. So in particular:

\frac{\lambda n_\lambda(A)}{\lambda n_\lambda(1)}=\mathbb{P}(\text{first mark when in }A)=\int_0^\infty \lambda e^{-\lambda t}P_t(0,A)dt=\lambda R_\lambda 1_A(0).

The resolvent is relatively easy to calculate explicitly, and so we can find the Laplace transform n_\lambda(A). From this it is generally possible to invert the transform to find the entrance law n_t.

References

A Guided Tour Through Excursions – L. C. G. Rogers.

This pair of posts is very much a paraphrase of chapters 3 and 4 of this excellent text. The original can be found here (possibly not open access?)

Brownian Excursions and Local Time

I’ve been spending a fair bit of time this week reading and thinking about the limits of various combinatorial objects, in particular letting the number of vertices tend to \infty in models of random graphs with various constraints. Perhaps predictably, like so many continuous stochastic objects, yet again the limiting ‘things’ turn out to be closely linked to Brownian Motion. As a result, I’ve ended up reading a bit about the notion of local time, and thought it was sufficiently elegant even by itself to justify a quick post.

Local Time

In general, we might be interested in calculating a stochastic integral like

\int_0^t f(B_s)ds.

Note that, except in some highly non-interesting cases, this is a random variable. Our high school understanding of Riemannian integration encourages thinking of this as a ‘pathwise’ integral along the path evolving in time. But of course, that’s orthogonal to the approach we start thinking about when we are introduced to the Lebesgue integral. There we think about potential values of the integrand, and weight their contribution by the (Lebesgue) measure of the subset of the domain in which they appear.

Can we do the same for the stochastic integral? That is, can we find a measure which records how long the Brownian Motion spends at a point x? This measure will not be deterministic – effectively the stochastic behaviour of BM will be encoded through the measure rather than the argument of the function.

The answer is yes, and the measure in question is referred to as local time. More formally, we want

\int_0^t f(B_s)ds=\int_\mathbb{R}f(x)L(t,x)dx. (*)

where the local time L(t,x) is a random process, increasing for fixed x. Informally, one could take

\partial_t L(t,x) \propto 1(B_t=x)

but clearly in practice that won’t do at all for a definition, and so instead we use (*). In the usual way, if we want (*) to hold for all reasonably nice functions f, it suffices to check it for the indicator functions of Borel sets. L(t,.) is therefore often referred to as occupation density, while L(.,A) is local time.

Local Time as natural index for Excursions

An excursion, for example of Brownian Motion, is a segment of the path that has zero value only at its endpoints. Alternatively, it is a maximal open interval of time such that the path is away from 0. We want to specify the measure on these excursions. Here are some obvious difficulties.

By Blumenthal’s 0-1 law, BM started from zero hits zero infinitely often in any time interval [0,e], so in the same way that there is no first positive rational, there is no first excursion. We could pick the excursion occurring in progress at a fixed time t, but this is little better. Firstly, the resultant measure is size-biased by the length of the excursion, and more importantly, the proximity of t to the origin may be significant unless we know of some memorylessness type of property to excursions.

Local time allows us to solve these problems. We restrict attention to L_t:=L(t,0), the occupation density of 0. Let’s think about some advantages of indexing excursions by local time rather than by the start time:

  • The key observation is that local time remains constant on excursions. That is, if we are avoiding 0, the local time at 0 cannot grow because the BM spends no time there!
  • If we use start time, then we have a countably infinite number of small excursions accumulating close to 0, ie with very small start time. However, local time increases rapidly when there are lots of small excursions. Remember, lots of small excursions means that the BM hits 0 lots of times. So local time grows quickly through the annoying bits, and effectively provides a size-biasing for excursions that allows us to ignore the effects of the ‘Blumenthal excursions’ near time 0.
  • When indexed by time, excursions might be Markovian, in the sense that subsequent excursions (and in particular their lengths) are independent of past excursions.This is certainly not the case if you index by start time! If an excursion starts at time t and has length u, then the ‘next’ excursions, in as much as that makes sense, must surely start at time t+u.

We know there are only countably many excursions, hence there are only countably many local times which pertain to an excursion. This motivates considering the set of excursions as a Poisson Point Process on local time. Once you’ve had this idea, everything follows quite nicely. Working out the distribution of the constant rate (which is a measure on the set of excursions) remains, but essentially we now have a sensible framework for tracking the process of excursions, and from this we can reconstruct the original Brownian Motion.

Remarkable fact about Brownian Motion #2: Blumenthal’s 0-1 Law and its Consequences

Brownian motion is a martingale, so it is known that it has the Markovian property. That is, (B_{t+s}-B_s,t\geq 0) is a Brownian motion independent of \mathcal{F}_s. But in fact we can show that this is independent of \mathcal{F}_s^+=\cap_{t>s}\mathcal{F}_t, the sigma algebra that (informally) deals with events which are determined by the process up to time s and in the infinitesimal period after time s.

Is this surprising? Perhaps. This larger sigma algebra contains, for example, the existence and value of the right-derivative of the process at time s. But the events determined by the local behaviour after time s are clearly also determined by the whole process after time s, so the independence property looks likely to give a 0-1 type law, like in the proof of Kolmogorov’s 0-1 law.

Theorem: (B_{t+s}-B_s,t\geq 0) is independent of \mathcal{F}_s^+.

Proof: Take a sequence of times s<t_1<\ldots<t_k and A\in\mathcal{F}_s^+. It will suffice to show that the joint law of the new process at these times is independent of event A. So take F a bounded continuous function. The plan is to approximate B_s from above, as then we will definitely have independence from \mathcal{F}_s, and hope that we have enough machinery to carry through the statements through the limit down to s. So take s_n\downarrow s, and then: \mathbb{E}[F(B_{t_1+s}-B_s,\ldots,B_{t_k+s}-B_s)1_A]=\lim_n \mathbb{E}[F(B_{t_1+s_n}-B_{s_n},\ldots,B_{t_k+s_n}-B_{s_n})1_A] as continuity gives a.s. pointwise convergence, and we can lift to expectations by Dominated Convergence. The function in the limit on the right separates by independence as \mathbb{E}[F(_{t_1+s_n}-B_{s_n},\ldots,B_{t_k+s_n}-B_{s_n})]\mathbb{P}(A). Applying the previous argument in reserve gives that the limit of this is \mathbb{E}[F(B_{t_1+s}-B_s,\ldots,B_{t_k+s}-B_s)]\mathbb{P}(A) as desired.

In particular, this gives Blumenthal’s 0-1 Law, which states that \mathcal{F}_0^+ is trivial. This is apparent by setting s=0 in the above result, because then the process under discussion is the original process, and so \mathcal{F}_0^+ is independent of \mathcal{F}\supset \mathcal{F}_0^+.

A consequence is the following. For a BM in one dimension, let \tau=\inf\{t>0:B_t>0\}, \sigma=\inf\{t>0:B_t<0\}. By the fact that BM is almost surely non-constant, for any sample path, at least one of these is 0, and by symmetry \mathbb{P}(\tau=0)=\mathbb{P}(\sigma=0) so these are greater than or equal to 1/2. But it is easy to see that the event \{\tau=0\}\in\mathcal{F}_0^+, and so by the triviality of the sigma-field, this probability must be 1. With continuity, this means that every interval (0,\epsilon) contains a zero of the Brownian motion almost surely. Patching together on rational intervals (so we can use countable additivity) gives that BM is almost surely monotonic on no interval. A similar argument can be used to show that BM is almost surely not differentiable at t=0. For example, the existence and (conditional on existence) value of the derivative at t=0 is a trivial event, so by symmetry, either the derivative is a.s. =0, or a.s. doesn’t exist. Ruling out the former option can be done in a few ways. Predictably, in fact it is almost surely differentiable nowhere, but that is probably something to save for another post.