# Skorohod embedding

Background

Suppose we are given a standard Brownian motion $(B_t)$, and a stopping time T. Then, so long as T satisfies one of the regularity conditions under which the Optional Stopping Theorem applies, we know that $\mathbb{E}[B_T]=0$. (See here for a less formal introduction to OST.) Furthermore, since $B_t^2-t$ is a martingale, $\mathbb{E}[B_T^2]=\mathbb{E}[T]$, so if the latter is finite, so is the former.

Now, using the strong Markov property of Brownian motion, we can come up with a sequence of stopping times $0=T_0, T_1, T_2,\ldots$ such that the increments $T_k-T_{k-1}$ are IID with the same distribution as T. Then $0,B_{T_1},B_{T_2},\ldots$ is a centered random walk. By taking T to be the hitting time of $\{-1,+1\}$, it is easy to see that we can embed simple random walk in a Brownian motion using this approach.

Embedding simple random walk in Brownian motion.

The Skorohod embedding question asks: can all centered random walks be constructed in this fashion, by stopping Brownian motion at a sequence of stopping time? With the strong Markov property, it immediately reduces the question of whether all centered finite-variance distributions X can be expressed as $B_T$ for some integrable stopping time T.

The answer to this question is yes, and much of what follows is drawn from, or at least prompted by Obloj’s survey paper which details the problem and rich history of the many approaches to its solution over the past seventy years.

Applications and related things

The relationship between random walks and Brownian motion is a rich one. Donsker’s invariance principle asserts that Brownian motion appears as the scaling limit of a random walk. Indeed, one can construct Brownian motion itself as the limit of a sequence of consistent random walks with normal increments on an increasingly dense set of times. Furthermore, random walks are martingales, and we know that continuous, local martingales can be expressed as a (stochastically) time-changed Brownian motion, from the Dubins-Schwarz theorem.

The Skorohod embedding theorem can be used to prove results about random walks with general distribution by proving the corresponding result for Brownian motion, and checking that the construction of the sequence of stopping times has the right properties to allow the result to be carried back to the original setting. It obviously also gives a coupling between a individual random walk and a Brownian motion which may be useful in some contexts, as well as a coupling between any pair of random walks. This is useful in proving results for random walks which are much easier for special cases of the distribution. For example, when the increments are Gaussian, or when there are combinatorial approaches to a problem about simple random walk. At the moment no aspect of this blog schedule is guaranteed, but I plan to talk about the law of the iterated logarithm shortly, whose proof is approachable in both of these settings, as well as for Brownian motion, and Skorohod embedding provides the route to the general proof.

At the end, we will briefly compare some other ways to couple a random walk and a Brownian motion.

One thing we could do is sample a copy of X independently from the Brownian motion, then declare $T= \tau_{X}:= \inf\{t\ge 0: B_t=X\}$, the hitting time of (random value) X. But recall that unfortunately $\tau_x$ has infinite expectation for all non-zero x, so this doesn’t fit the conditions required to use OST.

Skorohod’s original method is described in Section 3.1 of Obloj’s notes linked above. The method is roughly to pair up positive values taken by X appropriately with negative values taken by X in a clever way. If we have a positive value b and a negative value a, then $\tau_{a,b}$, the first hitting time of $\mathbb{R}\backslash (a,b)$ is integrable. Then we choose one of these positive-negative pairs according to the projection of the distribution of X onto the pairings, and let T be the hitting time of this pair of values. The probability of hitting b conditional on hitting {a,b} is easy to compute (it’s $\frac{-a}{b-a}$) so we need to have chosen our pairs so that the ‘probability’ of hitting b (ie the density) comes out right. In particular, this method has to start from continuous distributions X, and treat atoms in the distribution of X separately.

The case where the distribution X is symmetric (that is $X\stackrel{d}=-X$) is particularly clear, as then the pairs should be $(-x,x)$.

However, it feels like there is enough randomness in Brownian motion already, and subsequent authors showed that indeed it wasn’t necessary to introduce extra randomness to provide a solution.

One might ask whether it’s possible to generate the distribution on the set of pairs (as above) out of the Brownian motion itself, but independently from all the hitting times. It feels like it might be possible to make the distribution on the pairs measurable with respect to

$\mathcal{F}_{0+} = \bigcap\limits_{t>0} \mathcal{F}_t,$

the sigma-algebra of events determined by limiting behaviour as $t\rightarrow 0$ (which is independent of hitting times). But of course, unfortunately $\mathcal{F}_{0+}$ has a zero-one law, so it’s not possible to embed non-trivial distributions there.

Dubins solution

The exemplar for solutions without extra randomness is due to Dubins, shortly after Skorohod’s original argument. The idea is to express the distribution X as the almost sure limit of a martingale. We first use the hitting time of a pair of points to ‘decide’ whether we will end up positive or negative, and then given this information look at the hitting time (after this first time) of two subsequent points to ‘decide’ which of four regions of the real interval we end up in.

I’m going to use different notation to Obloj, corresponding more closely with how I ended up thinking about this method. We let

$a_+:= \mathbb{E}[X \,|\, X>0], \quad a_- := \mathbb{E}[X\,|\, X<0],$ (*)

and take $T_1 = \tau_{\{a_-,a_+\}}$. We need to check that

$\mathbb{P}\left( B_{T_1}=a_+\right) = \mathbb{P}\left(X>0\right),$

for this to have a chance of working. But we know that

$\mathbb{P}\left( B_{T_1}=a_+\right) = \frac{a_+}{a_+-a_-},$

and we can also attack the other side using (*) and the fact that $\mathbb{E}[X]=0$, using the law of total expectation:

$0=\mathbb{E}[X]=\mathbb{E}[X\,|\, X>0] \mathbb{P}(X>0) + \mathbb{E}[X\,|\,X<0]\mathbb{P}(X<0) = a_+ \mathbb{P}(X>0) + a_- \left(1-\mathbb{P}(X>0) \right),$

$\Rightarrow\quad \mathbb{P}(X>0)=\frac{a_+}{a_+-a_-}.$

Now we define

$a_{++}=\mathbb{E}[X \,|\, X>a_+],\quad a_{+-}=\mathbb{E}[X\,|\, 0

and similarly $a_{-+},a_{--}$. So then, conditional on $B_{T_1}=a_+$, we take

$T_2:= \inf_{t\ge T_1}\left\{ B_t\not\in (a_{+-},a_{++}) \right\},$

and similarly conditional on $B_{T_1}=a_-$. By an identical argument to the one we have just deployed, we have $\mathbb{E}\left[B_{T_2} \,|\,\mathcal{F}_{T_1} \right] = B_{T_1}$ almost surely. So, although the $a_{+-+}$ notation now starts to get very unwieldy, it’s clear we can keep going in this way to get a sequence of stopping times $0=T_0,T_1,T_2,\ldots$ where $B_{T_n}$ determines which of the $2^n$ regions of the real line any limit $\lim_{m\rightarrow\infty} B_{T_m}$ should lie in.

A bit of work is required to check that the almost sure limit $T_n\rightarrow T$ is almost surely finite, but once we have this, it is clear that $B_{T_n}\rightarrow B_T$ almost surely, and $B_T$ has the distribution required.

We want to know how close we can make this coupling between a centered random walk with variance 1, and a standard Brownian motion. Here, ‘close’ means uniformly close in probability. For large times, the typical difference between one of the stopping times $0,T_1,T_2,\ldots$ in the Skorohod embedding and its expectation (recall $\mathbb{E}[T_k]=k$) is $\sqrt{n}$. So, constructing the random walk $S_0,S_1,S_2,\ldots$ from the Brownian motion via Skorohod embedding leads to

$\left |S_k - B_k \right| = \omega(n^{1/4}),$

for most values of $k\le n$. Strassen (1966) shows that the true scale of the maximum

$\max_{k\le n} \left| S_k - B_k \right|$

is slightly larger than this, with some extra powers of $\log n$ and $\log\log n$ as one would expect.

The Komlos-Major-Tusnady coupling is a way to do a lot better than this, in the setting where the distribution of the increments has a finite MGF near 0. Then, there exists a coupling of the random walk and the Brownian motion such that

$\max_{k\le n}\left|S_k- B_k\right| = O(\log n).$

That is, there exists C such that

$\left[\max_{k\le n} \left |S_k-B_k\right| - C\log n\right] \vee 0$

is a tight family of distributions, indeed with uniform exponential tail. To avoid digressing infinitely far from my original plan to discuss the proof of the law of iterated logarithm for general distributions, I’ll stop here. I found it hard to find much coverage of the KMT result apart from the challenging original paper, and many versions expressed in the language of empirical processes, which are similar to random walks in many ways relevant to convergence and this coupling, but not for Skorohod embedding. So, here is a link to some slides from a talk by Chatterjee which I found helpful in getting a sense of the history, and some of the modern approaches to this type of normal approximation problem.

# DGFF 1 – The discrete Gaussian free field from scratch

I’ve moved to Haifa in northern Israel to start a post-doc in the probability group at the Technion, and now that my thesis is finished I want to start blogging again. The past couple of weeks have been occupied with finding an apartment and learning about the Discrete Gaussian Free Field. All questions about the apartment are solved, but fortunately lots remain open about the DGFF, so I thought I’d write some background about this object and methods which have been used to study it.

Background – Random walk bridge

When we think of a random walk, we usually think of the index as time, normally going forwards. So for a random walk bridge, we might assume $Z_0=0$, and then condition on $Z_N=0$, thinking of this as a demand that the process has returned to zero at the future time. In some applications, this is the ideal intuition, but in others, it is more useful to think of the random walk bridge

$(0=Z_0,Z_1,\ldots,Z_{N-1},Z_N=0),$

as a random height function indexed by [0,N], where the probability of a given path decomposes naturally into a product depending on the N increments, up to a normalising constant.

Naturally, we are interested in the asymptotic behaviour of such a random walk bridge when $N\rightarrow\infty$. So long as the step distribution has finite variance, a conditioned version of Donsker’s theorem shows that the rescaled random walk bridge converges in distribution to Brownian bridge. Note that Brownian bridge

$(B^{\mathrm{br}}_t, t\in[0,1])$

can be constructed either by conditioning a standard Brownian motion B to return to zero at time one (modulo some technicalities – this event has zero probability), or by applying an appropriate (random) linear shift

$B^{\mathrm{br}}(t):= B(t) - tB(1).$ (*)

It is not too hard to calculate the distribution of $B^{\mathrm{br}}(t)$ for each $t\in[0,1]$, and with a bit more work, one can calculate the joint distribution of $(B^{\mathrm{br}}(s),B^{\mathrm{br}}(t))$. In particular, the joint distribution is multivariate Gaussian, and so everything depends on the covariance ‘matrix’ (which here is indexed by [0,1]).

So if we return to a random walk bridge what should the step distribution be? Simple symmetric RW is a natural choice, as then lots of the quantities we might want to consider boil down to combinatorial calculations. Cleverness and Stirling’s formula can often get us useful asymptotics. But there are lots of inconveniences, not least the requirement to be careful about parity (N has to be even for a start unless you make the walk lazy, in which case the combinatorics becomes harder), and even if these can be overcome in a given calculation, it would be better not to have this.

The claim is that the random walk with Gaussian increments is by far the easiest to analyse asymptotically. As a further heuristic, think about the statement of the central limit theorem in the case where the underlying distribution is normal: it’s true but obvious. [Indeed, it’s my favourite piece of advice to anyone taking second year probability exams to check that your proposed statement of CLT does actually work for $N(\mu,\sigma^2)$…] More concretely, if a RW has Gaussian increments, then the path $(Z_1,\ldots,Z_N)$ is a multivariate normal, or a Gaussian process with finite index set. In particular, covariances define the distribution. It remains a Gaussian process after conditioning on $Z_N=0$, and the linear tilting argument at (*) remains true here, and can indeed be applied to turn any boundary conditions into any other boundary conditions.

The discrete Gaussian free field

We know how to generalise the domain of a random walk to higher dimensions. But what generalising the index to higher dimension? So now there is definitely no arrow of time, and the notion of a random height function above $\mathbb{Z}^2$ (or a subset of it) is helpful, for which a scaling limit might be a random surface rather than Brownian motion.

Because we can’t well-order $\mathbb{Z}^d$, it’s harder to define any such random object on the entire lattice immediately, so we start with compact connected subsets, with zero boundary conditions, as in the one-dimensional case of random walk bridge. Formally, let D be a finite subset of $\mathbb{Z}^d$, and the boundary $\partial D$ those elements of $D^c$ which are adjacent to an element of D, and let $\bar D:= D\cup \partial D$.

Then, the discrete Gaussian free field on D is a random real vector $h^D=(h^D_x: x\in \bar D)$, with probability density proportional to

$\mathbf{1}\{h^D_x=0, x\in\partial D\}\exp\left ( - \frac{1}{4d} \sum_{x\sim y}(h^D_x - h^D_y)^2 \right),$ (1)

where we write $x\sim y$ if that x,y are adjacent in $\bar D$. We won’t at any stage worry much about the partition function which normalises this pdf. Note also that $\frac{1}{4d}$ is just a convenient choice of constant, which corresponds to one of the canonical choices for the discrete Laplacian. Adjusting this constant is the same as uniformly rescaling the values taken by the field.

The immediate interpretation of (1) is that the values taken by the field at vertices which are close to each other are positively correlated. Furthermore, the form of the density is Gaussian. Concretely, if the values of $h^D$ are fixed everywhere except one vertex $x\in D$, then the conditional distribution of $h^D_x$ is Gaussian. Later, or in subsequent posts, we will heavily develop this idea. Alternatively, we could if we really wanted describe the model in terms of independent Gaussians describing the ‘increment’ along each edge in D (which we should direct), subject to a very large number of conditions, namely that the sum of increments along any directed cycle is zero. This latter description might be more useful if you wanted to define a DGFF on a more sparse graph, but won’t be useful in what follows.

Note that we can rearrange the Laplacian in (1) in terms of the transition kernel p( ) of the simple random walk of D to obtain

$\exp\left( -\frac12 (h^D)^T (\mathbf{P}-\mathbf{1})h^D \right),$

where $P_{x,y}=p(y-x)$ is the transition matrix of SRW on D. In particular, this means that the free field is Gaussian, and we can extract the covariances via

$\mathrm{Cov}(h^D_x,h^D_y) = \left[ (\mathbf{1}-\mathbf{P})^{-1}\right]_{x,y}$

$= \left[\sum_{n\ge 0} \mathbf{P}^n\right]_{x,y} = \sum_{n\ge 0} \mathbb{P}_x\left[X_n=y,\tau_{\partial D}>n\right],$

where, under $\mathbb{P}_x$, $(X_0,X_1,\ldots)$ is simple random walk started from x.

This final quantity records the expected number of visits to y before leaving the domain D, for a random walk started at x, and is called the Green’s function.

In summary, the DGFF on D is the centred Gaussian random vector indexed by $\bar D$ with covariance given by the Green’s function $G_D(x,y)$.

How many of these equivalences carries over to more general D-indexed random fields is discussed in the survey paper by Velenik. But it’s worth emphasising that having the covariance given by the Green’s function as in the definition we’ve just given is a very nice property, as there are lots of pre-existing tools for calculating these. By contrast, it’s hard to think of a natural model for an integer-valued surface of this kind, as an analogue to SRW.

[Though definitely not impossible. The nicest example I’ve heard of is for height functions of large uniform domino tilings within their ‘arctic circle’, which have GFF asymptotics. See this paper by Kenyon.]

A continuous limit?

We motivated the discussion of random walk bridge by the limit object, namely Brownian bridge. Part of the reason why the DGFF is more interesting than Gaussian random walk bridge, is that the limit object, the (continuum) Gaussian free field is hard to define classically in two dimensions.

We might suppose that the DGFF in $V_N$, the square box of width N has some scaling limit as $N\rightarrow\infty$. However, for fixed $x,y\in [0,1]^2$, (and taking integer parts component-wise), well-known asymptotics for SRW in a large square lattice (more on this soon hopefully) assert that

$\mathrm{Cov}(h^{V_N}_{\lfloor Nx \rfloor},h^{V_N}_{\lfloor Ny\rfloor}) \sim \log |x-y|,$ (2)

and so any scaling limit will rescale only the square domain, not the height (since there is no N on the RHS of (2)). However, then the variance of the proposed limit is infinite everywhere.

So the GFF does not exist as a random height function on $[0,1]^2$, with the consequence that a) more care is needed over its abstract definition; b) the DGFF in 2D on a large square is an interesting object, since it does exist in this sense.

What makes it ‘free’?

This seemed like a natural question to ask, but I’ve received various answers. Some sources seem to suggest that having zero boundary condition is free. Other sources refer to the Hamiltonian (that is the term inside the exponential function at (1) ) as free since it depends only on the increments between values. If the Hamiltonian also depends on the heights themselves, for example via the addition of a $\sum_{x} \Psi(h^D_x)$ term, then for suitable choice of function $\Psi$, this is interpreted as a model where the particles have mass. The physical interpretation of these more general Gibbs measures is discussed widely, and I’m not very comfortable with it all at the moment, but aim to come back to it later, when hopefully I will be more comfortable.

# Fair games and the martingale strategy III

Gambler’s Ruin

Continuing directly from the previous post, the nicest example of the optional stopping theorem we developed there is to example a simple random walk constrained between two values, say 0 and N. This represents an idealised gambling situation, where the gambler stops playing either when they reach some pre-agreed profit, or when they go bankrupt. We assume that we start at level k, for k = 1,2,…,N-1.

Naturally, we want to know the probabilities of winning (ie getting to N) and losing (ie going bankrupt). We could set this up by conditioning on the first step. Let $p_k$ be the probability of winning starting from level k. Then we must have

$p_k= \frac12 p_{k+1}+\frac12 p_{k-1},\quad k=1,\ldots,N-1,$ (*)

with the obvious boundary conditions $p_0=0, p_N=1$. In an ideal world, we just know how to solve second order difference equations like (*). Well, actually it isn’t too hard, because we can see from (*) directly that

$p_{k+1}-p_k = p_k-p_{k-1},$

and so $p_k$ is a linear function of k, and so $p_k = k/N$ follows pretty much immediately.

But, we can also use OST profitably. Let T be the time at which we first hit 0 or N. It’s intuitively clear that this should have finite expectation, since the problems you might encounter with just the hitting time of a single level shouldn’t apply. Or you can consider the expected number of steps before you see N ups or downs in a row, which certainly provides an upper bound on T. This random number of steps is sort of geometric (at least, can be upper bounded by a geometric RV) and so has finite expectation. So can apply OST to X at T, and we have

$\mathbb{E}[X_T] = N\cdot \mathbb{P}(X_T=N) + 0 \cdot \mathbb{P}(X_T=0) = \mathbb{E}[X_0]=k,$

from which we also derive $p_k=k/N$.

The reason we talk about gambler’s ruin is by considering the limit $N\rightarrow\infty$ with k fixed. After a moment’s thought, it’s clear we can’t really talk about stopping the process when we hit infinity, since that won’t happen at any finite time. But we can ask what’s the probability that we eventually hit zero. Then, if we imagine a barrier at level N, the probability that we hit 0 at some point is bounded below by the probability that we hit 0 before we hit level N (given that we know we hit either zero or level N with probability one), and this is $\frac{N-k}{N}$, and by choosing N large enough, we can make this as close to 1 as we want. So the only consistent option is that the probability of hitting 0 at some point is one. Hence gambler’s ruin. With probability one, ruin will occur. There’s probably a moral lesson hiding there not especially subtly.

So the deal here seems to be that if you just care about your average, it doesn’t matter how to choose to play a sequence of fair games. But what if you care about something other than your average? In any real setting, we maybe care about slightly more than this. Suppose I offer you a bet on a coin toss: you get £3 if it comes up heads, and I get £1 if it comes up tails. Sounds like a good bet, since on average you gain a pound. But what about if you get £10,003 if it comes up heads and I get £10,001 if it comes up tails? I’m guessing you’re probably not quite so keen now.

But if you were an international bank, you might have fewer reservations about the second option. My intention is not to discuss whether our valuation of money is linear here, but merely to offer motivation for the financial option I’m about to propose. The point is that we are generally risk-averse (well, most of us, most of the time) and so we are scared of possible large losses, even when there is the possibility of large profits to balance it out.

Let’s assume we have our simple random walk, and for definiteness let’s say it starts at £1. Suppose (eg as a very niche birthday present) we have the following opportunity: at any point between now and time t=5, we have the right to buy one unit of the stock for £2.

We want to work out how much this opportunity, which from now on I’m going to call an option, is worth on average. Note that now it does seem that when we choose to cash in the option will have an effect on our return, and so we will have to include this in the analysis.

Note that, once we’ve bought a unit of the stock, we have an asset which is following a simple random walk (ie sequential fair games) and so from this point on its expected value remains unchanged. So in terms of expectation, we might as well sell the stock at the same moment we buy it. So if we cash in the option when the stock is currently worth £X, we will on average have a return of £(X-2). This means that we’ll only ever consider exercising our option if the current value of the stock is greater than £2. This narrows down our strategy slightly.

This sort of option minimises the risk of a large loss, since the worst thing that happens is that you never choose to exercise your option. So if you actually paid for the right to have this option, that cost is the largest amount you can lose. In the trading world, this type of opportunity is called an American option.

The trick here is to work backwards in time, thinking about strategies. If at time t=4, the stock is worth £1, then the best that can happen is that it’s worth £2 at time t=5, and this still gains you no wealth overall. Similarly if it’s worth £0 at time t=3. So we’ve identified a region where, if the stock value enters this region, we might as well rip up our contract, because we definitely aren’t going to gain anything. Remember now that we’ve also said you won’t ever cash in if the stock’s value is at most £2, because you don’t gain anything on average.

Now suppose that the stock has value £3 at time t=4. There’s no danger of it ever getting back below £2 during the lifetime of the option, so from now on your potential return is following the trajectory of a simple random walk, ie a fair game. So on average, it makes no difference whether you cash in now, or wait until t=5, or some combination of the two. The same argument holds if the stock has value £4 at time t=3 or time t=4, and so we can identify a region where you might as well cash in.

What about the final region? If the stock value is greater than £2, but not yet in the definitely-cash-in area, what should you do? Well, if you think about it, the value of the stock is a fair game. But your return should be better than that, because the stock price doesn’t take account of the fact that you wouldn’t buy in (and make a loss overall) if the value drops below £2. So at this stage, your future options are better than playing a fair game, and so it doesn’t make sense (in terms of maximising your *average*) to cash in.

Now we can actually work backwards in time to establish how much any starting value is worth under this optimal strategy. We can fill in the values in the ‘doomed’ area (ie all zeros) and on the ‘cash in now’ area (ie current value minus 2), and construct backwards using the fact that we have a random walk.

The final answer ends up being 7/16 if the stock had value £1 at time 0. Note that the main point here is that working out the qualitative form of the strategy was the non-trivial part. Once we’d done that, everything was fairly straightforward. I claim that this was a reasonably fun adjustment to the original problem, but have minimal idea whether pricing options is in general an interesting thing to do.

Anyway, I hope that provided an interesting overview to some of the topics of interest within the question of how to choose strategies for games based on random processes.

# Fair games and the martingale strategy I

I went back to my school a couple of weeks ago and gave a talk. I felt I’d given various incarnations of a talk on card-shuffling too many times, so it was time for a new topic. The following post (and time allowing, one or two more) is pretty much what I said.

The Martingale Strategy

Suppose we bet repeatedly on the outcome of tossing a fair coin. Since it’s November, my heart is set on buying an ice cream that costs £1, so my aim is to win this amount from our game. My strategy is this:

First, I bet £1. If I win, then that’s great, because I now have made exactly enough profit to buy the ice cream. If I lose, then I play again, and this time I bet £2. Again, if I win, then my total profit is £2-£1 = £1, so I stop playing and buy the ice cream. If I lose, then I play a third time, again doubling my stake. So if I win for the first time on the seventh go, my overall profit will be

£64 – (£1+£2+£4+£8+£16+£32) = £1,

and it’s clear that this can be continued and I will eventually win a round, and at this point my total profit will be £1. So I will always eventually be able to buy my ice cream.

But, there’s nothing special about the value £1, so I could replace the words ‘ice cream’ with ‘private tropical island’, so why am I still here in the UK on a wet Monday when I could be on my beach lounger?

There are some fairly obvious reasons why the strategy I’ve described is not actually a fail-safe way to make a profit. For a start, although with probability one a head will come up eventually, there is a small positive chance that the first 200 rolls will all be tails. At this point, I would have accrued a debt of roughly $2^{200}$ pounds, and this is slightly more than the number of atoms in the universe. All this for an ice cream?

So there are major problems carrying out this strategy in a finite world. And of course, it’s no good if we stop after a very large but finite number of turns, because then there’s always this very small chance that we’ve made a very large loss, which is bad, partly because we can’t have the ice cream, but also because it exactly cancels out the chance of making our £1 profit, and so our overall average profit is exactly zero.

Though I’ve set this up in an intentionally glib fashion, as so often is the case, we might have stumbled across an interesting mathematical idea. That is, if we play a fair game a finite number of times, we have a fair game overall, meaning our overall average profit is zero. But if we are allowed to play a potentially infinite number of times, then it’s not clear how to define our overall ‘average’ profit, since we feel it ought to be zero, as an extension of the finite case, but also might be positive, because it ends up being £1 with probability one.

It’s tempting at this stage to start writing statements like

$1 \times 1 + (-\infty) \times 0=0 ,$

to justify why this might have come about, where we consider the infinitely unlikely event that is infinitely costly. But this is only convincing at the most superficial level, and so it makes more sense to think a bit more carefully about under exactly what circumstances we can extend our observation about the overall fairness of a finite sequence of individual fair games.

A second example

The previous example was based upon a series of coin tosses, and we can use exactly the same source of randomness to produce a simple random walk. This is a process that goes up or down by 1 in each time step, where each option happens with probability ½, independently of the history.

We could avoid the requirement to deal with very large bets by always staking £1, and then cashing in the first time we have a profit of £1. Then, if we start the random walk at zero, it models our profit, and we stop the first time it gets to 1. It’s not obvious whether we hit 1 with probability one. Let’s show this.

In order to hit some positive value k, the random walk must pass through 1, 2, and so on, up to (k-1) and then finally k. So $\mathbb{P}(\text{hit k}) = [\mathbb{P}(\text{hit 1})]^k$. And similarly for negative values. Also, the probability that we return to zero is the same as the probability that we ever hit 1, since after one time-step they are literally the same problem (after symmetry). So, if the probability of hitting 1 is p<1, then the number of visits to zero is geometric (supported on 1,2,3,…) with parameter p, and so

$\mathbb{E}[\text{visits to k}] = \mathbb{E}[\text{visits to zero}] \times \mathbb{P}(\text{hit k})=(1+1/p) \times p^{|k|} = (p+1)p^{|k|-1}.$

Thus, when we sum over all values of k, we are summing a pair of geometric series with exponent <1, and so we get a finite answer. But if the expected number of visits to anywhere (ie the sum across all places) is finite, this is clearly ridiculous, since we are running the process for an infinite time, and at each time-step we must be somewhere! So we must in fact have p=1, and thus another potential counter-example to the claim that a sequence of fair games can sometimes be unfair.

We might have exactly the same set of practical objections, such as this method requiring arbitrarily large liquidity (even though it doesn’t grow exponentially fast so doesn’t seem so bad).

What will actually turn out to be useful is that although the bets are now small, the average time until we hit 1 is actually infinite. Remember that, even though most things we see in real life don’t have this property, it is completely possible for a random variable to take finite values yet have infinite expectation.

Notes on the Martingale Strategy

There’s no reason why the originally proposed strategy had to be based upon fair coin tosses. This strategy might work in a more general setting, where the chance of winning on a given turn is not ½, or is not even constant. So long as at each stage you bet exactly enough that, if you win, you recoup all your losses so far, and one extra pound, this has the same overall effect.

Of course, we need to check that we do eventually win a round, which is not guaranteed if the probability of winning (conditional on not having yet won) decays sufficiently fast. If we let $p_k$ be the probability of winning on turn k, given that we haven’t previously won, then we require that the probability of never winning $\prod_{k\ge 1}(1-p_k)=0$. By taking logs and taking care of the approximations, it can be seen that the divergence or otherwise of $\sum p_k$ determines which way this falls.

In the next post, we’ll talk about how the two problems encountered here, namely allowing large increments, and considering a stopping time with infinite expectation are exactly the two cases where something can go wrong. We’ll also talk about a slightly different setting, where the choice of when to stop playing becomes a bit more dynamic and complicated.

# Ornstein-Uhlenbeck Process

A large part of my summer has been spent proving some technical results pertaining to the convergence of some functionals of a critical Frozen Percolation process. This has been worthwhile, but hasn’t involved a large amount of reading around anything in particular, which has probably contributed to the lack of posts in recent months. Perhaps a mixture of that and general laziness?

Anyway, it turns out that the limit of the discrete processes under consideration is the Ornstein-Uhlenbeck process. The sense in which this limit holds (or at least, for now, is conjectured to hold) is something for another article. However, I thought it would be worth writing a bit about this particular process and why it is interesting.

The O-U process is described by the SDE

$dX_t=-\beta (X_t-\mu)dt+\sigma dW_t,$

where W is a standard Brownian motion. We think of $\mu$ as the ‘mean’. The extent to which this behaves as a mean will be discussed shortly. The process is then mean-reverting, in the sense that the drift is directed against deviations of the process away from this mean. The parameter $\beta$ measures the extent of this mean reversion, while as usual $\sigma$ controls the magnitude of the Brownian noise.

The motivation for considering mean-reverting processes is considerable. One measure of this is how many equations with articles on Wikipedia turn out to be precisely this Ornstein-Uhlenbeck process with different context or notation. In most cases, the motivation arises because Brownian motion is for some reason unsuitable to take as a canonical random process. We will see why the O-U process is somehow the next most canonical choice for a random process.

In physics, it is sometimes unsatisfactory to model the trajectory of a particle with Brownian motion (even though this motivated the name…) as the velocities are undefined (see this post from ages ago), or infinite, depending on your definition of velocity. Using the Ornstein-Uhlenbeck process to model the velocity of a particle is often a satisfactory alternative. It is not unreasonable that there should be a mean velocity, presumably zero. The mean reversion models a frictional force from the underlying medium, while the Brownian noise describes random collisions with similar particles.

In financial applications, the Ornstein-Uhlenbeck model has been applied, apparently under the title of the Vasicek model since the 70s to describe quantities such as interest rates where there is some underlying reason to ban indefinite growth, and require mean reversion. Another setting might be a commodity which, because of external driving factors, has over the relevant time-scale well-defined mean value, around which mean-reverting fluctuations on the observed time-scale can be described. As with other financial models, it is undesirable for a process to take negative values. This can be fixed by taking a positive mean, then setting the volatility to be state dependent, decaying to zero as the state tends to zero, so for small values, the positive drift dominates. I don’t fully understand why patching this aspect is significantly more important than patching any other non-realistic properties of the model, but the resulting SDE is, at least in one particular case where the volatility is $\sqrt{X_t}$, called the Cox-Ingersoll-Ross model.

Anyway, a mathematical reason to pay particular attention to this Ornstein-Uhlenbeck process is the following. It is the unique family of continuous Markov processes to have a stationary Gaussian distribution. It is the mean-reverting property that is key. There is no chance of Brownian motion having any stationary distribution, let alone a Gaussian one. If this isn’t clear, you can convince yourself by thinking of the stationary distribution of SRW on $\mathbb{Z}$. Since the process is space-homogeneous, the only stationary measure is the uniform measure.

I want to focus on one particular property of the O-U process, through which some other aspects will be illuminated. If we take $\sigma=\beta$ and let $\beta\rightarrow\infty$, then the stationary processes converge to white noise.

First though, we should note this is perhaps the easiest SDE to solve explicitly. We consider $X_t e^{\theta t}$, and applying Ito’s lemma rapidly gives

$X_t=\mu + (X_0-\mu)e^{-\beta t}+\sigma\int_0^t e^{-\beta(t-s)}dW_s.$

W is Gaussian so the distribution of $X_t$ conditional on $X_0=x_0$ is also Gaussian, and since W is centred we can read off the expectation. Applying the Ito isometry then gives the variance. In conclusion:

$X_t\stackrel{d}{=}\mathcal{N}(\mu+(x_0-\mu)e^{-\beta t}, \frac{\sigma^2}{\beta}(1-e^{-2\beta t})).$

In particular, note that the variation has no dependence on $x_0$. So as t grows to infinity, this converges to $\mathcal{N}(\mu, \frac{\sigma^2}{\beta})$. This is, unsurprisingly, the stationary distribution of the process.

To address the white noise convergence, we need to consider $\text{Cov}(X_0,X_t)$ in stationarity. Let’s assume WLOG that $\mu=0$ so most of the expectations will vanish. We obtain

$\text{Cov}(X_0,X_t)=\mathbb{E}[X_0X_t]=\mathbb{E}_{x_0}\left[\mathbb{E}[X_t| X_0=x_0]\right]=\mathbb{E}[X_0^2 e^{-\beta t}]= \frac{sigma^2}{2\beta}2^{-\beta t}.$

If we want, the Chapman-Kolmogorov equations work particularly nicely here, and we are able to derive a PDE for the evolution of the density function, though obviously this is very related to the result above. This PDE is known as the Fokker-Planck equation.

So, in particular, when $\sigma=\beta\rightarrow \infty$, this covariance tends to 0. I’m not purporting that this constitutes a proof that the Ornstein-Uhlenbeck processes converge as processes to white noise. It’s not obvious how to define process convergence, not least because there’s flexibility about how to view white noise as a process. One doesn’t really want to define the value of white noise at a particular time, but you can consider the covariance of integrals of white noise over disjoint intervals as a limit, in similar way to convergence of finite dimensional distributions.

The fact that taking $\beta=0$ gives Brownian motion, and this case gives white noise, intermediate versions of the Ornstein-Uhlenbeck process are sometimes referred to as coloured noise.

Finally, the Ornstein-Uhlenbeck process emerges as the scaling limit of mean-reverting discrete Markov chains, analogous to Brownian motion as the scaling limit of simple random walk. One particularly nice example is the Ehrenfest Urn model. We have two urns, and 2N balls. In each time step one of the 2N balls is chosen uniformly at random, and it is moved to the other urn. So a ball is more likely to be removed from an urn with more than N balls. We can view this as a model for molecules in, say a room, with a slightly porous division between them, eg a small hole. More complicated interface models in higher dimensions lead to fascinating PDEs, such as the famous KPZ equation, which are the subject of much ongoing interest in this area.

This result can be an application of the theory of convergence of Markov chains to SDEs pioneered by Stroock and Varadhan, about which more may follow very soon. In any case, it turns out that the fluctuations in the Ehrenfest Urn model are on the scale of $\sqrt{n}$, unsurprisingly, and are given by a centred Ornstein-Uhlenbeck process.

Investigating this has reminded me how much I’ve forgotten, or perhaps how little I ever knew, about the technicalities of stochastic processes are their convergence results, so next up will probably be a summary of all the useful definitions and properties for this sort of analysis.

# Hitting Probabilities for Markov Chains

This continues my previous post on popular questions in second year exams. In the interest of keeping it under 2,500 words I’m starting a new article.

In a previous post I’ve spoken about the two types of Markov chain convergence, in particular, considering when they apply. Normally the ergodic theorem can be used to treat the case where the chain is periodic, so the transition probabilities do not converge to a stationary distribution, but do have limit points – one at zero corresponding to the off-period transitions, and one non-zero. With equal care, the case where the chain is not irreducible can also be treated.

A favourite question for examiners concerns hitting probabilities and expected hitting times of a set A. Note these are unlikely to come up simultaneously. Unless the hitting probability is 1, the expected hitting time is infinite! In both cases, we use the law of total probability to derive a family of equations satisfied by the probabilities/times. The only difference is that for hitting times, we add +1 on the right hand side, as we advance one time-step to use the law of total probability.

The case of hitting probabilities is perhaps more interesting. We have:

$h_i^A = 1,\; i\in A, \quad h_i^A=\sum_{j\in S}p_{ij}h_j^A,\; i\not\in A.$

There are two main cases of interest: where the chain is finite but has multiple closed communicating classes, and where the chain is infinite, so even though it is irreducible, a trajectory might diverge before hitting 0.

For the case of a finite non-irreducible Markov chain, this is fairly manageable, by solving backwards from states where we know the values. Although of course you could ask about the hitting probability of an open state, the most natural question is to consider the probability of ending up in a particular closed class. Then we know that the hitting probability starting from site in the closed class A is 1, and the probability starting from any site in a different closed class is 0. To find the remaining values, we can work backwards one step at a time if the set of possible transitions is sparse enough, or just solve the simultaneous equations for $\{h_i^A: i\text{ open}\}$.

We therefore care mainly about an infinite state-space that might be transient. Typically this might be some sort of birth-and-death chain on the positive integers. In many cases, the hitting probability equations can be reduced to a quadratic recurrence relation which can be solved, normally ending up with the form

$h_i=A+B\lambda^i$,

where $\lambda$ might well be q/p or similar if the chain is symmetric. If the chain is bounded, typically you might know $h_0=1, h_N=0$ or similar, and so you can solve two simultaneous equations to find A and B. For the unbounded case you might often only have one condition, so you have to rely instead on the result that the hitting probabilities are the minimal solution to the family of equations. Note that you will always have $h^i_i=1$, but with no conditions, $h^i_j\equiv 1$ is always a family of solutions.

It is not clear a priori what it means to be a minimal solution. Certainly it is not clear why one solution might be pointwise smaller than another, but in the case given above, it makes sense. Supposing that $\lambda<1$, and A+B=1 say, then as we vary the parameters, the resulting set of ‘probabilities’ does indeed vary monotically pointwise.

Why is this true? Why should the minimum solution give the true hitting probability values? To see this, take the equations, and every time an $h_i^A$ appears on the right-hand side, substitute in using the equations. So we obtain, for $i\not\in A$,

$h_i^A=\sum_{j\in A}p_{ij}+\sum_{j\not\in A} p_{ij}h_j^A,$

and after a further iteration

$h_i^A=\sum_{j_1\in A}p_{ij_1}+\sum_{j_1\not\in A, j_2\in A}p_{ij_1}p_{j_1j_2}+\sum_{j_1,j_2\not\in A}p_{ij_1}p_{j_1j_2}h_{j_2}^A.$

So we see on the RHS the probability of getting from i to A in one step, and in two steps, and if keep iterating, we will get a large sum corresponding to the probability of getting from i to A in 1 or 2 or … or N steps, plus an extra term. Note that the extra term does not have to correspond to the probability of not hitting A by time N. After all, we do not yet know that $(h_{i}^A)$ as defined by the equations gives the hitting probabilities. However, we know that the probability of hitting A within N steps converges to the probability of hitting A at all, since the sequence is increasing and bounded, so if we take a limit of both sides, we get $h_i^A$ on the left, and something at least as large as the hitting probability starting from i on the right, because of the extra positive term. The result therefore follows.

It is worth looking out for related problems that look like a hitting probability calculation. There was a nice example on one of the past papers. Consider a simple symmetric random walk on the integers modulo n, arranged clockwise in a circle. Given that you start at state 0, what is the probability that your first return to state 0 involves a clockwise journey round the circle?

Because the system is finite and irreducible, it is not particularly interesting to consider the actual hitting probabilities. Also, note that if it is convenient to do so, we can immediately reduce the problem when n is even. In two steps, the chain moves from j to j+2 and j-2 with probability ¼ each, and stays at j with probability ½. So the two step chain is exactly equivalent to the lazy version of the same dynamics on n/2.

Anyway, even though the structure is different, our approach should be the same as for the hitting probability question, which is to look one step into the future. For example, to stand a chance of working, our first two moves must both be clockwise. Thereafter, we are allowed to move anticlockwise. There is nothing special about starting at 0 in defining the original probability. We could equally well ask for the probability that starting from j, the first time we hit 0 we have moved clockwise round the circle.

The only thing that is now not obvious is how to define moving clockwise round the circle, since it is not the case that all the moves have to be clockwise to have experienced a generally clockwise journey round the circle, but we definitely don’t want to get into anything complicated like winding numbers! In fact, the easiest way to make the definition is that given the hitting time of 0 is T, we demand that the chain was at state n at time T-1.

For convenience (ie to make the equations consistent) we take $h_0=0, h_n=1$ in an obvious abuse of notation, and then

$h_j=\frac12h_{j-1}+\frac12 h_{j+1},$

from which we get

$h_j=a+bj \Rightarrow h_j=\frac{j}{n}.$

Of course, once we have this in mind, we realise that we could have cut the circle at 0 (also known as n) and unfolded it to reduce the problem precisely to symmetric gambler’s ruin. In particular, the answer to the original problem is 1/2n, which is perhaps just a little surprising – maybe by thinking about the BM approximation to simple random walk, and that BM started from zero almost certainly crosses zero infinitely many times near we might have expected this probability to decay faster. But once it is unfolded into gambler’s ruin, we have the optimal stopping martingale motivation to reassure us that this indeed looks correct.

# Sticky Brownian Motion

This follows on pretty much directly from the previous post about reflected Brownian motion. Recall that this is a process defined on the non-negative reals which looks like Brownian motion away from 0. We consider whether RBM is the only such process, and how any alternative might be constructed as a limit of discrete-time Markov processes.

One of the alternatives is called Sticky Brownian motion. This process spends more time at 0 than reflected Brownian motion. In fact it spends some positive proportion of time at 0. My main aim here is to explain why some intuitive ideas I had about how this might arise are wrong.

The first thought was to ensure that each visit to 0 last some positive measure of time. This could be achieved by staying at 0 for an Exp(1) duration, every time the process visited it. It doesn’t seem unreasonable that this might appear as the limit of a standard SRW adjusted so that on each visit to 0 the walker waits for a time given by independent geometric distributions. These distributions are memoryless, so that is fine, but by Blumenthal’s 0-1 Law, starting from 0 a Brownian motion hits zero infinitely many times before any small time t. So in fact the process described above would be identically zero as before it gets anywhere it would have to spend some amount of time at 0 given by an infinite sum of Exp(1) RVs.

We will return later to the question of why the proposed discrete-time model will still converge to reflected BM rather than anything more exotic. First though, we should discount the possibility of any intermediate level of stickiness, where the set of times spent at 0 still has measure zero, but the local time at 0 grows faster than for standard reflected BM. We can define the local time at 0 through a limit

$L_t=\lim_{\epsilon\downarrow 0}\frac{1}{2\epsilon}\text{Leb}(\{0\le s \le t: |B_t|<\epsilon\})$

of the measure of time spent very near 0, rescaled appropriately. So if the measure of the times when the process is at 0 is zero, then the local time is determined by behaviour near zero rather than by behaviour at zero. More precisely, on the interval $[-\epsilon,\epsilon]$, the process behaves like Brownian motion, except on a set of measure zero, so the local time process should look the same as that of BM itself. Note I don’t claim this as a formal proof, but I hope it is a helpful heuristic for why you can’t alter the local time process without altering the whole process.

At this stage, it seems sensible to define Sticky Brownian motion. For motivation, note that we are looking for a process which spends a positive measure of time at 0. So let’s track this as a process, say $C_t$. Then the set of times when C is increasing is sparse, as it coincides with the process being 0, but we know we cannot wait around at 0 for some interval of time without losing the Markov property. So C shares properties with the local time of a reflected BM. The only difference is that the measure of times when C is increasing is positive here, but zero for the local time.

So it makes sense to construct the extra time spent at zero from the local time of a standard reflected BM. The heuristic is that we slow down the process whenever it is at 0, so that local time becomes real time. We can also control the factor by which this slowing-down happens, so define

$\sigma(s)=\rho L(s)+s,$

where L is the local time process of an underlying reflected BM, and $\rho>0$ is a constant. So $\sigma$ is a map giving a random time-change. Unsurprisingly, we now define Sticky BM as the reflected BM with respect to this time-change. To do this formally, it is easiest to define a family of stopping times $\{\tau_t\}$, such that $\sigma(\tau_t)=t, \tau_{\sigma(s)}=s$, then if X is the reflected BM, define $Y_t=X_{\tau_t}$ for the sticky BM.

It is worth thinking about what the generator of this process should be. In particular, why should it be different to reflected BM? The key observation is that the drift of the underlying reflected BM is essentially infinite at 0. By slowing down the process at 0, this drift becomes finite. So the martingale associated with sticky BM is precisely a time-changed version of the martingale associated with the underlying reflected BM, but this time-change is precisely what is required to give a generator. We get:

$(\mathcal{L}f)(x)=\begin{cases}\frac12f''(x)&\quad x>0\\ \rho^{-1}f'(0) &\quad x=0.\end{cases}$

Now that we have the generator, it starts to become apparent how sticky BM might appear as a limit of discrete-time walks. The process must look like mean-zero, unit-variance RW everywhere except near 0, where the limiting drift should be $\rho^{-1}$. Note that when considering the limiting drift near zero, we are taking a joint limit in x and h. The order of this matters. As explained at the end of the previous article, we only need to worry about the limiting drift along sequences of $x,h$ such that $a_h(x)\rightarrow 0$. If no such sequences exist, or the limiting drift along any of these is infinite, then we actually have a reflected boundary condition.

This highlights one confusing matter about convergence of reflected processes. The boundary of the discrete-time process should converge to the boundary of the reflected process, but we also have to consider where reflective behaviour happens. Can we get sticky BM with reflection only at the boundary in the discrete-time processes? The answer turns out to be no. At the start of this article, I proposed a model of SRW with geometric waiting times whenever the origin was visiting. What is the limit of this?

The trick is to consider how long the discrete process spends near 0, after rescaling. It will spend a multiple 1/p more time at 0 itself, where p is the parameter of the geometric distribution, but no more time than expected at any point $x\in(0,\epsilon)$. But time spent in $(0,\epsilon)$ dominates time spent at 0 before this adjustment, so must also dominate it after the adjustment, so in the limit, the proportion of time spent at 0 is unchanged, and so in particular it cannot be positive.

Because of all of this, in practice it seems that most random walks we might be interested in converge (if they converge to a process at all) to a reflected SDE/diffusion etc, rather than one with sticky boundary conditions. I feel I’ve been talking a lot about Markov processes converging, so perhaps next, or at least soon, I’ll write some more technical things about exactly what conditions and methods are required to prove this.

REFERENCES

S. Varadhan – Chapter 16 from a Lecture Course at NYU can be found here.

# Reflected Brownian Motion

A standard Brownian motion is space-homogeneous, meaning that the behaviour of $B_{T+t}-B_T$ does not depend on the value of $B_T$. By Donsker’s Theorem, such a Brownian motion is also the limit in a process space of any homogeneous random walk with zero-drift and constant variance, after suitable rescaling.

In many applications, however, we are interested in real-valued continuous-time Markov processes that are defined not on the whole of the real line, but on the half-line $\mathbb{R}_{\ge 0}$. So as BM is the fundamental real-valued continuous-time Markov process, we should ask how we might adjust it so that it stays non-negative. In particular, we want to clarify uniqueness, or at least be sure we have found all the sensible ways to make this adjustment, and also to consider how Donsker’s Theorem might work in this setting.

We should consider what properties we want this non-negative BM to have. Obviously, it should be non-negative, but it is also reasonable to demand that it looks exactly like BM everywhere except near 0. But since BM has a scale-invariance property, it is essentially meaningful to say ‘near 0’, so we instead demand that it looks exactly like BM everywhere except at 0. Apart from this, the only properties we want are that it is Markov and has continuous sample paths.

A starting point is so-called reflected Brownian motion, defined by $X_t:=|B_t|$. This is very natural and very convenient for analysis, but there are some problems. Firstly, this has the property that it looks like Brownian motion everywhere except 0 only because BM is space-homogeneous but also symmetric, in the sense that $B_t\stackrel{d}{=}-B_t$. This will be untrue for essentially any other process, so as a general method for how to keep stochastic processes positive, this will be useless. My second objection is a bit more subtle. If we consider this as an SDE, we get

$dX_t=\text{sign}(B_t)dB_t.$

This is a perfectly reasonable SDE but it is undesirable, because we have a function of B as coefficient on the RHS. Ideally, increments of X would be a function of X, and the increments of B, rather than the values of B. That is, we would expect $X_{t+\delta t}-X_t$ to depend on $X_t$ and on $(B_{t+s}-B_t, 0\le s\le \delta t)$, but not on $B_t$ itself, as that means we have to keep track of extra information while constructing X.

So we need an alternative method. One idea might be to add some non-negative process to the BM so that the sum stays non-negative. If this process is deterministic and finite, there there is some positive probability that the sum will eventually be negative, so this won’t do. We are looking therefore so a process which depends on the BM. Obviously we could take $\max(-B_t,0)$, but this sum would then spend macroscopic intervals of time at 0, and these intervals would have the Raleigh distribution (for Brownian excursions) rather than the exponential distribution, hence the process given by the sum would not be memoryless and Markov.

The natural alternative is to look for an increasing process $A_t$, and then it makes sense to talk about the minimal increasing process that has the desired property. A moment’s thought suggests that $A_t=-min_{s\le t}B_t$ satisfies this. So we have the decomposition

$B_t=-A_t+S_t,$

where $S_t$ is the height of B above its running minimum. So S is an ideal alternative definition of reflecting BM. In particular, when B is away from its minimum, $dB_t=dS_t$, so this has the property that it evolves exactly as the driving Brownian motion.

What we have done is to decompose a general continuous process into the sum of a decreasing continuous process and a non-negative process. This is known as the Skorohod problem, and was the subject of much interest, even in the deterministic case. Note that process A has the property that it is locally constant almost everywhere, and is continuous, yet non-constant. Unsurprisingly, since A only changes when the underlying BM is 0, A is continuous with respect to the local time process at 0. In fact, A is the local time process of the underlying Brownian motion, by comparison with the construction by direct reflection.

One alternative approach is to look instead at the generator of the process. Recall that the generator of a process is an operator on some space of functions, with $\mathcal{L}f$ giving the infinitissimal drift of $f(X_t)$. In the case of Brownian motion, the generator $(\mathcal{L}f)(x)=\frac12 f''(x)$ for bounded smooth functions f. This is equivalent to saying that

$f(X_t)-f(X_0)-\int_0^t \frac12 f''(X_s)ds$ (*)

is a martingale. This must hold also for reflected Brownian motion, whenever x is greater than 0. Alternatively, if the function f is zero in a small neighbourhood of 0, it should have the same generator with respect to reflected BM. Indeed, for a general smooth bounded function f, we can still consider the expression (*) with respect to reflected BM. We know this expression behaves as a martingale except when X is zero. If f'(0)>0, and T is some hitting time of 0, then $f(X_{T+\delta T})-f(X_T)\ge 0$, hence the expression (*) is a submartingale. So if we restrict attention to functions with f'(0)=0, the generator remains the same. Indeed, by patching together all such intervals, it can be argued that even if f'(0) is not zero,

$f(X_t)-f(X_0)-\int_0^t \frac12 f''(X_s)ds - f'(0)A_t$

is a martingale, where A is the local time process at zero.

I was aware when I started reading about this that there was another family of processes called ‘Sticky Brownian Motion’ that shared properties with Reflected BM, in that it behaves like standard BM away from zero, but is also constrained to the non-negative reals. I think this will get too long if I also talk about that here, so that can be postponed, and for now we consider reflected BM as a limit of reflected (or other) random walks, bearing in mind that there is at least one other candidate to be the limit.

Unsurprisingly, if we have a family of random walks constrained to the non-negative reals, that are zero-drift unit-variance away from 0, then if they converge as processes, the limit is Brownian away from zero, and non-negative. Note that “away from 0” means after rescaling. So the key aspect is behaviour near zero.

What is the drift of reflected BM at 0? We might suspect it is infinite because of the form of the generator, but we can calculate it directly. Given $X_0=0$, we have:

$\frac{\mathbb{E}X_t}{t}=\frac{\mathbb{E}|B_t|}{t}=\frac{\sqrt{t}\mathbb{E}|B_1|}{t},$

so letting $t\rightarrow 0$, we see indeed that the drift is infinite at 0.

For convergence of discrete processes, we really need the generators to converge. Typically we index the discrete-time processes by the time unit h, which tends to 0, and $b_h(x),a_h(x)$ are the rescaled drift and square-drift from x. We assume that we don’t see macroscopic jumps in the limit. For the case of simple random walk reflected at 0, it doesn’t matter exactly how we construct the joint limit in h and x, as the drift is uniform on x>0, but in general this does matter. I don’t want to discuss sticky BM right now, so it’s probably easiest to be vague and say that the discrete Markov processes converge to reflected BM so long they don’t spend more time than expected near 0 in the limit, as the title ‘sticky’ might suggest.

The two ways in which this can happen is if the volatility term $a_h(x)$ is too small, in which case the process looks almost deterministic near 0, or if the drift doesn’t increase fast enough. And indeed, this leads to two conditions. The first is straightforward, if $a_h(x)$ is bounded below, in the sense that $\liminf_{h,x\rightarrow 0} a_h(x)\ge C>0$, then we have convergence to reflected BM. Alternatively, the only danger can arise down those subsequences where $a_h(x)\rightarrow 0$, so if we have that $b_h(x)\rightarrow +\infty$ whenever $h,x,a_h(x)\rightarrow 0$, then this convergence also holds.

Next time I’ll discuss what sticky BM means, what it doesn’t mean, why it isn’t easy to double the local time, and how to obtain sticky BM as a limit of discrete random walks in a similar way to the above.

REFERENCES

S. Varadhan – Chapter 16 from a Lecture Course at NYU can be found here.

# Lamperti Walks

The theory of simple random walks on the integer lattice is a classical topic in probability theory. Polya proved in the 1920s that such a SRW on $\mathbb{Z}^d$ is recurrent only for d=1 or 2. The argument is essentially combinatorial. We count the number of possible paths from 0 back to itself and show that this grows fast enough that even with the probabilistic penalty of having a particular long path we will still repeatedly see this event happening. In larger dimensions there is essentially ‘more space’ at large distances, at least comparatively, so a typical walk is more likely to escape into this space.

As Kakutani (of the product martingale theorem) said, and was subsequently quoted as the dedication on every undergraduate pdf about random walks: “A drunk man will find his way home, whereas a drunk bird may get lost forever.”

But transience in some sense a long-distance property. We can fiddle with the transition rates near zero and, so long as we don’t make anything deterministic this shouldn’t affect transience properties. Obviously if we have a (space-)homogeneous nearest-neighbour random walk on the integers with non-zero drift the process will be transient: it drifts towards positive infinity if the drift is positive. But can we have a random walk with non-zero drift, but where the drift tends to zero at large distances fast enough, and the process is still recurrent? What is the correct scaling for the decay of the drift to see interesting effects?

The answers to these questions is seen in the so-called Lamperti random walks, which were a recurring theme of the meeting on Aspects of Random Walks held in Durham this week. Thanks to the organisers for putting on such an excellent meeting. I hadn’t known much about this topic before, so thought it might be worth writing a short note.

As explained above, we consider time-homogeneous random walks. It will turn out that the exact distributions of the increments is not hugely important. Most of the properties we might care about will be determined only by the first two moments, which we define as:

$\mu_1(x)=\mathbb{E}[X_{t+1}-X_t | X_t=x],$

$\mu_2=\mathbb{E}[(X_{t+1}-X_t)^2 | X_t=x].$

Note that because the drift will be asymptotically zero, the second term is asymptotically equal to the variance of the increment. It will also turn out that the correct scaling for $\mu_1$ to see a phase transition is $\mu_1(x)\sim \frac{c}{x}$.

We begin by seeing how this works in the simplest possible example, from Harris (1952). Let’s restrict attention to a random walk on the non-negative integers, and impose the further condition that increments are +1 or -1. In the notation of a birth-and-death process from a first course on Markov chains, we can set:

$p_j:=\mathbb{P}(X_{t+1}=j+1| X_t=j), \quad q_j=1-p_j.$

We will set $p_j=\frac12 + \frac{c}{2j}$. Then a condition for transience is that

$1+\frac{q_1}{p_1}+\frac{q_1q_2}{p_1p_2}+\ldots <\infty.$

In our special case:

$\frac{q_1\ldots q_r}{p_1\ldots p_r}\approx\frac{(r-2c)(r-1-2c)(r-2-2c)\ldots}{r!}\approx \frac{1}{r^{2c}}.$

So we can deduce that this sum converges if c>1/2, giving transience. A similar, but slightly more complicated calculation specifies the two regimes of recurrence. If -1/2<=c<=1/2 then the chain is null-recurrent, meaning that the expected time to return to any given state is infinite. If c<-1/2, then it is positive recurrent.

In general, we assume $\mu_1(x)\sim \frac{c}{x}$ and $\mu_2(x)\approx s^2$. In the case above, obviously $s^2=1$. The general result is that under mild assumptions on the increment distributions, for instance a $(2+\epsilon)$-moment, if we define $r=-\frac{2c}{s^2}$, then the RW is transient if r<-1, positive-recurrent if r>1, and null-recurrent otherwise. This is the main result of Lamperti.

To explain why we have parameterised exactly like this, it makes sense to talk about the more general proof methods, as obviously the direct Markov chain calculation won’t work in general. The motivating idea is that we can deal well with the situation where the drift is zero, so let’s transform the random walk so that the drift becomes zero. A function of a Markov chain that is more stable (in some sense) that the original MC, for analysis at least, is sometimes called a Lyapunov function. Here, the sensible thing is to consider $Y_t=X_t^\gamma$, for some exponent $\gamma>0$.

So long as our distributions are fairly well-behaved (eg a finite $2+\epsilon$-moment), we can calculate the drift of Y as

$\mathbb{E}[Y_{t+1}-Y_t| X_t=x]=\frac{\gamma}{2}x^{\gamma-2}(2c+(1-\gamma)s^2) +o(x^{\gamma-2}).$

In particular, taking $\gamma=1+r$ results in a random walk that is ‘almost’ a martingale. Note that the original RW was almost a martingale, in the sense that the drift is asymptotically zero, but now it is zero to second order as well.

To draw any rigorous conclusions, we need to be careful about exactly how precise this approximation is, but we won’t worry about that now. In particular, we need to know whether we can take this approximation over the optional stopping theorem, as this allows us to say:

$\mathbb{P}(X\text{ hits }x\text{ before 0})=\mathbb{P}(Y\text{ hits }x^\gamma\text{ before 0})\sim x^{-\gamma}.$

This is particularly useful for working out the expected excursion time away from 0, which precisely leads to the condition for null-recurrence.

In his talk, Ostap Hryniv showed that this Lyapunov function analysis can be taken much further, to derive much more precise results about excursions, maxima and ergodicity. Results of Menshikov and Popov from the 90s further specify the asymptotics for the invariant distribution, if it exists, in terms of r.

One cautionary remark I should make is that earlier I implied that once we know the drift of such a random walk is zero, we have recurrence. This is true on $\mathbb{Z}$ with very mild restrictions, but is not necessarily true in higher dimensions. For example, consider the random walk on $\mathbb{R}^2$, where conditional on $X_t$, the increment is $X_{t+1}-X_t$ is of length 1 and perpendicular to the vector $X_t$. The two possible directions are equally likely. The drift is therefore 0 everything, and the second moment is also well-behaved, but note that $||X_t||^2=t^2$, just by considering Pythagoras. So in higher dimensions, we have to be a bit more careful, and put restrictions on the covariance structure of the increment distributions.

As a final comment, note that from Lamperti’s result, we can re-derive Polya’s result about SRW in higher dimensions. If we have $X_t$ an SRW on $\mathbb{Z}^d$, then consider $Y_t=||X_t||$. By considering a couple of examples in two-dimensions, it is clear that this is not Markov. But the methods we considered above for the Lamperti walks were really martingale methods rather than Markov chain methods. And indeed this process Y has asymptotically zero drift with the right scaling. Here,

$c=\frac{1}{2}(1-\frac{1}{d}),\quad s^2=\frac{1}{d},$

and so r=d-1, leading to exactly the result we know to be true, that the SRW is transient precisely in three dimensions and higher.

REFERENCES

Harris – First Passage and Recurrence Distributions (1952)

The slides from Ostap Hryniv’s talk, on which this was based, can be found here.