# Ornstein-Uhlenbeck Process

A large part of my summer has been spent proving some technical results pertaining to the convergence of some functionals of a critical Frozen Percolation process. This has been worthwhile, but hasn’t involved a large amount of reading around anything in particular, which has probably contributed to the lack of posts in recent months. Perhaps a mixture of that and general laziness?

Anyway, it turns out that the limit of the discrete processes under consideration is the Ornstein-Uhlenbeck process. The sense in which this limit holds (or at least, for now, is conjectured to hold) is something for another article. However, I thought it would be worth writing a bit about this particular process and why it is interesting.

The O-U process is described by the SDE

$dX_t=-\beta (X_t-\mu)dt+\sigma dW_t,$

where W is a standard Brownian motion. We think of $\mu$ as the ‘mean’. The extent to which this behaves as a mean will be discussed shortly. The process is then mean-reverting, in the sense that the drift is directed against deviations of the process away from this mean. The parameter $\beta$ measures the extent of this mean reversion, while as usual $\sigma$ controls the magnitude of the Brownian noise.

The motivation for considering mean-reverting processes is considerable. One measure of this is how many equations with articles on Wikipedia turn out to be precisely this Ornstein-Uhlenbeck process with different context or notation. In most cases, the motivation arises because Brownian motion is for some reason unsuitable to take as a canonical random process. We will see why the O-U process is somehow the next most canonical choice for a random process.

In physics, it is sometimes unsatisfactory to model the trajectory of a particle with Brownian motion (even though this motivated the name…) as the velocities are undefined (see this post from ages ago), or infinite, depending on your definition of velocity. Using the Ornstein-Uhlenbeck process to model the velocity of a particle is often a satisfactory alternative. It is not unreasonable that there should be a mean velocity, presumably zero. The mean reversion models a frictional force from the underlying medium, while the Brownian noise describes random collisions with similar particles.

In financial applications, the Ornstein-Uhlenbeck model has been applied, apparently under the title of the Vasicek model since the 70s to describe quantities such as interest rates where there is some underlying reason to ban indefinite growth, and require mean reversion. Another setting might be a commodity which, because of external driving factors, has over the relevant time-scale well-defined mean value, around which mean-reverting fluctuations on the observed time-scale can be described. As with other financial models, it is undesirable for a process to take negative values. This can be fixed by taking a positive mean, then setting the volatility to be state dependent, decaying to zero as the state tends to zero, so for small values, the positive drift dominates. I don’t fully understand why patching this aspect is significantly more important than patching any other non-realistic properties of the model, but the resulting SDE is, at least in one particular case where the volatility is $\sqrt{X_t}$, called the Cox-Ingersoll-Ross model.

Anyway, a mathematical reason to pay particular attention to this Ornstein-Uhlenbeck process is the following. It is the unique family of continuous Markov processes to have a stationary Gaussian distribution. It is the mean-reverting property that is key. There is no chance of Brownian motion having any stationary distribution, let alone a Gaussian one. If this isn’t clear, you can convince yourself by thinking of the stationary distribution of SRW on $\mathbb{Z}$. Since the process is space-homogeneous, the only stationary measure is the uniform measure.

I want to focus on one particular property of the O-U process, through which some other aspects will be illuminated. If we take $\sigma=\beta$ and let $\beta\rightarrow\infty$, then the stationary processes converge to white noise.

First though, we should note this is perhaps the easiest SDE to solve explicitly. We consider $X_t e^{\theta t}$, and applying Ito’s lemma rapidly gives

$X_t=\mu + (X_0-\mu)e^{-\beta t}+\sigma\int_0^t e^{-\beta(t-s)}dW_s.$

W is Gaussian so the distribution of $X_t$ conditional on $X_0=x_0$ is also Gaussian, and since W is centred we can read off the expectation. Applying the Ito isometry then gives the variance. In conclusion:

$X_t\stackrel{d}{=}\mathcal{N}(\mu+(x_0-\mu)e^{-\beta t}, \frac{\sigma^2}{\beta}(1-e^{-2\beta t})).$

In particular, note that the variation has no dependence on $x_0$. So as t grows to infinity, this converges to $\mathcal{N}(\mu, \frac{\sigma^2}{\beta})$. This is, unsurprisingly, the stationary distribution of the process.

To address the white noise convergence, we need to consider $\text{Cov}(X_0,X_t)$ in stationarity. Let’s assume WLOG that $\mu=0$ so most of the expectations will vanish. We obtain

$\text{Cov}(X_0,X_t)=\mathbb{E}[X_0X_t]=\mathbb{E}_{x_0}\left[\mathbb{E}[X_t| X_0=x_0]\right]=\mathbb{E}[X_0^2 e^{-\beta t}]= \frac{sigma^2}{2\beta}2^{-\beta t}.$

If we want, the Chapman-Kolmogorov equations work particularly nicely here, and we are able to derive a PDE for the evolution of the density function, though obviously this is very related to the result above. This PDE is known as the Fokker-Planck equation.

So, in particular, when $\sigma=\beta\rightarrow \infty$, this covariance tends to 0. I’m not purporting that this constitutes a proof that the Ornstein-Uhlenbeck processes converge as processes to white noise. It’s not obvious how to define process convergence, not least because there’s flexibility about how to view white noise as a process. One doesn’t really want to define the value of white noise at a particular time, but you can consider the covariance of integrals of white noise over disjoint intervals as a limit, in similar way to convergence of finite dimensional distributions.

The fact that taking $\beta=0$ gives Brownian motion, and this case gives white noise, intermediate versions of the Ornstein-Uhlenbeck process are sometimes referred to as coloured noise.

Finally, the Ornstein-Uhlenbeck process emerges as the scaling limit of mean-reverting discrete Markov chains, analogous to Brownian motion as the scaling limit of simple random walk. One particularly nice example is the Ehrenfest Urn model. We have two urns, and 2N balls. In each time step one of the 2N balls is chosen uniformly at random, and it is moved to the other urn. So a ball is more likely to be removed from an urn with more than N balls. We can view this as a model for molecules in, say a room, with a slightly porous division between them, eg a small hole. More complicated interface models in higher dimensions lead to fascinating PDEs, such as the famous KPZ equation, which are the subject of much ongoing interest in this area.

This result can be an application of the theory of convergence of Markov chains to SDEs pioneered by Stroock and Varadhan, about which more may follow very soon. In any case, it turns out that the fluctuations in the Ehrenfest Urn model are on the scale of $\sqrt{n}$, unsurprisingly, and are given by a centred Ornstein-Uhlenbeck process.

Investigating this has reminded me how much I’ve forgotten, or perhaps how little I ever knew, about the technicalities of stochastic processes are their convergence results, so next up will probably be a summary of all the useful definitions and properties for this sort of analysis.

# Duality for Interacting Particle Systems

Yesterday I introduced the notion of duality for two stochastic processes. My two goals for this post are to elaborate on the idea of why duality is useful, which we touched on in passing in the previous part, and to discuss duality of interacting particle systems. In the latter case, there are often nice ways to consider the forward and backward processes together that make the relation somewhat more natural.

The starting point is to assume a finite state space. This will be reasonable when we start to consider interacting particle systems, eg on $\{0,1\}^{[n]}$. As before, call the spaces R and S, and a duality function H(x,y). Since the state-spaces are finite, it is entirely natural to think of this as a matrix, and hence as an operator. Of course, a function defined on a finite state-space can be thought of as a vector, so it is clear what this operator will actually operate on. (I’ve chosen H rather than h for the duality function so it is more clear that it is acting as an operator here.)

We have some choice about which way round to define it, but for now let’s say that given some function f(.) on S

$Hf(x):=\sum_{y\in S} H(x,y)f(y).$

Note that this is a) exactly the definition of matrix (left-)multiplication; b) We should think of Hf as a function on R – perhaps (Hf)(x) might be more clear? and c) the operator H acts $\mathbb{R}^S\rightarrow \mathbb{R}^R$. If we want the corresponding operator $\mathbb{R}^R\rightarrow\mathbb{R}^S$, we simply multiply by H on the right instead.

But note also that the generator of a finite state-space Markov process is also a matrix, indeed a Q-matrix. So if we take our definition of the duality function as

$\mathcal{G}_X h(x,y)=\mathcal{G}_Y h(x,y),$

which, importantly, holds for all x,y, we can convert this into an algebraic form as

$\mathcal{G}_X H = H \mathcal{G}_Y^\dagger.$

In the same way that n-step transition probabilities for a discrete-time Markov chain are given by the product of the one-step transition matrix, general time transition probabilities for a continuous-time Markov chain are given by exponents of the Q-matrix. In particular, if X and Y have transition kernels P and Q respectively, then $P_t=e^{tG_X}$, and after doing some manipulation, we can show that

$P_t H=H Q_t^\dagger,$

also. This is really useful as in general we would hope that H might be invertible, from which we derive

$P_t=HQ_t^\dagger H^{-1}.$

So this is a powerful statement about the relationship between the evolutions of the two processes. In particular, it shows a correspondence (given by H) between left eigenvectors of P, and right eigenvectors of Q, and vice versa naturally.

The reason why this is useful rather than merely cute, is that when we re-interpret everything in terms of the original stochastic processes, we get a map between stationary distributions of X, and harmonic functions of Y. Stationary distributions are often hard to describe in any terms other than the left-1-eigenvector, or through some convergence property that is typically hard to work with. Harmonic functions, on the other hard, can be much more tractable. An example of a harmonic function is the survival probability started from a given state. This is useful for specifying the stationary distribution, but perhaps even more so for describing properties of the set of stationary distributions. In particular, uniqueness and existence are carried across this equivalence. So, for example, if the dual does not survive almost surely, then this says the only stationary measure is zero, and so the process is transient or similar.

Jan Swart’s course in Luminy last October dealt with duality, with a focus mainly on interacting particle systems. There are a couple of themes I want to talk about, without going into too much detail.

A typical interacting particle system will take place on a locally finite graph. At each vertex, there is either a particle, or there isn’t. Particles move between adjacent vertices, and sometimes interact with particles at adjacent vertices. These interactions might involve branching or coalescence. We will discuss shortly the set of possible forms such interaction might take. The state space is $\{0,1\}^{V(G)}$, with G the underlying graph. Then given a state, there is some set of actions which might happen next, and we consider the possibility that they happen with exponential rates.

At this stage, it seems like the initial configuration is important, as this affects what set of moves can happen immediately, and also thereafter. It is not clear how quickly this dependence fades. One useful idea is not to restrict ourselves to interactions involving the particles currently present in the system, but instead to consider a Poisson process of all possible interactions. Only the moves actually permitted by the current state will happen, but having this extra information allows for coupling between initial configurations.

It’s probably easier to consider a concrete example. The picture below shows the set-up for a branching random walk up an integer lattice. Each particle moves to one of the two state directly above its current state, or it branches and sends particles to both of them.In the diagram, we have glued arrows onto every state at every time, which tells us what to do if there is a particle there at each time. As a coupling, we can now think of the process as a deterministic walk through a random environment. The environment is given by some probability space, which in continuous time might have the appearance of a Poisson process on the set of ‘moves’, and the initial condition of the walk is up to us.

We can generalise this to a broader class of interacting particle systems. If we want all interactions to be between pairs of adjacent states, there are six possible things which could happen:

• Annihilation: two adjacent particles destroy each other. ( 11 -> 00 )
• Branching: one particle becomes two particles. ( 01 or 10 -> 11 )
• Coalescence: two particles merge. ( 11 -> 01 or 10 )
• Death: A particle is removed. ( 01 or 10 -> 00 )
• Exclusion: a particle moves. ( 01 -> 10 )
• Birth: a particle is created. ( 00 -> 01 or 10 )

For now we exclude the possibility of birth. Note that the way we have set this up involving two-site interactions excludes the possibility of a particle trying to move to an already-occupied site.

Let us say that in process X the rates at which each of these events happen are a, b, c, d and e, taking advantage of the helpful choice of naming. There is some flexibility about whether the rates are the same between every pair of vertices of note. For this post we assume that they are. Then it is a result of Lloyd and Sudbury that given some real $q\neq 1$, the process X’ with corresponding rates given by:

$a'=a+2q\gamma, b'=b+\gamma, c'=c-(1+q)\gamma, d'=d+\gamma, e'=e-\gamma,$

for $\gamma:= \frac{a+c-d+qb}{1-q},$

is dual to X, with duality function given by $h(Y,Z)=q^{|Y\cap Z|}$, for Y and Z possible states.

I want to make two comments:

1) This illustrates one of the differences between the dual and the time-reversal. It is clear that the time-reversal of branching is coalescence and vice versa, and exclusion is invariant under time-reversal. But the time-reversal of death is definitely birth, but there is no birth component in the dual of a process which features death. I don’t have a strong intuition for why this is the case, but see the final paragraph of this post. However, at least it seems plausible that both processes might simultaneously be recurrent, since in the dual, both the branching rate and the death rate have increased by the same amount.

2) This settles one problem of uniqueness of the dual that I mentioned last time, since we can vary q and get a different dual to the same original process. For example, in the voter model, we have b=d=1, and a=c=e=0, as in any update, the opinions of neighbours which were previously different become the same. Anyway, for any $q\in[-1,0]$ there is a choice of dual, where at the extremes q=0 corresponds to coalescing random walk, and q=-1 to annihilating random walk. (Note that the duality function for q=0 is the indicator function that the systems are different.)

As a final observation without much justification, suppose we add in arrows in the gaps of the branching random walk picture we had earlier, and direct them in the opposite direction. It turns out that this corresponds precisely to the dual of the process. This provides an appealing visual idea of why the dual of branching might be death. It also supports the general idea based on the coupling described earlier that the dual process is in some sense a deterministic walk in the opposite direction through the random environment specified by the original process.

REFERENCES

J.M. Swart – Duality and Intertwining of Markov Chains (mainly using chapters 2.1 and 2.7)

Thanks for Daniel Straulino for direction towards the branching random walk duality example.

# Convergence of Transition Probabilities

As you can see, I haven’t got round to writing a post for a while. Some of my reasons for this have been good, and some have not. One reason has been that I’ve had to give a large number of tutorials for the fourth quarter of the second year probability course here in Oxford. The second half of this course concerns discrete-time Markov chains, and the fourth problem sheet discusses various modes of convergence for such models, as well as a brief tangent onto Poisson Processes. I’ve written more about Poisson Processes than perhaps was justifiable in the past, so I thought I’d say some words about convergence of transition probabilities in discrete-time Markov chains.

Just to be concrete, let’s assume the state space K is finite, and labelled {1,2,…,k}, so that it becomes meaningful to discuss

$p_{12}^{(n)}:=\mathbb{P}(X_n=2|X_0=1).$

That is, the probability that if we start at state 1, then after n ‘moves’ we are at state 2. We are interested in the circumstances under which this converges to the stationary distribution. The heuristic is that we can view a time-step of a Markov chain as an operation on the space of distributions on K. Note that this operation is deterministic. If this sounds complicated, what we mean is that we specify an initial distribution, that is the distribution of $X_0$. If we consider the distribution of $X_1$, this is given by $\lambda P$, where $\lambda$ is the initial distribution, and P the transition matrix.

Anyway, the heuristic is that the stationary distribution is the unique fixed point of this operation on the space of distributions. It is therefore not unreasonable to assume that unless there are some periodic effects, we expect repeated use of this operation to move us closer to this fixed point.

We can further clarify this by considering the matrix form. Note that a transition P always has an eigenvalue equal to 1. This is equivalent to say that there is a solution to $\pi P=\pi$. Note it is not immediately equivalent to saying that P has a stationary distribution, as the latter must be non-negative and have elements summing to one. Only the first property is difficult, and relies on some theory or cleverness to prove. It can also be shown that all eigenvalues satisfy $|\lambda|\le 1$, and in general, there will be a single eigenvalue (ie dimension 1 eigenspace) with $|\lambda|=1$, and the rest satisfies $|\lambda|<1$. Then, if we diagonalise P, it is clear why $\pi P^n$ converges entry-wise, as $\pi UP^n U^{-1}$ converges. In the latter, only the entries in the row corresponding to $\lambda=1$ converge to something non-zero.

In summary, there is a strong heuristic for why in general, the transition probabilities should converge, and if they converge, that they should converge to the stationary distribution. In fact, we can prove that for any finite Markov chain, $p_{ij}^{(n)}\rightarrow \pi_j$, provided we two conditions hold. The conditions are that the chain is irreducible and aperiodic.

In the rest of this post, I want to discuss what might go wrong when these conditions are not satisfied. We begin with irreducibility. A chain is irreducible if it has precisely one communicating class. That means that we can get from any state to any other state, not necessarily in one step, with positive probability. One obvious reason why the statement of the theorem cannot hold in this setting is that $\pi$ is not uniquely defined when the chain is not irreducible. Suppose, for example, that we have two closed communicating classes A and B. Then, supported on each of them is an invariant distribution $\pi^A$ and $\pi^B$, so any affine combination of the two $\lambda \pi^A+(1-\lambda) \pi^B$ will give a stationary distribution for the whole chain.

In fact, the solution to this problem is not too demanding. If we are considering $p_{ij}^{(n)}$ for $i\in A$ a closed communicating class, then we know that $p_{ij}^{(n)}=0$ whenever $j\not\in A$. For the remaining j, we can use the theorem in its original form on the Markov chain, with state space reduced to A. Here, it is now irreducible.

The only case left to address is if i is in an open communicating class. In that case, it suffices to work out the hitting probabilities starting from i of each of the closed communicating classes. Provided these classes themselves satisfy the requirements of the theorem, we can write

$p_{ij}^{(n)}\rightarrow h_i^A \pi^A_j,\quad i\not\in A, j\in A.$

To prove this, we need to show that as the number of steps grows to infinity, the probability that we are in closed class A converges to $h_i^A$. Then, we decompose this large number of steps so to say that not only have we entered A with roughly the given probability, but in fact with roughly the given probability we entered A a long time in the past, and so there has been enough time for the original convergence result to hold in A.

Now we turn to periodicity. If a chain has period k, this says that we can split the state space into k classes $A_1,\ldots,A_k$, such that $p_{ij}^{(n)}=0$ whenever $n\not\equiv j-i \mod k$. Equivalently, the directed graph describing the possible transitions of the chain is k-partite. This definition makes it immediately clear that $p_{ij}^{(n)}$ cannot converge in this case. However, it is possible that $p_{ij}^{(kn)}$ will converge. Indeed, to verify this, we would need to consider the Markov chain with transition matrix $P^k$. Note that this is no longer irreducible, as it there are no transitions allowed between classes $A_1,\ldots,A_k$. Indeed, a more formal definition of the period, in terms of the lcd of possible return times allows us to conclude that there is no finer reducibility structure. That is, $A_1,\ldots,A_k$ genuinely are the closed classes when we consider the chain with matrix $P^k$. And so the Markov chain with transition matrix $P^k$ restricted to any of the $A_i$s satisfies the conditions of the theorem.

There remains one case which I’ve casually brushed over. When we were discussing the irreducible case, I said that if we had at least one communicating classes, then we could work out the limiting transition probabilities from a state in an open class to a state in a closed class by calculating the hitting probability of that closed class, then applying the standard version of the theorem to that closed class. This relies on the closed class being aperiodic.

Suppose otherwise that the destination closed class A has period k as before. If it were to be the case that the number of steps required to arrive at A had some fixed value mod k, or modulo a non-trivial divisor of k, then we certainly wouldn’t have convergence, for the same reasons as in the globally periodic case. However, we should ask whether we can ever have convergence?

In fact, the answer is yes. For concreteness, and because it’s easier to write ‘odd’ and ‘even’ than $m \mod k$, let’s assume A has size 2 and period 2. That is, once we arrive in A, thereafter we alternate deterministically between the two states. Anyway, for some large time n, we can write $p_{ca}^{(n)}$ for $a\in A, c\not\in A$ as:

$p_{ca}^{(n)}=h_i^A(n),$

where the latter term is the probability that we arrive in A at a time-step which has the same parity as n. It’s not terribly hard to come up with an example where this holds, and this idea holds in greater generality, where A has period k (and not necessarily just k states), we have to demand that the probability of arriving at a time which is a mod k is equal for all a in [0,k-1].

Of course, for applications, we don’t normally care much about irreducible chains, and we can easily remove periodicity by introducing so-called laziness, whereby on each time-step we flip a coin (biased if necessary) and stay put if it comes up heads, and apply the transition matrix if it comes up tails. Then it’s possible to get from any state to itself in one step, and so we are by construction aperiodic.