Duality for Interacting Particle Systems

Yesterday I introduced the notion of duality for two stochastic processes. My two goals for this post are to elaborate on the idea of why duality is useful, which we touched on in passing in the previous part, and to discuss duality of interacting particle systems. In the latter case, there are often nice ways to consider the forward and backward processes together that make the relation somewhat more natural.

The starting point is to assume a finite state space. This will be reasonable when we start to consider interacting particle systems, eg on \{0,1\}^{[n]}. As before, call the spaces R and S, and a duality function H(x,y). Since the state-spaces are finite, it is entirely natural to think of this as a matrix, and hence as an operator. Of course, a function defined on a finite state-space can be thought of as a vector, so it is clear what this operator will actually operate on. (I’ve chosen H rather than h for the duality function so it is more clear that it is acting as an operator here.)

We have some choice about which way round to define it, but for now let’s say that given some function f(.) on S

Hf(x):=\sum_{y\in S} H(x,y)f(y).

Note that this is a) exactly the definition of matrix (left-)multiplication; b) We should think of Hf as a function on R – perhaps (Hf)(x) might be more clear? and c) the operator H acts \mathbb{R}^S\rightarrow \mathbb{R}^R. If we want the corresponding operator \mathbb{R}^R\rightarrow\mathbb{R}^S, we simply multiply by H on the right instead.

But note also that the generator of a finite state-space Markov process is also a matrix, indeed a Q-matrix. So if we take our definition of the duality function as

\mathcal{G}_X h(x,y)=\mathcal{G}_Y h(x,y),

which, importantly, holds for all x,y, we can convert this into an algebraic form as

\mathcal{G}_X H = H \mathcal{G}_Y^\dagger.

In the same way that n-step transition probabilities for a discrete-time Markov chain are given by the product of the one-step transition matrix, general time transition probabilities for a continuous-time Markov chain are given by exponents of the Q-matrix. In particular, if X and Y have transition kernels P and Q respectively, then P_t=e^{tG_X}, and after doing some manipulation, we can show that

P_t H=H Q_t^\dagger,

also. This is really useful as in general we would hope that H might be invertible, from which we derive

P_t=HQ_t^\dagger H^{-1}.

So this is a powerful statement about the relationship between the evolutions of the two processes. In particular, it shows a correspondence (given by H) between left eigenvectors of P, and right eigenvectors of Q, and vice versa naturally.

The reason why this is useful rather than merely cute, is that when we re-interpret everything in terms of the original stochastic processes, we get a map between stationary distributions of X, and harmonic functions of Y. Stationary distributions are often hard to describe in any terms other than the left-1-eigenvector, or through some convergence property that is typically hard to work with. Harmonic functions, on the other hard, can be much more tractable. An example of a harmonic function is the survival probability started from a given state. This is useful for specifying the stationary distribution, but perhaps even more so for describing properties of the set of stationary distributions. In particular, uniqueness and existence are carried across this equivalence. So, for example, if the dual does not survive almost surely, then this says the only stationary measure is zero, and so the process is transient or similar.

Jan Swart’s course in Luminy last October dealt with duality, with a focus mainly on interacting particle systems. There are a couple of themes I want to talk about, without going into too much detail.

A typical interacting particle system will take place on a locally finite graph. At each vertex, there is either a particle, or there isn’t. Particles move between adjacent vertices, and sometimes interact with particles at adjacent vertices. These interactions might involve branching or coalescence. We will discuss shortly the set of possible forms such interaction might take. The state space is \{0,1\}^{V(G)}, with G the underlying graph. Then given a state, there is some set of actions which might happen next, and we consider the possibility that they happen with exponential rates.

At this stage, it seems like the initial configuration is important, as this affects what set of moves can happen immediately, and also thereafter. It is not clear how quickly this dependence fades. One useful idea is not to restrict ourselves to interactions involving the particles currently present in the system, but instead to consider a Poisson process of all possible interactions. Only the moves actually permitted by the current state will happen, but having this extra information allows for coupling between initial configurations.

It’s probably easier to consider a concrete example. The picture below shows the set-up for a branching random walk up an integer lattice. Each particle moves to one of the two state directly above its current state, or it branches and sends particles to both of them.DSC_2589In the diagram, we have glued arrows onto every state at every time, which tells us what to do if there is a particle there at each time. As a coupling, we can now think of the process as a deterministic walk through a random environment. The environment is given by some probability space, which in continuous time might have the appearance of a Poisson process on the set of ‘moves’, and the initial condition of the walk is up to us.

We can generalise this to a broader class of interacting particle systems. If we want all interactions to be between pairs of adjacent states, there are six possible things which could happen:

  • Annihilation: two adjacent particles destroy each other. ( 11 -> 00 )
  • Branching: one particle becomes two particles. ( 01 or 10 -> 11 )
  • Coalescence: two particles merge. ( 11 -> 01 or 10 )
  • Death: A particle is removed. ( 01 or 10 -> 00 )
  • Exclusion: a particle moves. ( 01 -> 10 )
  • Birth: a particle is created. ( 00 -> 01 or 10 )

For now we exclude the possibility of birth. Note that the way we have set this up involving two-site interactions excludes the possibility of a particle trying to move to an already-occupied site.

DSC_2588Let us say that in process X the rates at which each of these events happen are a, b, c, d and e, taking advantage of the helpful choice of naming. There is some flexibility about whether the rates are the same between every pair of vertices of note. For this post we assume that they are. Then it is a result of Lloyd and Sudbury that given some real q\neq 1, the process X’ with corresponding rates given by:

a'=a+2q\gamma, b'=b+\gamma, c'=c-(1+q)\gamma, d'=d+\gamma, e'=e-\gamma,

for \gamma:= \frac{a+c-d+qb}{1-q},

is dual to X, with duality function given by h(Y,Z)=q^{|Y\cap Z|}, for Y and Z possible states.

I want to make two comments:

1) This illustrates one of the differences between the dual and the time-reversal. It is clear that the time-reversal of branching is coalescence and vice versa, and exclusion is invariant under time-reversal. But the time-reversal of death is definitely birth, but there is no birth component in the dual of a process which features death. I don’t have a strong intuition for why this is the case, but see the final paragraph of this post. However, at least it seems plausible that both processes might simultaneously be recurrent, since in the dual, both the branching rate and the death rate have increased by the same amount.

2) This settles one problem of uniqueness of the dual that I mentioned last time, since we can vary q and get a different dual to the same original process. For example, in the voter model, we have b=d=1, and a=c=e=0, as in any update, the opinions of neighbours which were previously different become the same. Anyway, for any q\in[-1,0] there is a choice of dual, where at the extremes q=0 corresponds to coalescing random walk, and q=-1 to annihilating random walk. (Note that the duality function for q=0 is the indicator function that the systems are different.)

DSC_2590

As a final observation without much justification, suppose we add in arrows in the gaps of the branching random walk picture we had earlier, and direct them in the opposite direction. It turns out that this corresponds precisely to the dual of the process. This provides an appealing visual idea of why the dual of branching might be death. It also supports the general idea based on the coupling described earlier that the dual process is in some sense a deterministic walk in the opposite direction through the random environment specified by the original process.

REFERENCES

J.M. Swart – Duality and Intertwining of Markov Chains (mainly using chapters 2.1 and 2.7)

Thanks for Daniel Straulino for direction towards the branching random walk duality example.

Enhanced by Zemanta

Duality for Stochastic Processes

In the past couple of weeks, we’ve launched a new junior probability seminar in Oxford. (If you’re reading this and would like to come, please drop me an email!) The first session featured Daniel Straulino talking about his work on the Spatial Lambda Fleming-Viot process, a model for the evolution of populations allowing for selection and geometry. A lively discussion of duality broke out halfway through, following which it seemed reasonable to devote another session to this topic. Luke Miller presented some classical and less classical examples of the theory this afternoon. I found all of this interesting and relevant, and thought it would be worth writing some things down, and tying it in with one of the courses on this subject that we attended at ALEA in Luminy last October.

The majority of this article is based on Luke’s talk. Errors, omissions and over-simplifications are of course my own.

The setup is that we have two stochastic processes X_t\subset R, Y_t\subset S. For now we make no assertion about whether the two state spaces R and S are the same or related, and we make no comment on the dependence relationship between X and Y. Let P_x,Q_y be the respective probability measures, representing starting from x and y respectively. Then given a bivariate, measurable function h(.,.) on R x S, such that:

E^P_x h(X_t,y)=E^Q_y h(x,Y_t),\quad \forall x,y\quad\forall t,

then we say X and Y are dual with respect to h.

The interpretation should be that X is a process forwards in time, and Y is a process backwards in time. So X_t, Y_0 represent the present, while X_0, Y_t represent the past, which is the initial time for original process X. The fact that the result holds for all times t allows us to carry the equality through a derivative, to obtain an equality of generators:

\mathcal{G}^X h(x,y)=\mathcal{G}^Y h(x,y),\quad \forall x,y.

On the LHS, the generator acts on x, while on the RHS it acts on y. Although it still isn’t obvious (at least to me) when a pair of processes might have this property, especially for an arbitrary function, this seems the more natural definition to think about.

Note that this does indeed require a specific function h. There were murmurings in our meeting about the possibility of a two processes having a strong duality property, where this held for all h in some broad class of test functions. On more reflection, which may nonetheless be completely wrong, this seems unlikely to happen very often, except in some obviously degenerate cases, such as h constant. If this holds, then as the set of expectations of a class of functions of a random variable determines the distribution, we find that the instantaneous behaviour of Y is equal in distribution to the instantaneous behaviour of X when started from fixed (x,y). It seems unlikely that you might get many examples of this that are not deterministic or independent (eg two Brownian motions, or other space-, time-homogeneous Markov process).

Anyway, a canonical example of this is the Wright-Fisher diffusion, which provides a simple model for a population which evolves in discrete-time generations. We assume that there are two types in the population: {A,a} seems to be the standard notation. Children choose their type randomly from the distribution of types in the previous generation. In other words, if there are N individuals at all times, and X_k is the number of type A individuals, then:

X_{k+1} | X_k \stackrel{d}{=} \mathrm{Bin}(N, \frac{X_k}{N}).

It is not hard to see that in a diffusion limit as the number of individuals tends to infinity, the proportion of type A individuals is a martingale, and so the generator for this process will not depend on f’. In fact by checking a Taylor series, we can show that:

\mathcal{G}_{WF}f(x)=\frac{1}{2} x(1-x)f''(x),

for all f subject to mild regularity conditions. In particular, we can show that for f_n(x)=x^n, we have:

\mathcal{G}_{WF} f_n(x)=\binom{n}{2}(f_{n-1}(x)-f_n(x))

after some rearranging. This looks like the generator of a jump process, indeed a jump process where all the increments are -1. This suggests there might be a coalescent as the dual process, and indeed it turns out that Kingman’s coalescent, where any pair of blocks coalesce at uniform rate, is the dual. We have the relation in expectation:

\mathbb{E}_x[X_t^n]= \mathbb{E}_n[x^{N_t}],

where the latter term is the moment generating function of the number of blocks at time t of Kingman’s coalescent started from n blocks.

In particular, we can control the moments of the Wright-Fisher diffusion using the mgf of the Kingman’s coalescent, which might well be easier to work with.

That’s all very elegant, but we were talking about why this might be useful in a broader setting. In the context of this question, there seems to be an obstacle towards applying this idea above more generally. This is an underlying idea in population genetics models that as well as the forward process, there is also a so-called ancestral process looking backwards in time, detailing how time t individuals are related historically. It would be convenient if this process, which we might expect to be some nice coalescent, was the dual of the forward process.

But this seems to be a problem, as duals are a function of the duality function, so do we have uniqueness? It would not be very satisfying if there were several coalescents processes that could all be the dual of the forward process. Though some worked examples suggest this might not happen, because a dual and its duality function has to satisfy too many constaints, there seems no a priori reason why not. It seems that the strength of the results you can derive from the duality relation is only as strong as the duality relation itself. This is not necessarily a problem from the point of view of applications, so long as the duality function is something it might actually be useful to work with.

It’s getting late and this text is getting long, so I shall postpone a discussion of duality for interacting particle systems until tomorrow. In summary, by restricting to a finite state space, we can allow ourselves the option of a more algebraic approach, from which some direct uses of duality can be read off. I will also mention a non-technical but I feel helpful way to view many examples of duality in interacting particle systems as deterministic forward and backwards walks through a random environment, in what might be considering an extreme example of coupling.

Enhanced by Zemanta

Critical Components in Erdos-Renyi

In various previous posts, I’ve talked about the phase transition in the Erdos-Renyi random graph process. Recall the definition of the process. Here we will use the Gilbert model G(n,p), where we have n vertices, and between any pair of vertices we add an edge, independently of other pairs with probability p. We are interested in the sparse scaling, where the typical vertex has degree O(1) in n, and so p=c/n for constant c>0, and we assume throughout that n is large. We could alternatively have considered the alternative Erdos-Renyi model where we choose uniformly at random from the set of graphs with n vertices and some fixed number of edges. Almost all the results present work equally well in this setting.

As proved by Erdos and Renyi, the typical component structure of such a graph changes noticeably around the threshold c=1. Below this, in the subcritical regime, all the components are small, meaning of size at most order O(log n). Above this, in the supercritical regime, there is a single giant component on some non-zero proportion of the vertices. The rest of the graph looks subcritical. The case c=1 exhibits a phase transition between these qualitatively different behaviours. They proved that here, the largest component is with high probability O(n^2/3). It seems that they thought this result held whenever c=1-o(1), but it turns out that this is not the case. In this post, I will discuss some aspects of behaviour around criticality, and the tools needed to treat them.

The first question to address is this: how many components of size n^{2/3} are there? It might be plausible that there is a single such component, like for the subsequent giant component. It might also be plausible that there are n^1/3 such components, so O(n) vertices are on such critical components. As then it is clear how we transition out of criticality into supercriticality – all the vertices on critical components coalesce to form the new giant component.

In fact neither of these are correct. The answer is that for all integers k>0, with high probability the k-th largest component is on a size scale of n^2/3. This is potentially a confusing statement. It looks like there are infinitely many such components, but of course for any particular value of n, this cannot be the case. We should think of there being w(1) components, but o(n^b) for any b>0.

The easiest way to see this is by a duality argument, as we have discussed previously for the supercritical phase. If we remove a component of size O(n^2/3), then what remains is a random graph with n-O(n^2/3) vertices, and edge probability the same as originally. It might make sense to rewrite this probability 1/n as

\frac{1}{n-O(n^{2/3})}\cdot \frac{n-O(n^{2/3})}{n}=\frac{1-O(n^{-1/3})}{n-O(n^{2/3})}.

The approximation in the final numerator is basically the same as

1-o\left(n-O(n^{2/3})\right).

Although we have no concrete reasoning, it seems at least plausible that this should look similar in structure to G(n,1/n). In particular, there should be another component of size

O\left([n-O(n^{2/3})]^{2/3}\right)=O(n^{2/3}).

In fact, the formal proof of this proceeds by an identical argument, only using the exploration process. Because I’ve described this several times before, I’ll be brief. We track how far we have gone through each component in a depth-first walk. In both the supercritical and subcritical cases, when we scale correctly we get a random path which is basically deterministic in the limit (in n). For exactly the same reasons as visible CLT fluctuations for partial sums of RVs with expectation zero, we start seeing interesting effects at criticality.

The important question is the order of rescaling to choose. At each stage of the exploration process, the number of vertices added to the stack is binomial. We want to distinguish between components of size O(n^{2/3}) so we should look at the exploration process at time sn^{2/3}. The drift of the exploration process is given by the expectation of a binomial random variable minus one (since we remove the current vertex from the stack as we finish exploring it). This is given by

\mathbb{E}=\left[n-sn^{2/3}\right]\cdot \frac{1}{n}-1=-sn^{-1/3}.

Note that this is the drift in one time-step. The drift in n^{2/3} time-steps will accordingly by sn^{1/3}. So, if we rescale time by n^{2/3} and space by n^{1/3}, we should get a nice stochastic process. Specifically, if Z is the exploration process, then we obtain:

\frac{1}{n^{1/3}}Z^{(n)}_{sn^{2/3}} \rightarrow_d W_s,

where W is a Brownian motion with inhomogeneous drift -s at time s. The net effect of such a drift at a fixed positive time is given by integrating up to that time, and hence we might say the process has quadratic drift, or is parabolic.

We should remark that our binomial expectation is not entirely correct. We have discounted those sn^{2/3} vertices that have already been explored, but we have not accounted for the vertices currently in the stack. We should also be avoiding considering these. However, we now have a heuristic for the approximate number of these. The number of vertices in the stack should be O(n^{1/3}) at all times, and so in particular will always be an order of magnitude smaller than the number of vertices already considered. Therefore, they won’t affect this drift term, though this must be accounted for in any formal proof of convergence. On the subject of which, the mode of convergence is, unsurprisingly, weak convergence uniformly on compact sets. That is, for any fixed S, the convergence holds weakly on the random functions up to time sn^{2/3}.

Note that this process will tend to minus infinity almost surely. Component sizes are given by excursions above the running minimum. The process given by the height of the original process above the running minimum is called reflected. Essentially, we construct the reflected process by having the same generator when the current value is positive, and forcing the process up when it is at zero. There are various ways to construct this more formally, including as the scaling limit of some simple random walks conditioned never to stay non-negative.

The cute part of the result is that it holds equally well in a so-called critical window either side of the critical probability 1/n. When the probability is \frac{1+tn^{-1/3}}{n}, for any t\in \mathbb{R}, the same argument holds. Now the drift at time s is t-s, though everything else still holds.

This result was established by Aldous in [1], and gives a mechanism for calculating distributions of component sizes and so on through this critical window.

In particular, we are now in a position to answer the original question regarding how many such components there were. The key idea is that because whenever we exhaust a component in the exploration process, we choose a new vertex uniformly at random, we are effectively choosing a component according to the size-biased distribution. Roughly speaking, the largest components will show up near the beginning. Note that a critical O(n^{2/3}) component will not necessarily be exactly the first component in the exploration process, but the components that are explored before this will take up sufficiently few vertices that they won’t show up in the scaling of the limit.

In any case, the reflected Brownian motion ‘goes on forever’, and the drift is eventually very negative, so there cannot be infinitely wide excursions, hence there are infinitely many such critical components.

If we care about the number of cycles, we can treat this also via the exploration process. Note that in any depth-first search we are necessarily only interested in a spanning tree of the host graph. Anyway, when we are exploring a vertex, there could be extra edges to other vertices in the stack, but not to vertices we’ve already finished exploring (otherwise the edge would have been exposed then). So the expected number of excess edges into a vertex is proportional to the height of the exploration process at that vertex. So the overall expected number of excess edges, conditional on the exploration process is the area under the curve. This carries over perfectly well into the stochastic process limit. It is then a calculation to verify that the area under the curve is almost surely infinite, and thus that we expect there to be infinitely many cycles in a critical random graph.

REFERENCES

[1] Aldous D. – Brownian excursions, critical random graphs and the multiplicative coalescent

Subordinators and the Arcsine rule

After the general discussion of Levy processes in the previous post, we now discuss a particular class of such processes. The majority of content and notation below is taken from chapters 1-3 of Jean Bertoin’s Saint-Flour notes.

We say X_t is a subordinator if:

  • It is a right-continuous adapted stochastic process, started from 0.
  • It has stationary, independent increments.
  • It is increasing.

Note that the first two conditions are precisely those required for a Levy process. We could also allow the process to take the value \infty, where the hitting time of infinity represents ‘killing’ the subordinator in some sense. If this hitting time is almost surely infinite, we say it is a strict subordinator. There is little to be gained right now from considering anything other than strict subordinators.

Examples

  • A compound Poisson process, with finite jump measure supported on [0,\infty). Hereafter we exclude this case, as it is better dealt with in other languages.
  • A so-called stable Levy process, where \Phi(\lambda)=\lambda^\alpha, for some \alpha\in(0,1). (I’ll define \Phi very soon.) Note that checking that the sample paths are increasing requires only that X_1\geq 0 almost surely.
  • The hitting time process for Brownian Motion. Note that this does indeed have jumps as we would need. (This has \Phi(\lambda)=\sqrt{2\lambda}.)

Properties

  • In general, we describe Levy processes by their characteristic exponent. As a subordinator takes values in [0,\infty), we can use the Laplace exponent instead:

\mathbb{E}\exp(-\lambda X_t)=:\exp(-t\Phi(\lambda)).

  • We can refine the Levy-Khintchine formula;

\Phi(\lambda)=k+d\lambda+\int_{[0,\infty)}(1-e^{-\lambda x})\Pi(dx),

  • where k is the kill rate (in the non-strict case). Because the process is increasing, it must have bounded variation, and so the quadratic part vanishes, and we have a stronger condition on the Levy measure: \int(1\wedge x)\Pi(dx)<\infty.
  • The expression \bar{\Pi}(x):=k+\Pi((x,\infty)) for the tail of the Levy measure is often more useful in this setting.
  • We can think of this decomposition as the sum of a drift, and a PPP with characteristic measure \Pi+k\delta_\infty. As we said above, we do not want to consider the case that X is a step process, so either d>0 or \Pi((0,\infty))=\infty is enough to ensure this.

Analytic Methods

We give a snapshot of a couple of observations which make these nice to work with. Define the renewal measure U(dx) by:

\int_{[0,\infty)}f(x)U(dx)=\mathbb{E}\left(\int_0^\infty f(X_t)dt\right).

If we want to know the distribution function of this U, it will suffice to consider the indicator function f(x)=1_{X_t\leq x} in the above.

The reason to exclude step processes specifically is to ensure that X has a continuous inverse:

L_x=\sup\{t\geq 0:X_t\leq x\} so U(x)=\mathbb{E}L_x is continuous.

In fact, this renewal measure characterises the subordinator uniquely, as we see by taking the Laplace transform:

\mathcal{L}U(\lambda)=\int_{[0,\infty)}e^{-\lambda x}U(dx)=\mathbb{E}\int e^{-\lambda X_t}dt

=\int \mathbb{E}e^{-\lambda X_t}dt=\int\exp(-t\Phi(\lambda))dt=\frac{1}{\Phi(\lambda)}.

The Arcsine Law

X is Markov, which induces a so-called regenerative property on the range of X, \mathcal{R}. Formally, given s, we do not always have s\in\mathcal{R} (as the process might jump over s), but we can define D_s=\inf\{t>s:t\in\mathcal{R}\}. Then

\{v\geq 0:v+D_s\in\mathcal{R}\}\stackrel{d}{=}\mathcal{R}.

In fact, the converse holds as well. Any random set with this regenerative property is the range of some subordinator. Note that D_s is some kind of dual to X, since it is increasing, and the regenerative property induces some Markovian properties.

In particular, we consider the last passage time g_t=\sup\{s<t:s\in\mathcal{R}\}, in the case of a stable subordinator with \Phi(\lambda)=\lambda^\alpha. Here, \mathcal{R} is self-similar with scaling exponent \alpha. The distribution of \frac{g_t}{t} is thus independent of t. In this situation, we can derive the generalised arcsine rule for the distribution of g_1:

\mathbb{R}(g_1\in ds)=\frac{\sin \alpha\pi}{\pi}s^{\alpha-1}(1-s)^{-\alpha}ds.

The most natural application of this is to the hitting time process of Brownian Motion, which is stable with \alpha=\frac12. Then g_1=S_1-B_1, in the usual notation for the supremum process. Furthermore, we have equality in distribution of the processes (see previous posts on excursion theory and the short aside which follows):

(S_t-B_t)_{t\geq 0}\stackrel{d}{=}(|B_t|)_{t\geq 0}.

So g_1 gives the time of the last zero of BM before time 1, and the arcsine law shows that its distribution is given by:

\mathbb{P}(g_1\leq t)=\frac{2}{\pi}\text{arcsin}\sqrt{t}.

The Levy-Khintchine Formula

Because of a string of coincidences involving my choice of courses for Part III and various lecturers’ choices about course content, I didn’t learn what a Levy process until a few weeks’ ago. Trying to get my head around the Levy-Khintchine formula took a little while, so the following is what I would have liked to have been able to find back then.

A Levy process is an adapted stochastic process started from 0 at time zero, and with stationary, independent increments. This is reminiscent, indeed a generalisation, of the definition of Brownian motion. In that case, we were able to give a concrete description of the distribution of X_1. For a general Levy process, we have

X_1=X_{1/n}+(X_{2/n}-X_{1/n})+\ldots+(X_1-X_{1-1/n}).

So the distribution of X_1 is infinitely divisible, that is, can be expressed as the distribution of the sum n iid random variables for all n. Viewing this definition in terms of convolutions of distributions may be more helpful, especially as we will subsequently consider characteristic functions. If this is the first time you have seen this property, note that it is not a universal property. For example, it is not clear how to write a U[0,1] random variable as a convolution of two iid RVs. Note that exactly the same argument suffices to show that the distribution of X_t is infinitely divisible.

It will be most convenient to work with the characteristic functions

\mathbb{E}\exp(i\langle \lambda,X_t\rangle).

By stationarity of increments, we can show that this is equal to

\exp(-\Psi(\lambda)t)\quad\text{where}\quad \mathbb{E}\exp(i\langle \lambda,X_1\rangle)=:\exp(-\Psi(\lambda)).

This function \Psi(\lambda) is called the characteristic exponent. The argument resembles that used for Cauchy’s functional equations, by dealing first with the rationals using stationarity of increments, then lifting to the reals by the (right-)continuity of

t\mapsto \mathbb{E}\exp(i\langle \lambda,X_t\rangle).

As ever, \Psi(\lambda) uniquely determines the distribution of X_1, and so it also uniquely determines the distribution of Levy process. The only condition on \Psi is that it be the characteristic function of an infinitely divisible distribution. This condition is given explicitly by the Levy-Khintchine formula.

Levy-Khintchine

\Psi(\lambda) is the characteristic function of an infinitely divisible distribution iff

\Psi(\lambda)=i\langle a,\lambda\rangle +\frac12 Q(\lambda)+\int_{\mathbb{R}^d}(1-e^{i\langle \lambda,x\rangle}+i\langle \lambda,x\rangle 1_{|x|<1})\Pi(dx).

for a\in\mathbb{R}^d, Q a quadratic form on \mathbb{R}^d, and \Pi a so-called Levy measure satisfying \int (1\wedge |x|^2)\Pi(dx)<\infty.

This looks a bit arbitrary, so first let’s explain what each of these terms ‘means’.

  • i\langle a,\lambda\rangle comes from a drift of -a. Note that a deterministic linear function is a (not especially interesting) Levy process.
  • \frac12Q(\lambda) comes from a Brownian part \sqrt{Q}B_t.

The rest corresponds to the jump part of the process. Note that a Poisson process is an example of a Levy process, hence why we might consider thinking about jumps in the first place. The reason why there is an indicator function floating around is that we have to think about two regimes separately, namely large and small jumps. Jumps of size bounded below cannot happen too often as otherwise the process might explode off to infinity in finite time with positive probability. On the other hand, infinitesimally small jumps can happen very often (say on a dense set) so long as everything is controlled to prevent an explosion on the macroscopic scale.

There is no canonical choice for where the divide between these regimes happens, but conventionally this is taken to be at |x|=1. The restriction on the Levy measure near 0 ensures that the sum of the squares all jumps up some finite time converges absolutely.

  • \Pi\cdot 1_{|x|\geq 1} gives the intensity of a standard compound Poisson process. The jumps are well-spaced, and so it is a relatively simple calculation to see that the characteristic function is

\int_{\mathbb{R}^d}(1-e^{i\langle \lambda,x\rangle})1_{|x|\geq 1}\Pi(dx).

The intensity \Pi\cdot 1_{|x|<1} gives infinitely many hits in finite time, so if the expectation of this measure is not 0, we explode immediately. We compensate by drifting away from this at rate

\int_{\mathbb{R}^d}x1_{|x|<1}\Pi(dx).

To make this more rigorous, we should really consider 1_{\epsilon<|x|<1} then take a limit, but this at least explains where all the terms come from. Linearity allows us to interchange integrals and inner products, to get the term

\int_{\mathbb{R}^d}(1-e^{-i\langle \lambda,x\rangle}+i\langle\lambda,x\rangle 1_{|x|<1})\Pi(dx).

If the process has bounded variation, then we must have Q=0, and also

\int (1\wedge |x|)\Pi(dx)<\infty,

that is, not too many jumps on an |x| scale. In this case, then this drift component is well-defined and linear \lambda, so can be incorporated with the drift term at the beginning of the Levy-Khintchine expression. If not, then there are some \lambda for which it does not exist.

There are some other things to be said about Levy processes, including

  • Stable Levy processes, where \Psi(k\lambda)=k^\alpha \Psi(\lambda), which induces the rescaling-invariance property: k^{-1/\alpha}X_{kt}\stackrel{d}{=}X. The distribution of each X_t is then also a stable distribution.
  • Resolvents, where instead of working with the process itself, we work with the distribution of the process at a random exponential time.

Martingales

Definition

X_n is a stochastic process, integrable and adapted to filtration (\Omega,\mathcal{F},(\mathcal{F}_n),\mathbb{P}). Then X is a martingale if \mathbb{E}[X_n|\mathcal{F}_m]=X_m almost surely whenever n\geq m.

It is natural to think about a martingale as defined in the context of a process evolving in time. Then this definition is very reasonable:

  • Integrable: the entire construction is about taking expectations. So these need to exist.
  • Adapted: X_n can’t be affected by what happens after time n. It must be defined by what has happened up to time n.
  • Expectation condition: if you look at the process at a given time m, the best estimate for what X will be in the future is in fact what it is now. As you might expect, it is sufficient that \mathbb{E}[X_{n+1}|\mathcal{F}_n]=X_n almost surely for each n.

Motivation

There are many situations where the expected change in a variable over a time period is zero, whatever the value of the variable is at the start. For example, gambling. For illustration, assume we are speculating on the outcomes of tossing a coin repeatedly. You might have a complicated strategy, for example ‘double your stake when you lose’ (the so-called martingale strategy), or anything else. But ultimately, you can’t see into the future. So before every coin toss, you have to decide your stake, based on what’s happened up until now, and you will win or lose with equal probability, so your expected gain is 0, regardless of how you make your stake choice. Thus under any strategy determined without looking into the future (called previsible), the process recording your winnings is a martingale. Continue reading