# Duality for Stochastic Processes

In the past couple of weeks, we’ve launched a new junior probability seminar in Oxford. (If you’re reading this and would like to come, please drop me an email!) The first session featured Daniel Straulino talking about his work on the Spatial Lambda Fleming-Viot process, a model for the evolution of populations allowing for selection and geometry. A lively discussion of duality broke out halfway through, following which it seemed reasonable to devote another session to this topic. Luke Miller presented some classical and less classical examples of the theory this afternoon. I found all of this interesting and relevant, and thought it would be worth writing some things down, and tying it in with one of the courses on this subject that we attended at ALEA in Luminy last October.

The setup is that we have two stochastic processes $X_t\subset R, Y_t\subset S$. For now we make no assertion about whether the two state spaces R and S are the same or related, and we make no comment on the dependence relationship between X and Y. Let $P_x,Q_y$ be the respective probability measures, representing starting from x and y respectively. Then given a bivariate, measurable function h(.,.) on R x S, such that:

$E^P_x h(X_t,y)=E^Q_y h(x,Y_t),\quad \forall x,y\quad\forall t,$

then we say X and Y are dual with respect to h.

The interpretation should be that X is a process forwards in time, and Y is a process backwards in time. So $X_t, Y_0$ represent the present, while $X_0, Y_t$ represent the past, which is the initial time for original process X. The fact that the result holds for all times t allows us to carry the equality through a derivative, to obtain an equality of generators:

$\mathcal{G}^X h(x,y)=\mathcal{G}^Y h(x,y),\quad \forall x,y.$

On the LHS, the generator acts on x, while on the RHS it acts on y. Although it still isn’t obvious (at least to me) when a pair of processes might have this property, especially for an arbitrary function, this seems the more natural definition to think about.

Note that this does indeed require a specific function h. There were murmurings in our meeting about the possibility of a two processes having a strong duality property, where this held for all h in some broad class of test functions. On more reflection, which may nonetheless be completely wrong, this seems unlikely to happen very often, except in some obviously degenerate cases, such as h constant. If this holds, then as the set of expectations of a class of functions of a random variable determines the distribution, we find that the instantaneous behaviour of Y is equal in distribution to the instantaneous behaviour of X when started from fixed (x,y). It seems unlikely that you might get many examples of this that are not deterministic or independent (eg two Brownian motions, or other space-, time-homogeneous Markov process).

Anyway, a canonical example of this is the Wright-Fisher diffusion, which provides a simple model for a population which evolves in discrete-time generations. We assume that there are two types in the population: {A,a} seems to be the standard notation. Children choose their type randomly from the distribution of types in the previous generation. In other words, if there are N individuals at all times, and $X_k$ is the number of type A individuals, then:

$X_{k+1} | X_k \stackrel{d}{=} \mathrm{Bin}(N, \frac{X_k}{N}).$

It is not hard to see that in a diffusion limit as the number of individuals tends to infinity, the proportion of type A individuals is a martingale, and so the generator for this process will not depend on f’. In fact by checking a Taylor series, we can show that:

$\mathcal{G}_{WF}f(x)=\frac{1}{2} x(1-x)f''(x),$

for all f subject to mild regularity conditions. In particular, we can show that for $f_n(x)=x^n$, we have:

$\mathcal{G}_{WF} f_n(x)=\binom{n}{2}(f_{n-1}(x)-f_n(x))$

after some rearranging. This looks like the generator of a jump process, indeed a jump process where all the increments are -1. This suggests there might be a coalescent as the dual process, and indeed it turns out that Kingman’s coalescent, where any pair of blocks coalesce at uniform rate, is the dual. We have the relation in expectation:

$\mathbb{E}_x[X_t^n]= \mathbb{E}_n[x^{N_t}],$

where the latter term is the moment generating function of the number of blocks at time t of Kingman’s coalescent started from n blocks.

In particular, we can control the moments of the Wright-Fisher diffusion using the mgf of the Kingman’s coalescent, which might well be easier to work with.

That’s all very elegant, but we were talking about why this might be useful in a broader setting. In the context of this question, there seems to be an obstacle towards applying this idea above more generally. This is an underlying idea in population genetics models that as well as the forward process, there is also a so-called ancestral process looking backwards in time, detailing how time t individuals are related historically. It would be convenient if this process, which we might expect to be some nice coalescent, was the dual of the forward process.

But this seems to be a problem, as duals are a function of the duality function, so do we have uniqueness? It would not be very satisfying if there were several coalescents processes that could all be the dual of the forward process. Though some worked examples suggest this might not happen, because a dual and its duality function has to satisfy too many constaints, there seems no a priori reason why not. It seems that the strength of the results you can derive from the duality relation is only as strong as the duality relation itself. This is not necessarily a problem from the point of view of applications, so long as the duality function is something it might actually be useful to work with.

It’s getting late and this text is getting long, so I shall postpone a discussion of duality for interacting particle systems until tomorrow. In summary, by restricting to a finite state space, we can allow ourselves the option of a more algebraic approach, from which some direct uses of duality can be read off. I will also mention a non-technical but I feel helpful way to view many examples of duality in interacting particle systems as deterministic forward and backwards walks through a random environment, in what might be considering an extreme example of coupling.

# Coalescence 1: What is it, and why do we care?

As part of Part III, instead of sitting an extra exam paper I am writing an essay. I have chosen the topic of ‘Multiplicative Coalescence’. I want to avoid contravening plagiarism rules, which don’t allow you to quote your own words without a proper citation, which I figure is tricky on a blog, nor open publishing of anything you intend to submit. So just to be absolutely sure, I’m going to suppress this series of posts until after May 4th, when everything has to be handed in.

———–

Informal Description

Coalescence refers to a process in which particles join together over time. An example might be islands of foam on the surface of a cup of coffee. When two clumps meet, they join, and will never split. In this example, a model would need to take into account the shape of all the islands, their positions, their velocities, and boundary properties. To make things tractable, we need to distance ourselves from the idea that particles merge through collisions, which are highly physical and complicated, and instead just consider that they merge.

Description of the Model

When two particles coalesce, it is natural to assume that mass is conserved, as this will be necessary in any physical application. With this in mind, it makes sense to set up the entire model using only the masses of particles. Define the kernel K(x,y) which describes the relative rate or likelihood of the coalescence {x,y} -> x+y. This has a different precise meaning in different contexts. Effectively, we are making a mean-field assumption that all the complications of a physical model as described above can be absorbed into this coalescent kernel, either because the number of particles is large, or because the other effects are small.

When there is, initially, a finite number of particles, the process is stochastic. Coalescence almost surely happen one at a time, and so we can view the process as a continuous time Markov Chain with state space the set of relevant partitions of the total mass present. The transition rate p(A,B) is given by K(x,y) when the coalescence {x,y} -> x+y transforms partition into B, and 0 otherwise. An observation is that the process recording the number of {x,y} -> x+y coalescences is an inhomogeneous Poisson process with local intensity n(x,t)n(y,t)K(x,y) where n(x,t) is the number of particles with mass at time t.

This motivates the move to an infinite-volume setting. Suppose that there are infinitely many particles, so that coalescences are occurring continuously. The rate of {x,y} -> x+y coalescences is still n(x,t)n(y,t)K(x,y) but now n(x,t) specifies the density of particles with mass at time t. Furthermore, because of the continuum framework, this rate is now deterministic rather than stochastic. This is extremely important, as by removing the probability from a probabilistic model, it can be treated as a large but simple ODE.

Two Remarks

1) Once this introduction is finished, we shall be bringing our focus onto multiplicative coalescence, where K(x,y) = xy. In particular, this is a homogeneous function, as are the other canonical kernels. This means that considering K(x,y) = cxy is the same as performing a constant factor time-change when K(x,y) = xy. Similarly, it is not important how the density n(x,t) is scaled as this can also be absorbed with a time-change. In some contexts, it will be natural and useful to demand that the total density be 1, but this will not always be possible. In general it is convenient to absorb as much as possible into the time parameter, particularly initial conditions, as will be discussed.

2) Working with an infinite volume of particles means that mass is no longer constrained to finitely many values. Generally, it is assumed that the masses are discrete, taking values in the positive integers, or continuous, taking values in the positive reals. In this case, the rate of coalescences between particles with masses in (x, x+dx) and (y,y+dy) is n(x,t)n(y,t)K(x,y)dxdy. The main difference between these will arise when we try to view the process as limits of finite processes. Continue reading