Coalescence 1: What is it, and why do we care?

As part of Part III, instead of sitting an extra exam paper I am writing an essay. I have chosen the topic of ‘Multiplicative Coalescence’. I want to avoid contravening plagiarism rules, which don’t allow you to quote your own words without a proper citation, which I figure is tricky on a blog, nor open publishing of anything you intend to submit. So just to be absolutely sure, I’m going to suppress this series of posts until after May 4th, when everything has to be handed in.


Informal Description

Coalescence refers to a process in which particles join together over time. An example might be islands of foam on the surface of a cup of coffee. When two clumps meet, they join, and will never split. In this example, a model would need to take into account the shape of all the islands, their positions, their velocities, and boundary properties. To make things tractable, we need to distance ourselves from the idea that particles merge through collisions, which are highly physical and complicated, and instead just consider that they merge.

Description of the Model

When two particles coalesce, it is natural to assume that mass is conserved, as this will be necessary in any physical application. With this in mind, it makes sense to set up the entire model using only the masses of particles. Define the kernel K(x,y) which describes the relative rate or likelihood of the coalescence {x,y} -> x+y. This has a different precise meaning in different contexts. Effectively, we are making a mean-field assumption that all the complications of a physical model as described above can be absorbed into this coalescent kernel, either because the number of particles is large, or because the other effects are small.

When there is, initially, a finite number of particles, the process is stochastic. Coalescence almost surely happen one at a time, and so we can view the process as a continuous time Markov Chain with state space the set of relevant partitions of the total mass present. The transition rate p(A,B) is given by K(x,y) when the coalescence {x,y} -> x+y transforms partition into B, and 0 otherwise. An observation is that the process recording the number of {x,y} -> x+y coalescences is an inhomogeneous Poisson process with local intensity n(x,t)n(y,t)K(x,y) where n(x,t) is the number of particles with mass at time t.

This motivates the move to an infinite-volume setting. Suppose that there are infinitely many particles, so that coalescences are occurring continuously. The rate of {x,y} -> x+y coalescences is still n(x,t)n(y,t)K(x,y) but now n(x,t) specifies the density of particles with mass at time t. Furthermore, because of the continuum framework, this rate is now deterministic rather than stochastic. This is extremely important, as by removing the probability from a probabilistic model, it can be treated as a large but simple ODE.

Two Remarks

1) Once this introduction is finished, we shall be bringing our focus onto multiplicative coalescence, where K(x,y) = xy. In particular, this is a homogeneous function, as are the other canonical kernels. This means that considering K(x,y) = cxy is the same as performing a constant factor time-change when K(x,y) = xy. Similarly, it is not important how the density n(x,t) is scaled as this can also be absorbed with a time-change. In some contexts, it will be natural and useful to demand that the total density be 1, but this will not always be possible. In general it is convenient to absorb as much as possible into the time parameter, particularly initial conditions, as will be discussed.

2) Working with an infinite volume of particles means that mass is no longer constrained to finitely many values. Generally, it is assumed that the masses are discrete, taking values in the positive integers, or continuous, taking values in the positive reals. In this case, the rate of coalescences between particles with masses in (x, x+dx) and (y,y+dy) is n(x,t)n(y,t)K(x,y)dxdy. The main difference between these will arise when we try to view the process as limits of finite processes. Continue reading


Missing in Action?


Not much maths has been appearing here in the past few weeks. But I have been working…

As part of Part III, instead of sitting an extra exam paper I am writing an essay. I have chosen the topic of ‘Multiplicative Coalescence’ so have been hard at work reading various papers and articles. I’ve been writing some posts about the topic as practice for writing up the essay – in fact, it’s entirely possible that large chunks will end up featuring verbatim. As a result, to ensure I stay firmly on the correct side of the rules about plagiarism, I’m going to wait until after exam results are announced on June 20th before making these posts visible.

SLE revision 1: Properties of Random Sets

Prof. Werner’s excellent Part III course ‘Topics in Conformal Invariance and Randomness’ has recently finished, and I’ve been doing some revision. The course begins with a general discussion of some of the ideas useful in demanding some form of regularity for random paths or random sets in a domain. For example, for continuous time processes, we can define a Markovian property: this is both easy and natural, mainly because the state space, assuming it is \mathbb{R}^d is homogenous, which is not a luxury in, say, the unit disc. In two dimensions, things are particularly tractable because of the equivalence to the complex plane, and from this we develop the Schramm-Loewner evolution, and we examine its properties. In particular, SLEs with some exponents arise as a limit of discrete processes, with wide-ranging applications. In this first note, we motivate and explain some properties that we might wish random sets to have.

conformal map is an invertible map between domains in the complex plane which preserves angles. Riemann’s mapping theorem states that there exists a conformal map from any non-empty, simply connected domain to the open unit disc. We have some freedom to control one point, and the boundary is mapped to the boundary.

Conformal Invariance: Given a simply connected domain D, and conformal \phi:D\rightarrow\mathbb{U}, then \mathcal{B}^D a process defined on domains D is conformally invariant if


This says that the law of the process is preserved under the transformation.

The notation chosen is deliberate. The best example is Brownian paths: take B a Brownian motion started at 0, and T^D the exit time of domain D, then set \mathcal{B}=\{B_t,t\leq T^D\} the path in D. Informally, conformal invariance for all domains with 0\in D, follows because BM is isotropic, that is, the angle taken after a time t, whatever that means, is uniformly distributed. Modulo Markov technicalities, this property is preserved under a conformal map because they preserve angles.

Conformal Restriction: This is essentially the same as conformal invariance, but in the special case where one of the domains is contained in the other. Although less general, by viewing everything in the context of the laws of processes in the larger domain, we can in fact show an equality for a given single process with conditioning, rather than effectively two unrelated processes. We assume the reference domain is the unit disc.

Concretely, we can consider a random set K in the unit disc with law P^K, and for a subset U\subset\mathbb{U} which contains 0 and 1, define the conformal map \phi_U:U\rightarrow \mathbb{U} that preserves 0 and 1. Then set P_U^K to be the law of \phi_U^{-1}(K), which gives a law for random sets in U. We say K satisifies conformal restriction if:

P_U^K=P|_{\{K\subset U\}}

Observe that applying \phi_U to both sides of the definition gives conformal invariance for this pair of domains. Continue reading

Simpson’s Paradox – making sense of statistical confusion

Test 1 Test 2
Success Failure Success (%) Success Failure Success (%)
Drug 30 70 30 30 20 60
Placebo 10 40 20 100 100 50

Consider the introduction of a new drug, which needs to be tested against a placebo. A first test suggests that the drug is an improvement, in so far as its effects are better than doing nothing. As is usual practice, data only becomes respected when its conclusions are replicated, so a second test, with fresh subjects is performed, and a similar improvement is seen. Flushed with success, the scientist writes up his findings, combining the results of the two tests for maximum impact.

Success Failure Success (%)
Drug 60 90 40
Placebo 110 140 44

But how can this have happened? Whereas the drug was clearly significantly better than the placebo in both of the tests, when the results are combined, it has performed worse. This is often called Simpson’s Paradox, but like most paradoxes, it points to a flaw in the mathematican rather than in the maths. Let’s think about some interpretations of this confusing phenomenon.

Firstly, even the most casual second glance at the original data table makes it seem less surprising that this problem has arisen. The performance of both drug and placebo was significantly better in the second test. However, in the second test, there were loads of placebo subjects, and many fewer drug subjects. A sensible thing to consider is to take the situation to a natural extreme. Suppose the second test had only one subject on the drug, and it had been a success. Then, we would have claimed 100% performance on that test, but this wouldn’t have influenced the overall performance across both surveys enormously. If they had only tested one person, this would have immediately flagged up suspicions when we glanced at the table, but in the information given, we were lulled into a false sense of security because the data sizes didn’t seem too weird.

Secondly, when thinking about any kind of statistical analysis, data needs to be taken in the context in which it is provided. However, this ‘paradox’ is entirely arithmetic. There is no explicit reference to the pharmaceutical nature of the situation in the data. Suppose instead we had presented the figures in a sporting context.

Australia Bangladesh
Innings Runs Average Innings Runs Average
Cook 2 60 30 2 120 60
Strauss 1 20 20 4 200 50

This table presents essentially the same numerical information in the context of two batsmen’s performances against two different countries. Against both Australia and Bangladesh, Cook has a higher average than Strauss, but as before, when the total average is calculated, Strauss with 44 emerges above Cook with 40. But in this setting, it all makes a lot more sense. Of course Strauss’ average will be relatively inflated because he has spent much more time playing against easier opposition.

Perhaps the most intelligent remark to make is that given the original data, to add the results of the two tests is absurd. The success or otherwise of a drug should be reasonably constant (to within whatever we mean by reasonable statistical error) across two tests. If we get wildly different figures, especially for the placebo data (!), then the only conclusion can be that there are some other factors affecting the patients which we haven’t accounted for, but which are having a much greater effect than the drug! (Note that this doesn’t apply in the cricket framework, because there is no reason to suppose that a batsman’s performance in one series should be similar to any other. But in this case, we have previously justified this adding of contrasting data, so all is fine.)

Most articles about a paradox would conclude by giving general advice about avoiding the pitfalls. And I guess in this situation it’s not too tricky: relative sample sizes are a factor in determining net averages; and adding test results which are completely different can be dangerous even if they both give the same crude conclusion. Perhaps the best moral might be that, while school statistics courses are often forced to encourage an attitude of ‘if in doubt, draw a histogram’, having a think about the underlying situation can be useful before getting deep into some number-crunching.

Remarkable fact about Brownian Motion #4: The Dirichlet Problem

So this property of Brownian Motion is so elegant, in my opinion, that when I was recently asked what my ‘favourite theorem’ was, I suggested this. With this result, we can use this probabilistic structure to specify solutions to an important PDE, with boundary conditions, over a large class of domains.

Given a domain D, Laplace’s equation is: \Delta u=0 on D, and u=f on the boundary dD, where f is any continuous function defined there. This PDE arises wherever the notion of potentials is defined, for example electromagnetism, fluids and thermodynamics.

Theorem: Given suitable regularity conditions on D to be discussed later, Laplace’s equation has a unique solution, given by:


Notation: First, what does this mean? Define T_D:=\inf\{t:B_t\not\in D\}, to be the time at which a Brownian Motion leaves the domain D. This is a stopping time, and so will be suitable for application of the Strong Markov Property. \mathbb{E}_x means that we are taking expectation with respect to a BM started at x. So informally, we are defining u(x) as: start a BM at x; see where it hits the boundary of D; record the value of f at that point. Then set u(x) to be the expected value of this process.

Existence: First, we are going to check that the solution conjectured is a solution. We will need a lemma:

Lemma: A locally-bounded function u satisfies \Delta u=0 on a domain D if and only if it has the property that for every closed ball \bar{B(x,r)}\subset D we have:


where \sigma_{x,r} is the surface area measure on the boundary S(x,r) of the ball radius r centred on x. Essentially, this says that u(x) is equal to the average value of u on a ball around x.

Proof of Theorem: First, existence. Set u as specified in the statement of the theorem. Given a Brownian Motion started at x, we have stopping times T_r<T_D corresponding to the hitting times of the ball radius r around x and the boundary dD. The domination condition holds by continuity provided B(x,r) is contained within D. So we may apply the Strong Markov Property:


By definition, the left hand expression is u(x). But also, because the distribution of B_{T_r} is uniform on S(x,r), the right hand side is equal to:


and so by the lemma, this guarantees that the function u is harmonic on the interior of D.

The lemma can also be used to show uniqueness. Continue reading

Random Walks and Spanning Trees


In this post, I’m going to talk about probability distributions on graphs. In particular, consider paths on a graph, and how to assign a distribution to these in a natural way. One way is to consider a standard random walk, treating the vertices as states of a discrete Markov chain. But, in many situations, this isn’t really enough. We might want paths, that is, walks which are self-avoiding. But even in regular lattices like \mathbb{Z}^d, it is hard to say a great deal about the set of self-avoiding walks (SAWs) of length n. We would prefer a distribution which has a natural product form, or which at least we can sample from without large combinatorial calculations.

A spanning tree is a connected graph without cycles. The set of edges is maximal, in the sense that adding any further edge creates a cycle, and also minimal, as removing one will disconnect the graph. Counting the number of spanning trees on labelled vertices is harder than one might suspect, and is possibly worth a post by itself. However, in general the uniform distribution on spanning trees is a useful object to consider. Any spanning tree contains a unique path between two vertices of the graph, and so a Uniform Spanning Tree (UST) induces a distribution on paths.

An alternative construction is to take a random walk and remove the cycles. This is not well-defined unless you specify a canonical order in which to remove them, and the obvious option is to remove the cycles in the order in which they appear. That is, every time you end up at a vertex which you have already visited, you remove all the edges traversed since you were last at v. This gives a Loop-Erased Random Walk (LERW), and from this another measure on paths between two vertices (subject to connectivity conditions which guarantee that the LERW almost surely hits the target vertex eventually).

At an informal level, the difference between these measures is significant. Inducing a measure down from a natural but potentially unwieldy uniform distribution is theoretically natural, but hard to work with. On the other hand a computer could sample from LERW, just by performing a random walk and storing the history suitably, which immediately makes it useful. So, the following theorem is elegant in its own right, but in particular as a bridge between these two frameworks.

Theorem: The measures on paths from to in a finite graph G induced by UST and generated by LERW are the same.

Proof: Some books give proofs which involve coupling a LERW construction with a sequence of STs directly, driven by an underlying random walk on the graph. The difficulty in this approach is that to prove that the uniform distribution on spanning trees is stationary often requires the random walk to be in equilibrium, a criterion which causes difficulties when it come to starting at x later in the proof. Essentially, the difficulty in many LERW constructions is that the edges being removed are incident to the wrong vertex in the random walk, and so RW equilibrium is required to show that the spanning tree transitions are reversible. Instead, we proceed by an argument motivated by the fact that LERWs appear ‘backwards’ in UST constructions. Continue reading