Sharpness of Phase Transition in Percolation

On the flight to Romania a couple of weeks ago, I read this very nice paper by Duminil-Copin and Tassion in which, as a corollary to a more general result in a companion paper, they provide a short proof of a result about percolation on \mathbb{Z}^d. Mitch talked about this paper at our final Junior Probability Seminar of this term yesterday, so it seemed appropriate to write up something about this nice argument. I must confess I know relatively little about these problems, and in particular know nothing about how the original authors, Aizenmann + Barsky (1987), and Menshikov (1986) approached this problem, but experts have said that this is substantially easier to digest.

Rather than reference the first paper repeatedly, I remark now that everything which follows comes from there.

We consider conventional bond percolation on the edges of \mathbb{Z}^d, for d\ge 2, and are interested in whether the the origin percolates with positive probability. That is, that zero is contained in an infinite component. As usual we define p_c=\sup\{p: \mathbb{P}_p(0\leftrightarrow\infty)=0\} to be the critical probability above which percolation happens with positive probability. Defining \theta(p)=\mathbb{P}_p(0\leftrightarrow\infty), we do not know whether \theta(p_c)=0 for some values of d, notably d=3.

If the origin is connected to infinity, it is by definition connected to the boundary of every \Lambda_n:=[-n,n]^d. The number of distinct paths from the origin to \partial \Lambda_n is bounded by the number of self-avoiding walks on the lattice of length n starting from 0, which grows at most as fast as (2d-1)^n. In particular, we know that p_c\ge \frac{1}{2d-1}, but also, for any p<\frac{1}{2d-1}, the probability \mathbb{P}_p[0\leftrightarrow \partial\Lambda_n] decays exponentially in n. We would expect this in fact to hold for all p<p_c, and this is something that the authors prove, called Item 1. They also show that the percolation probability grows at least linearly beyond p_c, specifically (called Item 2)

\theta(p)\ge \frac{p-p_c}{p(1-p_c)}.

The proof here proceeds by considering the function of subsets S which contain the origin:

\varphi_p(S):= p\sum_{(x,y)\in\Delta S} \mathbb{P}_p[0\stackrel{S}{\leftrightarrow} x],\quad S\subset \mathbb{Z}^d.

In words, this gives the expected number of edges across the boundary which are connected to the origin by a path within S. So this gives a measure of how likely we are to escape S, and in particular, an upper bound on the probability that an open path exists from 0 to outside S. The authors then define the alternative critical probability

\tilde p_c := \sup_\{p\text{ s.t. }\exists\text{ finite }0\in S\text{ with }\varphi_p(S)<1\}.

They will show that \tilde p_c satisfies the statements of both Item 1 and Item 2. Item 2 for \tilde p_c implies \tilde p_c\le p_c, and Item 1 for \tilde p_c implies p_c\le \tilde p_c, so this is exactly what we need.

They show Item 1 first. We consider this set S for which \varphi_p(S)<1, and take some box \Lambda_L which strictly contains S. Now we consider the probability of escaping from a box of size kL. The reason why considering this definition of S works really nicely is that it makes it possible to split this event of escaping from \Lambda_{kL} into an event involving subjects of various disjoint sets of edges being open, so we can use independence.

We decompose the path from 0 to \partial\Lambda_{kL} based on the first time it leaves S. We are mindful that there might be lots of paths from from 0 to this boundary. The way we are bounding means it doesn’t matter if we have lots of suitable paths, but they should all spend a maximal number of steps in S, in the sense that whenever the path re-enters S, say to vertex z, there is no open path from 0 to z contained in S. Let the vertex on \partial S we leave from for the first time be x. Then, for all vertices y later in the path, 0\stackrel{S}{\not\leftrightarrow}y.

So under any suitable path, now take y to be the vertex directly following x, hence (x,y)\in\Delta S. If we take \mathcal{C} to be the set of vertices z for which 0\stackrel{S}{\leftrightarrow}z, we can split the expression based on S to obtain:

\mathbb{P}_p[0\leftrightarrow \partial \Lambda_{kL}]\le p \sum_{(x,y)\in\Delta S}\sum_{C\subset S} \mathbb{P}_p[0\stackrel{S}{\leftrightarrow} x,\mathcal{C}=C] \mathbb{P}_p[y\stackrel{\mathbb{Z}^d\backslash C}{\leftrightarrow}\partial\Lambda_{kL}].

Splitting based on C gives us independence between all of these sets of edges, but then we immediately forget about it. Irrespective of choice of y (recall, y\in S\subset \Lambda_L), this final probability is definitely bounded by \mathbb{P}_p[0\leftrightarrow \partial \Lambda_{(k-1)L}], while the p and the first term can be summed over C to give \varphi_p(S). They obtain:

\mathbb{P}_p[0\leftrightarrow \partial \Lambda_{kL}] \le \varphi_p(S)\mathbb{P}_p[y\leftrightarrow \partial \Lambda_{(k-1)L}] \le \varphi_p(S)^{k-1},

where the final relation holds by induction, and clearly gives exponential decay as required.

For Item 2 we use Russo’s formula. Here we have a slightly simpler example than the most general version, since the event under consideration A_n=\{0\leftrightarrow \partial\Lambda_n\} is increasing with respect to adding edges. It is also a function of a finite number of edges. Then we can consider \frac{d}{dp}\mathbb{P}_p[A] under the coupling which adds each edge independently as a Poisson process with (locally) rate \frac{1}{1-t}. (We take this to be the rate to avoid having to reparameterise exponentially between time and probability. Here t=p.)

Just for ease, we only consider the right-derivative at p. Then with \mathbb{P} as the law of the coupled process:

\frac{d}{dp}\mathbb{P}_p[A] \approx \frac{1}{\epsilon} \mathbb{P}[A\text{ holds at }p+\epsilon,\text{ but not at }p]

= \frac{1}{\epsilon} \sum_{e\in E}\mathbb{P}[e\text{ added between }p,p+\epsilon\text{ and }e\text{ completes event }A]

+ \frac{1}{\epsilon}\mathbb{P}[\text{two or more edges relevant to }A\text{ added}].

Since the number of edges whose states determine A is finite, this second term vanishes as \epsilon\rightarrow 0. So

=\frac{1}{\epsilon}\sum \frac{\epsilon}{1-p} \mathbb{P}(A \text{ doesn't hold at p, and edge }e\text{ pivotal at p}).

Taking the limit in \epsilon in this example gives

\frac{d}{dp}\mathbb{P}_p[0\leftrightarrow \partial\Lambda_n] = \frac{1}{1-p} \sum_{e\in\Lambda_n} \mathbb{P}_p[e\text{ pivotal, }0\not\leftrightarrow \partial \Lambda_n].

The argument then proceeds in a similar way to Item 1, decomposing \{0\not\leftrightarrow \partial \Lambda_n\} by conditioning on the set of vertices \mathcal{S} from which it is not possible to get to \partial \Lambda_n. In particular, this set is an excellent candidate to view as S, since on this event it contains 0 by definition. Once we have specified \mathcal{S} we know which edges might be pivotal, namely those across the boundary of \mathcal{S}. Crucially, the event \{\mathcal{S}=S\} only depends on those edges between the boundary of S and \partial \Lambda_n, so is independent of the event \{0\stackrel{S}{\leftrightarrow}x\} whenever x\in\partial \mathcal{S}. So applying this version of Russo gives

\frac{d}{dp}\mathbb{P}_p[0\leftrightarrow \partial\Lambda_n] = \frac{1}{1-p}\sum_{0\in S\subset \Lambda_n} \sum_{(x,y)\in\Delta S} \mathbb{P}_p[0\stackrel{S}{\leftrightarrow} x]\mathbb{P}_p[\mathcal{S}=S].

It is clear where \varphi_p(S) might turn up within the sum (after removing a factor of p), so for a bound we can take \inf_S \varphi_p(S) outside the sum, and arrive at

\frac{d}{dp}\mathbb{P}_p[0\leftrightarrow \partial\Lambda_n] \ge \frac{1}{p(1-p)}\inf_{0\in S\subset \Lambda_n} (1-\mathbb{P}_p[0\leftrightarrow \partial \Lambda_n].

It wasn’t immediately clear to me immediately that this implied the given form of Item 2 (though it certainly is consistent). I think perhaps I was trying to be too complicated and thinking about Gronwall’s lemma when in fact everything really follows from bounding \inf \varphi_p(S) below by 1 (since we have assumed p>\tilde p_c here), then integrating the differential inequality

\frac{d}{dp}\left[ \frac{p}{1-p}f(p) \right] = \frac{p}{1-p}f'(p)+\frac{1}{(1-p)^2}f(p) \ge \frac{1}{(1-p)^2}.

I include this not because it’s an interesting part of the argument (I don’t think it is really) but because I struggled to digest it first time round.

What is interesting is how well this choice to consider \varphi_p(S) works out. In both parts of the argument, sets which work well for splitting the crossing probabilities into disjoint edge events mesh nicely with considering this function after conditioning on sub- or super-criticality with respect to \tilde p_c.

Noise Sensitivity and Influence 1

Last week, Simon gave a junior probability seminar about noise sensitivity, and its applications in particular to dynamic critical percolation, so I thought I would write something about this topic based on his talk, and some other sources.

The motivation is critical site percolation on the triangular lattice. This has critical probability p_c=\frac12, and, at this critical probability, almost surely there is no infinite component. But it turns out that there is ‘almost’ an infinite component, in the following technical sense. Dynamical site percolation is given by switching the state of each site according to independent Poisson processes with the same parameter. Then, by construction, if we start at time 0 with the critical probability, then at any fixed time T>0, the configuration is distributed as critical site percolation, and hence almost surely has no infinite component. If, however, we consider the interval of times [0,T], then almost surely there *is* a time when there is an infinite component, and indeed the set of such times has positive Hausdorff dimension.

Definitions

Noise sensitivity addresses the question of how likely some function of a product measure is to change under small perturbations to the state. To give a precise definition, we take a function f_n:\{0,1\}^n \rightarrow \{0,1\} on a product space, with product measure \mu. Then given \omega\in\Omega_n=\{0,1\}^n, define w^\epsilon to be the outcome when each component of \omega is resampled with probability \epsilon. Note that it matters quantitatively whether we consider ‘resampled’ or ‘swapped’, but it doesn’t affect any of the results qualitatively.

Then the noise stability of f at level \epsilon is defined to be

S_\epsilon(f):=\text{Cov}(f(\omega)f(\omega^\epsilon),

where \omega \sim \mu. Note that \omega^\epsilon\sim \mu also, but obviously \omega,\omega^\epsilon are not independent. Noise stability measures the magnitude of this dependence. The sequence of functions (f_n) is said to be noise-sensitive if S_\epsilon(f_n)\rightarrow 0 as n\rightarrow\infty.

A related concept is influence. This should be viewed as a local version of noise sensitivity, where we consider the likelihood that the function will change as a result of swapping a single fixed component of the state. Perhaps abusing standard notation, define \omega^{(i)} to be the result of swapping the ith component of \omega. Then the influence of the ith coordinate is

I_i(f) := \mu(\{\omega\in\Omega: f(\omega)\ne f(\omega^{(i)})\}).

Examples and Results

As in so many models, we are interested in the relationship between the local setting and the global setting. Benjamini, Kalai and Schramm (99) showed that, in the case of the uniform product measure,

\sum_{i\in[n]} I_i(f_n)^2 \stackrel{n\rightarrow\infty}\rightarrow 0,

is a sufficient condition for noise-sensitivity of (f_n). Note the converse is trivially false, by considering the function f(w) that records the parity of the number of 1s in w. Here, the influence is 1, since changing one component by definition changes the value of the function. But the function is noise sensitive, as changing a proportion \epsilon of the components gives a roughly 1/2 probability for large n that the value of the function changes.

However, if we restrict to the class of monotone increasing functions, where changing a 0 to a 1 can never cause f to decrease, the converse is true. Outside the setting of uniform product measure, lots of versions of this theorem are open.

This perhaps seems counter-intuitive. After all, we would expect high global sensitivity to follow from high local sensitivity. I don’t have a good counter-intuition for why this should be true. The original authors themselves describe this as ‘paradoxical’ in [1].

It is natural to ask which component has the largest influence, and in particular whether we can guarantee the existence of a component with influence at least some particular value. Obviously, if the Boolean function is constant, all the influences are zero, and other examples, such as the indicator function of a specific state, will also have very small influences. But such functions also have small variance (under the uniform product measure), so we need to make comparison to this. We can show that

\text{Var}(f)\le \frac14 \sum_{i\in[n]}I_i(f),

which is a (discrete) example of a Poincare inequality, where deviations of a function from its mean on a particular space in L^p are bounded in terms of the derivative of the function. Here, influence is a discrete analogue of derivative.

Proof: We observe first that if \mathbb{P}(f(w)=1)=p, then \mathbb{E}[f(w)]=p, and

\text{Var}(f)=p(1-p)^2+(1-p)p^2=p(1-p)=\frac12 \mathbb{P}(f(w)\ne f(\bar w)),

where w,\bar w are independent realisations of the underlying measure. We move from w\mapsto \bar w one component at a time, examining the influence at each step. Let v^{(i)}\in \{0,1\}^n be given by the first i components of \bar w and the final (n-i) components of w. Then v^{(0)}=w, v^{(n)}=\bar w. Finally, define

J:=\{i:w_i\ne \bar w_i\}.

We also say that a state is i-pivotal, if changing the ith component changes the value of the function. Note that if f(w)\ne f(\bar w), then there exists some i<n such that f(v^{(i)})\ne f(v^{(i+1)}). In particular:

\{f(w)\ne f(\bar w)\}\subset \{1\in J, v^{(0)}\text{ 1-pivotal}\}\cup

\{2\in J, v^{(1)}\text{ 2-pivotal}\}\cup\ldots\cup\{n\in J, v^{(n-1)}\text{ n-pivotal}\}.

Note now that for any i<n, v^{(i-1)} has uniform product measure on \Omega, since it is an interpolation of sorts between independent realisations of the uniform product measure. Furthermore, since we haven’t interpolated at component i yet, it is independent of the event \{i \in J\}. Thus

\mathbb{P}(i\in J, v^{(i-1)}\text{ i-pivotal})=\frac12 I_i(f).

Putting everything together with a union bound gives

\text{Var}(f)=\frac12 \mathbb{P}(f(w)\ne f(\bar w))\le \frac14 \sum_{i\in [n]}I_i(f),

as desired.

We might ask when this result is tight. It is certainly not tight if we during the interpolation procedure, the value of f changes lots of times. Since a consequence of this Poincare inequality is that there is some component i such that I_i(f)\ge \frac{1}{4n}, we might conjecture that there is a better lower bound.

It turns out, as shown in [2], that we can add an extra logarithmic factor. Precisely, for some absolute constant c>0, there exists an i such that:

I_i(f)\ge c \text{Var}(f)\frac{\log n}{n}.

Indeed, there’s an example which shows that this is optimal. Consider partitioning the n components into groups of size roughly log n – log log n, all in base 2. Then the function is 1, if there exists at least one group, which we call tribes, where all the components are 1. So a particular component is pivotal precisely when all the other components in its tribe are already 1. Thus

I_i(f)\approx \left(\frac{1}{2}\right)^{\log n - \log\log n}= \frac{\log n}{n}.

One consequence of this is that by BKS, we learn that the function is noise-sensitive. Is this clear directly? Note that if w contains a tribe of 1s, then with high probability w^{(\epsilon)} for large n, roughly \epsilon(\log n-\log \log n) of these get changed. In particular, it is very unlikely that the same tribe will be a tribe of 1s in w^\epsilon. So heuristically, the presence of a tribe in w^\epsilon feels almost independent of the presence of a tribe in w. But because of the (approximate) symmetry of the setup, it is much easier to work with influences, and use BKS.

I will continue the discussion of the Fourier techniques used to derive some of these results in a new post soon.

References

[1] – Benjamini, Kalai, Schramm (1999) – Noise Sensitivity of Boolean Functions and Applications to Percolation. (arXiv)

[2] – Kahn, Kalai, Linial (1988) – The influence of variables on Boolean Functions. (link)

Garban, Steif – Noise sensitivity of Boolean functions and Percolation. (arXiv) I have borrowed heavily from these excellent notes. The discrete Poincare inequality appears in Section 1.2, with an outline of the solution in Exercise 1.5.

Discontinuous Phase Transitions

Yesterday, Demeter Kiss from Cambridge gave a seminar in Oxford about a model for self-destructive percolation on \mathbb{Z}^2 that had implications for the (non-)existence of an infinite-parameter forest fire model on the same lattice. I enjoyed talking about this and his recent work on the related model of frozen percolation on \mathbb{Z}^2. Considering these models in the lattice setting present a whole range of interesting geometric challenges that are not present in the mean-field case that has mainly occupied my research direction so far.

The afternoon’s discussion included lots of open problems about percolation. Several of these are based around continuity of the phase transition, so I thought I would write a quite post about some simple examples of this, and one example where it does not hold.

A helpful base example is bond percolation on the lattice \mathbb{Z}^2. Here, we specify some probability p in [0,1], and we declare edges of the lattice open with probability p, independently of each other. We then consider the graph induced by the open edges. We say that percolation occurs if the origin is contained in an infinite open component. The terminology arises from the interpretation as fluid being added at the origin and flowing down open edges. We define \theta(p) to be the probability that the origin is in an infinite component when the parameter is p. By translation-invariance, we can get some sort of 0-1 law, to conclude that there is an infinite component somewhere in the system with probability either 0 or 1, depending on whether \theta(p) is positive or zero. Indeed, we can further show that if it is positive, then with probability 1 there is a unique infinite component.

We define the critical probability p_c:= \inf\{\theta(p)>0\}. A question worth asking is then, what is \theta(p_c)? In some examples, we can find p_c, but we cannot prove that \theta(p) is continuous around p_c. In the case of \mathbb{Z}^2 this is known, and it is known from work of Kesten that p_c=1/2. See below for a plot of \theta(p) in this setting (obtained from this blog, though possibly originating elsewhere).

percolation probabilityThe aim is to find an example where we do not have such a continuous phase transition. The original work on frozen percolation took place on trees, and one of Kiss’s results is confirms that these show qualitatively different phenomena to the same process on the lattice. In some sense, trees lie halfway between a lattice and a mean-field model, since there is often some independence when we look down the tree from a given generation, if it is well-defined to use such language.

Anyway, first we consider percolation on an infinite regular rooted k-ary tree. This means we have a root, which has k children, each of which in turn has k children, and so on. As before we consider bond percolation with parameter p. In this setting, we have a language to describe the resulting open component of the root. The offspring distribution of any vertex in the open component is given by Bin(k,p) independently of everything else, so we can view this component as the realisation of a Galton-Watson tree with this offspring distribution. This distribution has finite mean kp, and so we can state explicitly when the survival probability is positive. This happens when the mean is greater than 1, ie p>1/k.

For our actual example, we will consider the survival probability, but the technicalities are easier to explain if we look at the extinction probability, now using the language of branching processes. Suppose the offspring distribution has pgf given by

f(x)=p_0+p_1x+p_2x^2+\ldots.

Then the extinction probability q satisfies f(q)=q. I want to pause to consider what happens if this equation has multiple solutions. Indeed, in most interesting cases it will have multiple solutions, since f(1) will always be 1 if it is a non-defective offspring distribution. It is typically cited that: the extinction probability q is the smallest solution to this equation. I want to discuss why that is the case.

To approach this, we have to consider what extinction means. It is the limit in the event sense of the events {we are extinct after n generations}. Let the probabilities of these events be q_n, so q_0=0. Then by a straightforward coupling argument, we must have

0=q_0\le q_1\le q_2 \le\ldots\le q:= \lim q_n \le 1.

But, by the same generating function argument as before, q_{n+1}=f(q_n)\ge q_n. So if we split [0,1] into regions A where f(x)\ge x and B where f(x)<x, all the (q_n)s must occur in the former, and so since it is closed, their limit must be in A also. Note that if f(x) intersects x lots of times, then region A is not necessarily connected. In the diagram below, in moving from q_n to q_{n+1} we might jump across part of B.

Iterative percolation graphThis is bad, as we are trying to prove that q is the right boundary of the connected component of A containing 0. But this cannot happen, as f is monotonic. So if one of the roots of f(x)=x in between the hypothesised q_n<q_{n+1} is called z, then f(q_n)< f(z)=z < q_{n+1}, a contradiction.

Ok, so now we are ready to consider our counterexample to continuity over the percolation threshold. See references for a link to the original source of this example. We have to choose a slightly more complicated event than mere survival or extinction. We consider bond percolation as before on the infinite ternary tree, where every vertex has precisely 3 offspring. Our percolation event is now that the root is the root of an infinite binary tree. That is, the root has at least two children, each of which have at least two children, each of which, and so on.

If we set this probability equal to q, and the probability of an edge being open equal to p, then we have the recurrence:

q=3p^2(1-p)q^2+p^3[3q^2(1-q)+q^3].

The first term corresponds to the root having two open edges to offspring, and the second to the root having all three open edges to offspring. After manipulating, we end up with

q\left[2p^3q^2-3p^2q+1\right]=0.

We are therefore interested in roots of the quadratic lying between 0 and 1. The discriminant can be evaluated as

\Delta=p^3(9p-8),

and so there are no real roots where p<8/9. But when p=8/9, we have a repeated root at q=27/32, which is obviously not zero!

This equation is qualitatively different to the previous one for the extinction probability of a Galton-Watson tree. There, we had a quadratic, with one root at 1. As we varied p, the other root moved continuously from greater than one to less than one, so it passed through 1, giving continuity at the critical probability. Here, we have a cubic, again with one root at 1. But now the other roots are complex for small p, meaning that the local minimum of the cubic lies above the x-axis. As p gets to the critical value, it the local minimum passes below the x-axis, and suddenly we have a repeated root, not at zero.

I would like to have a neat probabilistic heuristic for this result, without having to make reference to generating functions. At the moment, the best I can come up with is to say that the original problem is simple, in the sense that the critical probability is as small as it could be while still making sense in expectation. To be concrete, when the mean of the offspring generation is less than 1, the expected size of the nth generation tends to zero, so there certainly could not be positive probability of having an infinite component.

Whereas in the binary tree example, we only require p=2/3 to have, in expectation, the right number of open edges to theoretically allow an infinite binary tree. If we think of percolation as a dynamic process by coupling in p, essentially as we move from p=2/3 to p=8/9 we need to add enough edges near the origin to be able to take advantage of the high density of edges available far from the origin. The probability of this working given you start from n vertices grows much faster (as n grows) than in the original problem, so you might expect a faster transition.

This is so content-free I’m reluctant even to call it a heuristic. I would be very interested to hear of any more convincing argument for this phenomenon!

REFERENCES

Dekking, Pakes – On family trees and subtrees of simple branching processes (link)

Enhanced by Zemanta

Characterisations of Geometric Random Graphs

Continuing the LMS-EPSRC summer school on Random Graphs, Geometry and Asymptotic Structure, we’ve now had three of the five lectures by Mathew Penrose on Geometric Random Graphs.

The basic idea is that instead of viewing a graph entirely abstractly, we now place the vertices in the plane, or some other real space. In many network situations, we would expect connectivity to depend somehow on distance. Agents or sites which are close together might be considered more likely to have the sort of relationship indicated by being connected with an edge. In the model discussed in this course, this dependence is deterministic. We have some parameter r, and once we have chosen the location of all the vertices, we connect a pair of vertices if the distance between them is less than r.

For the purposes of this, we work in a compact space [0,1]^d, and we are interested in the limit as the number of vertices n grows to infinity. To avoid the graph getting too connected, as in the standard random graph model, we take r to be a decreasing function of n. Anyway, we place the n points into the unit hypercube uniformly at random, and then the edges are specified by the adjacency rule above. In general, because r_n will be o(1), we won’t have to worry too much above boundary effects. The number of vertices within r_n of the boundary of the cube will be o(1). For some results, this is a genuine problem, when it may be easier to work on the torus.

In G(n,p), the order of np in the limit determines the qualitative structure of the graph. This is the expected degree of a given fixed vertex. In this geometric model, the relevant parameter is nr_n^d, where d is the dimension of the hypercube. If this parameter tends to 0, we say the graph is sparse, and dense if it tends to infinity. The intermediate case is called a thermodynamic limit. Note that the definition of sparse here is slightly different from G(n,p).

Much of the content of the first three lectures has been verifying that the distributions of various quantities in the graph, for example the total number of edges, are asymptotically Poisson. Although sometimes arguments are applicable over a broad spectrum, we also sometimes have to use different calculations for different scaling windows. For example, it is possible to show convergence to a Poisson distribution for the number of edges in the sparse case, from which we get an asymptotic normal approximation almost for free. In the denser regimes, the argument is somewhat more technical, with some substantial moment calculations.

A useful tool in these calculations are some bounds derived via Stein’s method for sums of ‘almost independent’ random variables. For example, the presence or non-presence of an edge between two pairs of vertices are independent in this setting if the pairs are disjoint, and the dependence is still only mild if they share a vertex. An effective description is via a so-called dependency graph, where we view the random variables as the vertices of a graph, with an edge between them if there is some dependence. This description doesn’t have any power in itself, but it does provide a concise notation for what would otherwise be very complicated, and we are able to show versions of (Binomials converge to Poisson) and CLT via these that are exactly as required for this purpose.

In particular, we are able to show that if E_n is the total number of edges, under a broad set of scaling regimes, if \lambda_n is the expected total number of edges, then d_{TV}(E_n,\mathrm{Po}(\lambda_n))\rightarrow 0, as n grows. This convergence in total variation distance is as strong a result as one could hope for, and when the sequence of \lambda_n is O(1), we can derive a normal approximation as well.

At this point it is worth discussing an alternative specification of the model. Recall that for a standard homogenous random graph, we have the choice of G(n,m) and G(n,p) as definitions. G(n,m) is the finer measure, and G(n,p) can be viewed as a weighted mix of G(n,m). We can’t replicate this directly in the geometric setting because the edges and non-edges are a deterministic function of the vertex locations. What we can randomise is the number of vertices. Since we are placing the vertices uniformly at random, it makes sense to consider as an alternative a Poisson Point Process with intensity n. The number of vertices we get overall will be distributed as Po(n), which is concentrated near n, in the same manner as G(n,c/n).

As in G(n,p), this is a less basic model because it is a mixture of the fixed-vertex models. Let’s see if how we would go about extending the total variation convergence result to this slightly different setting without requiring a more general version of the Poisson Approximation Lemma. To avoid having to define everything again, we add a ‘ to indicate that we are talking about the Poisson Point Process case. Writing d(.,.) for total variation distance, the result we have is:

\lim_{n\rightarrow\infty} d(E_n,\mathrm{Po}(\lambda_n))=0.

We want to show that

\lim_{n\rightarrow\infty}d(E_n',\mathrm{Po}(\lambda_n'))=0,

which we can decompose in terms of expectations in the original model by conditioning on N_n

\leq \lim_{n\rightarrow\infty}\mathbb{E}\Big[\mathbb{E}[d(E_{N_n},\mathrm{Po}(\lambda_n')) | N_n]\Big],

where the outer expectation is over N. The observation here, is that the number of points given by the Poisson process induces a measure on distributions, the overwhelming majority of which look quite like Poisson distributions with parameter n. The reason we have a less than sign is that we are applying the triangle inequality in the sum giving total variation distance:

d(X,Y)=\sum_{k\geq 0}|\mathbb{P}(X=k)-\mathbb{P}(Y=k)|.

From this, we use the triangle inequality again:

\lim_{n\rightarrow\infty} \mathbb{E}\Big[\mathbb{E}[d(E_{N_n},\mathrm{Po}(\lambda_{N_n})) | N_n]\Big]

+\lim_{n\rightarrow\infty}\mathbb{E}\Big[\mathbb{E}[d(\mathrm{Po}(\lambda_{N_n}),\mathrm{Po}(\lambda_n')) | N_n]\Big].

Then, by a large deviations argument, we have that for any \epsilon>0, \mathbb{P}(|N_n-n|\geq \epsilon n)\rightarrow 0 exponentially in n. Also, total variation distance is, by definition, bounded above by 1. In the first term, the inner conditioning on N_n is irrelevant, and we have that E_{N_n} converges to the Poisson distribution for any fixed N_n\in (n(1-\epsilon),n(1+\epsilon)). Furthermore, we showed in the proof of the non-PPP result that this convergence is uniform in this interval. (This is not surprising – the upper bound is some well-behaved polynomial in 1/n.) So with probability 1- e^{-\Theta(n)} N_n is in the region where this convergence happens, and elsewhere, the expected TV distance is bounded below 1, so the overall expectation tends to 0. With a similar LD argument, for the second term it suffices to prove that when \lambda\rightarrow\mu, we must have d(\mathrm{Po}(\lambda),\mathrm{Po}(\mu))\rightarrow 0. This is ‘obviously’ true. Formally, it is probably easiest to couple the distributions \mathrm{Bin}(n,\lambda/n),\mathrm{Bin}(n,\mu/n) in the obvious way, and carry the convergence of TV distance as the parameter varies through the convergence in n.

That all sounded a little bit painful, but is really just the obvious thing to do with each term – it’s only the language that’s long-winded!

Anyway, I’m looking forward to seeing how the course develops. In particular, when you split the space into small blocks, the connectivity properties resemble those of (site) percolation, so I wonder whether there will be concrete parallels. Also, after reading about some recent results concerning the metric structure of the critical components in the standard random graph process, it will be interesting to see how these compare to the limit of a random graph process which comes equipped with metric structure for free!

Random Interlacements

In this post, I want to talk about another recently-introduced model that’s generating a lot of interest in probability theory, Sznitman’s model of random interlacements. We also want to see, at least heuristically, how this relates to more familiar models.

We fix our attention on a lattice, which we assume to be \mathbb Z ^d. We are interested in the union of an infinite collection of simple random walks on the lattice. The most sensible thing to consider is not a collection of random walks from at a random set of starting points, but rather a family of trajectories, that is a doubly-infinite random walk defined on times (-\infty,\infty). We will want this family to have some obvious properties, such as translation invariance, in order to make analysis possible and ideally obtain some 0-1 laws. The natural thing to do is then to choose the trajectories through a Poisson Point Process. The tricky part will be finding an intensity measure that has all the properties we want, and gives trajectories that genuinely do look like SRWs, and, most importantly, have a union that is neither too sparse nor too dense. For example, it wouldn’t be very interesting if with high probability every point appeared in the union…

For reasons we will mention shortly, we are interested in the complement of the union of the trajectories. We call this the vacant set. We will find an intensity which we can freely scale by some parameter u\in\mathbb{R}^+, which will give us a threshold for the complement to contain an infinite component. This is in the same sense as the phase transition for Bernoulli percolation. That is, there is a critical value u^* say, such that for u<u^* the vacant set contains an infinite component (or percolates) almost surely, and almost surely it does not when u>u^*. A later result of Teixeira shows that, as in percolation, this infinite component is unique.

Let us first recall why it is not interesting to consider this process for d=1 or 2. On \mathbb{Z}, with high probability a single SRW hits every integer point trivially, since it visits arbitrarily large and arbitrarily small integers. For d=2, the SRW is recurrent, and so consists of a countably infinite sequence of excursions from (0,0). Note that the probability that an excursion from 0 hits some point (x,y) is non-zero, as it is at least 2^{-2(|x|+|y|)} for example. Therefore, with high probability the SRW hits (x,y), and so whp it hits every point.

Therefore it is only for d=>3 that we start seeing interesting effects. It is worth mentioning at this point some of the problems that motivated considering this model. First is the disconnection time of a discrete cylinder by a simple walk. For example, Sznitman considers the random walk on \mathbb{Z}\times (\mathbb{Z}/N\mathbb{Z})^d. Obviously, it is more interesting to consider how long it takes a (1-dimensional in the natural sense) path to disconnect a d=>3 dimensional set than a 2-dimensional one, as the latter is given just by the first time the path self-intersects.

More generally, we might be interested in random walks up to some time an order of magnitude smaller than the cover time. Recall the cover time is the time to hit each point of the set. For example, for the random walk on the d-dimensional torus (\mathbb{Z}/N\mathbb{Z})^d the cover time (as discussed in Markov Chains and Mixing Times posts) is N^d \log N, but the log N represents in some sense only the ‘final few’ vertices. So we should ask what the set of unhit vertices looks like at time N^d. And it turns out that for large N, the structure of this vacant set is related to the vacant set in the random interlacement model, in a local sense.

Anyway, the main question to ask is: what should the intensity measure be?

We patch it together locally. Start with the observation that transience of the random walk means almost surely a trajectory spends only finitely many steps in a fixed finite set K. So we index all the trajectories which hit K by the first time they hit K. Given that a trajectory hits K, it is clear what the conditional distribution of this hitting point should be. Recall that SRW on Z^d is reversible, so we consider the SRW backwards from this hitting then. Then the probability that the hitting point is x (on the boundary of K) is proportional to the probability that a SRW started from x goes to infinity without hitting K again. So once we’ve settled on the distribution of the hitting point x, it is clear how to construct all the trajectories through K. We pick x on the boundary of K according to this distribution, and take the union of an SRW starting from x conditioned not to hit K again, and an SRW starting from x with no conditioning. These correspond to the trajectory before and after the hitting time, respectively.

In fact, it turns out that this is enough. Suppose we demand that the probability that the hitting point is x is equal to the probability that a SRW started from x goes to infinity without hitting K again (rather than merely proportional to). Sznitman proves that there is a unique measure on the set of trajectories that restricts to this measure for every choice of K. Furthermore, the Poisson Point Process with the globally-defined intensity, unsurprisingly restricts to a PPP with the intensity specific to K.

We have not so far said anything about trajectories which miss this set K. Note that under any sensible intensity with the translation-invariance property, the intensity measure of the trajectories which hit K must be positive, since we can cover \mathbb{Z}^d with countably many copies of K. So the number of trajectories hitting K is a Poisson random variable.

Recall how we defined the probability that the hitting point of K was some point x on the boundary. The sum of these probability is called the capacity of K. It follows that this is the parameter of the Poisson random variable. Ie, the probability that no trajectory passes through K is:

\exp(-u\mathrm{cap}(K)),

recalling that u is the free parameter in the intensity. This is the most convenient framework through which to start analysing the probability that there is an infinite connected set which is hit by no trajectory.

We conclude by summarising Sznitman’s Remark 1.2, explaining why it is preferable to work with the space of trajectories rather than the space of paths. Note that if we are working with paths, and we want translation invariance, then this restricts to translation invariance of the distribution of starting points as well, so it is in fact a stronger condition. Note then that either the intensity of starting at 0 is zero, in which case there are no trajectories at all, or it is positive, in which case the set of starting points looks like Bernoulli site percolation.

However, the results about capacity would still hold if there were a measure that restricted satisfactorily. And so the capacity of K would still be the measure of paths hitting K, which would be at least the probability that the path was started in K. But by translation invariance, this grows linearly with |K|. But capacity grows at most as fast as the size of the set of boundary points of K, which will be an order of magnitude smaller when K is, for example, a large ball.

REFERENCES

This was mainly based on

Sznitman – Vacant Set of Random Interlacements and Percolation (0704.2560)

Also

Sznitman – Random Walks on Discrete Cylinders and Random Interlacements (0805.4516)

Teixeira – On the Uniqueness of the Infinite Cluster of the Vacant Set of Random Interlacements (0805.4106)

and some useful slides by the same author (teixeira.pdf)

Independence and Association

Back when we did GCSE probability, we gave a definition of independent events as:

A and B are said to be independent if \mathbb{P}(A)\mathbb{P}(B)=\mathbb{P}(A\cap B).

We might also apply Bayes’ definition of conditional probability to say

\mathbb{P}(A|B)=\mathbb{P}(A)\quad\iff\quad A,B\text{ independent}\quad\iff\quad\mathbb{P}(B|A)=\mathbb{P}(B)

provided all the terms exist. (Eg the definition of \mathbb{P}(B|A) is at the very least non-obvious if the probability of A is 0.) In my opinion, this is a more naturally intuitive definition. For example, I think that when you toss two coins, the fact that the probability of the second coin being a tail is unaffected by whether the first is heads is more naturally ‘obvious’ than the fact that the joint probability of the two events is 1/4.

But, before getting too into anything philosophical, it is worth thinking about an equivalent situation for non-independent events. We remark that by an identical argument to above:

\mathbb{P}(A|B)\geq\mathbb{P}(A)\quad\iff\quad \mathbb{P}(A\cap B)\geq\mathbb{P}(A)\mathbb{P}(B)\quad\iff\quad\mathbb{P}(B|A)\geq\mathbb{P}(B)

Informally, this says that if we know A occurs, it increases the likelihood of B occuring. If we were talking about two random variables, we might say that they were positively correlated. But of course, by considering RVs 1_A,1_B, the result above is precisely the statement that the indicator functions have positive correlation.

Aim: To find a sufficient condition for positive correlation of random variables in a product measure.

Consider the following. Suppose A is an event which is positively correlated with the appearance of each edge. We might suspect that two such events A and B would be positively correlated. Instead, we consider a more concrete description. Recall that an event A is a subset of \Omega=\{0,1\}^E. Given w\in\Omega,e\in E, we say w^e\in\Omega defined by taking w and setting edge e to be open (note it may be open already). Now, we say event A is increasing, if

\forall w\in\Omega,\forall e\in E: w\in A\Rightarrow w^e\in A.

Note that this certainly implies the property previously mentioned, but the converse is not necessarily true.

Anyway, our revised aim will be to show that increasing events A and B are positively correlated for product measure.

For now, we approach the problem from the other direction, namely we attempt to find which measures on \{0,1\}^E have the property that A and B are positively correlated for all increasing A, B. Note that as before, we can think of this as \mathbb{E}1_A1_B\geq\mathbb{E}1_A\mathbb{E}1_B, and again here it is useful to rephrase our framework in terms of random variables. There is a natural (product) partial ordering of \Omega=\{0,1\}^E, and from this there is an easy notion of increasing random variables. Recall a random variable is defined as a measurable map \Omega\rightarrow\mathbb{R} so no further work is required.

X is increasing if w\geq w'\Rightarrow X(w)\geq X(w').

So we clarify our aim, which is to find a condition on the measure \mu such that \mu(XY)\geq \mu(X)\mu(Y) for all increasing X, Y. When this occurs, we say \mu is positively associated. Note that this is equivalent to \mu(A\cap B)\geq \mu(A)\mu(B) for all increasing events A, B. Why? We can build up X and Y from increasing indicator functions like \{X\geq x\} in a usual monotone class argument.

On the way, we need a partial ordering on the set of probability measures. Obviously, if \mu(A)\leq \nu(A) for all events A, then in fact \mu=\nu! So instead we say \mu\leq_{st}\nu if \mu(A)\leq \nu(A) for all increasing A. This is called the stochastic ordering, and there is a technical result of Strassen, proving the intuitively obvious claim that if \mu_1\leq \mu_2, then we can couple the measures in a natural way. Formally:

Theorem: \mu_1\leq\mu_2 \iff \exists a probability measure \nu on \Omega^2 such that the marginals are \mu_1,\mu_2 and

\nu(\{(w_1,w_2):w_1\leq w_2\})=1.

Our main result will be the FKG inequality which asserts that when \mu satisfies the following FKG lattice property

\mu(w_1\vee w_2)\mu(w_1\wedge w_2)\geq \mu(w_1)\mu(w_2),\quad\forall w_1,w_2\in\Omega

then \mu is positively associated. We will prove the case |E|<\infty.

We proceed by showing that \mu_1\leq\mu_2\propto Y\mu_1, rescaled, for Y an increasing RV. [Note that we are now suppressing the ‘st’ subscript, as context makes the use clear.]

To show this, we prove the more general Holley’s Theorem:

This states that if two positive probability measures satisfy a related lattice condition:

\mu_2(w_1\vee w_2)\mu_1(w_1\wedge w_2)\geq \mu_1(w_1)\mu_2(w_2)\quad\forall w_1,w_2\in\Omega

then we have the stochastic domination result: \mu_1\leq \mu_2.

Note that the lattice condition states, very informally, that adding edges results in a greater relative increase with respect to the measure \mu_2, which has a natural similarity to the definition of stochastic domination.

We prove this, perhaps unexpectedly, by resorting to a Markov chain. We note that there is a Markov chain on \Omega with equilibrium distribution given by \mu_1. This is simple: the non-zero transition rates are those given by the addition or removal of a single edge. Assume that edges are added at unit rate, and that edges are removed with rate: G(w^e,w_e)=\frac{\mu_1(w_e)}{\mu_1(w^e)}.

Similarly, we can construct a Markov chain on state space \Omega^2, where non-zero transitions are given by the addition of an edge to both states in the pair, the removal of an edge from both states in the pair, and the removal of an edge from only the first edge in the pair. Note that, as before, we may be ‘adding’ an edge which is already present. Assuming we start in this set, this choice means that we are restricting the sample space to \{(\pi,w):\pi\leq w\}. We need the transition rate of the third type of transition to have the form: \frac{\mu_1(\pi_e)}{\mu_1(\pi^e)}-\frac{\mu_2(w_e)}{\mu_2(w^e)}. So the lattice condition precisely confirms that this is non-negative, and thus we have a well-constructed Markov chain. The marginals have equilibrium distributions \mu_1,\mu_2 by construction, and by the general theory of Markov chains, there is an equilibrium distribution, and this leaves us in precisely the right position to apply Strassen to conclude the result.#

Summary of consequences: We have demonstrated that product measure is positively associated, as it certainly satisfies the FKG condition. Recall that this is what we had suspected intuitively for reasons given at the start of this account. Next time, I will talk about the most natural companion result, the BK inequality, and the stronger Reimer’s Inequality.

References: Both the motivation and the material is derived from Prof. Grimmett’s Part III course, Percolation and Related Topics, which was one of the mathematical highlights of the year. This account of the subject is a paraphrase of his lecture notes, which were themselves based on his book Probability on Graphs. Mistakes, naturally, are mine. Background on the course, and an online source of the book can be found on the course website here.