DGFF 3 – Gibbs-Markov property for entropic repulsion

In the previous post, we saw that it isn’t much extra effort to define the DGFF with non-zero boundary conditions, by adding onto the zero-BC DGFF the unique (deterministic) harmonic function which extends the boundary values into the domain. We also saw how a Gibbs-Markov property applies, whereby the values taken by the field on some sub-region A\subset D depend on the values taken on D\backslash A only through values taken on \partial A.

In this post, we look at how this property and some other methods are applied by Deuschel [1] to study the probability that the DGFF on a large box in \mathbb{Z}^d is positive ‘everywhere’. This event can be interpreted in a couple of ways, all of which are referred to there as entropic repulsion. Everything which follows is either taken directly or paraphrased directly from [1]. I have tried to phrase this in a way which avoids repeating most of the calculations, instead focusing on the methods and the motivation for using them.

Fix dimension d\ge 2 throughout. We let P^0_N be the law of the DGFF on V_N:=[-N,N]^d\subset \mathbb{Z}^d with zero boundary conditions. Then for any subset A\subset \mathbb{Z}^d, in an intuitively-clear abuse of notation, we let

\Omega^+(A):= \{ h_x\ge 0, x\in A\},

be the event that some random field h takes only non-negative values on A. The goal is to determine P^0_N ( \Omega^+(V_N)). But for the purposes of this post, we will focus on showing bounds on the probability that the field is non-negative on a thin annulus near the boundary of V_N, since this is a self-contained step in the argument which contains a blog-friendly number of ideas.

We set (L_N) to be a sequence of integers greater than one (to avoid dividing by zero in the statement), for which \frac{L_N}{N}\rightarrow 0. We now define for each N, the annulus

W_N = \{v\in V_N: L_N\le d_{\mathbb{Z}^d}(v, V_N^c)\le 2L_N \}

with radius L_N set a distance L_N inside the box V_N. We aim to control P^N_0 (\Omega^+(W_N)). This forms middle steps of Deuschel’s Propositions 2.5 and 2.9, which discuss P^N_0(\Omega^+(V_{N-L_N})). Clearly there is the upper bound

P^N_0(\Omega^+(V_{N-L_N})) \le P^N_0(\Omega^+(W_N)) (1)

and a lower bound on P^N_0(\Omega^+(V_{N-L_N})) is obtained in the second proposition by considering the box as a union of annuli then combining the bounds on each annulus using the FKG inequality.

Upper bound via odds and evens

After removing step (1), this is Proposition 2.5:

\limsup_{N\rightarrow \infty} \frac{L_N}{N^{d-1} \log L_N} \log P^N_0(\Omega^+(W_N)) < 0. (2)

This is giving a limiting upper bound on the probability of the form L_N^{-CN^{d-1}/L_N}, though as with all LDP estimates, the form given at (2) is more instructive.

Morally, the reason why it is unlikely that the field should be non-negative everywhere within the annulus is that the distribution at each location is centred, and even though any pair of values are positively correlated, this correlation is not strong enough to avoid this event being unlikely. But this is hard to corral into an upper bound argument directly. In many circumstances, we want to prove upper bounds for complicated multivariate systems by projecting to get an unlikely event for a one-dimensional random variable, or a family of independent variables, even if we have to throw away some probability. We have plenty of tools for tail probabilities in both of these settings. Since the DGFF is normal, a one-dimensional RV that is a linear combination (eg the sum) of all the field heights is a natural candidate. But in this case we would have thrown away too much probability, since the only way we could dominate is to demand that the sum \sum_{x\in W_N}h^N_x\ge 0, which obviously has probability 1/2 by symmetry. (3)

So Deuschel splits W_N into W_N^o,W_N^e, where the former includes all vertices with odd total parity in W_N and the latter includes all the vertices with even total parity in the interior of W_N. (Recall that \mathbb{Z}^d is bipartite in exactly this fashion). The idea is to condition on h^N\big|_{W^o_N}. But obviously each even vertex is exactly surrounded by odd vertices. So by the Gibbs-Markov property, conditional on the odd vertices, the values of the field at the even vertices are independent. Indeed, if for each v\in W_N^e we define \bar h_v to be the average of its neighbours (which is measurable w.r.t to the sigma-algebra generated by the odd vertices), then

\{h_v: v\in W_N^e \,\big|\, \sigma(h_w: w\in W_N^o)\},

is a collection of independent normals with variance one, and where the mean of h_v is \bar h_v.

To start finding bounds, we fix some threshold m=m_N\gg 1 to be determined later, and consider the odd-measurable event A_N that at most half of the even vertices v have \bar h_v\ge m. So A_N^c\cap \Omega^+(W_N) says that all the odd vertices are non-negative and many are quite large. This certainly feels like a low-probability event, and unlike at (3), we might be able to obtain good tail bounds by projection into one dimension.

In the other case, conditional on A_N, there are a large number of even vertices with conditional mean at most m, and so we can control the probability that at least one is negative as a product

(1-\varphi(m))^{\frac12 |W_N^e|}. (4)

Note that for this upper bound, we can completely ignore the other even vertices (those with conditional mean greater than m).

So we’ll go back to A_N^c \cap \Omega^+(W_N). For computations, the easiest one-dimensional variable to work with is probably the mean of the \bar h_vs across v\in W_N^e, since on A_N^c\cap \Omega^+(W_N) this is at least \frac{m}{2}. Rather than focus on the calculations themselves involving

\bar S^e_N:= \frac{1}{|W_N^e|} \sum\limits_{v\in W_N^e} \bar h_v,

let us remark that it is certainly normal and centered, and so there are many methods to bound its tail, for example

P^0_N \left( \bar S^e_N \ge \frac{m}{2} \right) \le \exp\left( \frac{-m^2}{8\mathrm{Var}(\bar S^e_N)} \right), (5)

as used by Deuschel just follows from an easy comparison argument within the integral of the pdf. We can tackle the variance using the Green’s function for the random walk (recall the first post in this set). But before that, it’s worth making an observation which is general and useful, namely that \bar S^e_N is the expectation of

S^e_N:= \sum{1}{|W_N^e|}\sum\limits_{v\in W_N^e} h_v

conditional on the odds. Directly from the law of total variance, the variance of any random variable X is always larger than the variance of \mathbb{E}[X|Y].

So in this case, we can replace \mathrm{Var}(\bar S^e_N) in (5) with \mathrm{Var}(S^e_N), which can be controlled via the Green’s function calculation.

Finally, we choose m_N so that the probability at (4) matches the probability at (5) in scale, and this choice leads directly to (2).

In summary, we decomposed the event that everything is non-negative into two parts: either there are lots of unlikely local events in the field between an even vertex and its odd neighbours, or the field has to be atypically large at the odd sites. Tuning the parameter m_N allows us to control both of these probabilities in the sense required.

Lower bound via a sparse sub-lattice

To get a lower bound on the probability that the field is non-negative on the annulus, we need to exploit the positive correlations in the field. We use a similar idea to the upper bound. If we know the field is positive and fairly large in many places, then it is increasingly likely that it is positive everywhere. The question is how many places to choose?

We are going to consider a sub-lattice that lives in a slightly larger region than W_N itself, and condition the field to be larger than m=m_N everywhere on this lattice. We want the lattice to be sparse enough that even if we ignore positive correlations, the chance of this happening is not too small. But we also want the lattice to be dense enough that, conditional on this event, the chance that the field is actually non-negative everywhere in W_N is not too small either.

To achieve this, Deuschel chooses a sub-lattice of width \lfloor\epsilon L_N^{2/d}\rfloor, and sets \Lambda_N(\epsilon) to be the intersection of this with the annulus with radii [N-\frac{5}{2}L_N, N-\frac{1}{2}L_N], to ensure it lives in a slightly larger region than W_N itself. The scaling of this sub-lattice density is such that when a random walk is started at any v\in W_N, the probability that the RW hits \Lambda_N(\epsilon) before \partial V_N is asymptotically in (0,1). (Ie, not asymptotically zero or one – this requires some definitely non-trivial calculations.) In particular, for appropriate (ie large enough) choice of \epsilon, this probability is at least 1/2 for all v\in W_N. This means that after conditioning on event B_N:=\{h_v\ge m : v\in \Lambda_N(\epsilon)\}, the conditional expectation of h_w is at least \frac{m}{2} for all w\in W_N\backslash \Lambda_N(\epsilon). Again this uses the Gibbs-Markov property and the Gaussian nature of the field. In particular, this conditioning means we are left with the DGFF on V_N\backslash \Lambda_N(\epsilon), ie with boundary \partial V_N\cup \Lambda_N(\epsilon), and then by linearity, the mean at non-boundary points is given by the harmonic extension, which is linear (and so increasing) in the boundary values.

At this point, the route through the calculations is fairly clear. Since we are aiming for a lower bound on the probability of the event \Omega^+(W_N), it’s enough to find a lower bound on P^0_N(\Omega^+(W_N)\cap B).

Now, by positive correlation (or, formally, the FKG inequality) we can control P^0_N(B) just as a product of the probabilities that the field exceeds the threshold at each individual site in \Lambda_N(\epsilon). Since the value of the field at each site is normal with variance at least 1 (by definition), this is straightforward.

Finally, we treat P^0_N(\Omega^+(W_N) \,\big|\, B). We’ve established that, conditional on B, the mean at each point of W_N\backslash \Lambda_N(\epsilon) is at least \frac{m}{2}, and we can bound the variance above too. Again, this is a conditional variance, and so is at most the corresponding original variance, which is bounded above by \sigma_N^2:=\mathrm{Var}(h^N_0). (This fact that the variance is maximised at the centre is intuitively clear when phrased in terms of occupation times, but the proof is non-obvious, or at least non-obvious to me.)

Since each of the event h_v^N\ge 0 for v\in W_N\backslash \Lambda_N(\epsilon) is positively correlated with B, we can bound the probability it holds for all v by the product of the probabilities that it holds for each v. But having established that the conditional mean is at least \frac{m_N}{2} for each v, and the variance is uniformly bounded above (including in N), this gives an easy tail bound of the form we require.

Again it just remains to choose the sequence of thresholds m_N to maximise the lower bound on the probability that we’ve found in this way. In both cases, it turns out that taking m_N= \sqrt{C\log N} is sensible, and this turns out to be linked to the scaling of the maximum of the DGFF, which we will explore in the future.

References

[1] – J-D Deuschel, Entropic Repulsion of the Lattice Free Field, II. The 0-Boundary Case. Available at ProjectEuclid.

Advertisement

Independence and Association

Back when we did GCSE probability, we gave a definition of independent events as:

A and B are said to be independent if \mathbb{P}(A)\mathbb{P}(B)=\mathbb{P}(A\cap B).

We might also apply Bayes’ definition of conditional probability to say

\mathbb{P}(A|B)=\mathbb{P}(A)\quad\iff\quad A,B\text{ independent}\quad\iff\quad\mathbb{P}(B|A)=\mathbb{P}(B)

provided all the terms exist. (Eg the definition of \mathbb{P}(B|A) is at the very least non-obvious if the probability of A is 0.) In my opinion, this is a more naturally intuitive definition. For example, I think that when you toss two coins, the fact that the probability of the second coin being a tail is unaffected by whether the first is heads is more naturally ‘obvious’ than the fact that the joint probability of the two events is 1/4.

But, before getting too into anything philosophical, it is worth thinking about an equivalent situation for non-independent events. We remark that by an identical argument to above:

\mathbb{P}(A|B)\geq\mathbb{P}(A)\quad\iff\quad \mathbb{P}(A\cap B)\geq\mathbb{P}(A)\mathbb{P}(B)\quad\iff\quad\mathbb{P}(B|A)\geq\mathbb{P}(B)

Informally, this says that if we know A occurs, it increases the likelihood of B occuring. If we were talking about two random variables, we might say that they were positively correlated. But of course, by considering RVs 1_A,1_B, the result above is precisely the statement that the indicator functions have positive correlation.

Aim: To find a sufficient condition for positive correlation of random variables in a product measure.

Consider the following. Suppose A is an event which is positively correlated with the appearance of each edge. We might suspect that two such events A and B would be positively correlated. Instead, we consider a more concrete description. Recall that an event A is a subset of \Omega=\{0,1\}^E. Given w\in\Omega,e\in E, we say w^e\in\Omega defined by taking w and setting edge e to be open (note it may be open already). Now, we say event A is increasing, if

\forall w\in\Omega,\forall e\in E: w\in A\Rightarrow w^e\in A.

Note that this certainly implies the property previously mentioned, but the converse is not necessarily true.

Anyway, our revised aim will be to show that increasing events A and B are positively correlated for product measure.

For now, we approach the problem from the other direction, namely we attempt to find which measures on \{0,1\}^E have the property that A and B are positively correlated for all increasing A, B. Note that as before, we can think of this as \mathbb{E}1_A1_B\geq\mathbb{E}1_A\mathbb{E}1_B, and again here it is useful to rephrase our framework in terms of random variables. There is a natural (product) partial ordering of \Omega=\{0,1\}^E, and from this there is an easy notion of increasing random variables. Recall a random variable is defined as a measurable map \Omega\rightarrow\mathbb{R} so no further work is required.

X is increasing if w\geq w'\Rightarrow X(w)\geq X(w').

So we clarify our aim, which is to find a condition on the measure \mu such that \mu(XY)\geq \mu(X)\mu(Y) for all increasing X, Y. When this occurs, we say \mu is positively associated. Note that this is equivalent to \mu(A\cap B)\geq \mu(A)\mu(B) for all increasing events A, B. Why? We can build up X and Y from increasing indicator functions like \{X\geq x\} in a usual monotone class argument.

On the way, we need a partial ordering on the set of probability measures. Obviously, if \mu(A)\leq \nu(A) for all events A, then in fact \mu=\nu! So instead we say \mu\leq_{st}\nu if \mu(A)\leq \nu(A) for all increasing A. This is called the stochastic ordering, and there is a technical result of Strassen, proving the intuitively obvious claim that if \mu_1\leq \mu_2, then we can couple the measures in a natural way. Formally:

Theorem: \mu_1\leq\mu_2 \iff \exists a probability measure \nu on \Omega^2 such that the marginals are \mu_1,\mu_2 and

\nu(\{(w_1,w_2):w_1\leq w_2\})=1.

Our main result will be the FKG inequality which asserts that when \mu satisfies the following FKG lattice property

\mu(w_1\vee w_2)\mu(w_1\wedge w_2)\geq \mu(w_1)\mu(w_2),\quad\forall w_1,w_2\in\Omega

then \mu is positively associated. We will prove the case |E|<\infty.

We proceed by showing that \mu_1\leq\mu_2\propto Y\mu_1, rescaled, for Y an increasing RV. [Note that we are now suppressing the ‘st’ subscript, as context makes the use clear.]

To show this, we prove the more general Holley’s Theorem:

This states that if two positive probability measures satisfy a related lattice condition:

\mu_2(w_1\vee w_2)\mu_1(w_1\wedge w_2)\geq \mu_1(w_1)\mu_2(w_2)\quad\forall w_1,w_2\in\Omega

then we have the stochastic domination result: \mu_1\leq \mu_2.

Note that the lattice condition states, very informally, that adding edges results in a greater relative increase with respect to the measure \mu_2, which has a natural similarity to the definition of stochastic domination.

We prove this, perhaps unexpectedly, by resorting to a Markov chain. We note that there is a Markov chain on \Omega with equilibrium distribution given by \mu_1. This is simple: the non-zero transition rates are those given by the addition or removal of a single edge. Assume that edges are added at unit rate, and that edges are removed with rate: G(w^e,w_e)=\frac{\mu_1(w_e)}{\mu_1(w^e)}.

Similarly, we can construct a Markov chain on state space \Omega^2, where non-zero transitions are given by the addition of an edge to both states in the pair, the removal of an edge from both states in the pair, and the removal of an edge from only the first edge in the pair. Note that, as before, we may be ‘adding’ an edge which is already present. Assuming we start in this set, this choice means that we are restricting the sample space to \{(\pi,w):\pi\leq w\}. We need the transition rate of the third type of transition to have the form: \frac{\mu_1(\pi_e)}{\mu_1(\pi^e)}-\frac{\mu_2(w_e)}{\mu_2(w^e)}. So the lattice condition precisely confirms that this is non-negative, and thus we have a well-constructed Markov chain. The marginals have equilibrium distributions \mu_1,\mu_2 by construction, and by the general theory of Markov chains, there is an equilibrium distribution, and this leaves us in precisely the right position to apply Strassen to conclude the result.#

Summary of consequences: We have demonstrated that product measure is positively associated, as it certainly satisfies the FKG condition. Recall that this is what we had suspected intuitively for reasons given at the start of this account. Next time, I will talk about the most natural companion result, the BK inequality, and the stronger Reimer’s Inequality.

References: Both the motivation and the material is derived from Prof. Grimmett’s Part III course, Percolation and Related Topics, which was one of the mathematical highlights of the year. This account of the subject is a paraphrase of his lecture notes, which were themselves based on his book Probability on Graphs. Mistakes, naturally, are mine. Background on the course, and an online source of the book can be found on the course website here.