DGFF 4 – Properties of the Green’s function

I’m at UBC this month for the PIMS probability summer school. One of the long courses is being given by Marek Biskup about the Discrete Gaussian Free Field (notes and outline here) so this seems like a good moment to revive the sequence of posts about the DGFF. Here’s DGFF1, DGFF2, DGFF3 from November.

The first draft of this post was about the maximum of the DGFF in a large box V_N, and also about the Green’s function G^{V_N}(x,y), which specifies the covariance structure of the DGFF. This first draft also became too long, so I’m splitting it into two somewhat shorter ones. As we’ll see, some understanding and standard estimates of the Green’s function is enough to say quite a bit about the maximum. In this first post, we’ll explore some ‘low-hanging fruit’ concerning the Green’s function, as defined through a simple random walk, which are useful, but rarely explained in the DGFF literature.

Symmetry of Green’s function

We start with one of these low-hanging fruit. If G^{V_N} is to be a covariance matrix, it has to be symmetric. In the first post, showing that the definition of the DGFF as a random field with given Hamiltonian is equivalent to \mathcal{N}(0,G^{V_N}) certainly can be viewed as a proof of symmetry. However, it would be satisfying if there was a direct argument in the language of the definition of the Green’s function.

To make this self-contained, recall the random walk definition of G^{V_N}(x,y). Let (S_m)_{m\ge 0} be simple random walk on V_N, and \mathbb{P}_x,\,\mathbb{E}_x denote starting the random walk at x\in V_N. As usual, let \tau_y,\,\tau_A denote the hitting time of a vertex y or a set A respectively. Then

G^{V_N}(x,y):= \mathbb{E}_x \left[ \sum_{m=0}^{\tau_{\partial V_N}}1_{(S_m=y) }\right].

That is, G^{V_N}(x,y) is the expected number of visits to y by a random walk from x, before it exits V_N.

Let’s drop the superscript for now, as everything should hold for a more general subset of the lattice. I don’t think it’s immediately obvious at the level of Markov chains why G(x,y)=G(y,x). In particular, it’s not the case that

\mathbb{P}_x(\tau_y < \tau_{D^c}) = \mathbb{P}_y(\tau_x <\tau_{D^c}),

and it feels that we can’t map between paths x \to \partial D and y\to \partial D in a way that preserves the number of visits to y and x, respectively. However, we can argue that for any m

\mathbb{P}_x(S_m=y, \tau_{D^c}>m) = \mathbb{P}_y(S_m=x, \tau_{D^c}>m),

by looking at the suitable paths of (S_m). That is, if we have a path x=S_0,S_1,\ldots,S_m=y that stays within D, then the probability of seeing this path starting from x and its reverse direction starting from y are equal. Why? Because

\mathbb{P}_x(S_0=x,S_1=v_1,\ldots,S_{m-1}=v_{m-1},S_m=y) = \prod_{\ell=0}^{m-1} \frac{1}{\mathrm{deg}(v_\ell)},

and

\mathbb{P}_y(S_0=y,S_1=v_{m-1},\ldots,S_{m-1}=v_1, S_m=x) = \prod_{\ell=0}^{m-1} \frac{1}{\mathrm{deg}(v_{m-\ell})} = \prod_{\ell=1}^m \frac{1}{\mathrm{deg}(v_\ell)}.

Since D\subset \mathbb{Z}^d and x,y are in the interior of D, we must have \mathrm{deg}(x)=\mathrm{deg}(y), and so these two expressions are equal. Summing over all such two-way paths, and then all m gives the result.

Fixing one argument

We now focus on G^D(\cdot,y), where the second argument is fixed. This is the solution to the Poisson equation

\Delta G^D(\cdot,y) = -\delta_y(\cdot),\quad G^D(x,y)=0,\; \forall x\in \partial D.

To see this, can use a standard hitting probability argument (as here) with the Markov property. This is harmonic in D\backslash \{y\}, and since we know

G^D(y,y)= \frac{1}{\mathbb{P}_y(\text{RW hits }\partial D\text{ before returning to }y)},

this uniquely specifies G^D(\cdot,y). Anyway, since harmonic functions achieve their maxima at the boundary, we have G(y,y)\ge G(x,y) for all x\in D. We can also see this from the SRW definition as

G(x,y)=G(y,x) = \mathbb{P}_y (\tau_x < \tau_{\partial D} ) G(x,x) \le G(x,x).

Changing the domain

Now we want to consider nested domains D\subset E, and compare G^D(\cdot,\cdot) and G^E(\cdot,\cdot) on DxD. The idea is that for SRW started from x\in D, we have \tau_{\partial D}\le \tau_{\partial E}, since one boundary is contained within the other. From this, we get

G^D(x,y)\le G^E(x,y),\quad \forall x,y\in D,

and we will use the particular case y=x.

For example, if x\in V_N, the box with width N, then the box with width 2N centred on x contains the whole of V_N. So, if we set \bar {V}_{2N}:= [-N,N]^d, then with reference to the diagram, we have

G^{V_N}(x,x)\le G^{\bar{V}_{2N}}(0,0),\quad x\in V_N.

As we’ll see when we study the maximum of the DGFF on V_N, uniform control over the pointwise variance will be a useful tool.

Maximising the Green’s function

The idea of bounding G^{V_N}(x,x) by G^{\bar V_{2N}}(0,0) for any x\in V_N is clever and useful. But a more direct approach would be to find the value of x that maximises G^{V_N}(x,x). We would conjecture that when V_N has a central vertex, then this is the maximiser.

We can prove this directly from the definition of the Green’s function in terms of random walk occupation times. Let’s assume we are working with \bar{V}_N for even N, so that 0 is the central vertex. Again, since

G^D(x,x)=\frac{1}{\mathbb{P}_x(\text{RW hits }\partial D\text{ before returning to }x)}, (*)

it would suffice to show that this probability is minimised when x=0. This feels right, since 0 is furthest from the boundary. Other points are closer to the boundary in some directions but further in others, so we can’t condition on the maximum distance from its start point achieved by an excursion of SRW (we’re vertex-transitive, so these look the same from all starting points), as even allowing for the four possible rotations, for an excursion of diameter slightly larger than N, starting at the centre is maximally bad.

However, intuitively it does feel as if being closer to the boundary makes you more likely to escape earlier. In fact, with a bit more care, we can couple the SRW started from 0 and the SRW started from r=(r^x,r^y)\ne 0 such that the latter always exits first. For convenience we’ll assume also that r^x,r^y are both even.

I couldn’t find any reference to this, so I don’t know whether it’s well-known or not. The following argument involves projecting into each axis, and doing separate couplings for transitions in the x-direction and transitions in the y-direction. We assume WLOG that x is in the upper-right quadrant as shown. Then, let 0=S_0,S_1,S_2,\ldots be SRW started from 0, and we will construct r=R_0,R_1,R_2,\ldots on the same probability space as (S_m)_{m\ge 0} as follows. For every m, we set the increment R_{m+1}-R_m to be \pm(S_{m+1}-S_m). It remains to specify the sign, which will be determined by the direction of the S-increment, and a pair of stopping times. The marginal is therefore again an SRW, started from r. Temporarily, we use the unusual notation S_m= (S^x_m,S^y_m) for the coordinates of S_m.

So, if S_{m+1}-S_m=(1,0), (-1,0), ie S moves left or right, then we set

R_{m+1}-R_m = \begin{cases} -(S_{m+1}-S_m) &\quad \text{if }m<T^x\\ S_{m+1}-S_m&\quad \text{if }m>T^x.\end{cases} (*)

where T^x:= \min\{m\,:\, R^x_m=S^x_m\}. That is, R^x moves in the opposing direction to S^x until the first time when they are equal (hence the parity requirement), and then they move together. WLOG assume that r^x>0. Then suppose S^x_m=\pm N and such m is minimal. Then by construction, if m\ge T^x, then R^x_m=\pm N also. If m<T^x, then we must have S^x_m=-N, and so since R^x‘s trajectory is a mirror image of S^x‘s, in fact R^x_m = N+r^x>N, so R^x hit +N first. In both cases, we see that R^x hits \pm N at the same time or before S^x.

In other words, when S^x_m has non-negative x coordinate, the lazy random walk R^x follows the same trajectory as S^x, and when it has negative x coordinate, the R^x mirrors S^x. At some time, it may happen that S^x_m= R^x_m=0 (recall the parity condition on r). Call this time T^x. We then adjust the description of the coupling so that (*) is the mechanism for m<T^x, and then for m\ge T^x, we take S^x_m=R^x_m.

Similarly, if S_{m+1}-S_m =(0,1), (0,-1), ie S moves up or down, then we set

R_{m+1}-R_m = \begin{cases} -(S_{m+1}-S_m)&\quad \text{ if }m<T^y\\  S_{m+1}-S_m&\quad \text{if }m\le T^y,\end{cases}

with corresponding definition of the stopping time T^y.

This completes the coupling, and by considering T^x\wedge T^y, we have shown what that the exit time for the walk started from zero dominates the exit time for walk started from r. Recall that so far we are in the case where the box has even width and r=(r^x,r^y) has even coordinates.

This exit time comparison isn’t exactly what we need to compare G^N(0,0) and G^N(x,x). It’s worth remarking at this stage that if all we cared about was the Green’s function on the integer line [-N,N], we would have an easier argument, as by the harmonic property of G(\cdot,y)

G^{[-N,N]}(0,r)=\frac{N-r}{N}G^{[-N,N]}(0,0),

G^{[-N,N]}(r,0) = \frac{N}{N+r}G^{[-N,N]}(r,r),

and so G(0,0)>G(r,r) follows by symmetry. To lift from 1D to 2D directly, we need a bit more than this. It’s possible that S returns in both x- and y- coordinates more often than R, but never at the same time. Fortunately, the coupling we defined slightly earlier does give us a bit more control.

Let \tau^x(S), \tau^x(R) be the first times that S^x, R^x hit \pm N. Under this coupling, for any m\ge 0

\mathbb{P}(S^x_m=0, m<T^x) = \mathbb{P}(R^x_m=r^x, m<T^x)

since these events are literally equal. Since we showed that \tau^x(R)\le \tau^x(S) almost surely, we can further deduce

\mathbb{P}(S^x_m=0,m<T^x\wedge \tau^x(S)) \ge \mathbb{P}(S^x_m=0,m<T^x\wedge \tau^x(R))

=\mathbb{P}(R^x_m=r^x, m <T^x \wedge \tau^x(R)).

To address the corresponding events for which m\ge T^x, we apply the strong Markov property at T^x, to obtain SRW Z_m started from r/2, and let \tau_{-N},\tau_{+N} be the hitting times of -N,+N respectively and \tau_{\pm N}=\tau_{-N}\wedge \tau_{+N}. It will now suffice to prove that

\mathbb{P}(Z_m=0, m< \tau_{\pm N}) \ge \mathbb{P}(Z_m=r,m<\tau_{\pm N}), (**)

as then we can apply the law of total probability and sum over values of T^x and m\ge 0.

To prove this result, we consider the following bijection between trajectories of length m from r/2 to {0,r}. We decompose the trajectories into excursions away from r/2, and then a final meander from r/2 to {0,r} that stays on the same side of r/2. We construct the new trajectory by preserving all the initial excursions, but reversing all the steps of the final meander. So if the original trajectory ended up at 0, the image ends up at r. Trivially, the initial excursions in the image only hit \pm N if the excursions in the original trajectory did this too. But it’s also easy to see, by a similar argument to the coupling at the start of this section, that if the original trajectory ends at r and does not hit \pm N, then so does the image. However, the converse is not true. So we conclude (**), and thus

\mathbb{P}(S_m^x=0) \ge \mathbb{P}(R_m^x=0)

for all m by combining everything we have seen so far. And so we can now lift to a statement about S_m itself, that is considering both coordinates separately.

 

The remaining cases for r require a little more care over the definition of T^x, though the same projection argument works, for fundamentally the same reason. (Note that in the above argument, if S^x_m=-N and m<T^x, then in fact R^x_m\ge N+2, and so it’s not hard to convince yourself that a sensible adjustment to the stopping time will allow a corresponding result with R^x_m\ge N+1 in the odd r^x case.) The case for N odd is harder, since in one dimension there are two median sites, and it’s clear by symmetry that we can’t couple them such that RW from one always exits at least as early as RW from the other. However, the distributions of exit times started from these two sites are the same (by symmetry), and so although we can’t find a coupling, we can use similar stopping times to obtain a result in probability.

In the next post, we’ll see how to apply this uniform bound on G^{V_N}(x,x) to control the maximum of the DGFF on V_N. In particular, we address how the positive correlations of DGFF influence the behaviour of the maximum by comparison with independent Gaussians at each site.

Advertisements

Popoviciu’s Inequality

I’ve just returned to the UK after an excellent stay at the University of British Columbia. More about that will follow in some posts which are being queued. Anyway, I flew back in time to attend the last day of the camp held at Oundle School to select the UK team for this year’s International Mathematical Olympiad, to be held in Cape Town in early July. I chose to give a short session on inequalities, which is a topic I did not enjoy as a student and do not enjoy now, but perhaps that makes it a particularly suitable choice?

We began with a discussion of convexity. Extremely occasionally in olympiads, and merely slightly occasionally in real life, an inequality arises which can be proved by showing that a given function is convex in all its arguments, hence its maximum must be attained at a boundary value in each variable.

In general though, our main experience of convexity will be through the medium of Jensen’s inequality. A worthwhile check is to consider one form of the statement of Jensen’s inequality, with two arguments. We are always given a convex function f defined on an interval I=[a,b], and x,y\in I, and weights \alpha,\beta which sum to 1. Then

\alpha f(x)+\beta f(y)\ge f(\alpha x+\beta y).

How do we prove this? Well, in fact this is the natural definition of convexity for a function. There had initially been vague murmurings that convexity should be defined as a property of the second derivative of the function. But this is somewhat unsatisfactory, as the function f(x)=|x| is certainly convex, but the second derivative does not exist at x=0. One could argue that the second derivative may not be finite at x=0, but is nonetheless positive by defining it as a limit which happens to be infinite in this case. However, I feel it is uncontroversial to take the case of Jensen given above as the definition of convexity. It is after all a geometric property, so why raise objections to a geometric definition?

The general statement of Jensen’s inequality, with the natural definitions, is

\sum_{i} \alpha_i f(x_i)\ge f(\sum_{i}\alpha_ix_i).

This is sometimes called Weighted Jensen in the olympiad community, with ‘ordinary’ Jensen following when the weights are all 1/n. In a probabilistic context, we write

\mathbb{E}[f(X)]\ge f(\mathbb{E}X),

for X any random variable supported on the domain of f. Naturally, X can be continuous as well as discrete, giving an integral version of the discretely weighted statement.

Comparing ‘ordinary’ Jensen and ‘weighted’ Jensen, we see an example of the situation where the more general result is easier to prove. As is often the case in these situations, this arises because the more general conditions allow more ‘elbow room’ to perform an inductive argument. A stronger statement means that assuming the induction hypothesis is more useful! Anyway, I won’t digress too far onto the proof of discrete ‘weighted’ Jensen as it is a worthwhile exercise for olympiad students.

What I wanted to discuss principally was an inequality due to Tiberiu Popoviciu:

\frac13[f(x)+f(y)+f(z)]+f(\frac{x+y+z}{3})\ge \frac23[f(\frac{x+y}{2})+f(\frac{y+z}{2})+f(\frac{z+x}{2})].

We might offer the following highly vague intuition. Jensen asserts that for sums of the form \sum f(x_i), you get larger sums if the points are more spread out. The effect of taking the mean is immediately to bring all the points as close together as possible. But Popoviciu says that this effect is so pronounced that even with only half the weight on the outer points (and the rest as close together as possible), it still dominates a system with the points twice as close together.

So how to prove it? I mentioned that there is, unsurprisingly, a weighted version of this result, which was supposed to act as a hint to avoid getting too hung up about midpoints. One can draw nice diagrams with a triangle of points (x,f(x)), (y,f(y)), (z,f(z)) and draw midpoints, medians and centroids, but the consensus seemed to be that this didn’t help much.

I had tried breaking up the LHS into three symmetric portions, and using weighted Jensen to obtain terms on the RHS, but this also didn’t yield much, so I warned the students against this approach unless they had a specific reason to suppose it might succeed.

Fortunately, several of the students decided to ignore this advice, and though most fell into a similar problem I had experienced, Joe found that by actively avoiding symmetry, a decomposition into two cases of Jensen could be made. First we assume WLOG that x\le y \le z, and so by standard Jensen, we have

\frac13[f(x)+f(y)]\ge \frac23 f(\frac{x+y}{2}).

It remains to show

\frac13 f(z)+f(\frac{x+y+z}{3})\ge \frac23[f(\frac{x+z}{2})+f(\frac{y+z}{2})].

If we multiply by ¾, then we have an expression on each side that looks like the LHS of Weighted Jensen. At this point, it is worth getting geometric again. One way to visualise Jensen is that for a convex function, a chord between two points on the function lies above the function. (For standard Jensen with two variables, in particular the midpoint lies above the function.) But indeed, suppose we have values x_1<x_2<y_2<y_1, then the chord between f(x_1),f(y_1) lies strictly above the chord between f(x_2),f(y_2). Making precisely such a comparison gives the result required above. If you want to be more formal about it, you could consider replacing the values of f between x_2,y_2 with a straight line, then applying Jensen to this function. Linearity allows us to move the weighting in and out of the brackets on the right hand side, whenever the mean lies in this straight line interval.

Convex function chordsHopefully the diagram above helps. Note that we can compare the heights of the blue points (with the same abscissa), but obviously not the red points!

In any case, I was sceptical about whether this method would work for the weighted version of Popoviciu’s inequality

\alpha f(x)+\beta f(y) + \gamma f(z)+f(\alpha x+\beta y+\gamma z)\ge

(\alpha+\beta)f(\frac{\alpha x+\beta y}{\alpha+\beta})+(\beta+\gamma)f(\frac{\beta y + \gamma z}{\beta+\gamma})+(\gamma+\alpha)f(\frac{\gamma z+\alpha x}{\gamma+\alpha}).

It turns out though, that it works absolutely fine. I would be interested to see a solution to the original statement making use of the medians and centroid, as then by considering general Cevians the more general inequality might follow.

That’s all great, but my main aim had been to introduce one trick which somewhat trivialises the problem. Note that in the original statement of Popoviciu, we have a convex function, but we only evaluate it at seven points. So for given x,y,z, it makes no difference if we replace the function f with a piece-wise linear function going through the correct seven points. This means that if we can prove that the inequality for any convex piece-wise linear function with at most eight linear parts then we are done.

(There’s a subtlety here. Note that we will prove the inequality for all such functions and all x,y,z, but we will only use this result when x,y,z and their means are the points where the function changes gradient.)

So we consider the function

g_a(x)=\begin{cases}0& x\le 0\\ ax & x\ge 0\end{cases}

for some positive value of a. It is not much effort to check that this satisfies Popoviciu. It is also easy to check that the constant function, and the linear function g(x)=bx also satisfy the inequality. We now prove that we can write the piece-wise linear function as a sum of functions which satisfy the inequality, and hence the piece-wise linear function satisfies the inequality.

Suppose we have a convex piecewise linear function h(x) where x_1<x_2<\ldots<x_n are the points where the derivative changes. We write

a_i=h'(x_i+)-h'(x_i'),\quad a_0=h'(x_1-),

for the change in gradient of h around point x_i. Crucially, because h is convex, we have a_i\ge 0. Then we can write h as

h(x)=C+ a_0x+g_{a_1}(x-x_1)+\ldots+g_{a_{n}}(x-x_n),

for a suitable choice of the constant C. This result comes according to [1] as an intermediate step in a short paper of Hardy, Littlewood and Polya, which I can’t currently find online. Note that inequalities are preserved under addition (but not under subtraction) so it follows that h satisfies Popoviciu, and so the original function f satisfies it too for the values of x,y,z chosen. These were arbitrary (but were used to construct h), and hence f satisfies the inequality for all x,y,z.

Some further generalisations can be found in [1]. With more variables, there are more interesting combinatorial aspects that must be checked, regarding the order of the various weighted means.

[1] – D. Grinberg – Generalizations of Popoviciu’s Inequality. arXiv

Enhanced by Zemanta

A Month in Vancouver

I’ve spent the past ~3 weeks at the University of British Columbia in Vancouver, working on some projects related to forest fires and frozen percolation with Balazs Rath. In advance, I’d imagined that this might be a prolific time for new articles on this blog, but in the end it was more like the exact opposite. After spending all of every day playing with technical details on our models (which do not lend themselves naturally to informal writing…) if I felt like doing maths in the evenings, it was more important to keep working on the questions in hand.

Anyway, I had a great time mathematically and otherwise, so I feel a shameless tourism post is justified to report what I enjoyed most about Vancouver.

Asian Food

It sounds ridiculous to return from a trip and reference this as the highlight, but the most noticeable long-term result of a month in Vancouver is that I will probably never be able to order sushi in the UK again. The typical house roll cost about £3 and contained so much seafood that it was essentially impossible to manipulate. Or maybe that was just my (lack of) chopstick skills? Anyway, as I’ve observed in other parts of Canada, groceries bizarrely expensive relative to eating out, at least relative to Europe. By my third week I had basically given up and found a different East Asian cuisine every other night. I ate Japanese, Chinese, Korean, Thai, Malaysian, Taiwanese, Vietnamese and others that didn’t feel constrained by a single national identity. All were uniformly excellent, unfailingly welcoming, and made commendable efforts to include smoked salmon in every single dish.

Outdoor Pursuits

For a country where a donut shop is an integral part of the national identity, the total absence of fat people was immediately noticeable. In their place was a city where everyone seemed to go running, hiking, skiing, quad-biking, skidoo-ing and so on at the slightest opportunity. I haven’t seen that much lycra since breakfast in college on the morning of May Bumps…

For a major world city, the wilderness was tantalisingly close, and remarkably accessible. The public transit system spread far into the mountains and up the coast, and offered you any 90 minutes worth of travel for $2. A breezy ferry trip across the sound to Bowen Island offered the chance to hike up Mt Gardner, an unrelenting and totally solitary trek up through the pine forest. I was conscious that if I slipped somewhere, there wouldn’t be much left to find once the bears had got to me, but the views over the inlet from the summit made it all worthwhile. A trip to Mt Seymour on Victoria Day, then down past the vertigo-inducing Lynn Suspension bridge and a trail back towards civilisation was equally successful. On both occasions, felt I’d very much earned that donut.

UBC

Of course, the main reason for the trip was to do worthwhile mathematics, and I certainly haven’t experienced a nicer university campus to do exactly that. A five minute walk and 500 steps away from the Wysteria-covered maths department is the 7km long Wreck Beach, a sunny haven for Vancouver’s hipsters and nudists, protected from direct exposure to the Pacific by Vancouver Island. Of more interest to me were the 30km or so of trails through the forest in the Pacific Spirit Regional Park, directly across the road from where I was staying in Wesbrook Village at the south end of the campus.

Most importantly, I’m extremely grateful for the invitation to work at UBC for these past few weeks. It’s been productive, and an excellent opportunity to meet some new people and learn about some interesting aspects of probability. I very much hope an opportunity will arise to go back soon!