# Branching Random Walk and Amenability

This post is about some of the things I learned in an interesting given by Elisabetta Candellero in Oxford last week, based on joint work with Matt Roberts. The paper on which this is based can be found here. The main thing I want to talk about are some properties of graphs which were mentioned near the beginning which I hadn’t heard about before.

Branching Random Walk (hereafter BRW) is a model to which much attention has been paid, because of its natural applications in a range of physical and genetic settings. As with many of the best models, the definition is pretty much in the title. We take the ingredients for a random walk on a graph, which is a graph, and a transition matrix P on that graph. For most of the time we will consider simple random walk, so the graph G exactly specifies P. This requires the additional condition that the graph G is locally finite. We will introduce a branching mechanism, so at discrete times {0,1,2,…} we will track both the number of particles, and their current locations. We start at time 0 with a single particle at some vertex. Then at each time-step, all the vertices present die, and each gives birth independently to some number of offspring according to a fixed probability distribution $\mu$. These offspring then perform one move according to transition matrix P. Note that if you want the system to carry the appearance of having no death, then taking the support of the offspring distribution to be {1,2,3,…} achieves precisely this. The properties we consider will not be very interesting unless G is infinite, so assume that from now on.

There are almost limitless ways we could think of to generalise these dynamics. The offspring distribution could be allowed to depend on the vertex the particle is occupying. The joint transition probabilities of the offspring at a vertex could be biased in favour or against the offspring moving to the same site next. The environment could be chosen in advance before the process starts, but random.

The classical question about BRW is that of recurrence and transience. The definition extends naturally from that of a Markov chain (which any non-branching random walk on a graph is). As in that setting, we say a BRW is recurrent if every vertex is almost surely visited infinitely often by particles of the graph.

Heuristically, we should observe that in some sense, it is quite difficult for simple random walk on an infinite graph to be recurrent. We have examples in $\mathbb{Z},\mathbb{Z}^2$, but these are about as ‘small’ as an infinite graph can be. An idea might be that if the number of sites some distance away from where we start grows rapidly as the distance grows, then there isn’t enough ‘pull’ back to visit the sites near where we start infinitely often. Extending this argument, it is easier for a BRW to be recurrent, as we have the option to make the branching rate large, which means that there are lots of particles at large times, hence more possibility for visiting everywhere. Note that if the offspring distribution is subcritical, we don’t stand a chance of having interesting properties. If we ignore the random walk part, we just have a subcritical Galton-Watson process, which dies out almost surely.

We need a measure of the concept discussed in the heuristic for how fast the number of vertices in the graph grows as we consider bands of vertices further and further away from the starting vertex. The standard measure for this is the spectral radius, which is defined not in terms of number of vertices, but through the limiting probability of returning to a fixed vertex at large time n. Precisely

$\rho:= \limsup \mathbb{P}_i(X_n=i)^{1/n},$

so in some approximation sense

$\mathbb{P}_i(X_n=i)\sim \rho^{n},$

which explains why $\rho\le 1$. Note that by considering the sum of such terms, if simple random walk on G is recurrent, then $\rho=1$, but the converse does not hold. (Consider SRW on $\mathbb{Z}^3$ for example.)

It’s also worth remarking that $\rho$ is a class property. In particular, for a connected graph, the value of $\rho$ is independent of i. This is not surprising, as if d is the graph distance between vertices i and j, then

$p_{ii}^{(n)}\ge p_{ij}^{(d)}p_{jj}^{(n-2d)}p_{ji}^{(d)},$

and vice versa, which enables us to sandwich usefully for the limits.

Really, $\rho$ is a function of the transition matrix P. In fact, we can be more specific, by considering diagonalising P. The only case we care about is when P is infinite, so this is not especially nice, but it makes it clear why $p_{ii}^{(n)}$ decays like $|\rho|^n$ where $\rho$ is the largest eigenvalue of P. Indeed this is an alternative definition of the spectral radius. Note that Perron-Frobenius theory (which seems to keep coming up on the blog this week…) says that since $|\rho|\le 1$, then if $|\rho|=1$, we must have $\rho=1$. So the spectral radius being 1 is precisely equivalent to having an invariant measure. We don’t know whether we can normalise it, but P-F guarantees the relevant left-eigenvector is non-negative, and hence a measure.

Next we give this situation a name. Say that a random walk is amenable if $\rho(P)=1$. We can extend this property to say that a graph is amenable if SRW on it is amenable.

This is not the standard definition of amenability. This property is originally defined (by von Neumann) in the context of groups. A group G is said to be amenable if there exists a left-invariant probability measure on G, ie $\mu$ such that

$\forall A\subset G, \forall g\in G, \mu(gA)=A.$

The uniform distribution shows that any finite group is amenable.

It turns out that in general there are several conditions for a group which are equivalent to amenability. One is that, given G finitely generated by B, the Cayley graph for G with edges given by elements of B does not satisfy a strong isoperimetric inequality. Such an inequality is an alternative way of saying that the graph grows rapidly. It says that the size of the boundary of a subset of the vertices is uniformly large relative to the size of the set. Precisely, there exists a constant c>0 such that whenever U is a finite subset of the vertices, we have $|\partial U|\ge c|U|$. (Note that finiteness of U is important – we would not expect results like this to hold for very large subsets.)

Kesten proved that it is further equivalent to the statement that simple random walk on $Cay(G,B)$ is amenable in our original sense. This technical and important result links the two definitions.

We finish by declaring the main classical result in BRW, which is a precise condition for transience. As motivated earlier, the rate of branching and the spectral radius have opposing effects on whether the system is recurrent or transient. Note that at some large time, the expected number of particles which have returned to the starting vertex is given by the expected number of particles in the system multiplied by the probability that any one of them is back at its origin, ie $\sim \mu^n\rho^n$. So the probability that there is a particle back at the origin at this time is (crudely transferring from expectation to probability) $1\wedge (\mu \rho)^n$. We can conclude that the chain is recurrent if $\mu > \rho^{-1}$ and transient if $\mu<\rho^{-1}$. This result is due to Benjamini and Peres.

The remaining case, when $\mu=\rho^{-1}$ is called, unsurprisingly, critical BRW. It was proved in ’06 by Gantert and Muller that, in fact, all critical BRWs are transient too. This must exclude the amenable case, as we could think of SRW on $\mathbb{Z}$ as a critical BRW by taking the branching distribution to be identically one, as the spectral radius is also 1.

In the end, the material in this post is rather preliminary to the work presented in EC’s talk, which concerned the trace of BRW, and whether there are infinitely many essentially different paths to infinity taken by the particles of the BRW. They show that this holds in a broad class of graphs with symmetric properties.

# Invariant Distributions of Markov Chains

My lecture course in Linyi was all about Markov chains, and we spent much of the final two sessions discussing the properties of invariant distributions. I was not surprised, however, that none of the class chose this topic as the subject for a presentation to give after the end of the teaching week. One of the main problems is that so many rather similar properties are introduced roughly simultaneously. As we did in the class, I thought it was worth making some sort of executive summary, as a mixture of revision and interest.

Definition: $\pi$ is an invariant measure if $\pi P=\pi$. If in addition $\sum_{i\in I}\pi_i=1$, then we say it is an invariant distribution. Of course, if I is finite, then any invariant measure can be normalised to give an invariant distribution.

The key initial questions are about existence and uniqueness. First, if there are multiple communicating classes, then an invariant measure (resp. distribution) is a linear (resp. affine) combination of the invariant measures / distributions on each (closed) class. So we restrict attention to irreducible Markov chains.

In the finite case, P is a stochastic matrix so has a column eigenvector with eigenvalue 1, namely the vector with all entries equal to 1. Thus, by reference to general theory in linear algebra, P has a row eigenvector $\pi$ with eigenvalue 1. To paraphrase a remark made by one of my students, what is not clear is that this should be a measure. Demonstrating that this is true is rather non-trivial I think, normally done by reference to the rather more general Perron-Frobenius theorem, though on the flight home I came up with a short argument using Lagrangian duality. For now, we accept existence in the finite case, and note that we typically show existence by showing that the vector of expected time spent in each state between successive visits to a fixed reference state satisfies the properties of an invariant measure.

This is a good moment to note that recurrence is not a necessary condition for the existence of an invariant measure. For example, the random walk on $\mathbb{Z}^3$ is transient, but the uniform measure is invariant. However, it is not a sufficient condition for the existence of an invariant distribution either. (Of course, an irreducible finite chain is always recurrent, and always has an invariant distribution, so now we are considering only the infinite state space case.) The random walk on $\mathbb{Z}^2$ is recurrent, but the invariant measure is not normalisable.

The property we in fact need is positive recurrence. This says that the expected return time to each point is finite. Again, this is a class property. This is a common requirement in probabilistic arguments: almost surely finite is often not strong enough to show results if the expectation is infinite (see for example the various requirements for the optional stopping theorem). If this holds, then $\pi_i=\frac{1}{\mathbb{E}T_i}$, where $T_i$ is the the return time starting from some $i\in I$.

The final question is ‘Why are we interested?’ One of the best answers is to look at convergence properties. A simple suggestion is this: if we start in equilibrium, then $X_0,X_1,X_2,\ldots$ are all equal in distribution. Note that the dependence structure remains complicated, and much much more interesting than the individual distributions. Next, we observe that a calculation of n-step transition probabilities for a finite chain will typically involve a linear combination of nth powers of eigenvalues. One of the eigenvalues is 1, and the others lie strictly between -1 and 1. We observe in examples that the constant coefficient in $p_{ij}^{(n)}$ is generally a function of j alone, and so $p_{ij}^{(n)}\rightarrow\lambda_j$, some distribution on I. By considering $P^{n+1}=P\cdot P^n$, it is easy to see that if this converges, $(\lambda_j)_{j\in I}$ is an invariant distribution. The classic examples which do not work are

$P=\begin{pmatrix}0&1\\1&0\end{pmatrix}$ and $P=\begin{pmatrix}0&1&0\\ 0&0&1\\1&0&0\end{pmatrix}$,

as then the distribution of $X_n$ is a function of the remainder of n modulo 3 alone. With a little thought, we can give a precise classification of such chains which force you to be in particular proper subsets of the state space at regular times n. Chains without this property are called aperiodic, and we can show that distributions for such chains converge to the equilibrium distribution as $n\rightarrow\infty$.