Multitype Branching Processes

One of the fundamental objects in classical probability theory is the Galton-Watson branching process. This is defined to be a model for the growth of a population, where each individual in a generation gives birth to some number (possibly zero) of offspring, who form the next generation. Crucially, the numbers of offspring of the individuals are IID, with the same distribution both within generations and between generations.

There are several ways one might generalise this, such as non-IID offspring distributions, or pairs of individuals producing some number of offspring, but here we consider the situation where each individual has some type, and different types have different offspring distributions. Note that if there are K types, say, then the offspring distributions should now be supported on $\mathbb{Z}_{\ge 0}^K$. Let’s say the offspring distribution from a parent of type i is $\mu^{(i)}$.

The first question to address is one of survival. Recall that if we want to know whether a standard Galton-Watson process has positive probability of having infinite size, that is never going extinct, we only need to know the expectation of the offspring distribution. If this is less than 1, then the process is subcritical and is almost surely finite. If it is greater than 1, then it is supercritical and survives with positive probability. If the expectation is exactly 1 (and the variance is finite) then the process is critical and although it is still almost surely finite, the overall population size has a power-law tail, and hence (or otherwise) the expected population size is infinite.

We would like a similar result for the multitype process, saying that we do not need to know everything about the distribution to decide what the survival probability should be.

The first thing to address is why we can’t just reduce the multitype change to the monotype setting. It’s easiest to assume that we know the type of the root in the multitype tree. The case where the type of the root is random can be reconstructed later. Anyway, suppose now that we want to know the offspring distribution of a vertex in the m-th generation. To decide this, we need to know the probability that this vertex has a given type, say type j. To calculate this, we need to work out all the type possibilities for the first m generations, and their probabilities, which may well include lots of complicated size-biasing. Certainly it is not easy, and there’s no reason why these offspring distributions should be IID. The best we can say is that they should probably be exchangeable within each generation.

Obviously if the offspring distribution does not depend on the parent’s type, then we have a standard Galton-Watson tree with types assigned in an IID manner to the realisation. If the types are symmetric (for example if M, to be defined, is invariant under permuting the indices) then life gets much easier. In general, however, it will be more complicated than this.

We can however think about how to decide on survival probability. We consider the expected number of offspring, allowing both the type of the parent and the type of the child to vary. So define $m_{ij}$ to be the expected number of type j children born to a type i parent. Then write these in a matrix $M=(m_{ij})$.

One generalisation is to consider a Galton-Watson forest started from some positive number of roots of various types. Suppose we have a vector $\nu=(\nu_i)$ listing the number of roots of each type. Then the expected number of descendents of each type at generation n is given by the vector $\nu M^n$.

Let $\lambda$ be the largest eigenvalue of M. As for the transition matrices of Markov chains, the Perron-Frobenius theorem applies here, which confirms that, because the entries of M are positive, the eigenvalue with largest modulus is simple and real, and the associated eigenvector has entirely positive entries. [In fact we need a couple of extra conditions on M, including that it is possible to get from any type to any other type – we say irreducible – but that isn’t worth going into now.]

So in fact the total number of descendents at generation n grows like $\lambda^n$ in expectation, and so we have the same description of subcriticality and supercriticality. We can also make a sensible comment about the left-$\lambda$-eigenvector of M. This is the limiting proportion of the different types of vertices.

It’s a result (eg. [3]) that the height profile of a depth-first search on a standard Galton-Watson tree converges to Brownian Motion. Another way to phrase this is that a GW tree conditioned to have some size N has the Brownian Continuum Random Tree as a scaling limit as N grows to infinity. Miermont [4] proves that this result holds for the multitype tree as well. In the remainder of this post I want to discuss one idea along the way to the proof, and one application.

I said initially that there wasn’t a trivial reduction of a multitype process to a monotype process. There is however a non-trivial embedding of a monotype process in a multitype process. Consider all the vertices of type 1, and all the paths between such vertices. Then draw a new tree consisting of just the type 1 vertices. Two of these are joined by an edge if there is no other type 1 vertex on the unique path between them in the original tree. If that definition is confusing, think of the most sensible way to construct a tree on the type 1 vertices from the original, and you’ve probably chosen this definition.

There are two important things about this new tree. 1) It is a Galton-Watson tree, and 2) if the original tree is critical, then this reduced tree is also critical. Proving 1) is heavily dependent on exactly what definitions one takes for both the multitype branching mechanism and the standard G-W mechanism. Essentially, at a type 1 vertex, the number of type 1 descendents is not dependent on anything that happened at previous generations, nor in other branches of the original tree. This gives IID offspring distributions once it is formalised. As for criticality, we note that by the matrix argument given before, under the irreducibility condition discussed, the expectation of the total population size is infinite iff the expected number of type 1 vertices is also infinite. Since the proportion of type 1 vertices is given by the first element of the left eigenvector, which is positive, we can make a further argument that the number of type 1 vertices has a power-law tail iff the total population size also has a power-law tail.

I want to end by explaining why I was thinking about this model at all. In many previous posts I’ve discussed the forest fire model, where occasionally all the edges in some large component are deleted, and the component becomes a set of singletons again. We are interested in the local limit. That is, what do the large components look like from the point of view of a single vertex in the component? If we were able to prove that the large components have BCRT as the scaling limit, this would answer this question.

This holds for the original random graph process. There are two sensible ways to motivate this. Firstly, given that a component is a tree (which it is with high probability if its size is O(1) ), its distribution is that of the uniform tree, and it is known that this has BCRT as a scaling limit [1]. Alternatively, we know that the components have a Poisson Galton-Watson process as a local limit by the same argument used to calculate the increments of the exploration process. So we have an alternative description of the BCRT appearing: the scaling limit of G-W trees conditioned on their size.

Regarding the forest fires, if we stop the process at some time T>1, we know that some vertices have been burned several times and some vertices have never received an edge. What is clear though is that if we specify the age of each vertex, that is, how long has elapsed since it was last burned; conditional on this, we have an inhomogeneous random graph. Note that if we have two vertices of ages s and t, then the probability that there is an edge between them is $1-e^{-\frac{s\wedge t}{n}}$, ie approximately $\frac{s\wedge t}{n}$. The function giving the probabilities of edges between different types of vertices is called the kernel, and here it is sufficiently well-behaved (in particular, it is bounded) that we are able to use the results of Bollobas et al in [2], where they discuss general sparse inhomogeneous random graphs. They show, among many other things, that in this setting as well the local limit is a multitype branching process.

So in conclusion, we have almost all the ingredients towards proving the result we want, that forest fire components have BCRT scaling limit. The only outstanding matter is that the Miermont result deals with a finite number of types, whereas obviously in the setting where we parameterise by age, the set of types is continuous. In other words, I’m working hard!

References

[1] Aldous – The Continuum Random Tree III

[2] Bollobas, Janson, Riordan – The phase transition in inhomogeneous random graphs

[3] Le Gall – Random Trees and Applications

[4] Miermont – Invariance principles for spatial multitype Galton-Watson trees

Discontinuous Phase Transitions

Yesterday, Demeter Kiss from Cambridge gave a seminar in Oxford about a model for self-destructive percolation on $\mathbb{Z}^2$ that had implications for the (non-)existence of an infinite-parameter forest fire model on the same lattice. I enjoyed talking about this and his recent work on the related model of frozen percolation on $\mathbb{Z}^2$. Considering these models in the lattice setting present a whole range of interesting geometric challenges that are not present in the mean-field case that has mainly occupied my research direction so far.

The afternoon’s discussion included lots of open problems about percolation. Several of these are based around continuity of the phase transition, so I thought I would write a quite post about some simple examples of this, and one example where it does not hold.

A helpful base example is bond percolation on the lattice $\mathbb{Z}^2$. Here, we specify some probability p in [0,1], and we declare edges of the lattice open with probability p, independently of each other. We then consider the graph induced by the open edges. We say that percolation occurs if the origin is contained in an infinite open component. The terminology arises from the interpretation as fluid being added at the origin and flowing down open edges. We define $\theta(p)$ to be the probability that the origin is in an infinite component when the parameter is p. By translation-invariance, we can get some sort of 0-1 law, to conclude that there is an infinite component somewhere in the system with probability either 0 or 1, depending on whether $\theta(p)$ is positive or zero. Indeed, we can further show that if it is positive, then with probability 1 there is a unique infinite component.

We define the critical probability $p_c:= \inf\{\theta(p)>0\}$. A question worth asking is then, what is $\theta(p_c)$? In some examples, we can find $p_c$, but we cannot prove that $\theta(p)$ is continuous around $p_c$. In the case of $\mathbb{Z}^2$ this is known, and it is known from work of Kesten that $p_c=1/2$. See below for a plot of $\theta(p)$ in this setting (obtained from this blog, though possibly originating elsewhere).

The aim is to find an example where we do not have such a continuous phase transition. The original work on frozen percolation took place on trees, and one of Kiss’s results is confirms that these show qualitatively different phenomena to the same process on the lattice. In some sense, trees lie halfway between a lattice and a mean-field model, since there is often some independence when we look down the tree from a given generation, if it is well-defined to use such language.

Anyway, first we consider percolation on an infinite regular rooted k-ary tree. This means we have a root, which has k children, each of which in turn has k children, and so on. As before we consider bond percolation with parameter p. In this setting, we have a language to describe the resulting open component of the root. The offspring distribution of any vertex in the open component is given by Bin(k,p) independently of everything else, so we can view this component as the realisation of a Galton-Watson tree with this offspring distribution. This distribution has finite mean kp, and so we can state explicitly when the survival probability is positive. This happens when the mean is greater than 1, ie p>1/k.

For our actual example, we will consider the survival probability, but the technicalities are easier to explain if we look at the extinction probability, now using the language of branching processes. Suppose the offspring distribution has pgf given by

$f(x)=p_0+p_1x+p_2x^2+\ldots.$

Then the extinction probability q satisfies f(q)=q. I want to pause to consider what happens if this equation has multiple solutions. Indeed, in most interesting cases it will have multiple solutions, since f(1) will always be 1 if it is a non-defective offspring distribution. It is typically cited that: the extinction probability q is the smallest solution to this equation. I want to discuss why that is the case.

To approach this, we have to consider what extinction means. It is the limit in the event sense of the events {we are extinct after n generations}. Let the probabilities of these events be $q_n$, so $q_0=0$. Then by a straightforward coupling argument, we must have

$0=q_0\le q_1\le q_2 \le\ldots\le q:= \lim q_n \le 1.$

But, by the same generating function argument as before, $q_{n+1}=f(q_n)\ge q_n$. So if we split [0,1] into regions A where $f(x)\ge x$ and B where $f(x), all the $(q_n)$s must occur in the former, and so since it is closed, their limit must be in A also. Note that if f(x) intersects x lots of times, then region A is not necessarily connected. In the diagram below, in moving from $q_n$ to $q_{n+1}$ we might jump across part of B.

This is bad, as we are trying to prove that q is the right boundary of the connected component of A containing 0. But this cannot happen, as f is monotonic. So if one of the roots of f(x)=x in between the hypothesised $q_n is called z, then $f(q_n)< f(z)=z < q_{n+1}$, a contradiction.

Ok, so now we are ready to consider our counterexample to continuity over the percolation threshold. See references for a link to the original source of this example. We have to choose a slightly more complicated event than mere survival or extinction. We consider bond percolation as before on the infinite ternary tree, where every vertex has precisely 3 offspring. Our percolation event is now that the root is the root of an infinite binary tree. That is, the root has at least two children, each of which have at least two children, each of which, and so on.

If we set this probability equal to q, and the probability of an edge being open equal to p, then we have the recurrence:

$q=3p^2(1-p)q^2+p^3[3q^2(1-q)+q^3].$

The first term corresponds to the root having two open edges to offspring, and the second to the root having all three open edges to offspring. After manipulating, we end up with

$q\left[2p^3q^2-3p^2q+1\right]=0.$

We are therefore interested in roots of the quadratic lying between 0 and 1. The discriminant can be evaluated as

$\Delta=p^3(9p-8),$

and so there are no real roots where p<8/9. But when p=8/9, we have a repeated root at q=27/32, which is obviously not zero!

This equation is qualitatively different to the previous one for the extinction probability of a Galton-Watson tree. There, we had a quadratic, with one root at 1. As we varied p, the other root moved continuously from greater than one to less than one, so it passed through 1, giving continuity at the critical probability. Here, we have a cubic, again with one root at 1. But now the other roots are complex for small p, meaning that the local minimum of the cubic lies above the x-axis. As p gets to the critical value, it the local minimum passes below the x-axis, and suddenly we have a repeated root, not at zero.

I would like to have a neat probabilistic heuristic for this result, without having to make reference to generating functions. At the moment, the best I can come up with is to say that the original problem is simple, in the sense that the critical probability is as small as it could be while still making sense in expectation. To be concrete, when the mean of the offspring generation is less than 1, the expected size of the nth generation tends to zero, so there certainly could not be positive probability of having an infinite component.

Whereas in the binary tree example, we only require p=2/3 to have, in expectation, the right number of open edges to theoretically allow an infinite binary tree. If we think of percolation as a dynamic process by coupling in p, essentially as we move from p=2/3 to p=8/9 we need to add enough edges near the origin to be able to take advantage of the high density of edges available far from the origin. The probability of this working given you start from n vertices grows much faster (as n grows) than in the original problem, so you might expect a faster transition.

This is so content-free I’m reluctant even to call it a heuristic. I would be very interested to hear of any more convincing argument for this phenomenon!

REFERENCES

Dekking, Pakes – On family trees and subtrees of simple branching processes (link)

Recent Research Activity

I’ve spent this week in Luminy, near Marseille, attending a summer school run by ALEA, the organisation of French probabilists. We’ve been staying in CIRM, a dedicated maths research conference centre at the edges of the calanques, the area of mountains and jagged coastal inlets between Marseille and Cassis. The walking possibilities have been excellent, as have the courses and lectures, on a range of topics in probability theory.

Anyway, the time here has been an excellent moment to reflect on my research progress, and try to come up with the sort of fresh ideas that are perhaps slightly inhibited by sitting at a desk with an endless supply of paper on which to try calculations. When I get back, I have to submit a first-year report, so at least for a little while I will have to suppress the desire to make further progress and instead diligently assemble the progress I have made.

The Model

I’ve defined some of these processes in past posts, but I see no harm in doing so again. We take the standard Erdos-Renyi random graph process, where edges are added one-at-a-time uniformly at random between n vertices, and amend it by adding a deletion mechanism. The aim is to arrive at a process which looks in equilibrium more like the critical random graph than either the subcritical or supercritical regimes, where the components are very small, and dominated by one giant component respectively. Rath, Toth and others have studied the process where each vertex is hit by lightning at uniform rate. When this happens, we delete all the edges in the component containing that vertex. Naturally, big components will be hit by lightning more often than small components, and so this acts as a mechanism to prevent the formation of giant components, if scaled correctly.

We take a different approach. We observe that criticality in the original random graph process is denoted by the first appearance of a giant component, but also by the first appearance of a) lots of cycles, and b) large cycles. In particular, it is very unlikely that a giant component could form without containing any cycles. We will therefore use the appearance of a cycle to trigger some form of deletion mechanism.

Our final goal is to treat the so-called ‘Cycle Deletion’ model. Here, whenever a cycle appears, we delete all the edges in that cycle immediately. There are several challenges in treating this model, because the rate at which cycles emerge in a tree is a function of the tree structure. The trees in this model will not be Uniform Spanning Trees (though it is very possible that they will be ‘almost USTs’ in some sense – we need to investigate this further) so it will be hard to make nice statements about the rates. For the standard random graph process, if we are only interested in the sizes of the components, we are actually allowed to ignore the graph structure entirely. The component sizes evolve as a discrete, stochastic version of the multiplicative coalescent (sometimes called a Marcus-Lushnikov process). We would like a deletion mechanism that has a nice interpretation as a fragmentation operation in the same sense. The rate at which a component fragments will be quadratic in the size of the component, since there are $O(k^2)$ possible edges between k vertices forming a component, and adding any of precisely these will create a cycle.

I’ve talked previously about how to overcome the problems with the tree structure in Cycle Deletion with the so-called Uniform Cycle Deleting model. In any case, as a starting point we might consider the Cycle-Induced Forest Fire model. Here, whenever a cycle appears, we delete all the edges, including the new one, in the whole component which contains the cycle.

We suspect this model may resemble the critical random graph at all times. The main characteristic of G(n,1/n) is that the largest component is of size O(n^2/3), and indeed there are arbitrarily many components of this size, with high probability in the limit. Since CIFF is recurrent for any fixed n, meaning that it will visit any state infinitely often (rather than tending to infinity or similar), we should ask what the largest component is typically in the equilibrium distribution. Our aim is to prove that it is O(n^2/3). We might suspect that the typical size of the largest component will be greater in the Cycle Deletion model, since each fragmentation event is less severe there, removing fewer edges.

An Upper Bound

The nice thing about Markov chains is that they have an ergodic property, which means that if you run them for long enough, the proportion of time spent in any state is given by the stationary probability of being in that state. It doesn’t matter whether or not you start in equilibrium, since it will converge anyway. Thus it is meaningful to talk about properties like the average number of isolated vertices as a time-average as well as an average with respect to some distribution.

This quantity is the key to an upper bound. We can equally talk about the average change in the number of isolated vertices in a time-step. This will increase when a component fragments, and will decrease when an isolated vertex coalesces with another component. In particular, the largest possible decrease in the number of isolated vertices in a single time-step is 2, corresponding to an edge appearing between two isolated vertices.

Suppose that with probability $\Theta(1)$ there is a component of size $n^\alpha$ for some $\alpha>2/3$. Then such a component makes a contribution to the expected change in the number of isolated vertices of

$\Theta(1) n^\alpha \left(\frac{n^\alpha}{n}\right)^2.$ (*)

Where does this come from? Well, we are tracking the contributions from the event that the largest component is of this size and that it fragments, giving $n^\alpha$ new isolated vertices. So the $\Theta(1)$ accounts for the probability that there is such a component to begin with. Then, conditional on that, the probability that it gets fragmented in the next time-step is the probability that both ends of the next edge added lie in that component. Since the edge is chosen uniformly at random, the probability of this is $n^\alpha/n$. Note that this is under a slightly odd definition of an edge, that allows loops. Basically, I don’t want to have lots of correction terms involving $\binom{n}{2}$ floating around. However, it would make no difference to the orders of magnitude if we to do it with these.

So, this is only one contribution to the typical rate of gain of isolated vertices. Now note that if $\alpha>2/3$, then this expression is >> 1. This is bad since the negative contributions to this expected flux in the number of isolated vertices is O(1). So this suggests that over time, the number of isolated vertices will keep growing. This is obviously ridiculous since a) we are in equilibrium, so the expected flux should be 0 and b) the number of isolated vertices cannot exceed n, for clear reasons.

This gives us an upper bound of n^2/3 as the typical scale of the largest component. We can come up with a similar argument for the cycle deleting model. The most helpful thing to track there is the number of edges in the graph. Note that since the graph is at all times a forest on n vertices, the number of edges is equal to n minus the number of (tree) components. We use the fact that the typical fragmentation of a component of size k creates $O(\sqrt{k})$ new components. It is possible to argue via isolated vertices here too, but the estimates are harder, or at least less present in the literature.

Lower Bounds?

The problem with lower bounds is that it is entirely possible that the flux in the number of isolated vertices is not driven by typical behaviour. Suppose for example we had a different rule. We begin a random graph process, and the first time we see a cycle in a component with size larger than n^2/3, we delete all the edges in the whole graph. Then we will see a sequence of random graph processes starting with the empty graph and stopped at some point close to criticality (in fact, with high probability in the *critical window*), and these will all be glued together. So then, most of the time the process will look subcritical, but the gains in isolated vertices will occur only during the critical periods, which are only an asymptotically small proportion of the time.

At the moment, my approach to the lower bound is instead to prove that the upper bound is tight. I mean this in the following sense. Suppose we wanted to be sure that (*) was in fact equal to the average rate of gain of isolated vertices. We would have to check the following:

• That the total contributions from all other components were similar or smaller than from the component(s) of size roughly $n^{\alpha}$.
• That there were only a few components of size $n^{\alpha}$. In particular, the estimate would be wrong if there were $n^\epsilon$ such components for any $\epsilon>0$.
• That it cannot be the case that for example, some small proportion of the time there is a component of size roughly $n^{\alpha+\epsilon}$, and over a large enough time these make a greater contribution to the average gain in isolated vertices.

A nice way to re-interpret this is to consider some special vertex and track the size of its component in time. It will be involved in repeated fragmentations over the course of time, so it is meaningful to talk about the distribution of the size of the component containing the vertex when it is fragmented. Our aim is to show that this distribution is concentrated on the scaling $O(n^\alpha)$.

So this has turned out to be fairly hard. Rather than try to explain some of the ideas I’ve employed in attempting to overcome this, I will finish by giving one reason why it is hard.

We have seen that the component sizes in random graphs evolve as the multiplicative coalescent, but at a fixed moment in time, we can derive good estimates from an analogy with branching processes. We might like to do that here. If we know what the system looks like most of the time, we might try to ‘grow’ a multiplicative coalescent, viewing it like a branching process, with distribution given by the typical distribution. The problem is that when I do this, I find that the expectation of the offspring distribution is $\Theta(1)$. This looks fine, since 1 is the threshold for extinction with probability 1. However, throughout the analysis, I have only been paying attention to the exponent of n in all the time and size estimates. For example, I view $n^\alpha$ and $n^\alpha \log n$ as the same. This is a problem, as when I say the expectation is $\Theta(1)$, I am really saying it is $\sim n^0$. This means it could be $\frac{1}{\log n}$ or $\log n$. Of course, there is a massive difference between these, since a branching process grows expectationally!

So, this approach appears doomed in its current form. I have some other ideas, but a bit more background may be required before going into those. I’m going to be rather busy with teaching on my return to the office, so unfortunately it is possible that there may be many posts about second year probability and third year applied probability before anything more about CIFF.

The Contour Process

As I explained in my previous post, I haven’t been reading around as much as I would generally like to recently. A few days in London staying with my parents and catching up with some friends has therefore been a good chance to get back into the habit of leafing through papers and Pitman’s book among other things.

This morning’s post should be a relatively short one. I’m going to define the contour process, a function of a (random or deterministic) tree, related to the exploration process which I have mentioned a few times previously. I will then use this to prove a simple but cute result equating in distribution the sizes of two different branching processes via a direct bijection.

The Contour Process

To start with, we have to have a root, and from that root we label the tree with a depth-first labelling. An example of this is given below. It is helpful at this stage to conceive this process as an explorer walking on the tree, and turning back on themselves only when there is no option to visit a vertex they haven’t already seen. So in the example tree shown, the depth-first exploration visits vertex V_2 exactly four times. Note that with this description, it is clear that the exploration traverses every edge exactly twice, and so the length of the sequence is 2n-1, where n is the number of vertices in the tree since obviously, we start and end at the root.

Another common interpretation of this depth-first exploration is to take some planar realisation of the tree. (Note trees are always planar – proof via induction after removing a leaf.) Then if you treat the tree as a hedge and starting at the root walk along, following the outer boundary with your right hand, this exactly recreates the process.

The height of a tree at a particular vertex is simply the graph distance between that vertex and the root. So when we move from one vertex to an adjacent vertex, the height must increase or decrease by 1.

The contour process is the sequence of heights seen along the depth-first exploration. It is therefore a sequence:

$0=h_0,h_1,\ldots,h_{2n-1}=0,\quad h_i\geq 0,$

and such that $|h_{i+1}-h_i|=1$.

Note that though the contour process uniquely determines the tree structure, the choice of depth-first labelling is a priori non-canonical. For example, in the display above, V_3 might have been explored before V_2. Normally this is resolved by taking the suitable vertex with the smallest label in the original tree to be next. It makes little difference to any analysis to choose the ordering of descendents of some vertex in a depth-first labelling randomly. Note that this explains why it is rather hard to recover Cayley’s theorem about the number of rooted trees on n vertices from this characterisation. Although the number of suitable contour functions is possible to calculate, we would require a complicated multiplicative correction for labelling if we wanted to recover the number of trees.

The only real observation about the uses of the contour process at this stage is that it is not in general a random walk with IID increments for a Galton-Watson branching process. This equivalence is what made the exploration process so useful. In particular, it made it straightforward, at least heuristically, to see why large trees might have a limit interpretation through Brownian excursions. If for example, the offspring distribution is bounded above, say by M, then the contour process certainly cannot be a random walk, as if we have visited a particular vertex exactly M+1 times, then it cannot have another descendent, and so we must return closer to the root at the next step.

I want to mention that in fact Aldous showed his results on scaling limits towards the Continuum Random Tree through the contour process rather than the exploration process. However, I don’t want to say any more about that right now.

A Neat Equivalence

What I do want to talk about is the following distribution on the positive integers. This comes up in Balazs Rath and Balint Toth’s work on forest-fires on the complete graph that I have been reading about recently. The role of this distribution is a conjectured equilibrium distribution for component size in a version of the Erdos-Renyi process where components are deleted (or ‘struck by lightning’) at a rate tuned so that giant components ‘just’ never emerge.

This distribution has the possibly useful property that it is the distribution of the total population size in a Galton-Watson process with Geom(1/2) offspring distribution. It is also the distribution of the total number of leaves in a critical binary branching process, where every vertex has either two descendents or zero descendents, each with probability 1/2. Note that both of these tree processes are critical, as the expected number of offspring is 1 in each case. This is a good start, as it suggests that the relevant equilibrium distribution should also have the power-law tail that is found in these critical branching processes. This would confirm that the forest-fire model exhibits self-organised criticality.

Anyway, as a sanity check, I tried to find a reason why, ignoring the forest-fires for now, these two distributions should be the same. One can argue using generating functions, but there is also the following nice bijective argument.

We focus first on the critical Geometric branching process. We examine its contour function. As explained above, the contour process is not in general a random walk with IID increments. However, for this particular case, it is. The geometric distribution should be viewed as the family of discrete memoryless distributions.

This is useful for the contour process. Note that if we are at vertex V for the (m+1)th time, that is we have already explored m of the edges out of V, then the probability that there is at least one further edge is 1/2, independently of the history of the exploration, as the offspring distribution is Geometric(1/2), which we can easily think of as adding edges one at a time based on independent fair coin tosses until we see a tail for example. The contour process for this random tree is therefore a simple symmetric random walk on Z. Note that this will hit -1 at some point, and the associated contour process is the RW up to the final time it hits 0 before hitting -1. We can check that this obeys the clear rule that with probability 1/2 the tree is a single vertex.

Now we consider the other model, the Galton-Watson process with critical binary branching mechanism. We should consider the exploration process. Recall that the increments in this process are given by the offspring distribution minus one. So this random sequence also behaves as a simple symmetric random walk on Z, again stopped when we hit -1.

To complete the bijective argument, we have to relate leaves in the binary process to vertices in the geometric one. A vertex is a leaf if it has no offspring, so the number of leaves is the number of times before the hitting time of -1 that the exploration process decreases by 1. (*)

Similarly for the contour process. Note that there is bijection between the set of vertices that aren’t the root and the set of edges. The contour process explores every edge exactly twice, once giving an increase of 1 and once giving a decrease of 1. So there is a bijection between the times that the contour process decreases by 1 and the non-root vertices. But the contour process was defined only up to the time we return to the root. This is fine if we know in advance how large the tree is, but we don’t know which return to the root is the final return to the root. So if we extend the random walk to the first time it hits -1, the portion up until the last increment is the contour process, and the final increment must be a decrease by 1, hence there is a bijection between the number of vertices in the Geom(1/2) G-W tree and the number of times that the contour process decreases by 1 before the hitting time of -1. Comparing with (*) gives the result.

Recent Progress and Gromov-Hausdorff Convergence

For the past few weeks I’ve been working on the problem of Cycle-Induced Forest Fires, which I’ve referred to in passing in some recent posts. The aim has been to find a non-contrived process which exhibits self-organised criticality, that is, where the process displays critical characteristics (scaling laws, multiple components at the largest order of magnitude) forever. Note that this is in contrast to the conventional Erdos-Renyi graph process, which is only critical at a single time n/2.

The conjecture is that the largest component in equilibrium typically has size on a scale of n^2/3. An argument based on the equilibrium proportion of isolated vertices gives an upper bound on this exponent. The working argument I have for the lower bound at the moment can comfortably fit on the back of a napkin, with perhaps some context provided verbally. Of course, the current full text is very much larger than that, mainly because the napkin would feature assertions like “event A happens at time $O(n^\beta)$“; whereas the more formal argument has to go like:

“With high probability as $n\rightarrow\infty$, event A happens between times $n^{\beta-\epsilon},n^{\beta+\epsilon},$ for any suitably small $\epsilon>0.$ Furthermore, the probability that A happens after this upper threshold decays exponentially with n for fixed $\epsilon$, and the probability that A happens before the lower threshold is at most $n^{-\epsilon}$. Finally, this is under the implicit assumption that there will be no fragmentations before event A, and this holds with probability $1-o(1)$ etc.”

It’s got to the point where I’ve exhausted the canonical set of symbols for small quantities: $\epsilon,\delta,(\eta ?)$.

This has been a very long way of setting up what was going to be my main point, which is that at many points during undergraduate mathematics, colleagues (and occasionally to be honest, probably myself too) have complained that they “don’t want to have anything to do with analysis. They just want to focus on algebra / number theory / statistics / fluids…” Anyway, the point of this ramble was that I think I’ve realised that it is very hard to think about any sort of open problem without engaging with the sort of ideas that a few years ago I would have thought of (and possibly dismissed) as ‘analysis’.

Much of my working on this problem has been rather from first principles, so haven’t been thinking much about any neat less elementary theory recently.

Ok, so on with the actual post now.

Last month I talked about local limits of graphs, which describe convergence in distribution of (local) neighbourhood structure about a ‘typical’ vertex. This is the correct context in which to make claims like “components of $G(n,\frac{\lambda}{n})$ look like Galton-Watson trees with offspring distribution $\text{Po}(\lambda)$“.

Even from this example, we can see a couple of drawbacks and omissions from this limiting picture. In the sub-critical regime, this G-W tree will be almost surely finite, but the number of vertices in the graph is going to infinity. More concretely, the limit description only tells us about a single component. If we wanted to know about a second component, in this case, it would be roughly independent of the size of the first component, and with the same distribution, but if we wanted to know about all components, it would get much more complicated.

Similarly, this local limit description isn’t particularly satisfactory in the supercritical regime. When the component in question is finite, this description is correct, but with high probability we have a giant component, and so the ‘typical’ vertex is with some positive probability in the giant component. This is reflected by the fact that the G-W tree with supercritical offspring distribution is infinite with some positive probability. However, the giant component does not look like a $\text{Po}(\lambda)$ G-W tree. As we exhaust O(n) vertices, the offspring distribution decreases, in expectation at least. In fact, without the assumption that the giant component is with high probability unique (so $\frac{L_1}{n}=1-\mathbb{P}(|C(v)|<\infty$), we can’t even deduce the expected size of the giant component from the local limit result.

This is all unsurprising. By definition a local limit describes the structure near some vertex. How near? Well, finitely near. It can be arbitrarily large, but still finite, so in particular, the change in the offspring distribution after O(n) vertices as mentioned above will not be covered.

So, if we want to learn more about the global structure of a large discrete object, we need to consider a different type of limit. In particular, the limit will not necessarily be a graph. Rather than try to define a priori a ‘continuum’ version of a graph, it is sensible to generalise from the idea that a graph is a discrete object and instead consider it as a metric space.

In this article, I don’t want to spend much time at all thinking about how to encode a finite graph as a metric space. We have a natural notion of graph distance between vertices, and it is not hard to extend this to points on edges. Alternatively, for sparse graphs, we have an encoding through various functions, which live in some (metric) function space.

However, in general, the graph will be a metric object itself, rather than necessarily a subset of a global metric space. We will be interested in convergence, so we need a suitable style of convergence of different metric spaces.

The natural candidate for this is the Gromov-Hausdorff metric, and the corresponding Gromov-Hausdorff convergence.

The Hausdorff distance between two subsets X, Y of a metric space is defined as follows. Informally, we say that $d_H(X,Y)<\epsilon$ if any point of X is within distance $\epsilon$ from some point of Y, in the sense of the original metric. Formally

$d_H(X,Y):=\max \{\sup_{x\in X}\inf_{y\in Y}d(x,y), \sup_{y\in Y}\inf_{x\in X}d(x,y)\}.$

It is not particularly illuminating to prove that this is in fact a metric. In fact, it isn’t a metric as the definition stands, but rather a pseudo-metric, which is exactly the same, only allowing d(X,Y)=0 when X and Y are not equal. Note that

$d(X^\circ,\bar X)=0,$

for any set X, so this gives an example, provided X is not both open and closed. Furthermore, if the underlying metric space is unbounded, then the Hausdorff distance between two sets might be infinite. For example in $\mathbb{R}$,

$d_H(\mathbb{R}_{<0},\mathbb{R}_{>0})=\infty.$

We can overcome this pair of objections by restricting attention to closed, bounded sets. In practice, many spaces under consideration will be real in flavour, so it makes sense to define this for compact sets when appropriate.

But this still leaves the underlying problem, which is how to define a distance function on metric spaces. If two metric spaces X and Y were both subspaces of some larger metric space then it would be easy, as we now have the Hausdorff distance. So this is in fact how we proceed in general. We don’t need any knowledge of this covering space a priori, we can just choose the one which minimises the resulting Hausdorff distance. That is

$d_{GH}(X,Y)=\inf\{d_H(\phi(X),\psi(Y))\},$

where the infimum is taken over all metric spaces (E,d), and isometric embeddings $\phi: X\rightarrow E, \psi: Y\rightarrow E$.

The first observation is that this will again be a pseudometric unless we demand that X, Y be closed and bounded. The second is that this index set is not a set. Fortunately, this is quickly rectified. Instead consider all metrics on the disjoint union of sets X and Y, which is set, and contains the subset of those metrics which restrict to the correct metric on each of X and Y. It can be checked that this forms a true metric on the set of compact metric spaces up to isometry.

We have an alternative characterisation. Given compact sets X and Y, a correspondence between X and Y is a set of pairs in $X\times Y$, such that both projection maps are surjective. Ie for any x in X, there is some pair (x,y) in the correspondence. Let $\mathcal{C}(X,Y)$ be the set of such correspondences. We then define the distortion of correspondence $\mathcal{R}$ by:

$\text{dis}(\mathcal{R}):=\sup\{|d_X(x_1,x_2)-d_2(y_1,y_2)|: (x_i,y_i)\in\mathcal{R}\}.$

Then

$d_{GH}(X,Y)=\frac{1}{2}\inf_{\mathcal{R}\in\mathcal{C}(X,Y)}\text{dis}(\mathcal{R}).$

In particular, this gives another reason why we don’t have to worry about taking an infimum over a proper class.

Gromov-Hausdorff convergence then has the natural definition. Note that this does not respect topological equivalence, ie homeomorphism. For example,

$\bar{B(0,\frac{1}{n})}\stackrel{GH}{\rightarrow} \{0\},$

where the latter has the trivial metric. In particular, although all the closed balls are homeomorphic, the G-H limit is not.

A final remark is that the trees we might be looking at are not necessarily compact, so it is useful to have a notion of how this might be extended to non-compact spaces. The answer is to borrow the idea from local limits of considering large finite balls around a fixed central point. In the case of trees, this is particularly well-motivated, as it is often quite natural to have a canonical choice for the ‘root’.

Then with identified points $p_n\in X_n$, say $(X_n,p_n)\rightarrow (X,p)$ if for any R>0 the R-ball around p_n in X_n converges to the R-ball around p in X. We adjust the definition of distortion to include the condition that the infimum be over correspondences for which $(p_X,p_Y)$ is an element.

REFERENCES

This article was based on some lecture notes by Jean-Francois Le Gall from the Clay Institute Summer School which can be found on the author’s website here (about halfway down the page). This material is in chapter 3. I also used Nicolas Curien’s tutorials on this chapter to inform some of the examples. The resolution of the proper class problem was mentioned by several sources I examined. These notes by Jan Christina were among the best.