# Lecture 9 – Inhomogeneous random graphs

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

As we enter the final stages of the semester, I want to discuss some extensions to the standard Erdos-Renyi random graph which has been the focus of most of the course so far. In doing so, we can revisit material that we have already covered, and discover how easily one can extend this directly to more exotic settings.

The focus of this lecture was the model of inhomogeneous random graphs (IRGs) introduced by Soderberg [Sod02] and first studied rigorously by Bollobas, Janson and Riordan [BJR07]. Soderberg and this blog post address the case where vertices have a type drawn from a finite set. BJR address the setting with more general typespaces, in particular a continuum of types. This generalisation is essential if one wants to use IRGs to model effects more sophisticated than those of the classical Erdos-Renyi model G(n,c/n), but most of the methodology is present in the finite-type setting, and avoids the operator theory language which is perhaps intimidating for a first-time reader.

Inhomogeneous random graphs

Throughout, $k\ge 2$ is fixed. A graph with k types is a graph G=(V,E) together with a type function $V\to \{1,\ldots,k\}$. We will refer to a $k\times k$ symmetric matrix with non-negative entries as a kernel.

Given $n\in\mathbb{N}$ and a vector $p=(p_1,\ldots,p_k)\in\mathbb{N}_0^k$ satisfying $\sum p_i=n$, and $\kappa$ a kernel, we define the inhomogeneous random graph $G^n(p,\kappa)$ with k types as:

• the vertex set is [n],
• types are assigned uniformly at random to the vertices such that exactly $p_i$ vertices have type i.
• Conditional on these types, each edge $v\leftrightarrow w$ (for $v\ne w\in [n]$) is present, independently, with probability

$1 - \exp\left(-\frac{\kappa_{\mathrm{type}(v),\mathrm{type}(w)} }{n} \right).$

Notes on the definition:

• Alternatively, we could assign the types so that vertices $\{1,\ldots,p_1\}$ have type 1, $\{p_1+1,\ldots,p_1+p_2\}$ have type 2, etc etc. This makes no difference except in terms of the notation we have to use if we want to use exchangeability arguments later.
• An alternative model considers some distribution $\pi$ on [k], and assigns the types of the vertices of [n] in an IID fashion according to $\pi$. Essentially all the same results hold for these two models. (For example, this model with ‘random types’ can be studied by quenching the number of each type!) Often one works with whichever model seems easier for a given proof.
• Note that the edge probability given is $\approx \frac{\kappa_{\mathrm{type}(v),\mathrm{type}(w)}}{n}$. The exponential form has a more natural interpretation if we ever need to turn the IRGs into a process. Additionally, it avoids the requirement to treat small values of n (for which, a priori, $k/n$ might be greater than 1) separately.

In the above example, one can see that, roughly speaking, red vertices are more likely to be connected to each other than blue vertices. However, for both colours, they are more likely to be connected to a given vertex of the same colour than a vertex of the opposite colour. This might, for example, correspond to the kernel $\begin{pmatrix}3&1\\1&2\end{pmatrix}$.

The definition given above corresponds to a sparse setting, where the typical vertex degrees are $\Theta(1)$. Obviously, one can set up an inhomogeneous random graph in a dense regime by an identical argument.

From an applications point of view, it’s not hard to imagine that an IRG of some flavour might be a good model for many phenomena observed in reality, especially when a mean-field assumption is somewhat appropriate. The friendships of boys and girls in primary school seems a particularly resonant example, though doubtless there are many others.

One particular application is to recover the types of the vertices from the topology of the graph. That is, if you see the above picture without the colours, can you work out which vertices are red, and which are blue? (Assuming you know the kernel.) This is clearly impossible to do with anything like certainty in the sparse setting – how does one decide about isolated vertices, for example? The probabilities that a red vertex is isolated and that a blue vertex is isolated differ by a constant factor in the $n\rightarrow\infty$ limit. But in the dense setting, one can achieve this with high confidence. When studying such statistical questions, these IRGs are often referred to as stochastic block models, and the recent survey of Abbe [Abbe] gives a very rich history of this type of problem in this setting.

Poisson multitype branching processes

As in the case of the classical random graph G(n,c/n), we learn a lot about the IRG by studying its local structure. Let’s assume from now on that we are given a sequence of IRGs $G^n(p^n,\kappa)$ for which $\frac{p^n}{n}\rightarrow \pi$, where $\pi=(\pi_1,\ldots,\pi_k)\in[0,1]^k$ satisfies $||\pi||_1=1$.

Now, let $v^n$ be a uniformly-chosen vertex in [n]. Clearly $\mathrm{type}(v^n)\stackrel{d}\rightarrow \pi$, with the immediate mild notation abuse of viewing $\pi$ as a probability distribution on [k].

Then, conditional on $\mathrm{type}(v^n)=i$:

• when $j\ne i$, the number of type j neighbours of $v^n$ is distributed as $\mathrm{Bin}\left(p_j,1-\exp\left(-\frac{\kappa_{i,j}}{n}\right)\right)$.
• the number of type i neighbours of $v^n$ is distributed as $\mathrm{Bin}\left( p_i-1,1-\exp\left(-\frac{\kappa_{i,i}}{n}\right)\right)$.

Note that $p_j\left[1-\exp\left(-\frac{\kappa_{i,j}}{n}\right)\right]\approx \frac{p_j\cdot \kappa_{i,j}}{n} \approx \kappa_{i,j}\pi_j$, and similarly in the case j=i, so in both cases, the number of neighbours of type j is distributed approximately as $\mathrm{Poisson}(\kappa_{i,j}\pi_j)$.

This motivates the following definition of a branching process tree, whose vertices have k types. Continue reading

# Duality for Interacting Particle Systems

Yesterday I introduced the notion of duality for two stochastic processes. My two goals for this post are to elaborate on the idea of why duality is useful, which we touched on in passing in the previous part, and to discuss duality of interacting particle systems. In the latter case, there are often nice ways to consider the forward and backward processes together that make the relation somewhat more natural.

The starting point is to assume a finite state space. This will be reasonable when we start to consider interacting particle systems, eg on $\{0,1\}^{[n]}$. As before, call the spaces R and S, and a duality function H(x,y). Since the state-spaces are finite, it is entirely natural to think of this as a matrix, and hence as an operator. Of course, a function defined on a finite state-space can be thought of as a vector, so it is clear what this operator will actually operate on. (I’ve chosen H rather than h for the duality function so it is more clear that it is acting as an operator here.)

We have some choice about which way round to define it, but for now let’s say that given some function f(.) on S

$Hf(x):=\sum_{y\in S} H(x,y)f(y).$

Note that this is a) exactly the definition of matrix (left-)multiplication; b) We should think of Hf as a function on R – perhaps (Hf)(x) might be more clear? and c) the operator H acts $\mathbb{R}^S\rightarrow \mathbb{R}^R$. If we want the corresponding operator $\mathbb{R}^R\rightarrow\mathbb{R}^S$, we simply multiply by H on the right instead.

But note also that the generator of a finite state-space Markov process is also a matrix, indeed a Q-matrix. So if we take our definition of the duality function as

$\mathcal{G}_X h(x,y)=\mathcal{G}_Y h(x,y),$

which, importantly, holds for all x,y, we can convert this into an algebraic form as

$\mathcal{G}_X H = H \mathcal{G}_Y^\dagger.$

In the same way that n-step transition probabilities for a discrete-time Markov chain are given by the product of the one-step transition matrix, general time transition probabilities for a continuous-time Markov chain are given by exponents of the Q-matrix. In particular, if X and Y have transition kernels P and Q respectively, then $P_t=e^{tG_X}$, and after doing some manipulation, we can show that

$P_t H=H Q_t^\dagger,$

also. This is really useful as in general we would hope that H might be invertible, from which we derive

$P_t=HQ_t^\dagger H^{-1}.$

So this is a powerful statement about the relationship between the evolutions of the two processes. In particular, it shows a correspondence (given by H) between left eigenvectors of P, and right eigenvectors of Q, and vice versa naturally.

The reason why this is useful rather than merely cute, is that when we re-interpret everything in terms of the original stochastic processes, we get a map between stationary distributions of X, and harmonic functions of Y. Stationary distributions are often hard to describe in any terms other than the left-1-eigenvector, or through some convergence property that is typically hard to work with. Harmonic functions, on the other hard, can be much more tractable. An example of a harmonic function is the survival probability started from a given state. This is useful for specifying the stationary distribution, but perhaps even more so for describing properties of the set of stationary distributions. In particular, uniqueness and existence are carried across this equivalence. So, for example, if the dual does not survive almost surely, then this says the only stationary measure is zero, and so the process is transient or similar.

Jan Swart’s course in Luminy last October dealt with duality, with a focus mainly on interacting particle systems. There are a couple of themes I want to talk about, without going into too much detail.

A typical interacting particle system will take place on a locally finite graph. At each vertex, there is either a particle, or there isn’t. Particles move between adjacent vertices, and sometimes interact with particles at adjacent vertices. These interactions might involve branching or coalescence. We will discuss shortly the set of possible forms such interaction might take. The state space is $\{0,1\}^{V(G)}$, with G the underlying graph. Then given a state, there is some set of actions which might happen next, and we consider the possibility that they happen with exponential rates.

At this stage, it seems like the initial configuration is important, as this affects what set of moves can happen immediately, and also thereafter. It is not clear how quickly this dependence fades. One useful idea is not to restrict ourselves to interactions involving the particles currently present in the system, but instead to consider a Poisson process of all possible interactions. Only the moves actually permitted by the current state will happen, but having this extra information allows for coupling between initial configurations.

It’s probably easier to consider a concrete example. The picture below shows the set-up for a branching random walk up an integer lattice. Each particle moves to one of the two state directly above its current state, or it branches and sends particles to both of them.In the diagram, we have glued arrows onto every state at every time, which tells us what to do if there is a particle there at each time. As a coupling, we can now think of the process as a deterministic walk through a random environment. The environment is given by some probability space, which in continuous time might have the appearance of a Poisson process on the set of ‘moves’, and the initial condition of the walk is up to us.

We can generalise this to a broader class of interacting particle systems. If we want all interactions to be between pairs of adjacent states, there are six possible things which could happen:

• Annihilation: two adjacent particles destroy each other. ( 11 -> 00 )
• Branching: one particle becomes two particles. ( 01 or 10 -> 11 )
• Coalescence: two particles merge. ( 11 -> 01 or 10 )
• Death: A particle is removed. ( 01 or 10 -> 00 )
• Exclusion: a particle moves. ( 01 -> 10 )
• Birth: a particle is created. ( 00 -> 01 or 10 )

For now we exclude the possibility of birth. Note that the way we have set this up involving two-site interactions excludes the possibility of a particle trying to move to an already-occupied site.

Let us say that in process X the rates at which each of these events happen are a, b, c, d and e, taking advantage of the helpful choice of naming. There is some flexibility about whether the rates are the same between every pair of vertices of note. For this post we assume that they are. Then it is a result of Lloyd and Sudbury that given some real $q\neq 1$, the process X’ with corresponding rates given by:

$a'=a+2q\gamma, b'=b+\gamma, c'=c-(1+q)\gamma, d'=d+\gamma, e'=e-\gamma,$

for $\gamma:= \frac{a+c-d+qb}{1-q},$

is dual to X, with duality function given by $h(Y,Z)=q^{|Y\cap Z|}$, for Y and Z possible states.

I want to make two comments:

1) This illustrates one of the differences between the dual and the time-reversal. It is clear that the time-reversal of branching is coalescence and vice versa, and exclusion is invariant under time-reversal. But the time-reversal of death is definitely birth, but there is no birth component in the dual of a process which features death. I don’t have a strong intuition for why this is the case, but see the final paragraph of this post. However, at least it seems plausible that both processes might simultaneously be recurrent, since in the dual, both the branching rate and the death rate have increased by the same amount.

2) This settles one problem of uniqueness of the dual that I mentioned last time, since we can vary q and get a different dual to the same original process. For example, in the voter model, we have b=d=1, and a=c=e=0, as in any update, the opinions of neighbours which were previously different become the same. Anyway, for any $q\in[-1,0]$ there is a choice of dual, where at the extremes q=0 corresponds to coalescing random walk, and q=-1 to annihilating random walk. (Note that the duality function for q=0 is the indicator function that the systems are different.)

As a final observation without much justification, suppose we add in arrows in the gaps of the branching random walk picture we had earlier, and direct them in the opposite direction. It turns out that this corresponds precisely to the dual of the process. This provides an appealing visual idea of why the dual of branching might be death. It also supports the general idea based on the coupling described earlier that the dual process is in some sense a deterministic walk in the opposite direction through the random environment specified by the original process.

REFERENCES

J.M. Swart – Duality and Intertwining of Markov Chains (mainly using chapters 2.1 and 2.7)

Thanks for Daniel Straulino for direction towards the branching random walk duality example.

# Multitype Branching Processes

One of the fundamental objects in classical probability theory is the Galton-Watson branching process. This is defined to be a model for the growth of a population, where each individual in a generation gives birth to some number (possibly zero) of offspring, who form the next generation. Crucially, the numbers of offspring of the individuals are IID, with the same distribution both within generations and between generations.

There are several ways one might generalise this, such as non-IID offspring distributions, or pairs of individuals producing some number of offspring, but here we consider the situation where each individual has some type, and different types have different offspring distributions. Note that if there are K types, say, then the offspring distributions should now be supported on $\mathbb{Z}_{\ge 0}^K$. Let’s say the offspring distribution from a parent of type i is $\mu^{(i)}$.

The first question to address is one of survival. Recall that if we want to know whether a standard Galton-Watson process has positive probability of having infinite size, that is never going extinct, we only need to know the expectation of the offspring distribution. If this is less than 1, then the process is subcritical and is almost surely finite. If it is greater than 1, then it is supercritical and survives with positive probability. If the expectation is exactly 1 (and the variance is finite) then the process is critical and although it is still almost surely finite, the overall population size has a power-law tail, and hence (or otherwise) the expected population size is infinite.

We would like a similar result for the multitype process, saying that we do not need to know everything about the distribution to decide what the survival probability should be.

The first thing to address is why we can’t just reduce the multitype change to the monotype setting. It’s easiest to assume that we know the type of the root in the multitype tree. The case where the type of the root is random can be reconstructed later. Anyway, suppose now that we want to know the offspring distribution of a vertex in the m-th generation. To decide this, we need to know the probability that this vertex has a given type, say type j. To calculate this, we need to work out all the type possibilities for the first m generations, and their probabilities, which may well include lots of complicated size-biasing. Certainly it is not easy, and there’s no reason why these offspring distributions should be IID. The best we can say is that they should probably be exchangeable within each generation.

Obviously if the offspring distribution does not depend on the parent’s type, then we have a standard Galton-Watson tree with types assigned in an IID manner to the realisation. If the types are symmetric (for example if M, to be defined, is invariant under permuting the indices) then life gets much easier. In general, however, it will be more complicated than this.

We can however think about how to decide on survival probability. We consider the expected number of offspring, allowing both the type of the parent and the type of the child to vary. So define $m_{ij}$ to be the expected number of type j children born to a type i parent. Then write these in a matrix $M=(m_{ij})$.

One generalisation is to consider a Galton-Watson forest started from some positive number of roots of various types. Suppose we have a vector $\nu=(\nu_i)$ listing the number of roots of each type. Then the expected number of descendents of each type at generation n is given by the vector $\nu M^n$.

Let $\lambda$ be the largest eigenvalue of M. As for the transition matrices of Markov chains, the Perron-Frobenius theorem applies here, which confirms that, because the entries of M are positive, the eigenvalue with largest modulus is simple and real, and the associated eigenvector has entirely positive entries. [In fact we need a couple of extra conditions on M, including that it is possible to get from any type to any other type – we say irreducible – but that isn’t worth going into now.]

So in fact the total number of descendents at generation n grows like $\lambda^n$ in expectation, and so we have the same description of subcriticality and supercriticality. We can also make a sensible comment about the left-$\lambda$-eigenvector of M. This is the limiting proportion of the different types of vertices.

It’s a result (eg. [3]) that the height profile of a depth-first search on a standard Galton-Watson tree converges to Brownian Motion. Another way to phrase this is that a GW tree conditioned to have some size N has the Brownian Continuum Random Tree as a scaling limit as N grows to infinity. Miermont [4] proves that this result holds for the multitype tree as well. In the remainder of this post I want to discuss one idea along the way to the proof, and one application.

I said initially that there wasn’t a trivial reduction of a multitype process to a monotype process. There is however a non-trivial embedding of a monotype process in a multitype process. Consider all the vertices of type 1, and all the paths between such vertices. Then draw a new tree consisting of just the type 1 vertices. Two of these are joined by an edge if there is no other type 1 vertex on the unique path between them in the original tree. If that definition is confusing, think of the most sensible way to construct a tree on the type 1 vertices from the original, and you’ve probably chosen this definition.

There are two important things about this new tree. 1) It is a Galton-Watson tree, and 2) if the original tree is critical, then this reduced tree is also critical. Proving 1) is heavily dependent on exactly what definitions one takes for both the multitype branching mechanism and the standard G-W mechanism. Essentially, at a type 1 vertex, the number of type 1 descendents is not dependent on anything that happened at previous generations, nor in other branches of the original tree. This gives IID offspring distributions once it is formalised. As for criticality, we note that by the matrix argument given before, under the irreducibility condition discussed, the expectation of the total population size is infinite iff the expected number of type 1 vertices is also infinite. Since the proportion of type 1 vertices is given by the first element of the left eigenvector, which is positive, we can make a further argument that the number of type 1 vertices has a power-law tail iff the total population size also has a power-law tail.

I want to end by explaining why I was thinking about this model at all. In many previous posts I’ve discussed the forest fire model, where occasionally all the edges in some large component are deleted, and the component becomes a set of singletons again. We are interested in the local limit. That is, what do the large components look like from the point of view of a single vertex in the component? If we were able to prove that the large components have BCRT as the scaling limit, this would answer this question.

This holds for the original random graph process. There are two sensible ways to motivate this. Firstly, given that a component is a tree (which it is with high probability if its size is O(1) ), its distribution is that of the uniform tree, and it is known that this has BCRT as a scaling limit [1]. Alternatively, we know that the components have a Poisson Galton-Watson process as a local limit by the same argument used to calculate the increments of the exploration process. So we have an alternative description of the BCRT appearing: the scaling limit of G-W trees conditioned on their size.

Regarding the forest fires, if we stop the process at some time T>1, we know that some vertices have been burned several times and some vertices have never received an edge. What is clear though is that if we specify the age of each vertex, that is, how long has elapsed since it was last burned; conditional on this, we have an inhomogeneous random graph. Note that if we have two vertices of ages s and t, then the probability that there is an edge between them is $1-e^{-\frac{s\wedge t}{n}}$, ie approximately $\frac{s\wedge t}{n}$. The function giving the probabilities of edges between different types of vertices is called the kernel, and here it is sufficiently well-behaved (in particular, it is bounded) that we are able to use the results of Bollobas et al in [2], where they discuss general sparse inhomogeneous random graphs. They show, among many other things, that in this setting as well the local limit is a multitype branching process.

So in conclusion, we have almost all the ingredients towards proving the result we want, that forest fire components have BCRT scaling limit. The only outstanding matter is that the Miermont result deals with a finite number of types, whereas obviously in the setting where we parameterise by age, the set of types is continuous. In other words, I’m working hard!

References

[1] Aldous – The Continuum Random Tree III

[2] Bollobas, Janson, Riordan – The phase transition in inhomogeneous random graphs

[3] Le Gall – Random Trees and Applications

[4] Miermont – Invariance principles for spatial multitype Galton-Watson trees

# Discontinuous Phase Transitions

Yesterday, Demeter Kiss from Cambridge gave a seminar in Oxford about a model for self-destructive percolation on $\mathbb{Z}^2$ that had implications for the (non-)existence of an infinite-parameter forest fire model on the same lattice. I enjoyed talking about this and his recent work on the related model of frozen percolation on $\mathbb{Z}^2$. Considering these models in the lattice setting present a whole range of interesting geometric challenges that are not present in the mean-field case that has mainly occupied my research direction so far.

The afternoon’s discussion included lots of open problems about percolation. Several of these are based around continuity of the phase transition, so I thought I would write a quite post about some simple examples of this, and one example where it does not hold.

A helpful base example is bond percolation on the lattice $\mathbb{Z}^2$. Here, we specify some probability p in [0,1], and we declare edges of the lattice open with probability p, independently of each other. We then consider the graph induced by the open edges. We say that percolation occurs if the origin is contained in an infinite open component. The terminology arises from the interpretation as fluid being added at the origin and flowing down open edges. We define $\theta(p)$ to be the probability that the origin is in an infinite component when the parameter is p. By translation-invariance, we can get some sort of 0-1 law, to conclude that there is an infinite component somewhere in the system with probability either 0 or 1, depending on whether $\theta(p)$ is positive or zero. Indeed, we can further show that if it is positive, then with probability 1 there is a unique infinite component.

We define the critical probability $p_c:= \inf\{\theta(p)>0\}$. A question worth asking is then, what is $\theta(p_c)$? In some examples, we can find $p_c$, but we cannot prove that $\theta(p)$ is continuous around $p_c$. In the case of $\mathbb{Z}^2$ this is known, and it is known from work of Kesten that $p_c=1/2$. See below for a plot of $\theta(p)$ in this setting (obtained from this blog, though possibly originating elsewhere).

The aim is to find an example where we do not have such a continuous phase transition. The original work on frozen percolation took place on trees, and one of Kiss’s results is confirms that these show qualitatively different phenomena to the same process on the lattice. In some sense, trees lie halfway between a lattice and a mean-field model, since there is often some independence when we look down the tree from a given generation, if it is well-defined to use such language.

Anyway, first we consider percolation on an infinite regular rooted k-ary tree. This means we have a root, which has k children, each of which in turn has k children, and so on. As before we consider bond percolation with parameter p. In this setting, we have a language to describe the resulting open component of the root. The offspring distribution of any vertex in the open component is given by Bin(k,p) independently of everything else, so we can view this component as the realisation of a Galton-Watson tree with this offspring distribution. This distribution has finite mean kp, and so we can state explicitly when the survival probability is positive. This happens when the mean is greater than 1, ie p>1/k.

For our actual example, we will consider the survival probability, but the technicalities are easier to explain if we look at the extinction probability, now using the language of branching processes. Suppose the offspring distribution has pgf given by

$f(x)=p_0+p_1x+p_2x^2+\ldots.$

Then the extinction probability q satisfies f(q)=q. I want to pause to consider what happens if this equation has multiple solutions. Indeed, in most interesting cases it will have multiple solutions, since f(1) will always be 1 if it is a non-defective offspring distribution. It is typically cited that: the extinction probability q is the smallest solution to this equation. I want to discuss why that is the case.

To approach this, we have to consider what extinction means. It is the limit in the event sense of the events {we are extinct after n generations}. Let the probabilities of these events be $q_n$, so $q_0=0$. Then by a straightforward coupling argument, we must have

$0=q_0\le q_1\le q_2 \le\ldots\le q:= \lim q_n \le 1.$

But, by the same generating function argument as before, $q_{n+1}=f(q_n)\ge q_n$. So if we split [0,1] into regions A where $f(x)\ge x$ and B where $f(x), all the $(q_n)$s must occur in the former, and so since it is closed, their limit must be in A also. Note that if f(x) intersects x lots of times, then region A is not necessarily connected. In the diagram below, in moving from $q_n$ to $q_{n+1}$ we might jump across part of B.

This is bad, as we are trying to prove that q is the right boundary of the connected component of A containing 0. But this cannot happen, as f is monotonic. So if one of the roots of f(x)=x in between the hypothesised $q_n is called z, then $f(q_n)< f(z)=z < q_{n+1}$, a contradiction.

Ok, so now we are ready to consider our counterexample to continuity over the percolation threshold. See references for a link to the original source of this example. We have to choose a slightly more complicated event than mere survival or extinction. We consider bond percolation as before on the infinite ternary tree, where every vertex has precisely 3 offspring. Our percolation event is now that the root is the root of an infinite binary tree. That is, the root has at least two children, each of which have at least two children, each of which, and so on.

If we set this probability equal to q, and the probability of an edge being open equal to p, then we have the recurrence:

$q=3p^2(1-p)q^2+p^3[3q^2(1-q)+q^3].$

The first term corresponds to the root having two open edges to offspring, and the second to the root having all three open edges to offspring. After manipulating, we end up with

$q\left[2p^3q^2-3p^2q+1\right]=0.$

We are therefore interested in roots of the quadratic lying between 0 and 1. The discriminant can be evaluated as

$\Delta=p^3(9p-8),$

and so there are no real roots where p<8/9. But when p=8/9, we have a repeated root at q=27/32, which is obviously not zero!

This equation is qualitatively different to the previous one for the extinction probability of a Galton-Watson tree. There, we had a quadratic, with one root at 1. As we varied p, the other root moved continuously from greater than one to less than one, so it passed through 1, giving continuity at the critical probability. Here, we have a cubic, again with one root at 1. But now the other roots are complex for small p, meaning that the local minimum of the cubic lies above the x-axis. As p gets to the critical value, it the local minimum passes below the x-axis, and suddenly we have a repeated root, not at zero.

I would like to have a neat probabilistic heuristic for this result, without having to make reference to generating functions. At the moment, the best I can come up with is to say that the original problem is simple, in the sense that the critical probability is as small as it could be while still making sense in expectation. To be concrete, when the mean of the offspring generation is less than 1, the expected size of the nth generation tends to zero, so there certainly could not be positive probability of having an infinite component.

Whereas in the binary tree example, we only require p=2/3 to have, in expectation, the right number of open edges to theoretically allow an infinite binary tree. If we think of percolation as a dynamic process by coupling in p, essentially as we move from p=2/3 to p=8/9 we need to add enough edges near the origin to be able to take advantage of the high density of edges available far from the origin. The probability of this working given you start from n vertices grows much faster (as n grows) than in the original problem, so you might expect a faster transition.

This is so content-free I’m reluctant even to call it a heuristic. I would be very interested to hear of any more convincing argument for this phenomenon!

REFERENCES

Dekking, Pakes – On family trees and subtrees of simple branching processes (link)