Kernels of critical graph components

This post is motivated by G(N,p), the classical Erdos-Renyi random graph, specifically its critical window, when p=p(N)=\frac{1}{N}(1+\lambda N^{-1/3}).

We start with the following observation, which makes no restriction on p. Suppose a component of G(N,p) is a tree. Then, the graph geometry of this component is that of a uniform random tree on the appropriate number of vertices. This is deliberately informal. To be formal, we’d have to say “condition on a particular subset of vertices forming a tree-component” and so on. But the formality is broadly irrelevant, because at the level of metric scaling limits, if we want to describe the structure of a tree component, it doesn’t matter whether it has \log N or \frac{1}{7}N vertices, because in both cases the tree structure is uniform. The only thing that changes is the scaling factor.

In general, when V vertices form a connected component of a graph with E edges, we define the excess to be E-V+1. So the excess is non-negative, and is zero precisely when the component is a tree. I’m reluctant to say that the excess counts the number of cycles in the component, but certainly it quantifies the amount of cyclic structure present. We will sometimes, in a mild abuse of notation, talk about excess edges. But note that for a connected component with positive excess, there is a priori no way to select which edges would be the excess edges. In a graph process, or when there is some underlying exploration of the component, there sometimes might be a canonical way to classify the excess edges, though it’s worth remarking that the risk of size-biasing errors is always extremely high in this sort of situation.

Returning to the random graph process, as so often there are big changes around criticality. In the subcritical regime, the components are small, and most of them, even the largest with high probability, are trees. In the supercritical regime, the giant component has excess \Theta(N), which is qualitatively very different.

It feels like every talk I’ve ever given has begun with an exposition of Aldous’s seminal paper [Al97] giving a distributional scaling limit of the sizes of critical components in the critical window, and a relation between the process on this time-scale and the multiplicative coalescent. And it remains relevant here, because the breadth-first exploration process can also be used to track the number of excess edges.

In a breadth-first exploration, we have a stack of vertices we are waiting to explore. We pick one and look its neighbours restricted to the rest of the graph, that is without the vertices we have already fully explored, and also without the other vertices in the stack. That’s the easiest way to handle the total component size. But we can simultaneously track how many times we would have joined to a neighbour within the stack, which leads to an excess edge, and Aldous derives a joint distributional scaling limit for the sizes of the critical components and their excesses. (Note that in this case, there is a canonical notion of excess edge, but it depends not just on the graph structure, but also on the extra randomness of the ordering within the breadth-first search.)

Roughly speaking, we consider the reflected exploration process, and its scaling limit, which is a reflected parabolically-drifting Brownian motion (though the details of this are not important at this level of exposition, except that it’s a well-behaved non-negative process that hits zero often). The component sizes are given by the widths of the excursions above zero, scaled up in a factor N^{1/3}. Then conditional on the shape of the excursion, the excess is Poisson with parameter the area under the excursion, with no rescaling. That is, a critical component has \Theta(1) excess.

So, with Aldous’s result in the background, when we ask about the metric structure of these critical components, we are really asking: “what does a uniformly-chosen connected component with fixed excess look like when the number of vertices grows?”

I’ll try to keep notation light, but let’s say T(n,k) is a uniform choice from connected graphs on n vertices with excess k.

[Note, the separation of N and n is deliberate, because in the critical window, the connected components have size n = \Theta(N^{2/3}), so I want to distinguish the two problems.]

In this post, we will mainly address the question: “what does the cycle structure of T(n,k) look like for large n?” When k=0, we have a uniform tree, and the convergence of this to the Brownian CRT is now well-known [CRT2, LeGall]. We hope for results with a similar flavour for positive excess k.

2-cores and kernels

First, we have to give a precise statement of what it means to study just the cycle structure of a connected component. From now on I will assume we are always working with a connected graph.

There are several equivalent definitions of the 2-core C(G) of a graph G:

  • When the excess is positive, there are some cycles. The 2-core is the union of all edges which form part of some cycle, and any edges which lie on a path between two edges which both form part of some cycle.
  • C(G) is the maximal induced subgraph where all degrees are at least two.
  • If you remove all the leaves from the graph, then all the leaves from the remaining graph, and continue, the 2-core is the state you arrive at where there are no leaves.

It’s very helpful to think of the overall structure of the graph as consisting of the 2-core, with pendant trees ‘hanging off’ the 2-core. That is, we can view every vertex of the 2-core as the root of a (possibly size 1) tree. This is particular clear if we remove all the edges of the 2-core from the graph. What remains is a forest, with one tree for each vertex of the 2-core.

In general, the k-core is the maximal induced subgraph where all degrees are at least k. The core is generally taken to be something rather different. For this post (and any immediate sequels) I will never refer to the k-core for k>2, and certainly not to the traditional core. So I write ‘core’ for ‘2-core’.

As you can see in the diagram, the core consists of lots of paths, and topologically, the lengths of these paths are redundant. So we will often consider instead the kernel, K(G), which is constructed by taking the core and contracting all the paths between vertices of degree greater than 2. The resulting graph has minimal degree at least three. So far we’ve made no comment about the simplicity of the original graphs, but certainly the kernel need not be simple. It will regularly have loops and multiple edges. The kernel of the graph and core in the previous diagram is therefore this:

Kernels of critical components

To recap, we can deconstruct a connected graph as follows. It has a kernel, and each edge of the kernel is a path length of some length in the core. The rest of the graph consists of trees hanging off from the core vertices.

For now, we ask about the distribution of the kernel of a T(n,K). You might notice that the case k=1 is slightly awkward, as when the core consists of a single cycle, it’s somewhat ambiguous how to define the kernel. Everything we do is easily fixable for k=1, but rather than carry separate cases, we handle the case k\ge 2.

We first observe that fixing k doesn’t confirm the number of vertices or edges in the kernel. For example, both of the following pictures could correspond to k=3:

However, with high probability the kernel is 3-regular, which suddenly makes the previous post relevant. As I said earlier, it can introduce size-biasing errors to add the excess edges one-at-a-time, but these should be constant factor errors, not scaling errors. So imagine the core of a large graph with excess k=2. For the sake of argument, assume the kernel has the dumbbell / handcuffs shape. Now add an extra edge somewhere. It’s asymptotically very unlikely that this is incident to one of the two vertices with degree three in the core. Note it would need to be incident to both to generate the right-hand picture above. Instead, the core will gain two new vertices of degree three.

Roughly equivalently, once the size of the core is fixed (and large) we have to make a uniform choice from connected graphs of this size where almost every vertex has degree 2, and \Theta(1) of the rest have degree 3 or higher. But the sum of the degrees is fixed, because the excess is fixed. If there are n vertices in the core, then there are \Theta(n) more graphs where all the vertices have degree 2 or 3, than graphs where a vertex has degree at least 4. Let’s state this formally.

Proposition: The kernel of a uniform graph with n vertices and excess k\ge 2 is, with high probability as n\rightarrow\infty, 3-regular.

This proved rather more formally as part of Theorem 7 of [JKLP], essentially as a corollary after some very comprehensive generating function setup; and in [LPW] with a more direct computation.

In the previous post, we introduced the configuration model as a method for constructing regular graphs (or any graphs with fixed degree sequence). We observe that, conditional on the event that the resulting graph is simple, it is in fact uniformly-distributed among simple graphs. When the graph is allowed to be a multigraph, this is no longer true. However, in many circumstances, as remarked in (1.1) of [JKLP], for most applications the configuration model measure on multigraphs is the most natural.

Given a 3-regular labelled multigraph H with 2(k-1) vertices and 3(k-1) edges, and K a uniform choice from the configuration model with these parameters, we have

\mathbb{P}\left( K \equiv H \right) \propto \left(2^{t(H)} \prod_{e\in E(H)} \mathrm{mult}(e)! \right)^{-1},

where t(H) is the number of loops in H, and mult(e) the multiplicity of an edge e. This might seem initially counter-intuitive, because it looks we are biasing against graphs with multiple edges, when perhaps our intuition is that because there are more ways to form a set of multiple edges we should bias in favour of it.

I think it’s most helpful to look at a diagram of a multigraph as shown, and ask how to assign stubs to edges. At a vertex with degree three, all stub assignments are different, that is 3!=6 possibilities. At the multiple edge, however, we care which stubs match with which stubs, but we don’t care about the order within the multi-edge. Alternatively, there are three choices of how to divide each vertex’s stubs into (2 for the multi-edge, 1 for the rest), and then two choices for how to match up the multi-edge stubs, ie 18 in total = 36/2, and a discount factor of 2.

We mention this because in fact K(T(n,k)) converges in distribution to this uniform configuration model. Once you know that K(T(n,k)) is with high probability 3-regular, then again it’s probably easiest to think about the core, indeed you might as well condition on its total size and number of degree 3 vertices. It’s then not hard to convince yourself that a uniform choice induces a uniform choice of kernel. Again, let’s state that as a proposition.

Proposition: For any H a 3-regular labelled multigraph H with 2(k-1) vertices and 3(k-1) edges as before,

\lim_{n\rightarrow\infty}\mathbb{P}\left( K(T(n,k)) \equiv H \right) \propto \left(2^{t(H)} \prod_{e\in E(H)} \mathrm{mult}(e)! \right)^{-1}.

As we said before, the kernel describes the topology of the core. To reconstruct the graph, we need to know the lengths in the core, and then how to glue pendant trees onto the core. But this final stage depends on k only through the total length of paths in the core. Given that information, it’s a combinatorial problem, and while I’m not claiming it’s easy, it’s essentially the same as for the case with k=1, and is worth treating separately.

It is worth clarifying a couple of things first though. Even the outline of methods above relies on the fact that the size of the core diverges as n grows. Again, the heuristic is that up to size-biasing errors, T(n,k) looks like a uniform tree with some uniformly-chosen extra edges. But distances in T(n,k) scale like n^{1/2} (and thus in critical components of G(N,p) scale like N^{1/3}). And the core will be roughly the set of edges on paths between the uniformly-chosen pairs of vertices, and so will also have length \Theta(n^{1/2}).

Once you have conditioned on the kernel structure, and the (large) number of internal vertices on paths in the core (ie the length of the core), it is natural that the assignment of the degree-2 vertices to core paths / kernel edges is uniform. A consequence of this is that if you record (Y_1,\ldots,Y_m) the lengths of paths in the core, where m=3(k-1), then

\frac{(Y_1,\ldots,Y_m)}{\sum Y_i} \stackrel{d}\rightarrow \mathrm{Dirichlet}(1,1,\ldots,1).

This is stated formally as Corollary 7 b) of [ABG09]. It’s worth noting that this confirms that the lengths of core paths are bounded in probability away from zero after the appropriate rescaling. In seeking a metric scaling limit, this is convenient as it means there’s so danger that two of the degree-3 vertices end up in ‘the same place’ in the scaling limit object.

To recap, the only missing ingredients now to give a complete limiting metric description of T(n,k) are 1) a distributional limit of the total core length; 2) some appropriate description of set of pendant trees conditional on the size of the pendant forest. [ABG09] show the first of these. As remarked before, all the content of the second of these is encoded in the unicyclic k=1 case, which I have written about before, albeit slightly sketchily, here. (Note that in that post we get around size-biasing by counting a slightly different object, namely unicyclic graphs with an identified cyclic edge.)

However, [ABG09] also propose an alternative construction, which you can think of as glueing CRTs directly onto the stubs of the kernel (with the same distribution as before). The proof that this construction works isn’t as painful as one might fear, and allows a lot of the other metric distributional results to be read off as corollaries.

References

[ABG09] – Addario-Berry, Broutin, Goldschmidt – Critical random graphs: limiting constructions and distributional properties

[CRT2] – Aldous – The continuum random tree: II

[Al97] – Aldous – Brownian excursions, critical random graphs and the multiplicative coalescent

[JKLP] – Janson, Knuth, Luczak, Pittel – The birth of the giant component

[LeGall] – Le Gall – Random trees and applications

[LPW] – Luczak, Pittel, Wierman – The structure of a random graph at the point of the phase transition

 

Real Trees – Root Growth and Regrafting

Two weeks ago in our reading group meeting, Raphael told us about Chapter Five which introduces root growth and regrafting. One of the points of establishing the Gromov-Hausdorff topology in this book was to provide a more natural setting for a discussion of tree-valued processes. Indeed in what follows, one can imagine how to start the construction of a similar process for the excursions which can be used to encode real trees, involving cutting off sub-excursions above one-sided local minima, then glueing them back in elsewhere. But taking account of the equivalence structure will be challenging, and it is much nicer to be able to describe cutting a tree in two by removing a single point without having to worry about quotient maps.

We have seen in Chapter Two an example of a process defined on the family of rooted trees with n labelled vertices which has the uniform rooted tree as an invariant distribution. Given a rooted tree with root p, we choose uniformly at random a vertex p’ in [n] to be the new root. Then if p’=p we do nothing, otherwise we remove the unique first edge in the path from p’ to p, giving two trees. Adding an edge from p to p’ completes the step and gives a new tree with p’ as root. We might want to take a metric limit of these processes as n grows and see whether we end up with a stationary real tree-valued process whose marginals are the BCRT.

To see non-trivial limiting behaviour, it is most interesting to consider the evolution of a particular subtree (which includes the root) through this process. If the vertex chosen for cutting lies in our observed subtree, then the subtree undergoes a prune and regraft operation. On the other hand, if the vertex chosen for cutting does not lie in the subtree, then we do not see any effect of the pruning, except the addition of a new vertex below the original root, which becomes the new root. So essentially, from the point of view of our observed subtree, the root is growing.

Now we can think about interpreting the dynamics of a natural limit process acting on real trees. The key idea is that we don’t change the set on which the tree is supported much, but instead just change the metric. In particular, we will keep the original tree, and add on length at unit rate. Of course, where this length gets added on entirely determines the metric structure of the tree, but that doesn’t stop us giving a simple ‘name’ for the extra length. If we consider a process X^T starting from a particular finite subtree T, then at time t, the tree X^T_t is has vertex set T \coprod (0,t]. (Finite subtree here means that it has finite total length.)

Root regrafting should happen at a rate proportional to the total length of the current observed tree. This is reasonable since after all it is supported within a larger tree, so in the discrete case the probability of a prune-regrafting event happening within a given observed subtree is proportional to the number of vertices in that subtree, which scales naturally as length in the real tree limit. It turns out that to get unit rate root growth with \Theta(1) rate prune-regrafting, we should consider subtrees of size \sqrt{n} within a host tree of size n as n\rightarrow\infty. We also rescale the lengths by \frac{1}{\sqrt{n}}, and time by \sqrt{n} so we actually see prune-regraft events.

Furthermore, if the subtree is pruned, the location of the pruning is chosen uniformly by length of the current observed subtree. So we can view the pruning process as being driven by a Poisson point process with intensity given by the instantaneous length measure of the tree, which at time t has vertex set T\coprod (0,t]. It will turn out to be consistent that there is a ‘piecewise isometry’ for want of a better phrase between the metric (and thus length measure) on X^T_t and the canonical induced measure on T\coprod (0,t], so we can describe the instances and locations of the pruning events via a pair of PPPs. The first is supported on T \times [0,\infty), and the second on \{(t,x): 0 \le x \le t\}, since we only ‘notice’ pruning at the point labelled x if the pruning happens at some time t after x was created.

If we start from a compact tree T, then the total intensity of this pair is finite up to some time t, and so we have a countable sequence \tau_0=0<\tau_1<\tau_2<\ldots of times for pruning events. It is easy to describe (but a bit messy to notate) the evolution of the metric between these pruning times. Essentially the distance between any pair of points in the observed tree at time \tau_m with root \rho_{\tau_m} is constant between times \tau_m,\tau_{m+1}, and new points are added so that the distance between \rho_{\tau_m} and any new point a\in(\tau_m,\tau_{m+1}] is a-\tau_m, and everything thing else follows from straightforward consideration of geodesics.

When a pruning event happens at point x_m at time \tau_m, distances are preserved within the subtree above x_m in X^T_{\tau_m -}, and within the rest of the tree. Again, an expression for the cross distances is straightforward but requires a volume of notation not ideally suited to this medium.

The natural thing to consider is the coupled processes started from different subtrees (again both must contain the original root) of the same host tree. Say T^1,T^2\le T, then it is relatively easy to check that X^{T^1}_t,X^{T^2}_t \le X^T_t \,\forall t, when we drive the processes by consistent coupled Poisson processes. Furthermore, it is genuinely obvious that the Hausdorff distance between X^{T^1}_t,X^{T^2}_t, here viewed as compact subsets of (X^T_t, d^T_t) remains constant during root growth phase.

Less obvious but more important is that the Hausdorff distance decreases during regrafting events. Suppose that just before a regrafting event, the two subtrees are T’ and T”, and the Hausdorff distance between them is \epsilon. This Hausdorff distance is with respect to the metric on the whole tree T. [Actually this is a mild abuse of notation – I’m now taking T to be the whole tree just before the regraft, rather than the tree at time 0.]

So for any a\in T', we can choose b\in T'' such that d_T(a,b)\le \epsilon. This is preserved under the regraft unless the pruning point lies on the geodesic segment (in T) between a and b. But in that case, the distance between a and the pruning point is again at most \epsilon, and so after the regrafting, a is at most \epsilon away from the new root, which is in both subtrees, and in particular the regrafted version of T”.

This is obviously a useful first step on the path to proving any kind of convergence result. There are some technicalities which we have skipped over. It is fairly natural that this leads to a Markov process when the original tree is finite, but it is less clear how to define these dynamics when the total tree length is infinite, as we don’t want regrafting events to be happening continuously unless we can bound their net effect in some sense.

Last week, Franz showed us how to introduce the BCRT into matters. Specifically, that BCRT is the unique stationary distribution for this process. After a bit more work, the previous result says that for convergence properties it doesn’t matter too much what tree we start from, so it is fine to start from a single point. Then, the cut points and growth mechanism corresponds very well to the Poisson line-breaking construction of the BCRT. With another ‘grand coupling’ we can indeed construct them simultaneously. Furthermore, we can show weak convergence of the discrete-world Markov chain tree algorithm to the process with these RG with RG dynamics.

It does seem slightly counter-intuitive that a process defined on the whole of the discrete tree converges to a process defined through subtrees. Evans remarks in the introduction to the chapter that this is a consequence of having limits described as compact real trees. Then limitingly almost all vertices are close to leaves, so in a Hausdorff sense, considering only \sqrt{n} of the vertices (ie a subtree) doesn’t really make any difference after rescaling edge lengths. I feel I don’t understand exactly why it’s ok to take the limits in this order, but I can see why this might work after more checking.

Tomorrow, we will have our last session, probably discussing subtree prune-and-regraft, where the regrafting does not necessarily happen at the root.

Random Maps 3 – Leaves and Geodesics in BCRT

Recall in the previous two posts, we’ve introduced some of the background to maps on various surfaces. In particular, we’ve introduced the remarkable Cori-Vauquelin-Schaeffer bijection which maps between plane trees labelled with uniform increments and quadrangulations of the sphere, up to some careful fiddling around with rooting and pointing an edge.

We are interested in the case where we choose uniformly a large element from these classes. We want to derive a scaling limit for the uniform planar quadrangulation, and we hope that we will be able to carry some properties of the scaling limit of the labelled trees, which may well be simpler, across the CVS bijection. It is convenient that the vertices of the plane tree become the vertices of the quadrangulation. We are looking to find some sort of metric limit, in the Gromov-Hausdorff sense, and so it will remain to deduce exactly how to use the labelling obtained from the tree to gain information about distances in the (limiting) quadrangulation.

Of course, all of this relies on the fact that there is a nice limit for the ordered plane trees in the first place. Unsurprisingly, it turns out that this is Aldous’s Brownian continuum random tree. The easiest way to see this is to consider the contour process of the ordered plane tree. This is chosen uniformly at random from the set of paths from (0,0) to (2n,0) with increments of size {-1,1} and which stay non-negative. It is thus precisely a simple random walk started at (0,0) conditioned to hit (2n,0) and to be non-negative. Since SRW suitably rescaled converges to Brownian motion, it is unsurprising (but not totally trivial) that this conditioned object converges to a Brownian excursion.

The Brownian excursion can be viewed as a continuous analogue of the contour process for the BCRT, but it is more natural to consider this convergence in the Gromov-Hausdorff topology. In this setting, we say that for a large value of n, the tree is ‘roughly isometric’ to the BCRT in distribution. Here, roughly isometric means the two metric spaces can be embedded isometrically into a common metric space such that they are close together, now in the sense of Hausdorff distance.

At this point, it is worth thinking about this interpretation of the BCRT. We have previously considered this as the scaling limit of a uniformly chosen Cayley tree, that is any unrooted tree on n labelled vertices. Essentially, we are now specifying that the BCRT can carry extra information, namely a root, and geometric information about the order of branches. The root is uncontroversial. Canonically, the root of the BCRT will be at the point associated with time 0 in the driving Brownian excursion. However, we can easily check that the distribution of a uniform rooted plane tree is invariant under re-rooting, and so any argument we have for convergence of the rooted trees to the BCRT will work with the root in a different place. Applying something like a tower law, we conclude that the convergence works when the root is chosen uniformly in the limit.

One potential problem to be discussed is what it means to choose a point uniformly in the limit. We have two possible approaches. One is to consider Lebesgue measure on any path in the BCRT, and glue these together. However, we have a uniform stick-breaking construction of the BCRT, and one consequence of the construction is that the total length of sticks required is infinite, so this won’t work.

The other option is to project Lebesgue measure on [0,1] via the same map that sends points on the Brownian excursion to points in the tree. Note that the so-called real tree is constructed from the excursion by identifying points s and t where f(s)=f(t), and f(x)>f(s) for x in (s,t). But then we might wonder whether this can really be said to be ‘uniform’, since different points in the BCRT will have a different number of pre-images in [0,1]. In fact though, it turns out that in this sense, projected-Leb[0,1]-almost all the points in the BCRT are leaves.

To prove this, naturally we first need to define a leaf, in the setting of these continuum trees. The degree of a vertex is an idea we might keep in mind, but we can’t use this, as we don’t have vertices any more. However, we have a continuous analogue of degree, given by counting the number of connected components remaining after removing a vertex. In particular, we can define the set of leaves as

\mathcal{L}(\mathcal{T}):=\{x\in\mathcal{T}:\mathcal{T}\backslash \{x\}\text{ is connected}\}.

We will give a sketch proof of this result about leaves shortly. First, we clarify some notation, and consider properties of geodesics (shortest-length paths) in the tree.

Define \check{f}(u,v):=\min_{x\in[u\wedge v,u\vee v]} f(x) to be the minimum value attained by f between u and v. Consider the value x at which this minimum is attained. Then, projecting onto the tree, p(x) is the ‘most recent common ancestor’ of points u and v. We can make this a bit more precise by considering geodesics in the tree starting at the root. Analogous to the unique path property in a discrete tree, in this continuous setting there is a unique path from the root to any given point, along which the height is strictly increasing. This is not surprising. It follows from one of the definitions of a real tree that the length of the path from p(0) to p(s) should be f(s), and so there is a unique isometric embedding of [0,f(s)] into \mathcal{T}_f which starts at 0 and ends at p(s). Anyway, under this p(\check{f}(s,t)) gives the point at which the geodesics from p(s) to p(0) and from p(t) to p(0) meet.

DSC_4255

Furthermore, we can now describe the distance in the tree between p(s) and p(t). This is given by

d_f(s,t):= f(s)+f(t)-2\check{f}(s,t),

and with the geodesic picture, it is easy to see why. Consider the point x at which \check{f}(s,t) achieves the minimum. As we have said, this lies on the geodesics from p(s) to 0 and p(t) to 0, and paths between points are unique, so removing point x disconnects p(s) and p(t) in the tree. So we need to concatenate the geodesic from p(s) to p(x) and from p(x) to p(t). But these are subsets of the two geodesics discussed, and their respective lengths are f(s)-\check{f}(s,t) and f(t)-\check{f}(s,t).

We can now give a sketch proof the result that almost all the support of \lambda_f, the projection on Lebesgue measure from [0,1] onto \mathcal{T}_{f} is on \mathcal{T}(\mathcal{T}_f).

Given s,t\in[0,1], suppose we are removing p(s), and this separates p(t) from the root, which is canonically p(0). Without loss of generality, take t>s. Now suppose that \check{f}(s,t)<f(s), and that, as before, this infimum is attained at x\in[s,t]. Then the geodesic from p(0) to p(t) will pass through p(x), but not through p(s), so in particular, removing p(s) cannot disconnect p(t) from the root.

Thus, p(s) is not a leaf if and only if there exists some small window [s,t] such that f(s)\le f(x),\;\forall x\in[s,t]. By Blumenthal’s 0-1 law, for fixed x, this happens with probability 0 if f is Brownian motion. Here, f is not Brownian motion, but a Brownian excursion with length 1. However, Blumenthal’s 0-1 law depends on the instantaneous behaviour after time s, ie the sigma field \mathcal{F}_s^+. So, for s\in(0,1), the value of a Brownian at time 1 is independent of this sigma field, so if we imagine Brownian excursion as a ‘conditioned’ Brownian motion, this conditioning should have no effect on the conclusion of this corollary to Blumenthal’s 0-1 law.

This is not a formal argument, but it sketches why with probability 1, p(s) is a leaf for each s\in(0,1), from which the result follows.

Multitype Branching Processes

One of the fundamental objects in classical probability theory is the Galton-Watson branching process. This is defined to be a model for the growth of a population, where each individual in a generation gives birth to some number (possibly zero) of offspring, who form the next generation. Crucially, the numbers of offspring of the individuals are IID, with the same distribution both within generations and between generations.

There are several ways one might generalise this, such as non-IID offspring distributions, or pairs of individuals producing some number of offspring, but here we consider the situation where each individual has some type, and different types have different offspring distributions. Note that if there are K types, say, then the offspring distributions should now be supported on \mathbb{Z}_{\ge 0}^K. Let’s say the offspring distribution from a parent of type i is \mu^{(i)}.

The first question to address is one of survival. Recall that if we want to know whether a standard Galton-Watson process has positive probability of having infinite size, that is never going extinct, we only need to know the expectation of the offspring distribution. If this is less than 1, then the process is subcritical and is almost surely finite. If it is greater than 1, then it is supercritical and survives with positive probability. If the expectation is exactly 1 (and the variance is finite) then the process is critical and although it is still almost surely finite, the overall population size has a power-law tail, and hence (or otherwise) the expected population size is infinite.

We would like a similar result for the multitype process, saying that we do not need to know everything about the distribution to decide what the survival probability should be.

The first thing to address is why we can’t just reduce the multitype change to the monotype setting. It’s easiest to assume that we know the type of the root in the multitype tree. The case where the type of the root is random can be reconstructed later. Anyway, suppose now that we want to know the offspring distribution of a vertex in the m-th generation. To decide this, we need to know the probability that this vertex has a given type, say type j. To calculate this, we need to work out all the type possibilities for the first m generations, and their probabilities, which may well include lots of complicated size-biasing. Certainly it is not easy, and there’s no reason why these offspring distributions should be IID. The best we can say is that they should probably be exchangeable within each generation.

Obviously if the offspring distribution does not depend on the parent’s type, then we have a standard Galton-Watson tree with types assigned in an IID manner to the realisation. If the types are symmetric (for example if M, to be defined, is invariant under permuting the indices) then life gets much easier. In general, however, it will be more complicated than this.

We can however think about how to decide on survival probability. We consider the expected number of offspring, allowing both the type of the parent and the type of the child to vary. So define m_{ij} to be the expected number of type j children born to a type i parent. Then write these in a matrix M=(m_{ij}).

One generalisation is to consider a Galton-Watson forest started from some positive number of roots of various types. Suppose we have a vector \nu=(\nu_i) listing the number of roots of each type. Then the expected number of descendents of each type at generation n is given by the vector \nu M^n.

Let \lambda be the largest eigenvalue of M. As for the transition matrices of Markov chains, the Perron-Frobenius theorem applies here, which confirms that, because the entries of M are positive, the eigenvalue with largest modulus is simple and real, and the associated eigenvector has entirely positive entries. [In fact we need a couple of extra conditions on M, including that it is possible to get from any type to any other type – we say irreducible – but that isn’t worth going into now.]

So in fact the total number of descendents at generation n grows like \lambda^n in expectation, and so we have the same description of subcriticality and supercriticality. We can also make a sensible comment about the left-\lambda-eigenvector of M. This is the limiting proportion of the different types of vertices.

It’s a result (eg. [3]) that the height profile of a depth-first search on a standard Galton-Watson tree converges to Brownian Motion. Another way to phrase this is that a GW tree conditioned to have some size N has the Brownian Continuum Random Tree as a scaling limit as N grows to infinity. Miermont [4] proves that this result holds for the multitype tree as well. In the remainder of this post I want to discuss one idea along the way to the proof, and one application.

I said initially that there wasn’t a trivial reduction of a multitype process to a monotype process. There is however a non-trivial embedding of a monotype process in a multitype process. Consider all the vertices of type 1, and all the paths between such vertices. Then draw a new tree consisting of just the type 1 vertices. Two of these are joined by an edge if there is no other type 1 vertex on the unique path between them in the original tree. If that definition is confusing, think of the most sensible way to construct a tree on the type 1 vertices from the original, and you’ve probably chosen this definition.

There are two important things about this new tree. 1) It is a Galton-Watson tree, and 2) if the original tree is critical, then this reduced tree is also critical. Proving 1) is heavily dependent on exactly what definitions one takes for both the multitype branching mechanism and the standard G-W mechanism. Essentially, at a type 1 vertex, the number of type 1 descendents is not dependent on anything that happened at previous generations, nor in other branches of the original tree. This gives IID offspring distributions once it is formalised. As for criticality, we note that by the matrix argument given before, under the irreducibility condition discussed, the expectation of the total population size is infinite iff the expected number of type 1 vertices is also infinite. Since the proportion of type 1 vertices is given by the first element of the left eigenvector, which is positive, we can make a further argument that the number of type 1 vertices has a power-law tail iff the total population size also has a power-law tail.

I want to end by explaining why I was thinking about this model at all. In many previous posts I’ve discussed the forest fire model, where occasionally all the edges in some large component are deleted, and the component becomes a set of singletons again. We are interested in the local limit. That is, what do the large components look like from the point of view of a single vertex in the component? If we were able to prove that the large components have BCRT as the scaling limit, this would answer this question.

This holds for the original random graph process. There are two sensible ways to motivate this. Firstly, given that a component is a tree (which it is with high probability if its size is O(1) ), its distribution is that of the uniform tree, and it is known that this has BCRT as a scaling limit [1]. Alternatively, we know that the components have a Poisson Galton-Watson process as a local limit by the same argument used to calculate the increments of the exploration process. So we have an alternative description of the BCRT appearing: the scaling limit of G-W trees conditioned on their size.

Regarding the forest fires, if we stop the process at some time T>1, we know that some vertices have been burned several times and some vertices have never received an edge. What is clear though is that if we specify the age of each vertex, that is, how long has elapsed since it was last burned; conditional on this, we have an inhomogeneous random graph. Note that if we have two vertices of ages s and t, then the probability that there is an edge between them is 1-e^{-\frac{s\wedge t}{n}}, ie approximately \frac{s\wedge t}{n}. The function giving the probabilities of edges between different types of vertices is called the kernel, and here it is sufficiently well-behaved (in particular, it is bounded) that we are able to use the results of Bollobas et al in [2], where they discuss general sparse inhomogeneous random graphs. They show, among many other things, that in this setting as well the local limit is a multitype branching process.

So in conclusion, we have almost all the ingredients towards proving the result we want, that forest fire components have BCRT scaling limit. The only outstanding matter is that the Miermont result deals with a finite number of types, whereas obviously in the setting where we parameterise by age, the set of types is continuous. In other words, I’m working hard!

References

[1] Aldous – The Continuum Random Tree III

[2] Bollobas, Janson, Riordan – The phase transition in inhomogeneous random graphs

[3] Le Gall – Random Trees and Applications

[4] Miermont – Invariance principles for spatial multitype Galton-Watson trees

Enhanced by Zemanta

The Contour Process

As I explained in my previous post, I haven’t been reading around as much as I would generally like to recently. A few days in London staying with my parents and catching up with some friends has therefore been a good chance to get back into the habit of leafing through papers and Pitman’s book among other things.

This morning’s post should be a relatively short one. I’m going to define the contour process, a function of a (random or deterministic) tree, related to the exploration process which I have mentioned a few times previously. I will then use this to prove a simple but cute result equating in distribution the sizes of two different branching processes via a direct bijection.

The Contour Process

To start with, we have to have a root, and from that root we label the tree with a depth-first labelling. An example of this is given below. It is helpful at this stage to conceive this process as an explorer walking on the tree, and turning back on themselves only when there is no option to visit a vertex they haven’t already seen. So in the example tree shown, the depth-first exploration visits vertex V_2 exactly four times. Note that with this description, it is clear that the exploration traverses every edge exactly twice, and so the length of the sequence is 2n-1, where n is the number of vertices in the tree since obviously, we start and end at the root.

Another common interpretation of this depth-first exploration is to take some planar realisation of the tree. (Note trees are always planar – proof via induction after removing a leaf.) Then if you treat the tree as a hedge and starting at the root walk along, following the outer boundary with your right hand, this exactly recreates the process.

The height of a tree at a particular vertex is simply the graph distance between that vertex and the root. So when we move from one vertex to an adjacent vertex, the height must increase or decrease by 1.

The contour process is the sequence of heights seen along the depth-first exploration. It is therefore a sequence:

0=h_0,h_1,\ldots,h_{2n-1}=0,\quad h_i\geq 0,

and such that |h_{i+1}-h_i|=1.

Note that though the contour process uniquely determines the tree structure, the choice of depth-first labelling is a priori non-canonical. For example, in the display above, V_3 might have been explored before V_2. Normally this is resolved by taking the suitable vertex with the smallest label in the original tree to be next. It makes little difference to any analysis to choose the ordering of descendents of some vertex in a depth-first labelling randomly. Note that this explains why it is rather hard to recover Cayley’s theorem about the number of rooted trees on n vertices from this characterisation. Although the number of suitable contour functions is possible to calculate, we would require a complicated multiplicative correction for labelling if we wanted to recover the number of trees.

The only real observation about the uses of the contour process at this stage is that it is not in general a random walk with IID increments for a Galton-Watson branching process. This equivalence is what made the exploration process so useful. In particular, it made it straightforward, at least heuristically, to see why large trees might have a limit interpretation through Brownian excursions. If for example, the offspring distribution is bounded above, say by M, then the contour process certainly cannot be a random walk, as if we have visited a particular vertex exactly M+1 times, then it cannot have another descendent, and so we must return closer to the root at the next step.

I want to mention that in fact Aldous showed his results on scaling limits towards the Continuum Random Tree through the contour process rather than the exploration process. However, I don’t want to say any more about that right now.

A Neat Equivalence

What I do want to talk about is the following distribution on the positive integers. This comes up in Balazs Rath and Balint Toth’s work on forest-fires on the complete graph that I have been reading about recently. The role of this distribution is a conjectured equilibrium distribution for component size in a version of the Erdos-Renyi process where components are deleted (or ‘struck by lightning’) at a rate tuned so that giant components ‘just’ never emerge.

This distribution has the possibly useful property that it is the distribution of the total population size in a Galton-Watson process with Geom(1/2) offspring distribution. It is also the distribution of the total number of leaves in a critical binary branching process, where every vertex has either two descendents or zero descendents, each with probability 1/2. Note that both of these tree processes are critical, as the expected number of offspring is 1 in each case. This is a good start, as it suggests that the relevant equilibrium distribution should also have the power-law tail that is found in these critical branching processes. This would confirm that the forest-fire model exhibits self-organised criticality.

Anyway, as a sanity check, I tried to find a reason why, ignoring the forest-fires for now, these two distributions should be the same. One can argue using generating functions, but there is also the following nice bijective argument.

We focus first on the critical Geometric branching process. We examine its contour function. As explained above, the contour process is not in general a random walk with IID increments. However, for this particular case, it is. The geometric distribution should be viewed as the family of discrete memoryless distributions.

This is useful for the contour process. Note that if we are at vertex V for the (m+1)th time, that is we have already explored m of the edges out of V, then the probability that there is at least one further edge is 1/2, independently of the history of the exploration, as the offspring distribution is Geometric(1/2), which we can easily think of as adding edges one at a time based on independent fair coin tosses until we see a tail for example. The contour process for this random tree is therefore a simple symmetric random walk on Z. Note that this will hit -1 at some point, and the associated contour process is the RW up to the final time it hits 0 before hitting -1. We can check that this obeys the clear rule that with probability 1/2 the tree is a single vertex.

Now we consider the other model, the Galton-Watson process with critical binary branching mechanism. We should consider the exploration process. Recall that the increments in this process are given by the offspring distribution minus one. So this random sequence also behaves as a simple symmetric random walk on Z, again stopped when we hit -1.

To complete the bijective argument, we have to relate leaves in the binary process to vertices in the geometric one. A vertex is a leaf if it has no offspring, so the number of leaves is the number of times before the hitting time of -1 that the exploration process decreases by 1. (*)

Similarly for the contour process. Note that there is bijection between the set of vertices that aren’t the root and the set of edges. The contour process explores every edge exactly twice, once giving an increase of 1 and once giving a decrease of 1. So there is a bijection between the times that the contour process decreases by 1 and the non-root vertices. But the contour process was defined only up to the time we return to the root. This is fine if we know in advance how large the tree is, but we don’t know which return to the root is the final return to the root. So if we extend the random walk to the first time it hits -1, the portion up until the last increment is the contour process, and the final increment must be a decrease by 1, hence there is a bijection between the number of vertices in the Geom(1/2) G-W tree and the number of times that the contour process decreases by 1 before the hitting time of -1. Comparing with (*) gives the result.

Local Limits

In several previous posts, I have talked about scaling limits of various random graphs. Typically in this situation we are interested in convergence of large-scale properties of the graph as the size grows to some limit. These properties will normally be metric in flavour: diameter, component size and so on. To describe convergence of these properties, we divide by the relevant scale, which will often be some simple function of n. If we are looking to find an actual limit object, this is even more important. This is rather similar to describing properties of centred random walks. There, if we run the walk for time n, we have to rescale by \frac{1}{\sqrt{n}} to see the fluctuations on a finite positive scale.

One of the best examples is Aldous’ Continuum Random Tree which we can view as the limit of a Galton-Watson tree conditioned to have total size n, as n tends to infinity. Because of the exploration process or contour process interpretation, where these functions behave rather like a random walk, the correct scaling in this context is again \frac{1}{\sqrt{n}}. The point about this convergence is that it is realised entirely as a convergence of some function that represents the tree. For each finite n, it is clear that the tree with n vertices is a graph, but this is neither clear nor true for the limit object. Although it does indeed have no cycles, if nothing else, if the CRT were a graph it would have [0,1] as vertex set and then would be highly non-obvious how to define the edges.

Local limits aim to give convergence towards a (discrete) infinite graph. The sort of properties we are looking for are now local properties such as degrees and correlations of degrees. These don’t require knowledge of the whole graph, only of some finite subset. First consider the possibility that the sequence of deterministic graphs has the property:

G_1\leq G_2\leq G_3\leq\ldots

where \leq denotes an induced subgraph. Then it is relatively clear what the limit should be, as it is well-defined to take a union. This won’t work directly for a limit of random graphs, because the above relation in probability doesn’t even really make sense if we have a different probability space for each finite graph. This is a general clue that we should be looking to use convergence in distribution rather than anything stronger.

In the previous example, suppose the first finite graph G_1 consists of a single vertex v. If the limit graph (remember this is just the union, since that is well-defined) has bounded degrees, then there is some N such that G_N contains all the information we might want about the limiting neighbourhood of vertex v. For some larger N, G_N contains all the vertex and edges within distance r from our starting vertex v that appear in the limit graph.

This is all the motivation we require for a genuine definition. We will define our limit in terms of neighbourhoods, so we need some mechanism to choose the central vertex of such a neighbourhood. The answer is to consider rooted graphs, that it a graph with an identified vertex. We can introduce randomness by specifying a random graph, or by giving a distribution for the choice of root. If G is finite, the canonical choice is to choose the root uniformly from the set of vertices. This isn’t an option for an infinite graph, so we define the system as (G, p) where G is a (for now deterministic) graph, and p is a probability measure on V(G).

We say that the limit of finite (G_n) is the random rooted infinite graph (G, p) if the neighbourhoods of G_n around a randomly chosen vertex converge in distribution to the neighbourhoods of G around p. Formally, say (G_n)[U_n]\stackrel{d}{\rightarrow} (G,p) if for all r>0, for any finite rooted graph (H,w), the probability that (H,w) is isomorphic to the ball of radius r in G_n centred at randomly chosen $v_n$ converges to the probability that (H,w) is isomorphic to the ball of radius r around v in (G,v), where v is distributed according to measure p.

Informally, we might say that if we zoom in on an average vertex in G_n for large n, the neighbourhood looks the same as the neighbourhood around the root in (G, p). We now consider three examples.

1) When we talk about approximating the component size in a sparse Erdos-Renyi random graph by a \text{Po}(\lambda) branching process, this is exactly the limit sense we mean. The approximation fails if we fix n and take the neighbourhood size very large (eg radius n), but for finite neighbourhoods, or any radius growing more slowly than n, the approximation is good.

2) To emphasise why rooting the finite graphs makes a difference, consider the full binary tree with n levels (so 2^n-1 vertices). If we fix the root, then the limit is the infinite-level binary tree, though this isn’t especially surprising or interesting.

Things get a bit more complicated if we root randomly. Remember that the motivation for random rooting is that we want to know the local structure around a vertex chosen at random in many applications. If we definitely know what vertex we are going to choose, we know the local structure a priori. Note that in an n-level binary tree, 2^{n-1} vertices are leaves, not counting the base of the tree, and 2^{n-2} are distance 1 from a leaf, and 2^{n-3} are distance 2 from a leaf and so on.

This gives us a precise description of the limiting local neighbourhood structure. The resulting limiting object is called the canopy tree. One picture of this can be found on page 6 of this paper. A verbal description is also possible. Consider the set of non-negative integers, arranged in the usual manner on the real line, with edges between adjacent elements. The distribution of the root will be supported on this set of vertices, corresponding to the distance from the leaves in the pre-limit graph. So we have mass 1/2 at 0, 1/4 at 1, 1/8 at 2 and so on. We then connect each vertex k to a full k-level binary tree. The resulting canopy tree looks like an infinite-level full binary tree, viewed from the leaves, which is of course a reasonable heuristic, since that is there the mass is concentrated if we randomly root.

3) In particular, the limit is not the infinite-level binary tree. The canopy tree and the infinite-level binary tree have qualitatively different properties. Simple random walk on the canopy tree is recurrent for example. In fact, a result of Benjamini and Schramm, as explained in this review by Curien, says that any local limit of uniformly bounded degree, uniformly rooted, planar graphs is recurrent for SRW. The infinite-level binary tree can be expressed as a local limit if we choose the root distribution sensibly, using large random 3-regular graphs. The previous result does not apply because the random 3-regular graphs are not almost surely planar.

REFERENCES:

– Much of this article is a paraphrase of a section of Itai Benjamini’s mini-course at the DSSA in Haifa March 2013.

– As well as the review paper linked above, these notes by David Aldous were very useful.