Kernels of critical graph components

This post is motivated by G(N,p), the classical Erdos-Renyi random graph, specifically its critical window, when p=p(N)=\frac{1}{N}(1+\lambda N^{-1/3}).

We start with the following observation, which makes no restriction on p. Suppose a component of G(N,p) is a tree. Then, the graph geometry of this component is that of a uniform random tree on the appropriate number of vertices. This is deliberately informal. To be formal, we’d have to say “condition on a particular subset of vertices forming a tree-component” and so on. But the formality is broadly irrelevant, because at the level of metric scaling limits, if we want to describe the structure of a tree component, it doesn’t matter whether it has \log N or \frac{1}{7}N vertices, because in both cases the tree structure is uniform. The only thing that changes is the scaling factor.

In general, when V vertices form a connected component of a graph with E edges, we define the excess to be E-V+1. So the excess is non-negative, and is zero precisely when the component is a tree. I’m reluctant to say that the excess counts the number of cycles in the component, but certainly it quantifies the amount of cyclic structure present. We will sometimes, in a mild abuse of notation, talk about excess edges. But note that for a connected component with positive excess, there is a priori no way to select which edges would be the excess edges. In a graph process, or when there is some underlying exploration of the component, there sometimes might be a canonical way to classify the excess edges, though it’s worth remarking that the risk of size-biasing errors is always extremely high in this sort of situation.

Returning to the random graph process, as so often there are big changes around criticality. In the subcritical regime, the components are small, and most of them, even the largest with high probability, are trees. In the supercritical regime, the giant component has excess \Theta(N), which is qualitatively very different.

It feels like every talk I’ve ever given has begun with an exposition of Aldous’s seminal paper [Al97] giving a distributional scaling limit of the sizes of critical components in the critical window, and a relation between the process on this time-scale and the multiplicative coalescent. And it remains relevant here, because the breadth-first exploration process can also be used to track the number of excess edges.

In a breadth-first exploration, we have a stack of vertices we are waiting to explore. We pick one and look its neighbours restricted to the rest of the graph, that is without the vertices we have already fully explored, and also without the other vertices in the stack. That’s the easiest way to handle the total component size. But we can simultaneously track how many times we would have joined to a neighbour within the stack, which leads to an excess edge, and Aldous derives a joint distributional scaling limit for the sizes of the critical components and their excesses. (Note that in this case, there is a canonical notion of excess edge, but it depends not just on the graph structure, but also on the extra randomness of the ordering within the breadth-first search.)

Roughly speaking, we consider the reflected exploration process, and its scaling limit, which is a reflected parabolically-drifting Brownian motion (though the details of this are not important at this level of exposition, except that it’s a well-behaved non-negative process that hits zero often). The component sizes are given by the widths of the excursions above zero, scaled up in a factor N^{1/3}. Then conditional on the shape of the excursion, the excess is Poisson with parameter the area under the excursion, with no rescaling. That is, a critical component has \Theta(1) excess.

So, with Aldous’s result in the background, when we ask about the metric structure of these critical components, we are really asking: “what does a uniformly-chosen connected component with fixed excess look like when the number of vertices grows?”

I’ll try to keep notation light, but let’s say T(n,k) is a uniform choice from connected graphs on n vertices with excess k.

[Note, the separation of N and n is deliberate, because in the critical window, the connected components have size n = \Theta(N^{2/3}), so I want to distinguish the two problems.]

In this post, we will mainly address the question: “what does the cycle structure of T(n,k) look like for large n?” When k=0, we have a uniform tree, and the convergence of this to the Brownian CRT is now well-known [CRT2, LeGall]. We hope for results with a similar flavour for positive excess k.

2-cores and kernels

First, we have to give a precise statement of what it means to study just the cycle structure of a connected component. From now on I will assume we are always working with a connected graph.

There are several equivalent definitions of the 2-core C(G) of a graph G:

  • When the excess is positive, there are some cycles. The 2-core is the union of all edges which form part of some cycle, and any edges which lie on a path between two edges which both form part of some cycle.
  • C(G) is the maximal induced subgraph where all degrees are at least two.
  • If you remove all the leaves from the graph, then all the leaves from the remaining graph, and continue, the 2-core is the state you arrive at where there are no leaves.

It’s very helpful to think of the overall structure of the graph as consisting of the 2-core, with pendant trees ‘hanging off’ the 2-core. That is, we can view every vertex of the 2-core as the root of a (possibly size 1) tree. This is particular clear if we remove all the edges of the 2-core from the graph. What remains is a forest, with one tree for each vertex of the 2-core.

In general, the k-core is the maximal induced subgraph where all degrees are at least k. The core is generally taken to be something rather different. For this post (and any immediate sequels) I will never refer to the k-core for k>2, and certainly not to the traditional core. So I write ‘core’ for ‘2-core’.

As you can see in the diagram, the core consists of lots of paths, and topologically, the lengths of these paths are redundant. So we will often consider instead the kernel, K(G), which is constructed by taking the core and contracting all the paths between vertices of degree greater than 2. The resulting graph has minimal degree at least three. So far we’ve made no comment about the simplicity of the original graphs, but certainly the kernel need not be simple. It will regularly have loops and multiple edges. The kernel of the graph and core in the previous diagram is therefore this:

Kernels of critical components

To recap, we can deconstruct a connected graph as follows. It has a kernel, and each edge of the kernel is a path length of some length in the core. The rest of the graph consists of trees hanging off from the core vertices.

For now, we ask about the distribution of the kernel of a T(n,K). You might notice that the case k=1 is slightly awkward, as when the core consists of a single cycle, it’s somewhat ambiguous how to define the kernel. Everything we do is easily fixable for k=1, but rather than carry separate cases, we handle the case k\ge 2.

We first observe that fixing k doesn’t confirm the number of vertices or edges in the kernel. For example, both of the following pictures could correspond to k=3:

However, with high probability the kernel is 3-regular, which suddenly makes the previous post relevant. As I said earlier, it can introduce size-biasing errors to add the excess edges one-at-a-time, but these should be constant factor errors, not scaling errors. So imagine the core of a large graph with excess k=2. For the sake of argument, assume the kernel has the dumbbell / handcuffs shape. Now add an extra edge somewhere. It’s asymptotically very unlikely that this is incident to one of the two vertices with degree three in the core. Note it would need to be incident to both to generate the right-hand picture above. Instead, the core will gain two new vertices of degree three.

Roughly equivalently, once the size of the core is fixed (and large) we have to make a uniform choice from connected graphs of this size where almost every vertex has degree 2, and \Theta(1) of the rest have degree 3 or higher. But the sum of the degrees is fixed, because the excess is fixed. If there are n vertices in the core, then there are \Theta(n) more graphs where all the vertices have degree 2 or 3, than graphs where a vertex has degree at least 4. Let’s state this formally.

Proposition: The kernel of a uniform graph with n vertices and excess k\ge 2 is, with high probability as n\rightarrow\infty, 3-regular.

This proved rather more formally as part of Theorem 7 of [JKLP], essentially as a corollary after some very comprehensive generating function setup; and in [LPW] with a more direct computation.

In the previous post, we introduced the configuration model as a method for constructing regular graphs (or any graphs with fixed degree sequence). We observe that, conditional on the event that the resulting graph is simple, it is in fact uniformly-distributed among simple graphs. When the graph is allowed to be a multigraph, this is no longer true. However, in many circumstances, as remarked in (1.1) of [JKLP], for most applications the configuration model measure on multigraphs is the most natural.

Given a 3-regular labelled multigraph H with 2(k-1) vertices and 3(k-1) edges, and K a uniform choice from the configuration model with these parameters, we have

\mathbb{P}\left( K \equiv H \right) \propto \left(2^{t(H)} \prod_{e\in E(H)} \mathrm{mult}(e)! \right)^{-1},

where t(H) is the number of loops in H, and mult(e) the multiplicity of an edge e. This might seem initially counter-intuitive, because it looks we are biasing against graphs with multiple edges, when perhaps our intuition is that because there are more ways to form a set of multiple edges we should bias in favour of it.

I think it’s most helpful to look at a diagram of a multigraph as shown, and ask how to assign stubs to edges. At a vertex with degree three, all stub assignments are different, that is 3!=6 possibilities. At the multiple edge, however, we care which stubs match with which stubs, but we don’t care about the order within the multi-edge. Alternatively, there are three choices of how to divide each vertex’s stubs into (2 for the multi-edge, 1 for the rest), and then two choices for how to match up the multi-edge stubs, ie 18 in total = 36/2, and a discount factor of 2.

We mention this because in fact K(T(n,k)) converges in distribution to this uniform configuration model. Once you know that K(T(n,k)) is with high probability 3-regular, then again it’s probably easiest to think about the core, indeed you might as well condition on its total size and number of degree 3 vertices. It’s then not hard to convince yourself that a uniform choice induces a uniform choice of kernel. Again, let’s state that as a proposition.

Proposition: For any H a 3-regular labelled multigraph H with 2(k-1) vertices and 3(k-1) edges as before,

\lim_{n\rightarrow\infty}\mathbb{P}\left( K(T(n,k)) \equiv H \right) \propto \left(2^{t(H)} \prod_{e\in E(H)} \mathrm{mult}(e)! \right)^{-1}.

As we said before, the kernel describes the topology of the core. To reconstruct the graph, we need to know the lengths in the core, and then how to glue pendant trees onto the core. But this final stage depends on k only through the total length of paths in the core. Given that information, it’s a combinatorial problem, and while I’m not claiming it’s easy, it’s essentially the same as for the case with k=1, and is worth treating separately.

It is worth clarifying a couple of things first though. Even the outline of methods above relies on the fact that the size of the core diverges as n grows. Again, the heuristic is that up to size-biasing errors, T(n,k) looks like a uniform tree with some uniformly-chosen extra edges. But distances in T(n,k) scale like n^{1/2} (and thus in critical components of G(N,p) scale like N^{1/3}). And the core will be roughly the set of edges on paths between the uniformly-chosen pairs of vertices, and so will also have length \Theta(n^{1/2}).

Once you have conditioned on the kernel structure, and the (large) number of internal vertices on paths in the core (ie the length of the core), it is natural that the assignment of the degree-2 vertices to core paths / kernel edges is uniform. A consequence of this is that if you record (Y_1,\ldots,Y_m) the lengths of paths in the core, where m=3(k-1), then

\frac{(Y_1,\ldots,Y_m)}{\sum Y_i} \stackrel{d}\rightarrow \mathrm{Dirichlet}(1,1,\ldots,1).

This is stated formally as Corollary 7 b) of [ABG09]. It’s worth noting that this confirms that the lengths of core paths are bounded in probability away from zero after the appropriate rescaling. In seeking a metric scaling limit, this is convenient as it means there’s so danger that two of the degree-3 vertices end up in ‘the same place’ in the scaling limit object.

To recap, the only missing ingredients now to give a complete limiting metric description of T(n,k) are 1) a distributional limit of the total core length; 2) some appropriate description of set of pendant trees conditional on the size of the pendant forest. [ABG09] show the first of these. As remarked before, all the content of the second of these is encoded in the unicyclic k=1 case, which I have written about before, albeit slightly sketchily, here. (Note that in that post we get around size-biasing by counting a slightly different object, namely unicyclic graphs with an identified cyclic edge.)

However, [ABG09] also propose an alternative construction, which you can think of as glueing CRTs directly onto the stubs of the kernel (with the same distribution as before). The proof that this construction works isn’t as painful as one might fear, and allows a lot of the other metric distributional results to be read off as corollaries.

References

[ABG09] – Addario-Berry, Broutin, Goldschmidt – Critical random graphs: limiting constructions and distributional properties

[CRT2] – Aldous – The continuum random tree: II

[Al97] – Aldous – Brownian excursions, critical random graphs and the multiplicative coalescent

[JKLP] – Janson, Knuth, Luczak, Pittel – The birth of the giant component

[LeGall] – Le Gall – Random trees and applications

[LPW] – Luczak, Pittel, Wierman – The structure of a random graph at the point of the phase transition

 

Advertisement

Critical Components in Erdos-Renyi

In various previous posts, I’ve talked about the phase transition in the Erdos-Renyi random graph process. Recall the definition of the process. Here we will use the Gilbert model G(n,p), where we have n vertices, and between any pair of vertices we add an edge, independently of other pairs with probability p. We are interested in the sparse scaling, where the typical vertex has degree O(1) in n, and so p=c/n for constant c>0, and we assume throughout that n is large. We could alternatively have considered the alternative Erdos-Renyi model where we choose uniformly at random from the set of graphs with n vertices and some fixed number of edges. Almost all the results present work equally well in this setting.

As proved by Erdos and Renyi, the typical component structure of such a graph changes noticeably around the threshold c=1. Below this, in the subcritical regime, all the components are small, meaning of size at most order O(log n). Above this, in the supercritical regime, there is a single giant component on some non-zero proportion of the vertices. The rest of the graph looks subcritical. The case c=1 exhibits a phase transition between these qualitatively different behaviours. They proved that here, the largest component is with high probability O(n^2/3). It seems that they thought this result held whenever c=1-o(1), but it turns out that this is not the case. In this post, I will discuss some aspects of behaviour around criticality, and the tools needed to treat them.

The first question to address is this: how many components of size n^{2/3} are there? It might be plausible that there is a single such component, like for the subsequent giant component. It might also be plausible that there are n^1/3 such components, so O(n) vertices are on such critical components. As then it is clear how we transition out of criticality into supercriticality – all the vertices on critical components coalesce to form the new giant component.

In fact neither of these are correct. The answer is that for all integers k>0, with high probability the k-th largest component is on a size scale of n^2/3. This is potentially a confusing statement. It looks like there are infinitely many such components, but of course for any particular value of n, this cannot be the case. We should think of there being w(1) components, but o(n^b) for any b>0.

The easiest way to see this is by a duality argument, as we have discussed previously for the supercritical phase. If we remove a component of size O(n^2/3), then what remains is a random graph with n-O(n^2/3) vertices, and edge probability the same as originally. It might make sense to rewrite this probability 1/n as

\frac{1}{n-O(n^{2/3})}\cdot \frac{n-O(n^{2/3})}{n}=\frac{1-O(n^{-1/3})}{n-O(n^{2/3})}.

The approximation in the final numerator is basically the same as

1-o\left(n-O(n^{2/3})\right).

Although we have no concrete reasoning, it seems at least plausible that this should look similar in structure to G(n,1/n). In particular, there should be another component of size

O\left([n-O(n^{2/3})]^{2/3}\right)=O(n^{2/3}).

In fact, the formal proof of this proceeds by an identical argument, only using the exploration process. Because I’ve described this several times before, I’ll be brief. We track how far we have gone through each component in a depth-first walk. In both the supercritical and subcritical cases, when we scale correctly we get a random path which is basically deterministic in the limit (in n). For exactly the same reasons as visible CLT fluctuations for partial sums of RVs with expectation zero, we start seeing interesting effects at criticality.

The important question is the order of rescaling to choose. At each stage of the exploration process, the number of vertices added to the stack is binomial. We want to distinguish between components of size O(n^{2/3}) so we should look at the exploration process at time sn^{2/3}. The drift of the exploration process is given by the expectation of a binomial random variable minus one (since we remove the current vertex from the stack as we finish exploring it). This is given by

\mathbb{E}=\left[n-sn^{2/3}\right]\cdot \frac{1}{n}-1=-sn^{-1/3}.

Note that this is the drift in one time-step. The drift in n^{2/3} time-steps will accordingly by sn^{1/3}. So, if we rescale time by n^{2/3} and space by n^{1/3}, we should get a nice stochastic process. Specifically, if Z is the exploration process, then we obtain:

\frac{1}{n^{1/3}}Z^{(n)}_{sn^{2/3}} \rightarrow_d W_s,

where W is a Brownian motion with inhomogeneous drift -s at time s. The net effect of such a drift at a fixed positive time is given by integrating up to that time, and hence we might say the process has quadratic drift, or is parabolic.

We should remark that our binomial expectation is not entirely correct. We have discounted those sn^{2/3} vertices that have already been explored, but we have not accounted for the vertices currently in the stack. We should also be avoiding considering these. However, we now have a heuristic for the approximate number of these. The number of vertices in the stack should be O(n^{1/3}) at all times, and so in particular will always be an order of magnitude smaller than the number of vertices already considered. Therefore, they won’t affect this drift term, though this must be accounted for in any formal proof of convergence. On the subject of which, the mode of convergence is, unsurprisingly, weak convergence uniformly on compact sets. That is, for any fixed S, the convergence holds weakly on the random functions up to time sn^{2/3}.

Note that this process will tend to minus infinity almost surely. Component sizes are given by excursions above the running minimum. The process given by the height of the original process above the running minimum is called reflected. Essentially, we construct the reflected process by having the same generator when the current value is positive, and forcing the process up when it is at zero. There are various ways to construct this more formally, including as the scaling limit of some simple random walks conditioned never to stay non-negative.

The cute part of the result is that it holds equally well in a so-called critical window either side of the critical probability 1/n. When the probability is \frac{1+tn^{-1/3}}{n}, for any t\in \mathbb{R}, the same argument holds. Now the drift at time s is t-s, though everything else still holds.

This result was established by Aldous in [1], and gives a mechanism for calculating distributions of component sizes and so on through this critical window.

In particular, we are now in a position to answer the original question regarding how many such components there were. The key idea is that because whenever we exhaust a component in the exploration process, we choose a new vertex uniformly at random, we are effectively choosing a component according to the size-biased distribution. Roughly speaking, the largest components will show up near the beginning. Note that a critical O(n^{2/3}) component will not necessarily be exactly the first component in the exploration process, but the components that are explored before this will take up sufficiently few vertices that they won’t show up in the scaling of the limit.

In any case, the reflected Brownian motion ‘goes on forever’, and the drift is eventually very negative, so there cannot be infinitely wide excursions, hence there are infinitely many such critical components.

If we care about the number of cycles, we can treat this also via the exploration process. Note that in any depth-first search we are necessarily only interested in a spanning tree of the host graph. Anyway, when we are exploring a vertex, there could be extra edges to other vertices in the stack, but not to vertices we’ve already finished exploring (otherwise the edge would have been exposed then). So the expected number of excess edges into a vertex is proportional to the height of the exploration process at that vertex. So the overall expected number of excess edges, conditional on the exploration process is the area under the curve. This carries over perfectly well into the stochastic process limit. It is then a calculation to verify that the area under the curve is almost surely infinite, and thus that we expect there to be infinitely many cycles in a critical random graph.

REFERENCES

[1] Aldous D. – Brownian excursions, critical random graphs and the multiplicative coalescent

Recent Research Activity

I’ve spent this week in Luminy, near Marseille, attending a summer school run by ALEA, the organisation of French probabilists. We’ve been staying in CIRM, a dedicated maths research conference centre at the edges of the calanques, the area of mountains and jagged coastal inlets between Marseille and Cassis. The walking possibilities have been excellent, as have the courses and lectures, on a range of topics in probability theory.

Anyway, the time here has been an excellent moment to reflect on my research progress, and try to come up with the sort of fresh ideas that are perhaps slightly inhibited by sitting at a desk with an endless supply of paper on which to try calculations. When I get back, I have to submit a first-year report, so at least for a little while I will have to suppress the desire to make further progress and instead diligently assemble the progress I have made.

The Model

I’ve defined some of these processes in past posts, but I see no harm in doing so again. We take the standard Erdos-Renyi random graph process, where edges are added one-at-a-time uniformly at random between n vertices, and amend it by adding a deletion mechanism. The aim is to arrive at a process which looks in equilibrium more like the critical random graph than either the subcritical or supercritical regimes, where the components are very small, and dominated by one giant component respectively. Rath, Toth and others have studied the process where each vertex is hit by lightning at uniform rate. When this happens, we delete all the edges in the component containing that vertex. Naturally, big components will be hit by lightning more often than small components, and so this acts as a mechanism to prevent the formation of giant components, if scaled correctly.

We take a different approach. We observe that criticality in the original random graph process is denoted by the first appearance of a giant component, but also by the first appearance of a) lots of cycles, and b) large cycles. In particular, it is very unlikely that a giant component could form without containing any cycles. We will therefore use the appearance of a cycle to trigger some form of deletion mechanism.

Our final goal is to treat the so-called ‘Cycle Deletion’ model. Here, whenever a cycle appears, we delete all the edges in that cycle immediately. There are several challenges in treating this model, because the rate at which cycles emerge in a tree is a function of the tree structure. The trees in this model will not be Uniform Spanning Trees (though it is very possible that they will be ‘almost USTs’ in some sense – we need to investigate this further) so it will be hard to make nice statements about the rates. For the standard random graph process, if we are only interested in the sizes of the components, we are actually allowed to ignore the graph structure entirely. The component sizes evolve as a discrete, stochastic version of the multiplicative coalescent (sometimes called a Marcus-Lushnikov process). We would like a deletion mechanism that has a nice interpretation as a fragmentation operation in the same sense. The rate at which a component fragments will be quadratic in the size of the component, since there are O(k^2) possible edges between k vertices forming a component, and adding any of precisely these will create a cycle.

I’ve talked previously about how to overcome the problems with the tree structure in Cycle Deletion with the so-called Uniform Cycle Deleting model. In any case, as a starting point we might consider the Cycle-Induced Forest Fire model. Here, whenever a cycle appears, we delete all the edges, including the new one, in the whole component which contains the cycle.

We suspect this model may resemble the critical random graph at all times. The main characteristic of G(n,1/n) is that the largest component is of size O(n^2/3), and indeed there are arbitrarily many components of this size, with high probability in the limit. Since CIFF is recurrent for any fixed n, meaning that it will visit any state infinitely often (rather than tending to infinity or similar), we should ask what the largest component is typically in the equilibrium distribution. Our aim is to prove that it is O(n^2/3). We might suspect that the typical size of the largest component will be greater in the Cycle Deletion model, since each fragmentation event is less severe there, removing fewer edges.

An Upper Bound

The nice thing about Markov chains is that they have an ergodic property, which means that if you run them for long enough, the proportion of time spent in any state is given by the stationary probability of being in that state. It doesn’t matter whether or not you start in equilibrium, since it will converge anyway. Thus it is meaningful to talk about properties like the average number of isolated vertices as a time-average as well as an average with respect to some distribution.

This quantity is the key to an upper bound. We can equally talk about the average change in the number of isolated vertices in a time-step. This will increase when a component fragments, and will decrease when an isolated vertex coalesces with another component. In particular, the largest possible decrease in the number of isolated vertices in a single time-step is 2, corresponding to an edge appearing between two isolated vertices.

Suppose that with probability \Theta(1) there is a component of size n^\alpha for some \alpha>2/3. Then such a component makes a contribution to the expected change in the number of isolated vertices of

\Theta(1) n^\alpha \left(\frac{n^\alpha}{n}\right)^2. (*)

Where does this come from? Well, we are tracking the contributions from the event that the largest component is of this size and that it fragments, giving n^\alpha new isolated vertices. So the \Theta(1) accounts for the probability that there is such a component to begin with. Then, conditional on that, the probability that it gets fragmented in the next time-step is the probability that both ends of the next edge added lie in that component. Since the edge is chosen uniformly at random, the probability of this is n^\alpha/n. Note that this is under a slightly odd definition of an edge, that allows loops. Basically, I don’t want to have lots of correction terms involving \binom{n}{2} floating around. However, it would make no difference to the orders of magnitude if we to do it with these.

So, this is only one contribution to the typical rate of gain of isolated vertices. Now note that if \alpha>2/3, then this expression is >> 1. This is bad since the negative contributions to this expected flux in the number of isolated vertices is O(1). So this suggests that over time, the number of isolated vertices will keep growing. This is obviously ridiculous since a) we are in equilibrium, so the expected flux should be 0 and b) the number of isolated vertices cannot exceed n, for clear reasons.

This gives us an upper bound of n^2/3 as the typical scale of the largest component. We can come up with a similar argument for the cycle deleting model. The most helpful thing to track there is the number of edges in the graph. Note that since the graph is at all times a forest on n vertices, the number of edges is equal to n minus the number of (tree) components. We use the fact that the typical fragmentation of a component of size k creates O(\sqrt{k}) new components. It is possible to argue via isolated vertices here too, but the estimates are harder, or at least less present in the literature.

Lower Bounds?

The problem with lower bounds is that it is entirely possible that the flux in the number of isolated vertices is not driven by typical behaviour. Suppose for example we had a different rule. We begin a random graph process, and the first time we see a cycle in a component with size larger than n^2/3, we delete all the edges in the whole graph. Then we will see a sequence of random graph processes starting with the empty graph and stopped at some point close to criticality (in fact, with high probability in the *critical window*), and these will all be glued together. So then, most of the time the process will look subcritical, but the gains in isolated vertices will occur only during the critical periods, which are only an asymptotically small proportion of the time.

At the moment, my approach to the lower bound is instead to prove that the upper bound is tight. I mean this in the following sense. Suppose we wanted to be sure that (*) was in fact equal to the average rate of gain of isolated vertices. We would have to check the following:

  • That the total contributions from all other components were similar or smaller than from the component(s) of size roughly n^{\alpha}.
  • That there were only a few components of size n^{\alpha}. In particular, the estimate would be wrong if there were n^\epsilon such components for any \epsilon>0.
  • That it cannot be the case that for example, some small proportion of the time there is a component of size roughly n^{\alpha+\epsilon}, and over a large enough time these make a greater contribution to the average gain in isolated vertices.

A nice way to re-interpret this is to consider some special vertex and track the size of its component in time. It will be involved in repeated fragmentations over the course of time, so it is meaningful to talk about the distribution of the size of the component containing the vertex when it is fragmented. Our aim is to show that this distribution is concentrated on the scaling O(n^\alpha).

So this has turned out to be fairly hard. Rather than try to explain some of the ideas I’ve employed in attempting to overcome this, I will finish by giving one reason why it is hard.

We have seen that the component sizes in random graphs evolve as the multiplicative coalescent, but at a fixed moment in time, we can derive good estimates from an analogy with branching processes. We might like to do that here. If we know what the system looks like most of the time, we might try to ‘grow’ a multiplicative coalescent, viewing it like a branching process, with distribution given by the typical distribution. The problem is that when I do this, I find that the expectation of the offspring distribution is \Theta(1). This looks fine, since 1 is the threshold for extinction with probability 1. However, throughout the analysis, I have only been paying attention to the exponent of n in all the time and size estimates. For example, I view n^\alpha and n^\alpha \log n as the same. This is a problem, as when I say the expectation is \Theta(1), I am really saying it is \sim n^0. This means it could be \frac{1}{\log n} or \log n. Of course, there is a massive difference between these, since a branching process grows expectationally!

So, this approach appears doomed in its current form. I have some other ideas, but a bit more background may be required before going into those. I’m going to be rather busy with teaching on my return to the office, so unfortunately it is possible that there may be many posts about second year probability and third year applied probability before anything more about CIFF.

Random Mappings for Cycle Deletion

In previous posts here and here, I’ve talked about attempts to describe a cycle deleting process. We amend the dynamics of the standard random graph process by demanding that whenever a cycle is formed in the graph we delete all the edges that lie on the cycle. The aim of this is to prevent the system growing giant components, and perhaps give a system that displays the characteristics of self-organised criticality. In the posts linked to, we discuss the difficulties caused by the fact that the tree structure of components in such a process is not necessarily uniform.

Today we look in the opposite direction. It gives a perfectly reasonable model to take a multiplicative coalescent with quadratic fragmentation (this corresponds to cycle deletion, since there are O(n^2) edges which would give a cycle if added to a tree on n vertices) and a fragmentation kernel corresponding to adding an extra edge to a uniform spanning tree on n vertices then deleting the edges of the unique cycle. The focus of the rest of this post, we consider this fragmentation mechanism, in particular thinking about how we would sample from it most practically. Not least, without going through Prufer codes or some other clever machinery, it is not trivial to sample a uniform spanning tree.

First, we count the number of unicyclic graphs on n labelled vertices. If we know that the vertices on the cycle are v_1,\ldots,v_k, then the number of cycles with an identified edge is

u_1=1,\quad u_k=\frac{k!}{2},\, k\ge 2.

If we know that the tree coming off the cycle from vertex v_i has size m, say, then each of the possible rooted labelled trees with size m is equally likely. So taking w_j=j^{j-1}, the number of rooted trees on j labelled vertices, we get B_n(u_\bullet,w_\bullet) for the number of such unicyclic graphs on [n]. Recall B_n is the nth Bell polynomial, which gives the size of a compound combinatorial structure, where we have some structure on blocks and some other structure within blocks. Then the random partition of [n] given by the tree sizes has the distribution \text{Gibbs}_n(u_\bullet,w_\bullet).

Consider now a related object, the so-called random mapping digraph. What follows is taken from Chapter 9 of Combinatorial Stochastic Processes. We can view any mapping M_n:[n]\rightarrow[n] as a digraph where every vertex has out-degree 1. Each such digraph contains a collection of directed cycles, supported on those elements x for which M_n^k(x)=x for some k. Such an element x is called a cyclic point. Each cyclic point can be viewed as the root of a labelled tree.

In an identical manner to the unicyclic graph, the sizes of these directed trees in the digraph decomposition of a uniform random mapping is distributed as \text{Gibbs}_n(\bullet !,w_\bullet). So this is exactly the same as the cycle deletion kernel, apart from in the probability that the partition has precisely one block. In practice, for large n, the probability of this event is very small in both cases. And if we wanted to sample the cycle deletion kernel exactly, we could choose the trivial partition with some probability p, and otherwise sample from the random mapping kernel, where p is chosen such that

p+\frac{1-p}{B_n(\bullet !, w_\bullet)}=\frac{1}{B_n(u_\bullet,w_\bullet)}.

At least we know from the initial definition of a random mapping, that B_n(\bullet !,w_\bullet)=n^n. The number of unicyclic graphs with an identified edge is less clear. It turns out that the partition induced by the random mapping has a nice limit, after rescaling, as the lengths of excursions away from 0 in the standard Brownian bridge on [0,1].

The time for a fuller discussion of this sort of phenomenon is in the context of Poisson-Dirichlet distributions, as the above exchangeable partition turns out to be PD(1/2,1/2). However, for now we remark that the jumps of a subordinator give a partition after rescaling. The case of a stable subordinator is particularly convenient, as calculations are made easier by the Levy-Khintchine formula.

A notable example is the stable-1/2 subordinator, which can be realised as the inverse of the local time process at zero of a Brownian motion. The jumps of this process are then the excursion lengths of the original Brownian motion. A calculation involving the tail of the w_j’s indicates that 1/2 is the correct parameter for a subordinator to describe the random mappings. Note that the number of blocks in the partition corresponds to the local time at zero of the Brownian motion. (This is certainly not obvious, but it should at least be intuitively clear why a larger local time roughly indicates more excursions which indicates more blocks.)

So it turns out, after checking some of the technicalities, that it will suffice to show that the rescaled number of blocks in the random mapping partition \frac{|\Pi_n|}{\sqrt{n}} converges to the Raleigh density, which is a size-biased Normal random variable (hence effectively first conditioned to be positive), and which also is the distribution of the local time of the standard Brownian bridge.

After that very approximate description, we conclude by showing that the distribution of the number of blocks does indeed converge as we require. Recall Cayley’s formula kn^{n-k-1} for the number of labelled forests on [n] with a specified set of k roots. We also need to know how many labelled forests there are with any set of roots. Suppose we introduce an extra vertex, labelled 0, and connect it only to the roots of a rooted labelled forest on [n]. This gives a bijection between unlabelled trees on {0,1,…,n} and labelled forests with a specified set of roots on [n]. So we can use Cayley’s original formula to conclude there are (n+1)^{n-1} such forests. We can do a quick sanity check that these are the same, which is equivalent to showing

\sum_{k=1}^n k n^{-k-1}\binom{n}{k}=\frac{1}{n}(1+\frac{1}{n})^{n-1}.

This odd way of writing it is well-motivated. The form of the LHS is reminiscent of a generating function, and the additional k suggests taking a derivative. Indeed, the LHS is the derivative

\frac{d}{dx}(1+x)^n,

evaluated at \frac{1}{n}. This is clearly the same as the RHS.

That said, having established that the random mapping partition is essentially the same, it is computationally more convenient to consider that instead. By the digraph analogy, we again need to count forests with k roots on n vertices, and multiply by the number of permutations of the roots. This gives:

\mathbb{P}(|\Pi_n|=k)=\frac{kn^{n-k-1}\cdot k! \binom{n}{k}}{n^n}=\frac{k}{n}\prod_{i=1}^{k-1}\left(1-\frac{i}{n}\right).

Now we can consider the limit. Being a bit casual with notation, we get:

\lim \mathbb{P}(\frac{|\Pi_n|}{\sqrt{n}}\in dl)\approx \sqrt{n}dl \mathbb{P}(|\Pi_n|=l\sqrt{n}).

Since the Raleigh distribution has density l\exp(-\frac12 l^2)dl, it suffices for this informal verification to check that

\prod_{i=1}^{l\sqrt{n}}(1-\frac{i}{n})\approx \exp(-\frac12 l^2). (*)

We take logs, so the LHS becomes:

\log(1-\frac{1}{n})+\log(1-\frac{2}{n})+\ldots+\log(1-\frac{l\sqrt{n}}{n}).

If we view this as a function of l and differentiate, we get

d(LHS)=\sqrt{n}dl \log (1-\frac{l}{\sqrt{n}})\approx \sqrt{n}dl \left[-\frac{l}{\sqrt{n}}-\frac{l^2}{2n}\right]\approx -ldl.

When l is zero, the LHS should be zero, so we can obtain the desired result (*) by integrating then taking an exponential.

Multiplicative Coalescence

I spent pretty much the entirety of April 2012 writing an essay with the title Multiplicative Coalescence, as part of my Part III exams. As the results have now been published, this is probably an acceptable time to publish it here, in case anyone is interested. I certainly enjoyed writing the review, and hopefully some people may find it an interesting introduction to the more analytical side of the topic.

Multiplicative Coalescence

Coalescence 2: Marcus-Lushnikov Processes and a neat equivalence

Last time, discussed the Smoluchowski equations which define an infinite volume mean-field model for coalescence. Now we consider a stochastic coalescent model, the Marcus-Lushnikov process. Here, we start with a finite collection of finite mass particles, with total mass N. Then we define a process by demanding that given particles with masses and coalesce into a single particle with mass x + y at rate K(x,y)/N.

This can be formalised as a continuous-time Markov chain in different ways. The underlying state space consists of the set of unordered multisets of positive integers which sum to N. But rather than considering the configurations of masses themselves, it is sometimes more convenient to take the state space to be:

\{n=(n_1,\ldots,n_N): \sum xn_x=N\}.

Here n_x records the number of particles with mass x, and the constraint controls conservation of mass. Writing out a transition of this Markov chain is notationally annoying, but they have simple transition rates. The rate of transition from (n_1,\ldots,n_N) to the state where masses x and have coalesced is given by

N^{-1}K(x,y)n_xn_y.

Therefore, the Marcus-Lushnikov process ML^{(N)}_t is the process of component sizes in the finite particle setting of coalescence. The existence and mode of convergence of these processes to the deterministic solutions to Smoluchowski’s equation are of particular interest in many models.

As discussed in the previous post, there is an obvious heuristic link between the multiplicative coalescent and random graph processes. An easy but interesting explicit equivalence can be drawn between the Marcus-Lushnikov process with kernel K(x,y) = xy and monodisperse (that is, starting with unit mass particles) initial conditions and a specific random graph structure.

Proposition: The process ML^{(N)}_t with the conditions above is equivalent to the process of component sizes in \mathcal{G}(N,1-e^{-t/N}).

Proof: First, observe that we can couple the latter process in the obvious way by associating a U[0,1] random variable U_e with each of the \binom{N}{2} edges. The edge e is included at time iff U_e\leq 1-e^{-t/N}. In this process, the appearances of different edges are independent and

\mathbb{P}(\text{edge \emph{e} appears after \emph{t}})=e^{-t/N}.

Therefore the waiting times for a given edge to appear are independent \exp(1/N) RVs. In particular, the edge process is memoryless, hence the component size process is Markov. Then it is genuinely obvious that the rate at which an edge joining given distinct components of sizes and appears is N^{-1}xy. So the evolution is exactly the same as the Marcus-Lushnikov process, and the initial configuration is monodisperse.

Coalescence 1: What is it, and why do we care?

As part of Part III, instead of sitting an extra exam paper I am writing an essay. I have chosen the topic of ‘Multiplicative Coalescence’. I want to avoid contravening plagiarism rules, which don’t allow you to quote your own words without a proper citation, which I figure is tricky on a blog, nor open publishing of anything you intend to submit. So just to be absolutely sure, I’m going to suppress this series of posts until after May 4th, when everything has to be handed in.

———–

Informal Description

Coalescence refers to a process in which particles join together over time. An example might be islands of foam on the surface of a cup of coffee. When two clumps meet, they join, and will never split. In this example, a model would need to take into account the shape of all the islands, their positions, their velocities, and boundary properties. To make things tractable, we need to distance ourselves from the idea that particles merge through collisions, which are highly physical and complicated, and instead just consider that they merge.

Description of the Model

When two particles coalesce, it is natural to assume that mass is conserved, as this will be necessary in any physical application. With this in mind, it makes sense to set up the entire model using only the masses of particles. Define the kernel K(x,y) which describes the relative rate or likelihood of the coalescence {x,y} -> x+y. This has a different precise meaning in different contexts. Effectively, we are making a mean-field assumption that all the complications of a physical model as described above can be absorbed into this coalescent kernel, either because the number of particles is large, or because the other effects are small.

When there is, initially, a finite number of particles, the process is stochastic. Coalescence almost surely happen one at a time, and so we can view the process as a continuous time Markov Chain with state space the set of relevant partitions of the total mass present. The transition rate p(A,B) is given by K(x,y) when the coalescence {x,y} -> x+y transforms partition into B, and 0 otherwise. An observation is that the process recording the number of {x,y} -> x+y coalescences is an inhomogeneous Poisson process with local intensity n(x,t)n(y,t)K(x,y) where n(x,t) is the number of particles with mass at time t.

This motivates the move to an infinite-volume setting. Suppose that there are infinitely many particles, so that coalescences are occurring continuously. The rate of {x,y} -> x+y coalescences is still n(x,t)n(y,t)K(x,y) but now n(x,t) specifies the density of particles with mass at time t. Furthermore, because of the continuum framework, this rate is now deterministic rather than stochastic. This is extremely important, as by removing the probability from a probabilistic model, it can be treated as a large but simple ODE.

Two Remarks

1) Once this introduction is finished, we shall be bringing our focus onto multiplicative coalescence, where K(x,y) = xy. In particular, this is a homogeneous function, as are the other canonical kernels. This means that considering K(x,y) = cxy is the same as performing a constant factor time-change when K(x,y) = xy. Similarly, it is not important how the density n(x,t) is scaled as this can also be absorbed with a time-change. In some contexts, it will be natural and useful to demand that the total density be 1, but this will not always be possible. In general it is convenient to absorb as much as possible into the time parameter, particularly initial conditions, as will be discussed.

2) Working with an infinite volume of particles means that mass is no longer constrained to finitely many values. Generally, it is assumed that the masses are discrete, taking values in the positive integers, or continuous, taking values in the positive reals. In this case, the rate of coalescences between particles with masses in (x, x+dx) and (y,y+dy) is n(x,t)n(y,t)K(x,y)dxdy. The main difference between these will arise when we try to view the process as limits of finite processes. Continue reading

Missing in Action?

Aside

Not much maths has been appearing here in the past few weeks. But I have been working…

As part of Part III, instead of sitting an extra exam paper I am writing an essay. I have chosen the topic of ‘Multiplicative Coalescence’ so have been hard at work reading various papers and articles. I’ve been writing some posts about the topic as practice for writing up the essay – in fact, it’s entirely possible that large chunks will end up featuring verbatim. As a result, to ensure I stay firmly on the correct side of the rules about plagiarism, I’m going to wait until after exam results are announced on June 20th before making these posts visible.