# Kernels of critical graph components

This post is motivated by G(N,p), the classical Erdos-Renyi random graph, specifically its critical window, when $p=p(N)=\frac{1}{N}(1+\lambda N^{-1/3})$.

We start with the following observation, which makes no restriction on p. Suppose a component of G(N,p) is a tree. Then, the graph geometry of this component is that of a uniform random tree on the appropriate number of vertices. This is deliberately informal. To be formal, we’d have to say “condition on a particular subset of vertices forming a tree-component” and so on. But the formality is broadly irrelevant, because at the level of metric scaling limits, if we want to describe the structure of a tree component, it doesn’t matter whether it has $\log N$ or $\frac{1}{7}N$ vertices, because in both cases the tree structure is uniform. The only thing that changes is the scaling factor.

In general, when V vertices form a connected component of a graph with E edges, we define the excess to be E-V+1. So the excess is non-negative, and is zero precisely when the component is a tree. I’m reluctant to say that the excess counts the number of cycles in the component, but certainly it quantifies the amount of cyclic structure present. We will sometimes, in a mild abuse of notation, talk about excess edges. But note that for a connected component with positive excess, there is a priori no way to select which edges would be the excess edges. In a graph process, or when there is some underlying exploration of the component, there sometimes might be a canonical way to classify the excess edges, though it’s worth remarking that the risk of size-biasing errors is always extremely high in this sort of situation.

Returning to the random graph process, as so often there are big changes around criticality. In the subcritical regime, the components are small, and most of them, even the largest with high probability, are trees. In the supercritical regime, the giant component has excess $\Theta(N)$, which is qualitatively very different.

It feels like every talk I’ve ever given has begun with an exposition of Aldous’s seminal paper [Al97] giving a distributional scaling limit of the sizes of critical components in the critical window, and a relation between the process on this time-scale and the multiplicative coalescent. And it remains relevant here, because the breadth-first exploration process can also be used to track the number of excess edges.

In a breadth-first exploration, we have a stack of vertices we are waiting to explore. We pick one and look its neighbours restricted to the rest of the graph, that is without the vertices we have already fully explored, and also without the other vertices in the stack. That’s the easiest way to handle the total component size. But we can simultaneously track how many times we would have joined to a neighbour within the stack, which leads to an excess edge, and Aldous derives a joint distributional scaling limit for the sizes of the critical components and their excesses. (Note that in this case, there is a canonical notion of excess edge, but it depends not just on the graph structure, but also on the extra randomness of the ordering within the breadth-first search.)

Roughly speaking, we consider the reflected exploration process, and its scaling limit, which is a reflected parabolically-drifting Brownian motion (though the details of this are not important at this level of exposition, except that it’s a well-behaved non-negative process that hits zero often). The component sizes are given by the widths of the excursions above zero, scaled up in a factor $N^{1/3}$. Then conditional on the shape of the excursion, the excess is Poisson with parameter the area under the excursion, with no rescaling. That is, a critical component has $\Theta(1)$ excess.

So, with Aldous’s result in the background, when we ask about the metric structure of these critical components, we are really asking: “what does a uniformly-chosen connected component with fixed excess look like when the number of vertices grows?”

I’ll try to keep notation light, but let’s say T(n,k) is a uniform choice from connected graphs on n vertices with excess k.

[Note, the separation of N and n is deliberate, because in the critical window, the connected components have size $n = \Theta(N^{2/3})$, so I want to distinguish the two problems.]

In this post, we will mainly address the question: “what does the cycle structure of T(n,k) look like for large n?” When k=0, we have a uniform tree, and the convergence of this to the Brownian CRT is now well-known [CRT2, LeGall]. We hope for results with a similar flavour for positive excess k.

2-cores and kernels

First, we have to give a precise statement of what it means to study just the cycle structure of a connected component. From now on I will assume we are always working with a connected graph.

There are several equivalent definitions of the 2-core C(G) of a graph G:

• When the excess is positive, there are some cycles. The 2-core is the union of all edges which form part of some cycle, and any edges which lie on a path between two edges which both form part of some cycle.
• C(G) is the maximal induced subgraph where all degrees are at least two.
• If you remove all the leaves from the graph, then all the leaves from the remaining graph, and continue, the 2-core is the state you arrive at where there are no leaves.

It’s very helpful to think of the overall structure of the graph as consisting of the 2-core, with pendant trees ‘hanging off’ the 2-core. That is, we can view every vertex of the 2-core as the root of a (possibly size 1) tree. This is particular clear if we remove all the edges of the 2-core from the graph. What remains is a forest, with one tree for each vertex of the 2-core.

In general, the k-core is the maximal induced subgraph where all degrees are at least k. The core is generally taken to be something rather different. For this post (and any immediate sequels) I will never refer to the k-core for k>2, and certainly not to the traditional core. So I write ‘core’ for ‘2-core’.

As you can see in the diagram, the core consists of lots of paths, and topologically, the lengths of these paths are redundant. So we will often consider instead the kernel, K(G), which is constructed by taking the core and contracting all the paths between vertices of degree greater than 2. The resulting graph has minimal degree at least three. So far we’ve made no comment about the simplicity of the original graphs, but certainly the kernel need not be simple. It will regularly have loops and multiple edges. The kernel of the graph and core in the previous diagram is therefore this:

Kernels of critical components

To recap, we can deconstruct a connected graph as follows. It has a kernel, and each edge of the kernel is a path length of some length in the core. The rest of the graph consists of trees hanging off from the core vertices.

For now, we ask about the distribution of the kernel of a T(n,K). You might notice that the case k=1 is slightly awkward, as when the core consists of a single cycle, it’s somewhat ambiguous how to define the kernel. Everything we do is easily fixable for k=1, but rather than carry separate cases, we handle the case $k\ge 2$.

We first observe that fixing k doesn’t confirm the number of vertices or edges in the kernel. For example, both of the following pictures could correspond to k=3:

However, with high probability the kernel is 3-regular, which suddenly makes the previous post relevant. As I said earlier, it can introduce size-biasing errors to add the excess edges one-at-a-time, but these should be constant factor errors, not scaling errors. So imagine the core of a large graph with excess k=2. For the sake of argument, assume the kernel has the dumbbell / handcuffs shape. Now add an extra edge somewhere. It’s asymptotically very unlikely that this is incident to one of the two vertices with degree three in the core. Note it would need to be incident to both to generate the right-hand picture above. Instead, the core will gain two new vertices of degree three.

Roughly equivalently, once the size of the core is fixed (and large) we have to make a uniform choice from connected graphs of this size where almost every vertex has degree 2, and $\Theta(1)$ of the rest have degree 3 or higher. But the sum of the degrees is fixed, because the excess is fixed. If there are n vertices in the core, then there are $\Theta(n)$ more graphs where all the vertices have degree 2 or 3, than graphs where a vertex has degree at least 4. Let’s state this formally.

Proposition: The kernel of a uniform graph with n vertices and excess $k\ge 2$ is, with high probability as $n\rightarrow\infty$, 3-regular.

This proved rather more formally as part of Theorem 7 of [JKLP], essentially as a corollary after some very comprehensive generating function setup; and in [LPW] with a more direct computation.

In the previous post, we introduced the configuration model as a method for constructing regular graphs (or any graphs with fixed degree sequence). We observe that, conditional on the event that the resulting graph is simple, it is in fact uniformly-distributed among simple graphs. When the graph is allowed to be a multigraph, this is no longer true. However, in many circumstances, as remarked in (1.1) of [JKLP], for most applications the configuration model measure on multigraphs is the most natural.

Given a 3-regular labelled multigraph H with 2(k-1) vertices and 3(k-1) edges, and K a uniform choice from the configuration model with these parameters, we have

$\mathbb{P}\left( K \equiv H \right) \propto \left(2^{t(H)} \prod_{e\in E(H)} \mathrm{mult}(e)! \right)^{-1},$

where t(H) is the number of loops in H, and mult(e) the multiplicity of an edge e. This might seem initially counter-intuitive, because it looks we are biasing against graphs with multiple edges, when perhaps our intuition is that because there are more ways to form a set of multiple edges we should bias in favour of it.

I think it’s most helpful to look at a diagram of a multigraph as shown, and ask how to assign stubs to edges. At a vertex with degree three, all stub assignments are different, that is 3!=6 possibilities. At the multiple edge, however, we care which stubs match with which stubs, but we don’t care about the order within the multi-edge. Alternatively, there are three choices of how to divide each vertex’s stubs into (2 for the multi-edge, 1 for the rest), and then two choices for how to match up the multi-edge stubs, ie 18 in total = 36/2, and a discount factor of 2.

We mention this because in fact K(T(n,k)) converges in distribution to this uniform configuration model. Once you know that K(T(n,k)) is with high probability 3-regular, then again it’s probably easiest to think about the core, indeed you might as well condition on its total size and number of degree 3 vertices. It’s then not hard to convince yourself that a uniform choice induces a uniform choice of kernel. Again, let’s state that as a proposition.

Proposition: For any H a 3-regular labelled multigraph H with 2(k-1) vertices and 3(k-1) edges as before,

$\lim_{n\rightarrow\infty}\mathbb{P}\left( K(T(n,k)) \equiv H \right) \propto \left(2^{t(H)} \prod_{e\in E(H)} \mathrm{mult}(e)! \right)^{-1}.$

As we said before, the kernel describes the topology of the core. To reconstruct the graph, we need to know the lengths in the core, and then how to glue pendant trees onto the core. But this final stage depends on k only through the total length of paths in the core. Given that information, it’s a combinatorial problem, and while I’m not claiming it’s easy, it’s essentially the same as for the case with k=1, and is worth treating separately.

It is worth clarifying a couple of things first though. Even the outline of methods above relies on the fact that the size of the core diverges as n grows. Again, the heuristic is that up to size-biasing errors, T(n,k) looks like a uniform tree with some uniformly-chosen extra edges. But distances in T(n,k) scale like $n^{1/2}$ (and thus in critical components of G(N,p) scale like $N^{1/3}$). And the core will be roughly the set of edges on paths between the uniformly-chosen pairs of vertices, and so will also have length $\Theta(n^{1/2})$.

Once you have conditioned on the kernel structure, and the (large) number of internal vertices on paths in the core (ie the length of the core), it is natural that the assignment of the degree-2 vertices to core paths / kernel edges is uniform. A consequence of this is that if you record $(Y_1,\ldots,Y_m)$ the lengths of paths in the core, where m=3(k-1), then

$\frac{(Y_1,\ldots,Y_m)}{\sum Y_i} \stackrel{d}\rightarrow \mathrm{Dirichlet}(1,1,\ldots,1).$

This is stated formally as Corollary 7 b) of [ABG09]. It’s worth noting that this confirms that the lengths of core paths are bounded in probability away from zero after the appropriate rescaling. In seeking a metric scaling limit, this is convenient as it means there’s so danger that two of the degree-3 vertices end up in ‘the same place’ in the scaling limit object.

To recap, the only missing ingredients now to give a complete limiting metric description of T(n,k) are 1) a distributional limit of the total core length; 2) some appropriate description of set of pendant trees conditional on the size of the pendant forest. [ABG09] show the first of these. As remarked before, all the content of the second of these is encoded in the unicyclic k=1 case, which I have written about before, albeit slightly sketchily, here. (Note that in that post we get around size-biasing by counting a slightly different object, namely unicyclic graphs with an identified cyclic edge.)

However, [ABG09] also propose an alternative construction, which you can think of as glueing CRTs directly onto the stubs of the kernel (with the same distribution as before). The proof that this construction works isn’t as painful as one might fear, and allows a lot of the other metric distributional results to be read off as corollaries.

References

[ABG09] – Addario-Berry, Broutin, Goldschmidt – Critical random graphs: limiting constructions and distributional properties

[CRT2] – Aldous – The continuum random tree: II

[Al97] – Aldous – Brownian excursions, critical random graphs and the multiplicative coalescent

[JKLP] – Janson, Knuth, Luczak, Pittel – The birth of the giant component

[LeGall] – Le Gall – Random trees and applications

[LPW] – Luczak, Pittel, Wierman – The structure of a random graph at the point of the phase transition

# Random 3-regular graphs

A graph is d-regular if every vertex has degree d. Probably the easiest examples of d-regular graphs are the complete graph on (d+1) vertices, and the infinite d-ary tree. A less trivial example is the Petersen graph, which is 3-regular. 3-regular graphs will be the main focus for some of this post, but initially we lose nothing by considering general d.

Throughout, a necessary condition for the existence of a d-regular graph with N vertices is that at least one of d and N is even, as the sum of the degrees of a graph must be even. We will always assume that this holds, so that when d=3, we are always taking N to be even.

A natural pair of questions for a probabilist is ‘can we sample a d-regular graph with N vertices uniformly at random?’ and ‘what does a typical large d-regular graph look like?’

In a rather old post, I addressed some aspects of the first question, but revisit it briefly here. A good idea, due to Bollobas [B80] is to assign to all the vertices d stubs (or half-edges), and choose a matching of the Nd stubs uniformly at random. This works as a method to generate a random graph with any fixed degree sequence.

If you want your graphs to be simple, this can go wrong, because there’s a chance you get loops (that is, an edge from a vertex v to itself) and multiple edges between the same pair of vertices. It would be nice the graph formed in this fashion was simple with high probability when $N\rightarrow\infty$. Unfortunately that’s not the case, however the probability that the graph is simple remains asymptotically bounded away from 0 and 1. Indeed, because the presence of a loop / multiple edge is asymptotically independent of the presence of a loop / multiple edge elsewhere, it’s unsurprising we have a Poisson limit for the number of such occurences. So from a sampling point of view, it’s reasonable to sample a graph in this way until you find a simple one. This takes O(1) steps, and it’s O(N) steps to check whether a given multigraph is simple.

It’s clear that conditional on the graph generated in this fashion being simple, its distribution is uniform on the set of simple graphs with the correct degree distribution. If you are happy for your graphs to have loops, then it’s a little bit more complicated, because if an edge has multiplicity k, these can appear in k! ways in the configuration construction.

Other asymptotic properties

Loops and multiple edges can be thought of as cycles of length 1 and 2 respectively if you want. We might ask about other small cycles. A calculation in expectation is relatively straightforward. Given three vertices, the probability they form a triangle (in at least one way) is $\Theta(N^{-3})$, and there are $\Theta(N^3)$ ways to choose three vertices. Thus the expected number of triangles is $\Theta(1)$. Finally, the edge structure induced on disjoint triples is asymptotically independent, and hence a Poisson limit. (See [J06] for details, including more detail on the general configuration construction.) The same result holds for the same reasons for cycles of any fixed finite length.

We might also ask about connectivity. At a heuristic level, there are two ways for the graph to be disconnected: it could have some small components; or it could have two components of size $\Theta(N)$. The smallest possible component is $K_4$, and an argument like for the cycles above shows that the number of copies of $K_4$ vanishes in expectation. Now, consider having two components of size roughly N/2. There are $\binom{N}{N/2} \sim 2^{2N}$ ways to make this choice. However, given such a choice, we can handle the probability that all the stubs from one class match within that class by going through the class one stub at a time:

$\frac{\frac{3N}{2}-1}{3N-1} \times \frac{\frac{3N}{2}-3}{3N-3} \times \cdots \times \frac{1}{\frac{3N}{2}+1}.$

We approximate this as

$\frac{\sqrt{(3N/2)!}}{\sqrt{ (3N)!}} \sim e^{3N/2} 2^{-3N/2} \left(3N\right)^{-3N/2},$

and this dominates the number of choices powerfully enough that we might believe it remains valid for a broader range of class sizes. In fact we have a much stronger statement, namely that G(N,3) is 3-connected with high probability. This means that the graph cannot be disconnected by removing two vertices, or equivalently that there are three vertex-disjoint paths between any pair of vertices in the graph, essentially one emerging from each stub. See this note by David Ellis for a quick proof. We might return to this later.

You might ask about planarity. It’s clear from degree consideration that there are no induced copies of $K_5$ in any random 3-regular graph, and since $K_{3,3}$ contains a cycle of length 4, and with high probability G(N,3) doesn’t, that takes care of that possibility too. However, there might be minors of this form. This seemed a good example of the Kuratowski criterion not actually being that useful, since I certainly don’t find the minors of the 3-regular graph an obvious structure to handle.

However, we can use Euler’s formula V – E + F = 2 for planar graphs. Here V = N, E = 3N/2. Faces are described by (a subset of the) cycles, and we there are asymptotically O(1) small cycles, so most faces include a large number of edges. But each edge corresponds to at most two faces. So we have $F \ll E$, and so with high probability Euler’s formula can’t hold in G(N,3) for large N.

We can also ask about the local limit of G(N,3). Since the vertices are exchangeable, we don’t need to worry about whether we choose the root uniformly at random (often referred to as the Benjamini-Schramm sense) or by some other method.

The root has up to three neighbours, and with high probability it has exactly three neighbours. These neighbours have at most two other neighbours themselves. However, we’ve already seen that there are asymptotically O(1) cycles, and so with high probability there are no small cycles near a fixed root vertex. So the six neighbours-of-neighbours are with high probability different to the root and the root’s neighbours and to each other. We can make this argument at arbitrary finite radius from the root, to conclude that the local limit of G(N,3) is the infinite 3-ary tree.

Spectral expansion

[Caveat – this is something I read about and wanted to mention, but I really don’t know much at all about any of this theory, and it’s definitely not certain that what follows wouldn’t be better replaced by a set of links.]

This straightforward local limit offers good heuristics on some of the more global properties. Almost by definition, the d-ary tree expands as rapidly as is possible away from the root among infinite d-regular graphs. There are a number of ways to measure the expansion of a graph, and some methods transfer better to the infinite setting than others. The adjacency matrix of an infinite graph can be defined similarly to that of a finite graph, and it remains possible to talk about eigenfunctions and spectrum. As for the finite setting, d is an eigenvalue because the tree is d-regular, and -d is an eigenvalue because it is also bipartite.

The next largest eigenvalue $\lambda_2$ governs the spectral gap $d-\lambda_2$ which is a measure of the expansion of a graph. A graph is a good (spectral) expander if all the non-trivial eigenvalues are close to zero. A priori, all we know is that $|\lambda_2|\le d$. For the infinite d-ary tree, we have $\lambda_2 = 2\sqrt{d-1}$. This blog post by Luca Trevisan gives a very readable proof.

A key result is that finite graphs can have $\lambda_2 \le 2\sqrt{d-1}$, but not asymptotically. That is, taking N to be the number of vertices:

$\lambda_2 \ge 2\sqrt{d-1} - o_N(1).$

This is the content of the Alon-Boppana theorem [Al86]. In fact the error can be quantified as $O(\frac{1}{\log N})$ – the diamater of the graph is relevant here. A finite d-regular graph for which $\lambda_2\le 2\sqrt{d-1}$ is called a Ramanujan graph. The existence of Ramanujan graphs has been much studied, and various constructions often rely on number theoretic properties of N, and lie at the interface of disparate branches of mathematics where my understanding is zero rather than epsilon.

Now return to our view of the d-ary tree as the local limit of a d-regular graph on N vertices for large N. We might expect from everything above that the uniform d-regular graph is a good expander. Bollobas shows that in the sense of edge-expansion, asymptotically almost all d-regular graphs have edge-expansion bounded away from zero. (See Section 2 of [Ell], including history of the d=3 case.) Friedman [Fri08] proves the conjecture of Alon that for every $\epsilon>0$, a.a.s. $\lambda_2$ for G(N,d) is at most $2\sqrt{d-1}+\epsilon$. In this sense, G(N,d) is asymptotically ‘almost Ramanujan’. (See also [Bor17] for another proof and an introduction including history, context and references.)

Some other links: The Wikipedia page on expanders, which includes a discussion of the different descriptions of expansion, and the Cheeger inequalities and other relations between them; slides for a talk by Spielman on spectra and Ramanujan graphs; a survey by Murty on Ramanujan graphs;.

What next?

This post took a slightly different direction from what I had intended, and rather than make a halting U-turn back to my planned finale, I’ll postpone this. However, a short overture is that I’m interested in the structure of critical components of random graphs during the critical window. This is the window during which the largest components first have cycles with probability $\Theta(1)$. Indeed, the critical components have size $\Theta(N^{2/3})$ and $\Theta(1)$ surplus edges. Conditional on their size, and number of surplus edges, the choice of the graph structure on the component is uniform among such (connected) graphs.

Addario-Berry, Broutin and Goldschmidt [ABG09] study scaling limits of such components. Central to this analysis is the 2-core of such components, which can be described in terms of 3-regular (multi)graphs. Various processes we are now interested in running on the critical components of critical RGs can then be studied in terms of related processes on random 3-regular graphs.

References

[ABG09] – Addario-Berry, Broutin, Goldschmidt – Critical random graphs: limiting constructions and distributional properties

[Al86] – Alon – Eigenvalues and expanders

[B80] – Bollobas – A probabilistic proof of an asymptotic formula for the number of labelled regular graphs

[B88] – Bollobas – The isoperimetric number of random regular graphs

[Bor17] – Bordenave – A new proof of Friedman’s second eigenvalue theorem and its extension to random lifts. Arxiv.

[Ell] – Ellis – The expansion of random regular graphs

[Fri08] – Friedman – A proof of Alon’s second eigenvalue conjecture and related problems

[J06] – Janson – The probability that a random multigraph is simple by

# Random transpositions

We study a procedure for generating a random sequence of permutations of [N]. We start with the identity permutation, and then in each step, we choose two elements uniformly at random, and swap them. We obtain a sequence of permutations, where each term is obtained from the previous one by multiplying by a uniformly-chosen transposition.

Some more formality and some technical remarks:

• This is a Markov chain, and as often with Markov chains, it would be better it was aperiodic. As described, the cycle will alternate between odd and even permutations. So we allow the two elements chosen to be the same. This laziness slows down the chain by a factor N-1/N, but removes periodicity. We will work over timescales where this adjustment makes no practical difference.
• Let $\tau_1,\tau_2,\ldots$ be the sequence of transpositions. We could define the sequence of permutations by $\pi_m= \tau_m\cdot\tau_{m-1}\cdot \ldots\cdot \tau_1$. I find it slightly more helpful to think of swapping the elements in places i and j, rather the elements i and j themselves, and so I’ll use this language, for which $\pi_m = \tau_1\cdot \tau_2\cdot\ldots \cdot \tau_m$ is the appropriate description. Of course, transpositions and the identity are self-inverse permutations, so it makes no difference to anything we might discuss.
• You can view this as lazy random walk on the Cayley graph of $S_N$ generated by the set of transpositions. That is, the vertices of the graph are elements of $S_N$, and two are connected by an edge if one can be obtained from the other by multiplying by a transposition. Note this relation is symmetric. Hence random transposition random walk.
• Almost everything under discussion would work in continuous time too.

At a very general level, this sort of model is interesting because sometimes the only practical way to introduce ‘global randomness’ is repeatedly to apply ‘local randomness’. This is not the case for permutations – it is not hard to sample uniformly from $S_N$. But it is a tractable model in which to study relevant questions about the generating randomness on a complicated set through iterated local operations.

Since it is a Markov chain with a straightforward invariant distribution, we can ask about the mixing time. That is, the correct scaling for the number of moves before the random permutation is close in distribution (say in the sense of total variation distance) to the equilibrium distribution. See this series of posts for an odd collection of background material on the topic. Diaconis and Shahshahani [DS81] give an analytic argument for mixing around $\frac{N\log N}{2}$ transpositions. Indeed they include a constant because it is a sharp cutoff, where the total variation distance drops from approximately 1 to approximately 0 in O(N) steps.

Comparison with Erdos-Renyi random graph process

In the previous result, one might observe that $m=\frac{N\log N}{2}$ is also the threshold number of edges to guarantee connectivity of the Erdos-Renyi random graph G(N,m) with high probability. [ER59] Indeed, there is also a sharp transition around this threshold in this setting too.

We explore this link further. We can construct a sequence of random graphs simultaneously with the random transposition random walk. When we multiply by transposition (i j), we add edge ij in the graph. Laziness of RTRW and the possibility of multiple edges mean this definition isn’t literally the same as the conventional definition of a discrete-time Erdos-Renyi random graph process, but again this is not a problem for any of the effects we seek to study.

The similarity between the constructions is clear. But what about the differences? For the RTRW, we need to track more information than the random graph. That is, we need to know what order the transpositions were added, rather than merely which edges were added. However, the trade-off is that a permutation is a simpler object than a graph in the following sense. A permutation can be a described as a union of disjoint cycles. In an exchangeable setting, all the information about a random permutation is encoded in the lengths of the these cycles. Whereas in a graph, geometry is important. It’s an elegant property of the Erdos-Renyi process that we can forget about the geometry and treat it as a process on component sizes (indeed, a multiplicative coalescent process), but there are other questions we might need to ask for which we do have to study the graph structure itself.

Within this analogy, unfortunately the word cycle means different things in the two different settings. In a permutation, a cycle is a directed orbit, while in a graph it has the usual definition. I’m going to write graph-cycle whenever relevant to avoid confusion.

A first observation is that, under this equivalence, the cycles of the permutation form a finer partition than the components of the graph. This is obvious. If we split the vertices into sets A and B, and there are no edges between them, then nothing in set A will ever get moved out of set A by a transposition. (Note that the slickness of this analogy is the advantage of viewing a transposition as swapping the elements in places i and j.)

However, we might then ask under what circumstances is a cycle of the permutation the same as a component of the graph (rather than a strict subset of it). A first answer is the following:

Lemma: [Den59] The permutation formed by a product of transpositions corresponding in any order to a tree in the graph has a single cycle.

We can treat this as a standalone problem and argue in the following predictable fashion. (Indeed, I was tempted to set this as a problem during selection for the UK team for IMO 2017 – it’s perfectly suitable in this context I think.) The first transposition corresponds to some edge say ab, and removing this edge divides the vertices into components $A \ni a, B\ni b$. Since no further transposition swaps between places in A and places in B, the final permutation maps a into B and b into A, and otherwise preserves A and B.

This argument extends to later transpositions too. Now, suppose there are multiple cycles. Colour one of them. So during the process, the coloured labels move around. At some point, we must swap a coloured label with an uncoloured label. Consider this edge, between places a and b as before, and indeed the same conclusion holds. WLOG we move the coloured label from a to b. But then at the end of the process (ie in the permutation) there are more coloured labels in B than initially. But the number of coloured labels should be the same, because they just cycle around in the final permutation.

We can learn a bit more by trying thinking about the action on cycles (in the permutation) of adding a transposition. In the following pair of diagrams, the black arrows represent the original permutation (note it’s not helpful to think of the directed edges as having anything to do with transpositions now), the dashed line represents a new transposition, and the new arrows describe the new permutation which results from this product.

It’s clear from this that adding a transposition between places corresponding to different cycles causes the cycles to merge, while adding a transposition between places already in the same cycle causes the cycle to split into two cycles. Furthermore the sizes of the two cycles formed is related to the distance in the cycle between the places defining the transposition.

This allows us to prove the lemma by adding the edges of the tree one-at-a-time and using induction. The inductive claim is that cycles of the permutation exactly correspond to components of the partially-built tree. Assuming this claim guarantees that the next step is definitely a merge, not a split (otherwise the edge corresponding to the next step would have to form a cycle). If all N-1 steps are merges, then the number of cycles is reduced by one on each step, and so the final permutation must be a single cycle.

Uniform split-merge

This gives another framework for thinking about the RTRW itself, entirely in terms of cycle lengths as a partition of [N]. That is, given a partition, we choose a pair of parts in a size-biased way. If they are different, we merge them; and if it is the same part, with size k, we split them into two parts, with sizes chosen uniformly from { (1,k-1), (2,k-2), …  (k-1,1) }.

What’s nice about this is that it’s easy to generalise to real-valued partitions, eg of [0,1]. Given a partition of [0,1], we sample two IID U[0,1] random variables $U_1,U_2$. If these correspond to different parts, we replace these parts by a single part with size given by the sum. If these correspond to the same part, with size $\alpha$, we split this part into two parts with sizes $|U_1-U_2|$ and $\alpha - |U_1-U_2|$. This is equivalent in a distributional sense to sampling another U[0,1] variable U and replacing $\alpha$ with $(\alpha U, \alpha(1-U))$. We probably want our partition to live in $\ell^1_\searrow$, so we might have to reorder the parts afterwards too.

These uniform split-merge dynamics have a (unique) stationary distribution, the canonical Poisson-Dirichlet random partition, hereafter PD(0,1). This was first shown in [DMZZ04], and then in a framework more relevant to this post by Schramm [Sch08].

Conveniently, PD(0,1) is also the scaling limit of the cycle lengths in a uniform random permutation (scaled by N). The best way to see this is to start with the observation that the length of the cycle containing 1 in a permutation chosen uniformly from $S_N$ has the uniform distribution on {1,…,N}. This matches up well with the uniform stick-breaking construction of PD(0,1), though other arguments are available too. Excellent background on Poisson-Dirichlet distributions and this construction and equivalence can be found in Chapter 3 of Pitman’s comprehensive St. Flour notes [CSP]. Also see this post, and the links within, with the caveat that my understanding of the topic was somewhat shaky then (as presently, for now).

However, Schramm says slightly more than this. As the Erdos-Renyi graph passes criticality, there is a well-defined (and whp unique) giant component including $\Theta(N)$ vertices. It’s not clear that the corresponding permutation should have giant cycles. Indeed, whp the giant component has $\Theta(N)$ surplus edges, so the process of cycle lengths will have undergone $O(N)$ splits. Schramm shows that most of the labels within the giant component are contained in giant cycles in the permutation. Furthermore, the distribution of cycle lengths within the giant component, rescaled by the size of the giant component, converges in distribution to PD(0,1) at any supercritical time $\frac{(1+\epsilon)N}{2}$

This is definitely surprising, since we already know that the whole permutation doesn’t look close to uniform until time $\frac{N\log N}{2}$. Essentially, even though the size of the giant component is non-constant (ie it’s gaining vertices), the uniform split-merge process is happening to the cycles within it at rate N. So heuristically, at the level of the largest cycles, at any supercritical time we have a non-trivial partition, so at any slightly later time (eg $\frac{(1+\epsilon/2)N}{2}$ and $\frac{(1+\epsilon)N}{2}$ ), mixing will have comfortably occurred, and so the distribution is close to PD(0,1).

This is explained very clearly in the introduction of [Ber10], in which the approach is extended to a random walk on $S_N$ driven by a uniform choice from any conjugacy class.

So this really does tell us how the global uniform randomness emerges. As the random graph process passes criticality, we have a positive mass of labels in a collection of giant cycles which are effectively a continuous-space uniform split-merge model near equilibrium (and thus with PD(0,1) marginals). The remaining cycles are small, corresponding to small trees which make up the remaining (subcritical by duality) components of the ER graph. These cycles slowly get absorbed into the giant cycles, but on a sufficiently slow timescale relevant to the split-merge dynamics that we do not need to think of a separate split-merge-with-immigration model. Total variation distance on permutations does feel the final few fixed points (corresponding to isolated vertices in the graph), hence the sharp cutoff corresponding to sharp transition in the number of isolated vertices.

References

[Ber10] – N. Berestycki – Emergence of giant cycles and slowdown transition in random transpositions and k-cycles. [arXiv version]

[CSP] – Pitman – Combinatorial stochastic processes. [pdf available]

[Den59] – Denes – the representation of a permutation as a product of a minimal number of transpositions, and its connection with the theory of graphs

[DS81] – Diaconis, Shahshahani – Generating a random permutation with random transpositions

[DMZZ04] – Diaconis, Mayer-Wolf, Zeitouni, Zerner – The Poisson-Dirichlet distribution is the unique invariant distribution for uniform split-merge transformations [link]

[ER59] – Erdos, Renyi – On random graphs I.

[Sch08] – Schramm – Compositions of random transpositions [book link]

# Generating uniform trees

A long time ago, I wrote quite a few a things about uniform trees. That is, a uniform choice from the $n^{n-2}$ unrooted trees with vertex set [n]. This enumeration, normally called Cayley’s formula, has several elegant arguments, including the classical Prufer bijection. But making a uniform choice from a large set is awkward, and so we seek more probabilistic methods to sample such a tree, which might also give insight into the structure of a ‘typical’ uniform tree.

In another historic post, I talked about the Aldous-Broder algorithm. Here’s a quick summary. We run a random walk on the complete graph $K_n$ started from a uniformly-chosen vertex. Every time we arrive at a vertex we haven’t visited before, we record the edge just traversed. Eventually we have visited all n vertices, so have recorded n-1 edges. It’s easy enough to convince yourself that these n-1 edges form a tree (how could there be a cycle?) and a bit more complicated to decide that the distribution of this tree is uniform.

It’s worth noting that this algorithm works to construct a uniform spanning tree on any connected base graph.

This post is about a few alternative constructions and interpretations of the uniform random tree. The first construction uses a Galton-Watson process. We take a Galton-Watson process where the offspring distribution is Poisson(1), and condition that the total population size is n. The resulting random tree has a root but no labels, however if we assign labels in [n] uniformly at random, the resulting rooted tree has the uniform distribution among rooted trees on [n].

Proof

This is all about moving from ordered trees to non-ordered trees. That is, when setting up a Galton-Watson tree, we distinguish between the following two trees, drawn extremely roughly in Paint:

That is, it matters which of the first-generation vertices have three children. Anyway, for such a (rooted) ordered tree T with n vertices, the probability that the Galton-Watson process ends up equal to T is

$\mathbb{P}(GW = T) = \prod_{v\in T} \frac{e^{-1}}{C(v)!} = e^{-n} \prod_{v\in T}\frac{1}{C(v)!},$

where $C(v)$ is the number of children of a vertex $v\in T$. Then, since $\mathbb{P}( |GW|=n )$ is a function of n, we find

$\mathbb{P}(GW=T \,\big|\, |GW|=n) = f(n)\prod_{v\in T} \frac{1}{C(v)!},$

where f(n) is a function of n alone (ie depends on T only through its size n).

But given an unordered rooted tree t, labelled by [n], there are $\prod_{v \in t} C(v)!$ ordered trees associated to t in the natural way. Furthermore, if we take the Poisson Galton-Watson tree conditioned to have total population size n, and label uniformly at random with [n], we obtain any one of these ordered trees with probability $\frac{f(n)}{n!} \prod_{v\in t} \frac{1}{C(v)!}$. So the probability that we have t after we forget about the ordering is $\frac{f(n)}{n!}$, which is a function of n alone, and so the distribution is uniform among the set of rooted unordered trees labelled by [n], exactly as required.

Heuristic for Poisson offspring distribution

In this proof, the fact that $\mathbb{P}(C(v)=k)\propto \frac{1}{k!}$ exactly balances the number of orderings of the k children explains why Poisson(1) works out. Indeed, you can see in the proof that Poisson(c) works equally well, though when $c\ne 1$, the event we are conditioning on (namely that the total population size is n) has probability decaying exponentially in n, whereas for c=1, the branching process is critical, and the probability decays polynomially.

We can provide independent motivation though, from the Aldous-Broder construction. Both the conditioned Galton-Watson construction and the A-B algorithm supply the tree with a root, so we’ll keep that, and look at the distribution of the degree of the root as constructed by A-B. Let $\rho=v_1,v_2,v_3,\ldots$ be the vertices [n], ordered by their discovery during the construction. Then $\rho$ is definitely connected by an edge to $v_2$, but thereafter it follows by an elementary check that the probability $\rho$ is connected to $v_m$ is $\frac{1}{n-1}$, independently across all m. In other words, the distribution of the degree of $\rho$ in the tree as constructed by A-B is

$1+ \mathrm{Bin}\left(n-2,\frac{1}{n-1}\right) \approx 1+\mathrm{Poisson}(1).$

Now, in the Galton-Watson process, conditioning the tree to have fixed, large size changes the offspring distribution of the root. Conveniently though, in a limiting sense it’s the same change as conditioning the tree to have size at least n. Since these events are monotone in n, it’s possible to take a limit of the conditioning events, and interpret the result as the Galton-Watson tree conditioned to survive. It’s a beautiful result that this interpretation can be formalised as a local limit. The limiting spine decomposition consists of an infinite spine, where the offspring distribution is a size-biased version of the original offspring distribution (and so in particular, always has at least one child) and where non-spine vertices have the original distribution.

In particular, the number of the offspring of the root is size-biased, and it is well-known and not hard to check that size-biasing Poisson(c) gives 1+Poisson(c) ! So in fact we have, in an appropriate limiting sense in both objects, a match between the degree distribution of the root in the uniform tree, and in the conditioned Galton-Watson tree.

This isn’t supposed to justify why a conditioned Galton-Watson tree is relevant a priori (especially the unconditional independence of degrees), but it does explain why Poisson offspring distributions are relevant.

Construction via G(N,p) and the random cluster model

The main reason uniform trees were important to my thesis was their appearance in the Erdos-Renyi random graph G(N,p). The probability that vertices {1, …, n} form a tree component in G(N,p) with some particular structure is

$p^{n-1} (1-p)^{\binom{n}{2}-(n-1)} \times (1-p)^{n(N-m)}.$

Here, the first two terms give the probability that the graph structure on {1, …, n} is correct, and the the final term gives the probability of the (independent) event that these vertices are not connected to anything else in the graph. In particular, this has no dependence on the tree structure chosen on [n] (for example, whether it should be a path or a star – both examples of trees). So the conditional distribution is uniform among all trees.

If we work in some limiting regime, where $pn\rightarrow 0$ (for example if n is fixed and $p=\frac{1}{N}\rightarrow 0$), then we can get away asymptotically with less strong conditioning. Suppose we condition instead just that [n] form a component. Now, there are more ways to form a connected graph with one cycle on [n] than there are trees on [n], but the former all require an extra edge, and so the probability that a given one such tree-with-extra-edge appears as the restriction to [n] in G(N,p) is asymptotically negligible compared to the probability that the restriction to [n] of G(N,p) is a tree. Naturally, the local limit of components in G(N,c/N) is a Poisson(c) Galton-Watson branching process, and so this is all consistent with the original construction.

One slightly unsatisfying aspect to this construction is that we have to embed the tree of size [n] within a much larger graph on [N] to see uniform trees. We can’t choose a scaling p=p(n) such that G(n,p) itself concentrates on trees. To guarantee connectivity with high probability, we need to take $p> \frac{\log n}{n}$, but by this threshold, the graph has (many) cycles with high probability.

At this PIMS summer school in Vancouver, one of the courses is focusing on lattice spin models, including the random cluster model, which we now briefly define. We start with some underlying graph G. From a physical motivation, we might take G to be $\mathbb{Z}^d$ or some finite subset of it, or a d-ary tree, or the complete graph $K_N$. As in classical bond percolation (note G(N,p) is bond percolation on $K_N$), a random subset of the edges of G are included, or declared open. The probability of a given configuration w, with e open edges is proportional to

$p^e (1-p)^{|E(G)| - e} q^{k(w)},$ (*)

where the edge-weight $p\in(0,1)$ as usual, and cluster weight $q\in (0,\infty)$, and $k(w)$ counts the number of connected components in configuration w. When q=1, we recover classical bond percolation (including G(N,p) ), while for q>1, this cluster-reweighting favours having more components, and q<1 favours fewer components. Note that in the case $q\ne 1$, the normalising constant (or partition function) of (*) is generally intractable to calculate explicitly.

As in the Erdos-Renyi graph, consider fixing the underlying graph G, and taking $p\rightarrow 0$, but also taking $\frac{q}{p}\rightarrow 0$. So the resulting graph asymptotically ‘wants to have as few edges as possible, but really wants to have as few components as possible’. In particular, 1) all spanning trees of G are equally likely; 2) any configuration with more than one component has asymptotically negligible probability relative to any tree; 3) any graph with a cycle has #components + #edges greater than that of a tree, and so is asymptotically negligible probability relative to any tree.

In other words, the limit of the distribution is the uniform spanning tree of G, and so this (like Aldous-Broder) is a substantial generalisation, which constructs the uniform random tree in the special case where $G=K_n$.

# Parking on a ring, linear hashing

I’ve spent most of my doctorate trying to analyse how adding destructive dynamics affects the behaviour of a particular random growth process, the classical random graph. In this post I’m going to talk about another random growth process, which is slightly less natural, but for which one can show some similar qualitative properties.

The model, and the additive coalescent

Consider m places arranged in a circle, and for consistency of analogy we think of these as parking spaces. Some number n of cars will arrive one at a time. Each car will arrive at a space chosen uniformly at random. If it is empty they will park in it, otherwise they will look clockwise until they find an empty space, and park there. For now we are only interested in growth, so we assume cars never leave. We are interested in the sizes of blocks of consecutively parked cars.

The reason to consider this slightly unnatural statement is its equivalence to the problem of hashing with linear probing, apparently a key topic in computer science, which I won’t pretend that I know anything about. In any case, it’s a nice model, and it seems reasonable that it would have a basis in more realistic search algorithms.

So, how does the sequence of sizes of blocks of consecutively parked cars grow? Well, given the sequence of block sizes, it is reasonably easy to convince yourself that the order of the blocks around the circle is uniformly random, and the number of empty spaces between adjacent blocks is also uniformly random.

Assume for now that there are at least three blocks. A block of size x can merge with a block of size y with the arrival of the next car only if the blocks are adjacent, with exactly one empty space between them. The chance of this is uniform among all pairs of blocks. Now suppose this is the case, and that the block of size y lies clockwise from the block of size x. Then they will merge precisely if the next car arrives at any of the x occupied spaces in that block, or at the empty space between the pair of blocks. This has probability $\frac{x+1}{m}$. There’s also the opposite ordering to consider, where the block of size x lies clockwise from the other. The total probability of this merge $\{x,y\}\mapsto \{x+y+1\}$ is therefore proportional to (x+y+2).

So the process of block sizes looks a bit like the additive coalescent, at least for large blocks. This is in contrast to the random graph process, where the sequence of component sizes behaves exactly like a multiplicative coalescent, where blocks merge at a rate proportional to the product of their sizes.

Asymptotics

As in the random graph process, it’s interesting to ask roughly how large the largest block will be in such a configuration. Pittel [3] considers the case where the number of empty places $\ell = m-n \approx \beta m$, for some $\beta\in (0,1)$.

A less interesting model would be to choose the positions of the n cars uniformly at random. But then the size of a block is roughly geometric with parameter $\beta$, and there are $\Theta(m)$ blocks with high probability. Relatively straightforward calculations in extreme value theory suggest that the largest block is likely to have size on the order of $\log m$ in this setting.

Of course, the actual model is slightly more complicated, because the size of a block is self-reinforcing, since larger blocks are more likely to grow than smaller blocks. However, we can still get somewhere with naïve estimates. Let’s label the places clockwise. Then in order for there to be a block starting at 0 and stretching beyond $\alpha \log m$, a necessary condition is that at least $\alpha \log m$ cars arrive at those places. The number of cars which arrive at those places is binomial, since there are n cars, and each arrives at a place chosen uniformly, and independently of the other cars. So this event corresponds to

$\mathrm{Bin}(n,\frac{\alpha \log m}{m}) \ge \alpha \log m.$

Then, since $n\approx (1-\beta)n$, this event corresponds approximately to

$\mathrm{Po}((1-\beta)\alpha \log m) \ge \alpha \log m.$

The probability that a Poisson RV is at least a constant multiple larger than its mean decays exponentially with the mean, hence in this case the probability is asymptotically some negative power of m, depending on the value of $\alpha$. But there are $O(m)$ possible places for such a block to start, so whether we can apply a union bound usefully or not depends on whether the power of m is strictly less than -1.

Since all of this depends on $\alpha$, it is reasonable that everything is fine, and the largest block does have size at least $\alpha \log m$ when $\alpha$ is small, and very unlikely when $\alpha$ is large. This heuristic argument fits with Pittel’s theorem. Indeed, his result shows much stronger concentration: that the fluctuations of the size of the largest block are O(1).

Critical regime and empirical processes

The following is a paraphrase of the introduction and some methods from [2].

Obviously, once m=m cars have arrived, there’s no room for manoeuvre and definitely all the places are taken in one giant block. But it’s not obvious in general what scaling for the number of gaps will give rise to giant blocks of $\Theta(m)$ cars.

As for the random graph, we can find a process similar to the exploration process of a (random) graph which encodes much of the information we care about. Let $Y_k$ be the number of cars which arrive at place k. So the sum of the $Y_k$s will be n, the total number of cars. Now consider the process

$C_0=0, \ldots, C_{k+1}=C_k + Y_{k+1}-1.$

A block has the property that the number of arrivals within that set of places is equal to the number of places. So every time this *empirical process* C drops below its previous running minimum, this indicates the end of a block. To make this equivalence precise, we need to be a bit careful about where we start counting. It works exactly if we start at the beginning of a block. If not, it might introduce some unwanted divisions within the first block.

What we have is a process that looks roughly like a random walk that is constrained to pass through the point (m,n-m), which is equal to (m,-l). Even if we aren’t totally precise about how this is like a random walk, we would expect to see Brownian fluctuations after rescaling. Indeed, we might expect to see a Brownian bridge added to a deterministic linear function with negative gradient. But this is only meaningful if the random part is at least as large as the deterministic part, and since the fluctuations have order $\sqrt{m}$, if l is much larger than this, the rescaled empirical process is essentially deterministic, so we won’t see any macroscopic excursions above the minimum.

If l is substantially smaller than $\sqrt{m}$, then there is no real difference between (m,-l) and (m,0), and what we see is just a Brownian bridge. At this point, where we choose to start the process is actually important. If we were to start it at the minimum of the Brownian bridge instead, we would have seen a Brownian excursion, which corresponds to one block occupying (almost) all of the places.

Unsurprisingly, the story is completed by considering $\ell=\Theta(\sqrt{m})$, where the rescaled empirical process looks like a slanted Brownian bridge, that is Brownian motion conditioned to pass through $(1,-\frac{\ell}{\sqrt{m})$. There isn’t an obvious fix to the question of where to start the process, but it turns out that the correct way is now adding a Brownian excursion onto the deterministic linear function with gradient $- \frac{\ell}{\sqrt{m}}$. It’s now reasonable that the excursions above the minimum should macroscopic.

This scaling limit works dynamically as well, where the same Brownian excursion is used for different gradients of the deterministic line, corresponding to $\ell$ moving through the critical window $m-\Theta(\sqrt{m})$. Finally, a direction to Bertoin’s recent paper [1] for the model with an additional destructive property. Analogous to the forest fire, blocks of cars are removed at a rate proportional to their size (as a result, naturally, of ‘Molotov cocktails’…). Similar effects of self-organised criticality are seen when the rate of bombs is scaled appropriately.

References

[1] – Bertoin – Burning cars in a parking lot (paper / slides)

[2] – Chassaing + Louchard – Phase transition for parking blocks, Brownian excursion and coalescence (arXiv)

[3] – Pittel – Linear probing: the probable largest search time grows logarithmically with the number of records

# Poisson Random Measures

[This is a companion to the previous post. They explore different aspects of the same problem which I have been thinking about from a research point of view. So that they can be read independently, there has inevitably been some overlap.]

As I explained in passing previously, Poisson Random Measures have come up in my current research project. Indeed, the context where they have appeared seems like a very good motivation for considering the construction and some properties of PRMs.

We begin not with a Poisson variable, but with a standard Erdos-Renyi random graph $G(n,\frac{c}{n})$. The local limit of a component in this random graph is given by a Galton-Watson branching process with Poisson(c) offspring distribution. Recall that a local limit is description of what the structure looks like near a given (or random) vertex. Since the vertices in G(n,p) are exchangeable, this rooting matters less. Anyway, the number of neighbours in the graph of our root is given by Bin(n-1,c/n). Suppose that the root v_0, has k neighbours. Then if we are just interested in determining the vertices in the component, we can ignore the possibility of further edges between these neighbours. So if we pick one of the neighbours of the root, say v_1, and count the number of neighbours of this vertex that we haven’t already considered, this is distributed as Bin(n-1-k,c/n), since we discount the root and the k neighbours of the root.

Then, as n grows large, Bin(n-1,c/n) converges in distribution to Po(c). Except on a very unlikely event whose probability we can control if we need, so does Bin(n-1-k,c/n). Indeed if we consider a set of K vertices which are already connected in some way, then the distribution of the number of neighbours of one of them which we haven’t already considered is still Po(c) in the limit.

Now we consider what happens if we declare the graph to be inhomogeneous. The simplest possible way to achieve this is to specify two types of vertices, say type A and type B. Then we specify the proportion of vertices of each type, and the probability that there is an edge between two vertices of given types. This is best given by a symmetric matrix. So for example, if we wanted a random bipartite graph, we could achieve this as described by setting all the diagonal entries of the matrix to be zero.

So does the local limit extend to this setting? Yes, unsurprisingly it does. To be concrete, let’s say that the proportion of types A and B are a and b respectively, and the probabilities of having edges between vertices of various types is given by $P=(p_{ij}/n)_{i,j\in\{A,B\}}$. So we can proceed exactly as before, only now we have to count how many type A neighbours and how many type B neighbours we see at all stages. We have to specify the type of our starting vertex. Suppose for now that it is type A. Then the number of type A neighbours is distributed as

$\text{Bin}(an,p_{AA}/n)\stackrel{d}{\rightarrow}\text{Po}(ap_{AA})$,

and similarly the limiting number of type B neighbours is $\sim \text{Po}(bp_{AB})$. Crucially, this is independent of the number of type A neighbours. The argument extends naturally to later generations, and the result is exactly a multitype Galton-Watson process as defined in the previous post.

My motivating model is the forest fire. Here, components get burned when they are large and reduced to singletons. It is therefore natural to talk about the ‘age’ of a vertex, that is, how long has elapsed since it was last burned. If we are interested in the forest fire process at some fixed time T>1, that is, once burning has started, then we can describe it as an inhomogeneous random graph, given that we know the ages of the vertices.

For, given two vertices with ages s and t, where WLOG s<t, we know that the older vertex could not have been joined to the other vertex between times T-t and T-s. Why? Well, if it had, then it too would have been burned at time T-s when the other vertex was burned. So the only possibility is that they might have been joined by an edge between times T-s and T. Since each edge arrives at rate 1/n, the probability that this happens is $1-e^{-s/n}\approx \frac{s}{n}$. Indeed, in general the probability that two vertices of ages s and t are joined at time T is $\frac{s\wedge t}{n}$.

Again at fixed time T>1, the sequence of ages of the vertices converges weakly to some fixed distribution (which depends on T) as the number of vertices grows to infinity. We can then recover the graph structure by assigning ages according to this distribution, then growing the inhomogeneous random graph with the kernel as described. The question is: when we look for a local limit, how to do we describe the offspring distribution?

Note that in the limit, components will be burned continuously, so the distribution of possible ages is continuous (with an atom at T for those vertices which have never been burned). So if we try to calculate the distribution of the number of neighbours of age s, we are going to be doomed, because with probability 1 then is no vertex of age s anywhere!

The answer is that the offspring distribution is given by a Poisson Random Measure. You can think of this as a Poisson Point Process where the intensity is non-constant. For example, let us consider how many neighbours we expect to have with ages [s,s+ds]. Let us suppose the age of our root is t>s+ds for now. Assuming the distribution of ages, $f(\cdot)$ is positive and continuous, the number of vertices with these ages in the system is roughly nf(s)ds, and so the number of neighbours with this property is roughly $\text{Bin}(nf(s)ds,\frac{s}{n})$. In particular, this does have a Poisson limit. We need to be careful about whether this Poisson limit is preserved by the approximation. In fact this is fine. Let’s assume WLOG that f is increasing at s. Then the number of age [s,s+ds] neighbours can be stochastically bounded between $\text{Bin}(nf(s)ds,\frac{s}{n})$ and $\text{Bin}(nf(s+ds)ds,\frac{s+ds}{n}$. As n grows, these converge in the distribution to two Poisson random variables, and then we can let ds go to zero. Note for full formalism, we may need to account for the large deviations event that the number of age s vertices in the system is noticeably different from its expectation. Whether this is necessary depends on whether the ages are assigning deterministically, or drawn IID-ly from f.

One important result to be drawn from this example is that the number of offspring from disjoint type sets, say $[s_1,s_2], [t_1,t_2]$ are independent, for the same reason as in the two-type setting, namely that the underlying binomial variables are independent. We are, after all, testing different sets of vertices! The other is that the number of neighbours with ages in some range is Poisson. Notice that these two results are consistent. The number of neighbours with ages in the set $[s_1,s_2]\cup [t_1,t_2]$ is given by the sum of two independent Poisson RVs, and hence is Poisson itself. The parameter of the sum RV is given by the sum of the original parameters.

These are essentially all the ingredients required for the definition of a Poisson Random Measure. Note that the set of offspring is a measure of the space of ages, or types. (Obviously, this isn’t a probability measure.) We take a general space E, with sigma algebra $\mathcal{E}$, and an underlying measure $\mu$ on E. We want a distribution $\nu$ for measures on E, such that for each Borel set $A\in\mathcal{E}$, $\nu(A)$, which is random because $\nu$ is, is distributed as $\text{Po}(\mu(A))$, and furthermore, for disjoint $A,B\in\mathcal{E}$, the random variables $\nu(A),\nu(B)$ are independent.

If $M=\mu(E)<\infty$, then constructing such a random measure is not too hard using a thinning property. We know that $\nu(E)\stackrel{d}{=}\text{Po}(M)$, and so if we sample a Poisson(M) number of RVs with distribution given by $\frac{\mu(\cdot)}{M}$, we get precisely the desired PRM. Proving this is the unique distribution with this property is best done using properties of the Laplace transform, which uniquely defines the law of a random measure in the same manner that the moment generating function defines the law of a random variable. Here the argument is a function, rather than a single variable for the MGF, reflecting the fact that the space of measures is a lot ‘bigger’ than the reals, where a random variable is supported. We can extend this construction for sigma-finite spaces, that is some countable union of finite spaces.

One nice result about Poisson random measures concerns the expectation of functions evaluated at such a random measure. Recall that some function f evaluated at the measure $\sum \delta_{x_i}$ is given by $\sum f(x_i)$. Then, subject to mild conditions on f, the expectation

$\mathbb{E}\nu (f)=\mu(f).$

Note that when $f=1_A$, this is precisely one of the definitions of the PRM. So by a monotone class result, it is not surprising that this holds more generally. Anyway, I’m currently trying to use results like these to get some control over what the structure of this branching processes look like, even when the type space is continuous as in the random graph with specified ages.

# Dispersion in Social Networks

This post is based on a paper that appeared a couple of weeks ago on the Computer Science section of arXiv. You can find it here. I’m going to write a few things about the ideas behind the paper, and avoid pretty much entirely the convincing data the authors present, as well as questions of implementing the algorithms discussed.

The setting is a social network, which we can describe as a graph. Nodes stand for people, and an edge represents that the two associated people have some social connection. This paper focuses on edges corresponding to friendship in the Facebook graph.

A key empirical feature of the graph topology of such social networks as compared to most mathematical models of random graphs is the prevalence of short cycles, and so-called clustering. Loosely speaking, in an Erdos-Renyi random graph, any potential edges appear in the graph independently of the rest of the configuration. But this does not accord well with our experience of our own Facebook friend circle. In general, knowledge that A is friends with both B and C increases the likelihood that B and C are themselves friends. This effect appears to be more present in other models, such as Preferential Attachment and the Configuration Model, but that is really more a consequence of the degree sequence being less concentrated.

The reason for this phenomenon appearing in social networks is clear. People meet other people generally by sharing common activities, whether that be choice of school, job or hobbies. The question of how readily people choose to add others on Facebook is a worthwhile one, but not something I have the time or the sociological credibility to consider! In any case, it is not a controversial idea that for some typical activity, it is entirely possible that almost all the participants will end up as friends, leading to a large (almost-) ‘clique’ in the graph. Recall a clique is a copy of a complete graph embedded in a larger graph – that is, a set of nodes all of which are pairwise connected.

We could think of much of the structure of this sort of network as being generated in the following way. Suppose we were able to perform the very unlikely-sounding task of listing every conceivable activity or shared attribute that might engender a friendship. Each activity corresponds to a set of people. We then construct a graph on the set of people by declaring that a pair of nodes are connected by an edge precisely if the people corresponding to these nodes can both be found in some activity set.

Another way of thinking about this setup is to consider a bipartite graph, with people as one class of vertices, and activities as the other. Predictably, we join a person to an activity if they engage in that activity. The edges within the class of people are then induced by the bipartite edges. Obviously, under this interpretation, we could equally well construct a graph on the set of activities. Here, two activities would be joined if there is a person who does them both. Graphs formed in this way can be called Intersection Graphs, and there is lots of interest in investigating various models of Random Intersection Graphs.

The question addressed by the authors of the paper can be summarised as follows. A social network graph tells us whether two people are ‘friends’, but it does not directly tell us how close their relationship is. It is certainly an interesting question to ask whether the (local) network topology can give us a more quantitative measure of the strength of a friendship.

As the authors explain, a first approach might be to consider how many mutual friends two people have. (We consider only pairs of people who are actually friends. It seems reasonable to take this as a pre-requisite for a strong relationship among people who do actually use Facebook.) However, this can fail because of the way these social networks organise themselves around shared attributes and activities. The size of one of these cliques (which are termed social foci in parts of the literature) is not especially likely to be well correlated to the strengths of the friendships within the clique. In particular, the clique corresponding to someone’s workplace is likely to grow in size over time, especially when people grow towards an age where, on average, they move job much less. So it seems likely that, according to a naive examination of the number of mutual friends, we would predict that a person’s strongest friend is likely to be someone they work with, who perhaps by chance also does some other activity with that person.

The authors phrase this problem slightly differently. They examine algorithms for establishing a person’s spouse or long-term partner with good accuracy from only the local network structure.

Heuristically we might expect that a husband knows many of his wife’s work colleagues, and vice versa. Not all of these ties might be so strong that they actually lead to friendship, in the Boolean sense of Facebook, but we might expect that some noticeable proportion have this property. Naturally, there will be cliques to which both belong. One or more of these might be the reason they met in the first place, and others (eg parents at children’s schools) might have developed over the course of their relationship. However, as we’ve explained, this doesn’t narrow things down much.

(We need not be constrained by this heteronormative scenario. However, as the authors point out in a footnote, there are challenges in collecting data because of the large number of ironic relationship listings on Facebook, mainly among the undergraduate and younger community. This problem is particularly obstructive in the case of same-sex marriage, owing to the smaller numbers of genuine pairings, and larger numbers of false listings for this setting.)

The crucial observation is that if we look at the couple’s mutual friends, we expect to see large parts of the most important cliques from both husband and wife’s lives. Among these mutual friends, there will be some overlap, that is cliques of which both are an integral member. But among the rest, there will be a natural partition into friends who really originate from the husband, and friends who were introduced via the wife. So the induced graph on these mutual friends is likely to split into three classes of vertices, with very poor connectivity between two of them.

This is, up to sorting out scaling and so on, precisely the definition of dispersion, introduced by the authors. The dispersion between two vertices is high if the induced graph on their mutual neighbourhood has poor connectivity. Modulo exact choice of definition, they then exhibit data showing that this is indeed a good metric for determining marriages from the network topology, with success rate of around 50% over a wide range of users.

# Critical Components in Erdos-Renyi

In various previous posts, I’ve talked about the phase transition in the Erdos-Renyi random graph process. Recall the definition of the process. Here we will use the Gilbert model G(n,p), where we have n vertices, and between any pair of vertices we add an edge, independently of other pairs with probability p. We are interested in the sparse scaling, where the typical vertex has degree O(1) in n, and so p=c/n for constant c>0, and we assume throughout that n is large. We could alternatively have considered the alternative Erdos-Renyi model where we choose uniformly at random from the set of graphs with n vertices and some fixed number of edges. Almost all the results present work equally well in this setting.

As proved by Erdos and Renyi, the typical component structure of such a graph changes noticeably around the threshold c=1. Below this, in the subcritical regime, all the components are small, meaning of size at most order O(log n). Above this, in the supercritical regime, there is a single giant component on some non-zero proportion of the vertices. The rest of the graph looks subcritical. The case c=1 exhibits a phase transition between these qualitatively different behaviours. They proved that here, the largest component is with high probability O(n^2/3). It seems that they thought this result held whenever c=1-o(1), but it turns out that this is not the case. In this post, I will discuss some aspects of behaviour around criticality, and the tools needed to treat them.

The first question to address is this: how many components of size n^{2/3} are there? It might be plausible that there is a single such component, like for the subsequent giant component. It might also be plausible that there are n^1/3 such components, so O(n) vertices are on such critical components. As then it is clear how we transition out of criticality into supercriticality – all the vertices on critical components coalesce to form the new giant component.

In fact neither of these are correct. The answer is that for all integers k>0, with high probability the k-th largest component is on a size scale of n^2/3. This is potentially a confusing statement. It looks like there are infinitely many such components, but of course for any particular value of n, this cannot be the case. We should think of there being w(1) components, but o(n^b) for any b>0.

The easiest way to see this is by a duality argument, as we have discussed previously for the supercritical phase. If we remove a component of size O(n^2/3), then what remains is a random graph with n-O(n^2/3) vertices, and edge probability the same as originally. It might make sense to rewrite this probability 1/n as

$\frac{1}{n-O(n^{2/3})}\cdot \frac{n-O(n^{2/3})}{n}=\frac{1-O(n^{-1/3})}{n-O(n^{2/3})}.$

The approximation in the final numerator is basically the same as

$1-o\left(n-O(n^{2/3})\right).$

Although we have no concrete reasoning, it seems at least plausible that this should look similar in structure to G(n,1/n). In particular, there should be another component of size

$O\left([n-O(n^{2/3})]^{2/3}\right)=O(n^{2/3})$.

In fact, the formal proof of this proceeds by an identical argument, only using the exploration process. Because I’ve described this several times before, I’ll be brief. We track how far we have gone through each component in a depth-first walk. In both the supercritical and subcritical cases, when we scale correctly we get a random path which is basically deterministic in the limit (in n). For exactly the same reasons as visible CLT fluctuations for partial sums of RVs with expectation zero, we start seeing interesting effects at criticality.

The important question is the order of rescaling to choose. At each stage of the exploration process, the number of vertices added to the stack is binomial. We want to distinguish between components of size $O(n^{2/3})$ so we should look at the exploration process at time $sn^{2/3}$. The drift of the exploration process is given by the expectation of a binomial random variable minus one (since we remove the current vertex from the stack as we finish exploring it). This is given by

$\mathbb{E}=\left[n-sn^{2/3}\right]\cdot \frac{1}{n}-1=-sn^{-1/3}.$

Note that this is the drift in one time-step. The drift in $n^{2/3}$ time-steps will accordingly by $sn^{1/3}$. So, if we rescale time by $n^{2/3}$ and space by $n^{1/3}$, we should get a nice stochastic process. Specifically, if Z is the exploration process, then we obtain:

$\frac{1}{n^{1/3}}Z^{(n)}_{sn^{2/3}} \rightarrow_d W_s,$

where W is a Brownian motion with inhomogeneous drift -s at time s. The net effect of such a drift at a fixed positive time is given by integrating up to that time, and hence we might say the process has quadratic drift, or is parabolic.

We should remark that our binomial expectation is not entirely correct. We have discounted those $sn^{2/3}$ vertices that have already been explored, but we have not accounted for the vertices currently in the stack. We should also be avoiding considering these. However, we now have a heuristic for the approximate number of these. The number of vertices in the stack should be $O(n^{1/3})$ at all times, and so in particular will always be an order of magnitude smaller than the number of vertices already considered. Therefore, they won’t affect this drift term, though this must be accounted for in any formal proof of convergence. On the subject of which, the mode of convergence is, unsurprisingly, weak convergence uniformly on compact sets. That is, for any fixed S, the convergence holds weakly on the random functions up to time $sn^{2/3}$.

Note that this process will tend to minus infinity almost surely. Component sizes are given by excursions above the running minimum. The process given by the height of the original process above the running minimum is called reflected. Essentially, we construct the reflected process by having the same generator when the current value is positive, and forcing the process up when it is at zero. There are various ways to construct this more formally, including as the scaling limit of some simple random walks conditioned never to stay non-negative.

The cute part of the result is that it holds equally well in a so-called critical window either side of the critical probability 1/n. When the probability is $\frac{1+tn^{-1/3}}{n}$, for any $t\in \mathbb{R}$, the same argument holds. Now the drift at time s is t-s, though everything else still holds.

This result was established by Aldous in [1], and gives a mechanism for calculating distributions of component sizes and so on through this critical window.

In particular, we are now in a position to answer the original question regarding how many such components there were. The key idea is that because whenever we exhaust a component in the exploration process, we choose a new vertex uniformly at random, we are effectively choosing a component according to the size-biased distribution. Roughly speaking, the largest components will show up near the beginning. Note that a critical $O(n^{2/3})$ component will not necessarily be exactly the first component in the exploration process, but the components that are explored before this will take up sufficiently few vertices that they won’t show up in the scaling of the limit.

In any case, the reflected Brownian motion ‘goes on forever’, and the drift is eventually very negative, so there cannot be infinitely wide excursions, hence there are infinitely many such critical components.

If we care about the number of cycles, we can treat this also via the exploration process. Note that in any depth-first search we are necessarily only interested in a spanning tree of the host graph. Anyway, when we are exploring a vertex, there could be extra edges to other vertices in the stack, but not to vertices we’ve already finished exploring (otherwise the edge would have been exposed then). So the expected number of excess edges into a vertex is proportional to the height of the exploration process at that vertex. So the overall expected number of excess edges, conditional on the exploration process is the area under the curve. This carries over perfectly well into the stochastic process limit. It is then a calculation to verify that the area under the curve is almost surely infinite, and thus that we expect there to be infinitely many cycles in a critical random graph.

REFERENCES

[1] Aldous D. – Brownian excursions, critical random graphs and the multiplicative coalescent

# Recent Research Activity

I’ve spent this week in Luminy, near Marseille, attending a summer school run by ALEA, the organisation of French probabilists. We’ve been staying in CIRM, a dedicated maths research conference centre at the edges of the calanques, the area of mountains and jagged coastal inlets between Marseille and Cassis. The walking possibilities have been excellent, as have the courses and lectures, on a range of topics in probability theory.

Anyway, the time here has been an excellent moment to reflect on my research progress, and try to come up with the sort of fresh ideas that are perhaps slightly inhibited by sitting at a desk with an endless supply of paper on which to try calculations. When I get back, I have to submit a first-year report, so at least for a little while I will have to suppress the desire to make further progress and instead diligently assemble the progress I have made.

The Model

I’ve defined some of these processes in past posts, but I see no harm in doing so again. We take the standard Erdos-Renyi random graph process, where edges are added one-at-a-time uniformly at random between n vertices, and amend it by adding a deletion mechanism. The aim is to arrive at a process which looks in equilibrium more like the critical random graph than either the subcritical or supercritical regimes, where the components are very small, and dominated by one giant component respectively. Rath, Toth and others have studied the process where each vertex is hit by lightning at uniform rate. When this happens, we delete all the edges in the component containing that vertex. Naturally, big components will be hit by lightning more often than small components, and so this acts as a mechanism to prevent the formation of giant components, if scaled correctly.

We take a different approach. We observe that criticality in the original random graph process is denoted by the first appearance of a giant component, but also by the first appearance of a) lots of cycles, and b) large cycles. In particular, it is very unlikely that a giant component could form without containing any cycles. We will therefore use the appearance of a cycle to trigger some form of deletion mechanism.

Our final goal is to treat the so-called ‘Cycle Deletion’ model. Here, whenever a cycle appears, we delete all the edges in that cycle immediately. There are several challenges in treating this model, because the rate at which cycles emerge in a tree is a function of the tree structure. The trees in this model will not be Uniform Spanning Trees (though it is very possible that they will be ‘almost USTs’ in some sense – we need to investigate this further) so it will be hard to make nice statements about the rates. For the standard random graph process, if we are only interested in the sizes of the components, we are actually allowed to ignore the graph structure entirely. The component sizes evolve as a discrete, stochastic version of the multiplicative coalescent (sometimes called a Marcus-Lushnikov process). We would like a deletion mechanism that has a nice interpretation as a fragmentation operation in the same sense. The rate at which a component fragments will be quadratic in the size of the component, since there are $O(k^2)$ possible edges between k vertices forming a component, and adding any of precisely these will create a cycle.

I’ve talked previously about how to overcome the problems with the tree structure in Cycle Deletion with the so-called Uniform Cycle Deleting model. In any case, as a starting point we might consider the Cycle-Induced Forest Fire model. Here, whenever a cycle appears, we delete all the edges, including the new one, in the whole component which contains the cycle.

We suspect this model may resemble the critical random graph at all times. The main characteristic of G(n,1/n) is that the largest component is of size O(n^2/3), and indeed there are arbitrarily many components of this size, with high probability in the limit. Since CIFF is recurrent for any fixed n, meaning that it will visit any state infinitely often (rather than tending to infinity or similar), we should ask what the largest component is typically in the equilibrium distribution. Our aim is to prove that it is O(n^2/3). We might suspect that the typical size of the largest component will be greater in the Cycle Deletion model, since each fragmentation event is less severe there, removing fewer edges.

An Upper Bound

The nice thing about Markov chains is that they have an ergodic property, which means that if you run them for long enough, the proportion of time spent in any state is given by the stationary probability of being in that state. It doesn’t matter whether or not you start in equilibrium, since it will converge anyway. Thus it is meaningful to talk about properties like the average number of isolated vertices as a time-average as well as an average with respect to some distribution.

This quantity is the key to an upper bound. We can equally talk about the average change in the number of isolated vertices in a time-step. This will increase when a component fragments, and will decrease when an isolated vertex coalesces with another component. In particular, the largest possible decrease in the number of isolated vertices in a single time-step is 2, corresponding to an edge appearing between two isolated vertices.

Suppose that with probability $\Theta(1)$ there is a component of size $n^\alpha$ for some $\alpha>2/3$. Then such a component makes a contribution to the expected change in the number of isolated vertices of

$\Theta(1) n^\alpha \left(\frac{n^\alpha}{n}\right)^2.$ (*)

Where does this come from? Well, we are tracking the contributions from the event that the largest component is of this size and that it fragments, giving $n^\alpha$ new isolated vertices. So the $\Theta(1)$ accounts for the probability that there is such a component to begin with. Then, conditional on that, the probability that it gets fragmented in the next time-step is the probability that both ends of the next edge added lie in that component. Since the edge is chosen uniformly at random, the probability of this is $n^\alpha/n$. Note that this is under a slightly odd definition of an edge, that allows loops. Basically, I don’t want to have lots of correction terms involving $\binom{n}{2}$ floating around. However, it would make no difference to the orders of magnitude if we to do it with these.

So, this is only one contribution to the typical rate of gain of isolated vertices. Now note that if $\alpha>2/3$, then this expression is >> 1. This is bad since the negative contributions to this expected flux in the number of isolated vertices is O(1). So this suggests that over time, the number of isolated vertices will keep growing. This is obviously ridiculous since a) we are in equilibrium, so the expected flux should be 0 and b) the number of isolated vertices cannot exceed n, for clear reasons.

This gives us an upper bound of n^2/3 as the typical scale of the largest component. We can come up with a similar argument for the cycle deleting model. The most helpful thing to track there is the number of edges in the graph. Note that since the graph is at all times a forest on n vertices, the number of edges is equal to n minus the number of (tree) components. We use the fact that the typical fragmentation of a component of size k creates $O(\sqrt{k})$ new components. It is possible to argue via isolated vertices here too, but the estimates are harder, or at least less present in the literature.

Lower Bounds?

The problem with lower bounds is that it is entirely possible that the flux in the number of isolated vertices is not driven by typical behaviour. Suppose for example we had a different rule. We begin a random graph process, and the first time we see a cycle in a component with size larger than n^2/3, we delete all the edges in the whole graph. Then we will see a sequence of random graph processes starting with the empty graph and stopped at some point close to criticality (in fact, with high probability in the *critical window*), and these will all be glued together. So then, most of the time the process will look subcritical, but the gains in isolated vertices will occur only during the critical periods, which are only an asymptotically small proportion of the time.

At the moment, my approach to the lower bound is instead to prove that the upper bound is tight. I mean this in the following sense. Suppose we wanted to be sure that (*) was in fact equal to the average rate of gain of isolated vertices. We would have to check the following:

• That the total contributions from all other components were similar or smaller than from the component(s) of size roughly $n^{\alpha}$.
• That there were only a few components of size $n^{\alpha}$. In particular, the estimate would be wrong if there were $n^\epsilon$ such components for any $\epsilon>0$.
• That it cannot be the case that for example, some small proportion of the time there is a component of size roughly $n^{\alpha+\epsilon}$, and over a large enough time these make a greater contribution to the average gain in isolated vertices.

A nice way to re-interpret this is to consider some special vertex and track the size of its component in time. It will be involved in repeated fragmentations over the course of time, so it is meaningful to talk about the distribution of the size of the component containing the vertex when it is fragmented. Our aim is to show that this distribution is concentrated on the scaling $O(n^\alpha)$.

So this has turned out to be fairly hard. Rather than try to explain some of the ideas I’ve employed in attempting to overcome this, I will finish by giving one reason why it is hard.

We have seen that the component sizes in random graphs evolve as the multiplicative coalescent, but at a fixed moment in time, we can derive good estimates from an analogy with branching processes. We might like to do that here. If we know what the system looks like most of the time, we might try to ‘grow’ a multiplicative coalescent, viewing it like a branching process, with distribution given by the typical distribution. The problem is that when I do this, I find that the expectation of the offspring distribution is $\Theta(1)$. This looks fine, since 1 is the threshold for extinction with probability 1. However, throughout the analysis, I have only been paying attention to the exponent of n in all the time and size estimates. For example, I view $n^\alpha$ and $n^\alpha \log n$ as the same. This is a problem, as when I say the expectation is $\Theta(1)$, I am really saying it is $\sim n^0$. This means it could be $\frac{1}{\log n}$ or $\log n$. Of course, there is a massive difference between these, since a branching process grows expectationally!

So, this approach appears doomed in its current form. I have some other ideas, but a bit more background may be required before going into those. I’m going to be rather busy with teaching on my return to the office, so unfortunately it is possible that there may be many posts about second year probability and third year applied probability before anything more about CIFF.

# The Configuration Model

In the past, I’ve talked about limitations of the Erdos-Renyi model of homogeneous random graphs for applications in real-world networks. In a previous post, I’ve discussed a dynamic model, the Preferential Attachment mechanism, that ‘grows’ a graph dynamically by adding edges from new vertices preferentially to existing vertices with high degree. The purpose of this adjustment is to ensure that the distribution of the degrees is not concentrated around some fixed value (which would be c in G(n,c/n) ) but rather exhibits a power-law tail such as observed in many genuine examples.

In this post, we introduce some aspects of the configuration model, which achieves this property more directly. This idea probably first arose in the guise of regular graphs. Recall a regular graph has all degrees equal. How would we construct a random d-regular graph on a large number of vertices?

What we probably want to do is to choose uniformly at random from the set of such graphs, but it is not clear even how large this set is, let alone how one would order its elements to make it possible to make this uniform choice. Instead, we try the following. Assign to each vertex d so-called stubs, which will end up being ‘half-edges’. We then choose two stubs uniformly at random, and glue them together. More formally, we construct an edge between the host vertices, and then delete the chosen stubs. We then continue.

The construction makes no reference to the distribution of stubs, so we are free to choose this as we please. We could for example specify some sequence of degrees which approximates a power-law, so we could sample a random sequence of degrees in some way. So long as we have a sequence of stub set sizes before we start building the edges of the graph we will be able to use the above algorithm.

So what might go wrong? There seem to me to be three potential problems that might arise with this construction.

Firstly, there might be a stub left over, if the sum of the stub set sizes is odd. Recall that in a graph the sum of the degrees is twice the sum of the number of edges, and so in particular the sum of the degrees should be even. But this is a small problem. When the degree sequence is deterministic we can demand that it have even sum, and if it is random, we will typically be working in a large N regime, and so deleting the solitary stub, if such a thing exists, will not affect the sort of properties of the graph we are likely to be interested in.

The second and third objections are perhaps more serious. If we glue together stubs naively, we might end up with loops, that is, edges that ‘begin’ and ‘end’ at the same vertex. These are not allowed in the standard definition of a graph. Alternatively, we might end up with more than one edge between the same pair of vertices.

Our overall aim is that this mechanism gives a convenient way of simulating the uniform distribution on simple graphs with a given degree sequence. At present we have the uniform distribution on potential multigraphs, with a weighting of 1/k! for every multi-edge with multiplicity k, and a weighting of 1/2 for every loop. The latter can be seen because there is an initial probability proportional to $d(v_i)d(v_j)$ that vertices v_i and v_j will be joined, whereas a probability proportional (with the same constant) to $d(v_i)^2$ that v_i will receive a loop. The multi-edge weighting justification is similar.

However, conditional on getting a simple graph, the distribution is uniform on the set of simple graphs with that degree sequence. So it remains to investigate the probability that a graph generated in this way is simple. So long as this probability does not tend to 0 as n grows, we will probably be happy.

The strongest results on this topic are due to Janson. First observe that if the sum of the degrees grows faster than the number of vertices n, we fail to get a graph without loops with high probability. Heuristically, note that on the first pass, we are taking two picks from the set of vertices, biased by the number of stubs. By Cauchy-Schwarz, Rearrangement Inequality or just intuition, the probability of getting the same vertex is greater than if we picked uniformly from the set of vertices without biasing. So the probability of getting no loop on the first pass is $\le (1-\frac{1}{n})$. Take some function a(n) that grows faster than n, but slower than the sum of the degrees. Then after a(n) passes, the degree distribution is still roughly the same. In particular, the sum of the degrees is still an order of magnitude greater than n. So we obtain:

$\mathbb{P}(\text{no loops})\leq (1-\frac{1}{n})^{a(n)}\approx e^{-\frac{a(n)}{n}}\rightarrow 0.$

So, since isolated vertices have no effect on the simplicity or otherwise, we assume the sum of the degrees is $\Theta(n)$. Then, Janson shows that the further condition

$\sum_{i=1}^n d_i^2=O(n),$

is essentially necessary and sufficient for simplicity. We can see why this might be true by looking at the probability that the first edge added is a loop, which is roughly

$\frac{d_1^2+d_2^2+\ldots+d_n^2}{2(\sum d_i)^2}.$

We have to consider $O(\sum d_i)$ edges, so if the above expression is much larger than this, we can perform a similar exponential estimate to show that the probability there are no loops is o(1). The technical part is showing that this probability doesn’t change dramatically as the first few stubs disappear.

Note that in both cases, considering only loops is sufficient for simplicity. Although it looks like loop appearance is weaker than multiplicity of edges, in fact they have the same threshold. It should also be pointed out that, like the uniform random forests, an alternative approach is simply to count the number of simple graphs and multigraphs with a given degree sequence. Good asymptotics can then be found for the probability of simplicity.

In the case of G(n,c/n), we were particularly interested in the emergence of the giant component at time c=1. While first-moment methods can be very effective in demonstrating such results, a branching process local limit representation is probably easiest heuristic for this phase transition.

So long as the degree sequences converge in a natural way, we can apply a similar approach to this configuration model. Concretely, we assume that the proportion of vertices with degree i is $\lambda_i$ in the limit. Although the algebra might push through, we should be aware that this means we are not explicitly specifying how many vertices have degree, eg $\Theta(n^{1/2})$. For now assume the $\lambda_i$s sum to 1, so specify a probability distribution for degree induced by choosing a vertex uniformly at random.

So we start at a vertex, and look at its neighbours. The expected number of neighbours of this root vertex is $\sum i\lambda i$. Thereafter, when we consider a child vertex, based on how the stubs are paired up (and in particular the fact that the order of the operations does not matter – the choice of partner of a given stub is chosen uniformly at random), we are really choosing a stub uniformly at random. This corresponds to choosing a vertex at random, biased by the number of stubs available. The quantity of interest is how many additional stubs (other than the one that led to the vertex) are attached to this vertex. We assume we don’t need to worry too much about repeating vertices, in a similar way to G(n,c/n). So the expected number of additional stubs is

$\frac{1}{\sum i\lambda_i}\sum i\lambda_i(i-1).$

For an infinite component, we required the expectation to be > 1, which is equivalent to

$\sum \lambda_i i(i-2)>0.$

This was proven by Molloy and Reed (95), then with fewer conditions by Janson (07). The latter also shows how to use this construction to derive the giant component for G(n,c/n) result.

REFERENCES

Janson – A New Approach to the Giant Component Problem

Molloy, Reed – A Critical Point for Random Graphs with a Given Degree Sequence

Janson – The Probability that  Random Multigraph is Simple