I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.
As we enter the final stages of the semester, I want to discuss some extensions to the standard Erdos-Renyi random graph which has been the focus of most of the course so far. Although we will not get far into the details during this course, the overall goal is to develop models which are close to Erdos-Renyi in terms of ease of analysis, while also allowing more of the features characteristic of networks observed in the real world.
One of the more obvious deficiencies of the sparse regime of Erdos-Renyi random graphs for modelling ‘real-world phenomena’ concerns the degree sequence. Indeed, the empirical degree distribution of G(n,c/n) converges to Poisson(c). By contrast, in real-world networks, a much wider range of degrees is typically observed, and in many cases it is felt that these should follow a power law, with a small number of a very highly connected agents.
One way around this problem to construct random graphs where we insist that the graph has a given sequence of degrees. The configuration model, which is the subject of this lecture and this post (and about which I’ve written before), offers one way to achieve this.
Definition and notes
Let and let
be a sequence of non-negative integers such that
is even. Then the configuration model with degree sequence d is a random multigraph with vertex set [n], constructed as follows:
- To each vertex
, assign
half-edges;
- Then, take a uniform matching of these half-edges;
- Finally, for each pair of half-edges in the matching, replace the two half-edges with a genuine edge, to obtain the multigraph
, in which, by construction, vertex i has degree
.
One should note immediately that although the matching is uniform, the multigraph is not uniform amongst multigraphs with that degree sequence. Note also that the condition on the sums of the degrees is necessary for any graph, and in this context means that the number of half-edges is even, without which it would not be possible to construct a matching.
This effect is manifest in the simplest possible example, when n=2 and d=(3,3). There are two possible graphs, up to isomorphism, which are shown below:
For obvious reasons, we might refer to these as the handcuffs and the theta , respectively. It’s helpful if we, temporarily, assume the half-edges are distinguishable at the moment we join them up in the configuration model construction. Because then there are 3×3=9 ways to join them up to form the handcuffs (think of which half-edge ends up forming the edge between the two vertices) while there are 3!=6 ways to pair up the half-edges in the theta.
In general, for multigraphs H with the correct degree sequence, we have
where is the multiplicity with which a given edge e appears in H.
Note: it might seem counterintuitive that this procedure is biased against multiple edges and self-loops, but it is really just saying that there are more ways to form two distinct edges than to form two equal edges (ie a multiedge pair) when we view the half-edges as distinguishable. (See this post for further discussion of this aspect in the 3-regular setting.)
However, a consequence of this result is that if we condition on the event that is simple, then the resulting random graph is uniform on the set of simple graphs satisfying the degree property. Note that the same example as above shows that there’s no guarantee that there exists a simple graph whose degrees are some given sequence.
d-regular configuration model
In general, from a modelling point of view, we are particularly interested in simple, connected graphs, and so it is valuable to study whether the large examples of the configuration model are likely to have these properties. In this lecture, I will mainly focus on the case where the multigraphs are d-regular, meaning that all the vertices have degree equal to d. For the purposes of this lecture, we denote by , the d-regular configuration model
.
- d=1: to satisfy the parity condition on the sums of degrees, we must have n even. But then
will consist of n/2 disjoint edges.
- d=2:
will consist of some number of disjoint cycles, and it is a straightforward calculation to check that when n is large, with high probability the graph will be disconnected.
In particular, I will focus on the case when d=3, which is the first interesting case. Most of the results we prove here can be generalised (under various conditions) to more general examples of the configuration model. The main goal of the lecture is revision of some techniques of the course, plus one new one, in a fresh setting, and the strongest possible versions of many of these results can be found amongst the references listed at the end.
Connectedness
In the lecture, we showed that is connected with high probability. This is, in fact, a very weak result, since in fact
is d-connected with high probability for
[Bol81, Wor81]. Here, d-connected means that one must remove at least d vertices in order to disconnect the graph, or, equivalently, that there are d disjoint paths between any pair of vertices. Furthermore, Bollobas shows that for
,
is a (random) expander family [Bol88].
Anyway, for the purposes of this course, the main tool is direct enumeration. The matching number satisfies
and so Stirling’s approximation gives the asymptotics
although it will be useful to use the true bounds
instead in some places. Anyway, in , there are 6n half-edges in total, and so the probability that the graph may be split into two parts consisting of
vertices, with
, and with no edges between the classes is
Allowing constants to vary between lines, it’s not too hard to bound this above by . Taking logs and using the convexity of the function
, we see that this function is maximised when
. This fits our intuition and reminds us of the corresponding situation when we handled the connectivity threshold for G(n,p). The easiest way to disconnect a graph is to have an isolated vertex. Of course, if we want to be 3-regular, we can’t have an isolated vertices, but we can have a component of size
. In any case, the best strategy here is to separate the sum
and then use that, for identical reasons, the largest of the n-3 summands are still the outer terms, now corresponding to . We obtain
which completes the result.
Simplicity
As we’ve seen in the case n=2, d=(3,3), it is possible that none of the multigraphs satisfying a given degree sequence are simple. In general, though, we expect that many of the multigraphs will be simple. As we’ve seen, the configuration model is uniform amongst such graphs, when conditioned to be simple. So one way to sample a uniform simple graph with given degree sequence is to repeatedly sample the configuration model (which is good, because uniform matchings are very easy to simulate) until it generates a simple graph, which will then be uniform amongst the target set.
We will mostly be interested in large graphs, and so it’s important to know what the success probability is. In fact, we have the following result for the 3-regular configuration model (which generalises to other degree sequences)
Proposition:
So in fact we can sample a uniform simple 3-regular graph in O(1) attempts using , even when n is large. In the interests of time, we will prove a simpler version of this result, addressing only the self-loops. A very readable account of a full version for d-regular graphs may be found here and in [Jan06] for the more general setting.
Proposition:
Proof: As in previous sections of the course, we can compute the first moment of the number of self-loops, by focusing on a single vertex v. We need the first half-edge to be matching to one of the other two half-edges incident to v, else the second half-edge must match to the third half-edge, in order to have a self-loop. (Note, here we are exploiting the fact that we can generate a uniform matching one pair at a time in any order.)
So, if is the number of self-loops in
, then
.
We could at this stage follow a second-moment method similar to previous calculations we’ve performed, but we will use a stronger method, which will tell us precisely what the limiting distribution of the number of self-loops is.
The motivation is that the event that one vertex supports a self-loop is almost independent of the event that a second vertex supports a self-loop. Thus the total number of self-loops should be almost binomial, and so we expect its limit to be Poisson. There are many ways to establish convergence in distribution, but the following Poisson approximation method is particularly suitable in this sort of setting, especially with a combinatorial interpretation, as we now outline.
Given a real number x, and a positive integer r, we define the r-th factorial moment of x as . The key fact is that if
, then
.
Then, if is a sequence of {0,1,2,…}-valued random variables:
Lemma (Poisson approximation): If as
for every
, then
.
This lemma is really just a special case of the notion that convergence of moments defines convergence in distribution. (Think about Levy’s continuity theorem.) And clearly convergence of all r-th factorial moments implies convergence of all r-th moments (in the regular sense of moment). This follows simply because the factorial moments of x form a basis of polynomials! Anyway, the reason to state it separately is that it’s often useful in these cases, because the factorial moments have a combinatorial interpretation that makes the expectation easy to compute in many circumstances.
Returning to the case of self-loops in , note that
count the number of ordered r-tuples of (distinct) vertices supporting self-loops. So it suffices to handle a single ordered r-tuple, for example the first r vertices. That is
For each of these vertices, there are three ways to form a self-loop out of the three half-edges, leaving one half-edge left to interact with the rest of the half-edges in the graph. We obtain
which we can approximate using Stirling as
which, recalling that r is fixed, we may approximate further as
That is, as
, and so we may use the Poisson approximation lemma to deduce
.
In particular, we may read off that .
References
[Bol81] – Bollobas – Random Graphs (in Combinatorics), 1981.
[Bol88] – Bollobas – The isoperimetric number of random regular graphs, 1988.
[Jan06] – Janson – The probability that a random multigraph is simple, 2006.
[Wor81] – Wormald – The asymptotic connectivity of labelled regular graphs, 1981.