Continuing the LMS-EPSRC summer school on Random Graphs, Geometry and Asymptotic Structure, we’ve now had three of the five lectures by Mathew Penrose on Geometric Random Graphs.
The basic idea is that instead of viewing a graph entirely abstractly, we now place the vertices in the plane, or some other real space. In many network situations, we would expect connectivity to depend somehow on distance. Agents or sites which are close together might be considered more likely to have the sort of relationship indicated by being connected with an edge. In the model discussed in this course, this dependence is deterministic. We have some parameter r, and once we have chosen the location of all the vertices, we connect a pair of vertices if the distance between them is less than r.
For the purposes of this, we work in a compact space [0,1]^d, and we are interested in the limit as the number of vertices n grows to infinity. To avoid the graph getting too connected, as in the standard random graph model, we take r to be a decreasing function of n. Anyway, we place the n points into the unit hypercube uniformly at random, and then the edges are specified by the adjacency rule above. In general, because r_n will be o(1), we won’t have to worry too much above boundary effects. The number of vertices within r_n of the boundary of the cube will be o(1). For some results, this is a genuine problem, when it may be easier to work on the torus.
In G(n,p), the order of np in the limit determines the qualitative structure of the graph. This is the expected degree of a given fixed vertex. In this geometric model, the relevant parameter is , where d is the dimension of the hypercube. If this parameter tends to 0, we say the graph is sparse, and dense if it tends to infinity. The intermediate case is called a thermodynamic limit. Note that the definition of sparse here is slightly different from G(n,p).
Much of the content of the first three lectures has been verifying that the distributions of various quantities in the graph, for example the total number of edges, are asymptotically Poisson. Although sometimes arguments are applicable over a broad spectrum, we also sometimes have to use different calculations for different scaling windows. For example, it is possible to show convergence to a Poisson distribution for the number of edges in the sparse case, from which we get an asymptotic normal approximation almost for free. In the denser regimes, the argument is somewhat more technical, with some substantial moment calculations.
A useful tool in these calculations are some bounds derived via Stein’s method for sums of ‘almost independent’ random variables. For example, the presence or non-presence of an edge between two pairs of vertices are independent in this setting if the pairs are disjoint, and the dependence is still only mild if they share a vertex. An effective description is via a so-called dependency graph, where we view the random variables as the vertices of a graph, with an edge between them if there is some dependence. This description doesn’t have any power in itself, but it does provide a concise notation for what would otherwise be very complicated, and we are able to show versions of (Binomials converge to Poisson) and CLT via these that are exactly as required for this purpose.
In particular, we are able to show that if is the total number of edges, under a broad set of scaling regimes, if
is the expected total number of edges, then
, as n grows. This convergence in total variation distance is as strong a result as one could hope for, and when the sequence of
is O(1), we can derive a normal approximation as well.
At this point it is worth discussing an alternative specification of the model. Recall that for a standard homogenous random graph, we have the choice of G(n,m) and G(n,p) as definitions. G(n,m) is the finer measure, and G(n,p) can be viewed as a weighted mix of G(n,m). We can’t replicate this directly in the geometric setting because the edges and non-edges are a deterministic function of the vertex locations. What we can randomise is the number of vertices. Since we are placing the vertices uniformly at random, it makes sense to consider as an alternative a Poisson Point Process with intensity n. The number of vertices we get overall will be distributed as Po(n), which is concentrated near n, in the same manner as G(n,c/n).
As in G(n,p), this is a less basic model because it is a mixture of the fixed-vertex models. Let’s see if how we would go about extending the total variation convergence result to this slightly different setting without requiring a more general version of the Poisson Approximation Lemma. To avoid having to define everything again, we add a ‘ to indicate that we are talking about the Poisson Point Process case. Writing d(.,.) for total variation distance, the result we have is:
We want to show that
which we can decompose in terms of expectations in the original model by conditioning on
where the outer expectation is over N. The observation here, is that the number of points given by the Poisson process induces a measure on distributions, the overwhelming majority of which look quite like Poisson distributions with parameter n. The reason we have a less than sign is that we are applying the triangle inequality in the sum giving total variation distance:
From this, we use the triangle inequality again:
Then, by a large deviations argument, we have that for any ,
exponentially in n. Also, total variation distance is, by definition, bounded above by 1. In the first term, the inner conditioning on N_n is irrelevant, and we have that
converges to the Poisson distribution for any fixed
. Furthermore, we showed in the proof of the non-PPP result that this convergence is uniform in this interval. (This is not surprising – the upper bound is some well-behaved polynomial in 1/n.) So with probability
N_n is in the region where this convergence happens, and elsewhere, the expected TV distance is bounded below 1, so the overall expectation tends to 0. With a similar LD argument, for the second term it suffices to prove that when
, we must have
. This is ‘obviously’ true. Formally, it is probably easiest to couple the distributions
in the obvious way, and carry the convergence of TV distance as the parameter varies through the convergence in n.
That all sounded a little bit painful, but is really just the obvious thing to do with each term – it’s only the language that’s long-winded!
Anyway, I’m looking forward to seeing how the course develops. In particular, when you split the space into small blocks, the connectivity properties resemble those of (site) percolation, so I wonder whether there will be concrete parallels. Also, after reading about some recent results concerning the metric structure of the critical components in the standard random graph process, it will be interesting to see how these compare to the limit of a random graph process which comes equipped with metric structure for free!