I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.
As we enter the final stages of the semester, I want to discuss some extensions to the standard Erdos-Renyi random graph which has been the focus of most of the course so far. In doing so, we can revisit material that we have already covered, and discover how easily one can extend this directly to more exotic settings.
The focus of this lecture was the model of inhomogeneous random graphs (IRGs) introduced by Soderberg [Sod02] and first studied rigorously by Bollobas, Janson and Riordan [BJR07]. Soderberg and this blog post address the case where vertices have a type drawn from a finite set. BJR address the setting with more general typespaces, in particular a continuum of types. This generalisation is essential if one wants to use IRGs to model effects more sophisticated than those of the classical Erdos-Renyi model G(n,c/n), but most of the methodology is present in the finite-type setting, and avoids the operator theory language which is perhaps intimidating for a first-time reader.
Inhomogeneous random graphs
Throughout, is fixed. A graph with k types is a graph G=(V,E) together with a type function . We will refer to a symmetric matrix with non-negative entries as a kernel.
Given and a vector satisfying , and a kernel, we define the inhomogeneous random graph with k types as:
- the vertex set is [n],
- types are assigned uniformly at random to the vertices such that exactly vertices have type i.
- Conditional on these types, each edge (for ) is present, independently, with probability
Notes on the definition:
- Alternatively, we could assign the types so that vertices have type 1, have type 2, etc etc. This makes no difference except in terms of the notation we have to use if we want to use exchangeability arguments later.
- An alternative model considers some distribution on [k], and assigns the types of the vertices of [n] in an IID fashion according to . Essentially all the same results hold for these two models. (For example, this model with ‘random types’ can be studied by quenching the number of each type!) Often one works with whichever model seems easier for a given proof.
- Note that the edge probability given is . The exponential form has a more natural interpretation if we ever need to turn the IRGs into a process. Additionally, it avoids the requirement to treat small values of n (for which, a priori, might be greater than 1) separately.
In the above example, one can see that, roughly speaking, red vertices are more likely to be connected to each other than blue vertices. However, for both colours, they are more likely to be connected to a given vertex of the same colour than a vertex of the opposite colour. This might, for example, correspond to the kernel .
The definition given above corresponds to a sparse setting, where the typical vertex degrees are . Obviously, one can set up an inhomogeneous random graph in a dense regime by an identical argument.
From an applications point of view, it’s not hard to imagine that an IRG of some flavour might be a good model for many phenomena observed in reality, especially when a mean-field assumption is somewhat appropriate. The friendships of boys and girls in primary school seems a particularly resonant example, though doubtless there are many others.
One particular application is to recover the types of the vertices from the topology of the graph. That is, if you see the above picture without the colours, can you work out which vertices are red, and which are blue? (Assuming you know the kernel.) This is clearly impossible to do with anything like certainty in the sparse setting – how does one decide about isolated vertices, for example? The probabilities that a red vertex is isolated and that a blue vertex is isolated differ by a constant factor in the limit. But in the dense setting, one can achieve this with high confidence. When studying such statistical questions, these IRGs are often referred to as stochastic block models, and the recent survey of Abbe [Abbe] gives a very rich history of this type of problem in this setting.
Poisson multitype branching processes
As in the case of the classical random graph G(n,c/n), we learn a lot about the IRG by studying its local structure. Let’s assume from now on that we are given a sequence of IRGs for which , where satisfies .
Now, let be a uniformly-chosen vertex in [n]. Clearly , with the immediate mild notation abuse of viewing as a probability distribution on [k].
Then, conditional on :
- when , the number of type j neighbours of is distributed as .
- the number of type i neighbours of is distributed as .
Note that , and similarly in the case j=i, so in both cases, the number of neighbours of type j is distributed approximately as .
This motivates the following definition of a branching process tree, whose vertices have k types. Continue reading