Random Graphs – Lecture 1

My plan is to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics about the course can be found here.

In the first lecture, we revised some basic definitions about graphs, focusing on those which are most relevant to a first study of the Erdos-Renyi random graph G(n,p) which will be the focus of the lecture course. We discussed in abstract why the independence of the (potential) edges makes the model easier to analyse, but reduces its suitability as a direct model for lots of networks one might see in the real world, where knowledge that A is directly connected to both B and C affects the probability that B is directly connected to C, in either direction. Thinking about the Facebook friendship graph is one of the best examples, where in this case, we expect this extra information to increase the probability that B and C are connected. Even as the world moves away from heteronormativity, it realistically remains the case that in a graph of the dating history amongst a well-defined community we would likely observe the opposite effect.

All of these more complicated phenomena can be captured by various random graphs, but G(n,p) remains the corner stone, evinced by the >10^5 citations towards one of Erdos and Renyi’s original papers on the topic.

Somewhat paraphrasing, one of their (well, mostly Renyi’s) original questions was: when n is large, what should p be so that there’s a good chance that G(n,p) is connected?

The answer to this question lies in Lecture 2, but to cement understanding of the model, and explore some key methods for proofs in discrete probability (as well as play around with the big-O and little-o notation), we investigated the following two situations, which are very far from interesting as far as connectivity of G(n,p) is concerned.

Dense regime

When p is fixed, there are many interesting questions one could ask about the asymptotic properties of G(n,p), but connectivity is not one of them. In particular, for p\in(0,1) we claim:

Proposition: \mathrm{diam}(G(n,p)) \stackrel{\mathbb{P}}\rightarrow 2 as n\rightarrow\infty.

Note that if \mathrm{diam}(G(n,p))=1, then G(n,p)\simeq K_n, the complete graph on n vertices. In other words, every possible edge is actually present. But the probability of this event is p^{\binom{n}{2}}\rightarrow 0, so long as p<1.

It then suffices to prove that \mathbb{P}(G(n,p)>2) \rightarrow 0. We use a union bound, where we study the probability that the graph distance d_{G(n,p}(v,w)>2 for two fixed vertices v\ne w first, and then sum over all such pairs. Of course, there is a probability p that the two vertices are directly connected by an edge. Then, there are (n-2) other vertices with the potential to be a common neighbour of v and w, which would ensure that the graph distance between them is at most two. So

\mathbb{P}(d_{G(n,p)}(v,w)>2)=(1-p)[1-p^2]^{n-2} .

Note that we are using independence throughout this calculation. Then comes the union bound:

\mathbb{P}(\mathrm{diam}(G(n,p)) >2) \le \sum_{v\ne w \in[n]} \mathbb{P}(d_{G(n,P)}(v,w)>2)

\le \binom{n}{2} (1-p)[1-p^2]^{n-2} \rightarrow 0,

since exponential decay ‘kills’ polynomial growth.

Ultra-sparse regime

In general, we work in the setting where p=p(n) depends on n. If p(n) decays fast enough (see Exercise 2), then with high probability G(n,p) has no edges at all. However, when p(n)=o(n^{-3/2}) we have

Proposition: \mathbb{P}(\text{edges of }G(n,p)\text{ form a matching}) \rightarrow 1 as n\rightarrow\infty.

A matching is a collection of edges with no vertices in common. So if the edge set of the graph is a matching, we have essentially no interesting connectivity structure at all. The longest path has length one, for example.

To prove this, note that the edge set of the graph fails to be a matching precisely if one of the vertices has degree at least two. But since a vertex v is connected to each of the (n-1) other vertices in the graph independently with probability p, we have

\mathrm{deg}_{G(n,p)}(v) \sim \mathrm{Bin}(n-1,p),

and so we can directly make the crude approximation

\mathbb{P}(\mathrm{deg}_{G(n,p)}(v) =k) = \binom{n-1}{k}p^k(1-p)^{n-1-k}\le n^k p^k.

We’ve made this very weak bound to make life easier when we sum:

\mathbb{P}(\mathrm{deg}_{G(n,p)}(v) \ge 2) \le \sum_{k\ge 2}(np)^k = \frac{(np)^2}{1-np}.

Since p=o(n^{-3/2}), we have \frac{1}{1-np}=\frac{1}{1-o(1)}=1+o(1), and overall we obtain

\mathbb{P}(\mathrm{deg}_{G(n,p)}(v) \ge 2) = o(\frac{1}{n}).

Again, we finish with a union bound, considering this event across all vertices v\in[n].

\mathbb{P}(E(G(n,p))\text{ not a matching}) \le \sum_{v\in[n]} \mathbb{P}(\mathrm{deg}_{G(n,p)}(v)\ge 2)

=n\mathbb{P}(\mathrm{deg}_{G(n,p)}(1)\ge 2) = o(1),

as required.

Next time

In the next lecture, we’ll study the regime p(n)\sim \frac{\log n}{n}, where G(n,p) experiences a phase transition from probably not connected to probably connected. Part of this involves making the notion probably connected precise, which will be useful throughout the rest of the course, as well as establishing the language for comparing G(n,p) and G(n,q).

The proof itself requires some more sophisticated versions of calculations from Lecture 1, and more sophisticated probabilistic tools (first- and second-moment methods) to convert them into statements about convergence in probability. This will be an advertisement for the more classical enumerative methods that underpinned much of the early work on random graphs.

The rest of the course will exploit much more some comparisons and embeddings involving branching processes and exploration processes, so don’t worry – it won’t be 26 hours of counting trees!

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s