From G(n,p) to G(n,m)

In the previous posts about random graphs, I was focusing on the model G(n,p). Here, we have n vertices, and we insert an edge between any pair of vertices independently with probability p. In particular, the number of edge which appear in a realisation of G(n,p) is a random variable, distributed as \text{Bin}(\frac{n(n-1)}{2},p).

The original model examined by Erdos and Renyi, after whom the random graph described above was named, was slightly different. Still with n vertices, they specified how many edges m they wanted in the graph, and chose uniformly at random from the set of graphs with this number of edges. This model is usually denoted G(n,m). Normally we can tell them apart by context. Obviously, p is a probability so lies in [0,1], whereas m is a positive integer, so there isn’t much room for ambiguity.

The key observation is that, if H is some graph with n vertices and m edges, then the probability that this is appears as G(n,p) is

p^{E(H)}(1-p)^{n-E(H)}.

This is constant if we vary H while fixing m. In other words, G(n,p) conditioned to have m edges is G(n,m). So, via some sort of law of total probability, we can construct G(n,p) by taking m to be distributed as \text{Bin}(\frac{n(n-1)}{2},p), and conditional on that, sampling from G(n,m). (*)

We can couple G(n,p) for all p, by assigning iid uniform [0,1] random variables to each pair of vertices, then including the edge in G(n,p) only if the value of the RV is greater than 1-p. Similarly, it is often helpful to think of G(n,m) as m varies as a random process, where edges are added one at a time, and at each stage the next edge is chosen uniformly at random from those not currently present. Perhaps because of this, it is sometimes easier to prove results for G(n,p) than for G(n,m) so we want to develop a framework to move between the two.

The decomposition (*) gives a relatively straightforward way to move from a result in G(n,m) to a result in G(n,p). By the Central Limit Theorem, the number of edges in G(n,p) is \binom{n}{2}p+O(n) in the limit, and so if a result with high probability in G(n,m) for all m in some interval, say [\binom{n}{2}p-Kn,\binom{n}{2}p+Kn] for some large K, then the law of total probability shows that the property holds with high probability in G(n,p).

In general, we get more interesting properties when p is a function of n. As discussed in previous posts, the scaling p=\frac{\lambda}{n} is particularly worth studying. CLT now shows that G(n,\frac{\lambda}{n}) has \frac{\lambda n}{2}+O(\sqrt{n}) edges in the limit. If you are confused why you can’t just substitute this value for p into the previous expression, note that p(1-p) does appear in the general asymptotic variance, but this gets absorbed into the “big O” notation when p is constant.

More importantly, many properties that we might want to consider are not in general affected in the limit by adding or removing O(\sqrt{n}) edges. For example, with high probability, G(n,m) has largest component of size (\zeta_\lambda+o(1)) n whenever \lambda>1 and m\in [\frac{\lambda n}{2}-O(\sqrt{n}),\frac{\lambda n}{2}+O(\sqrt{n})]. Some of this notation would need to be made a bit more precise in a formal argument, but for now, let’s take that as given. This then implies that with high probability, G(n,\frac{\lambda}{n}) has largest component of size (\zeta_\lambda+o(1))n also.

Of course, from the logical structure of this blog, this deduction is a bit bogus, because we have only just introduced G(n,m), and have no idea about the properties of its giant components yet. We seek instead an argument to deduce facts about G(n,m) from facts about G(n,p). Because G(n,m) cannot obviously be written as some conditioned combination of G(n,p)s, this instinctively seems harder. Bollobas gives various general conditions to carry results over between the two regimes in his Part III course notes, but I feel that an examples would be the easiest way to explain the ideas.

The size of the largest component is such an important quantity, we might as well consider that, in the subcritical case. We work with G(n,\frac{\lambda}{n}), for which we have the result:

\mathbb{P}_{\lambda+o(1)}(C_1\geq a\log n)=O(n^{-\delta}),

for some \delta>0, whenever a>I_\lambda^{-1}, the rate function at 1 of the total population size of a Poisson \lambda branching process. For now, that doesn’t matter too much, except that it is continuous as a function of \lambda. We want to show that G(n,\frac{\lambda n}{2}) has the same property.

The trick is to consider G(n,\frac{\lambda+\epsilon}{n}) instead. Let E_a be the event described above. By the law of total probability and the decomposition mentioned above, we have:

\mathbb{P}_{\lambda+\epsilon}(E_a)=\sum_{m=0}^n\mathbb{P}(\text{Bin}(\frac{n(n-1)}{2},\frac{\lambda+\epsilon}{n})=m)\mathbb{P}_{n,m}(E_a).

We are going to split this sum into

\sum_{m=0}^{\frac{(\lambda+\epsilon )n}{2}-n^{3/4}}+\sum_{m=\frac{(\lambda+\epsilon)n}{2}-n^{3/4}}^n.

On the first of these sums, we bound using the fact that probabilities are less than 1, and on the second, we use that \mathbb{P}_{n,m}(E_a) is an increasing function of m. This property is special to the event we are considering – in general one might have to be a bit more clever, perhaps using continuity of P_{n,m}(E), interpreting continuity in the limit with n. Anyway, this enables us to bound:

\mathbb{P}_{\lambda+\epsilon}(E_a)\geq \mathbb{P}(\text{Bin}(\frac{n(n-1)}{2},\frac{\lambda+\epsilon}{n})\leq \frac{(\lambda+\epsilon)n}{2}-n^{3/4})+\mathbb{P}_{\frac{(\lambda+\epsilon)n}{2}-n^{3/4}}(E_a)\mathbb{P}(\text{Bin}(\frac{n(n-1)}{2},\frac{\lambda+\epsilon}{n})\geq\frac{(\lambda+\epsilon)n}{2}-n^{3/4}).

By the Central Limit Theorem, this first probability tends to 0, while the final term tends to 1. We therefore have:

\mathbb{P}_{\lambda+\epsilon}(E_a)\geq \mathbb{P}_{\frac{(\lambda+\epsilon)n}{2}-n^{3/4}}(E_a).

We demanded that a>I_\lambda^{-1}, and mentioned that this function was continuous, so since we have total freedom over \epsilon, in particular, we can choose \epsilon>0 such that a>I_{\lambda+\epsilon}^{-1}. By the work on G(n,p), we have \mathbb{P}_{\lambda+\epsilon}(E_a)=O(n^{-\delta}), and for large enough n, we have \frac{(\lambda+\epsilon)n}{2}-n^{3/4}>>\frac{\lambda n}{2}, and so the result \mathbb{P}_{\frac{\lambda n}{2}}(E_a)=O(n^{-\delta}) follows.

Advertisement

3 thoughts on “From G(n,p) to G(n,m)

  1. Pingback: Poisson Tails | Eventually Almost Everywhere

  2. Pingback: Azuma-Hoeffding Inequality | Eventually Almost Everywhere

  3. Pingback: Kernels of critical graph components | Eventually Almost Everywhere

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s