In the previous posts about random graphs, I was focusing on the model G(n,p). Here, we have n vertices, and we insert an edge between any pair of vertices independently with probability p. In particular, the number of edge which appear in a realisation of G(n,p) is a random variable, distributed as .
The original model examined by Erdos and Renyi, after whom the random graph described above was named, was slightly different. Still with n vertices, they specified how many edges m they wanted in the graph, and chose uniformly at random from the set of graphs with this number of edges. This model is usually denoted G(n,m). Normally we can tell them apart by context. Obviously, p is a probability so lies in [0,1], whereas m is a positive integer, so there isn’t much room for ambiguity.
The key observation is that, if H is some graph with n vertices and m edges, then the probability that this is appears as G(n,p) is
This is constant if we vary H while fixing m. In other words, G(n,p) conditioned to have m edges is G(n,m). So, via some sort of law of total probability, we can construct G(n,p) by taking m to be distributed as , and conditional on that, sampling from G(n,m). (*)
We can couple G(n,p) for all p, by assigning iid uniform [0,1] random variables to each pair of vertices, then including the edge in G(n,p) only if the value of the RV is greater than 1-p. Similarly, it is often helpful to think of G(n,m) as m varies as a random process, where edges are added one at a time, and at each stage the next edge is chosen uniformly at random from those not currently present. Perhaps because of this, it is sometimes easier to prove results for G(n,p) than for G(n,m) so we want to develop a framework to move between the two.
The decomposition (*) gives a relatively straightforward way to move from a result in G(n,m) to a result in G(n,p). By the Central Limit Theorem, the number of edges in G(n,p) is in the limit, and so if a result with high probability in G(n,m) for all m in some interval, say for some large K, then the law of total probability shows that the property holds with high probability in G(n,p).
In general, we get more interesting properties when p is a function of n. As discussed in previous posts, the scaling is particularly worth studying. CLT now shows that has edges in the limit. If you are confused why you can’t just substitute this value for p into the previous expression, note that p(1-p) does appear in the general asymptotic variance, but this gets absorbed into the “big O” notation when p is constant.
More importantly, many properties that we might want to consider are not in general affected in the limit by adding or removing edges. For example, with high probability, G(n,m) has largest component of size whenever and . Some of this notation would need to be made a bit more precise in a formal argument, but for now, let’s take that as given. This then implies that with high probability, has largest component of size also.
Of course, from the logical structure of this blog, this deduction is a bit bogus, because we have only just introduced G(n,m), and have no idea about the properties of its giant components yet. We seek instead an argument to deduce facts about G(n,m) from facts about G(n,p). Because G(n,m) cannot obviously be written as some conditioned combination of G(n,p)s, this instinctively seems harder. Bollobas gives various general conditions to carry results over between the two regimes in his Part III course notes, but I feel that an examples would be the easiest way to explain the ideas.
The size of the largest component is such an important quantity, we might as well consider that, in the subcritical case. We work with , for which we have the result:
for some , whenever , the rate function at 1 of the total population size of a Poisson branching process. For now, that doesn’t matter too much, except that it is continuous as a function of . We want to show that has the same property.
The trick is to consider instead. Let be the event described above. By the law of total probability and the decomposition mentioned above, we have:
We are going to split this sum into
On the first of these sums, we bound using the fact that probabilities are less than 1, and on the second, we use that is an increasing function of m. This property is special to the event we are considering – in general one might have to be a bit more clever, perhaps using continuity of , interpreting continuity in the limit with n. Anyway, this enables us to bound:
By the Central Limit Theorem, this first probability tends to 0, while the final term tends to 1. We therefore have:
We demanded that , and mentioned that this function was continuous, so since we have total freedom over , in particular, we can choose such that . By the work on G(n,p), we have , and for large enough n, we have , and so the result follows.
- Exploring the Supercritical Random Graph (eventuallyalmosteverywhere.wordpress.com)
- Analytic vs Probabilistic Arguments for a Supercritical BP (eventuallyalmosteverywhere.wordpress.com)