I’ve spent this week in Luminy, near Marseille, attending a summer school run by ALEA, the organisation of French probabilists. We’ve been staying in CIRM, a dedicated maths research conference centre at the edges of the calanques, the area of mountains and jagged coastal inlets between Marseille and Cassis. The walking possibilities have been excellent, as have the courses and lectures, on a range of topics in probability theory.
Anyway, the time here has been an excellent moment to reflect on my research progress, and try to come up with the sort of fresh ideas that are perhaps slightly inhibited by sitting at a desk with an endless supply of paper on which to try calculations. When I get back, I have to submit a first-year report, so at least for a little while I will have to suppress the desire to make further progress and instead diligently assemble the progress I have made.
I’ve defined some of these processes in past posts, but I see no harm in doing so again. We take the standard Erdos-Renyi random graph process, where edges are added one-at-a-time uniformly at random between n vertices, and amend it by adding a deletion mechanism. The aim is to arrive at a process which looks in equilibrium more like the critical random graph than either the subcritical or supercritical regimes, where the components are very small, and dominated by one giant component respectively. Rath, Toth and others have studied the process where each vertex is hit by lightning at uniform rate. When this happens, we delete all the edges in the component containing that vertex. Naturally, big components will be hit by lightning more often than small components, and so this acts as a mechanism to prevent the formation of giant components, if scaled correctly.
We take a different approach. We observe that criticality in the original random graph process is denoted by the first appearance of a giant component, but also by the first appearance of a) lots of cycles, and b) large cycles. In particular, it is very unlikely that a giant component could form without containing any cycles. We will therefore use the appearance of a cycle to trigger some form of deletion mechanism.
Our final goal is to treat the so-called ‘Cycle Deletion’ model. Here, whenever a cycle appears, we delete all the edges in that cycle immediately. There are several challenges in treating this model, because the rate at which cycles emerge in a tree is a function of the tree structure. The trees in this model will not be Uniform Spanning Trees (though it is very possible that they will be ‘almost USTs’ in some sense – we need to investigate this further) so it will be hard to make nice statements about the rates. For the standard random graph process, if we are only interested in the sizes of the components, we are actually allowed to ignore the graph structure entirely. The component sizes evolve as a discrete, stochastic version of the multiplicative coalescent (sometimes called a Marcus-Lushnikov process). We would like a deletion mechanism that has a nice interpretation as a fragmentation operation in the same sense. The rate at which a component fragments will be quadratic in the size of the component, since there are possible edges between k vertices forming a component, and adding any of precisely these will create a cycle.
I’ve talked previously about how to overcome the problems with the tree structure in Cycle Deletion with the so-called Uniform Cycle Deleting model. In any case, as a starting point we might consider the Cycle-Induced Forest Fire model. Here, whenever a cycle appears, we delete all the edges, including the new one, in the whole component which contains the cycle.
We suspect this model may resemble the critical random graph at all times. The main characteristic of G(n,1/n) is that the largest component is of size O(n^2/3), and indeed there are arbitrarily many components of this size, with high probability in the limit. Since CIFF is recurrent for any fixed n, meaning that it will visit any state infinitely often (rather than tending to infinity or similar), we should ask what the largest component is typically in the equilibrium distribution. Our aim is to prove that it is O(n^2/3). We might suspect that the typical size of the largest component will be greater in the Cycle Deletion model, since each fragmentation event is less severe there, removing fewer edges.
An Upper Bound
The nice thing about Markov chains is that they have an ergodic property, which means that if you run them for long enough, the proportion of time spent in any state is given by the stationary probability of being in that state. It doesn’t matter whether or not you start in equilibrium, since it will converge anyway. Thus it is meaningful to talk about properties like the average number of isolated vertices as a time-average as well as an average with respect to some distribution.
This quantity is the key to an upper bound. We can equally talk about the average change in the number of isolated vertices in a time-step. This will increase when a component fragments, and will decrease when an isolated vertex coalesces with another component. In particular, the largest possible decrease in the number of isolated vertices in a single time-step is 2, corresponding to an edge appearing between two isolated vertices.
Suppose that with probability there is a component of size for some . Then such a component makes a contribution to the expected change in the number of isolated vertices of
Where does this come from? Well, we are tracking the contributions from the event that the largest component is of this size and that it fragments, giving new isolated vertices. So the accounts for the probability that there is such a component to begin with. Then, conditional on that, the probability that it gets fragmented in the next time-step is the probability that both ends of the next edge added lie in that component. Since the edge is chosen uniformly at random, the probability of this is . Note that this is under a slightly odd definition of an edge, that allows loops. Basically, I don’t want to have lots of correction terms involving floating around. However, it would make no difference to the orders of magnitude if we to do it with these.
So, this is only one contribution to the typical rate of gain of isolated vertices. Now note that if , then this expression is >> 1. This is bad since the negative contributions to this expected flux in the number of isolated vertices is O(1). So this suggests that over time, the number of isolated vertices will keep growing. This is obviously ridiculous since a) we are in equilibrium, so the expected flux should be 0 and b) the number of isolated vertices cannot exceed n, for clear reasons.
This gives us an upper bound of n^2/3 as the typical scale of the largest component. We can come up with a similar argument for the cycle deleting model. The most helpful thing to track there is the number of edges in the graph. Note that since the graph is at all times a forest on n vertices, the number of edges is equal to n minus the number of (tree) components. We use the fact that the typical fragmentation of a component of size k creates new components. It is possible to argue via isolated vertices here too, but the estimates are harder, or at least less present in the literature.
The problem with lower bounds is that it is entirely possible that the flux in the number of isolated vertices is not driven by typical behaviour. Suppose for example we had a different rule. We begin a random graph process, and the first time we see a cycle in a component with size larger than n^2/3, we delete all the edges in the whole graph. Then we will see a sequence of random graph processes starting with the empty graph and stopped at some point close to criticality (in fact, with high probability in the *critical window*), and these will all be glued together. So then, most of the time the process will look subcritical, but the gains in isolated vertices will occur only during the critical periods, which are only an asymptotically small proportion of the time.
At the moment, my approach to the lower bound is instead to prove that the upper bound is tight. I mean this in the following sense. Suppose we wanted to be sure that (*) was in fact equal to the average rate of gain of isolated vertices. We would have to check the following:
- That the total contributions from all other components were similar or smaller than from the component(s) of size roughly .
- That there were only a few components of size . In particular, the estimate would be wrong if there were such components for any .
- That it cannot be the case that for example, some small proportion of the time there is a component of size roughly , and over a large enough time these make a greater contribution to the average gain in isolated vertices.
A nice way to re-interpret this is to consider some special vertex and track the size of its component in time. It will be involved in repeated fragmentations over the course of time, so it is meaningful to talk about the distribution of the size of the component containing the vertex when it is fragmented. Our aim is to show that this distribution is concentrated on the scaling .
So this has turned out to be fairly hard. Rather than try to explain some of the ideas I’ve employed in attempting to overcome this, I will finish by giving one reason why it is hard.
We have seen that the component sizes in random graphs evolve as the multiplicative coalescent, but at a fixed moment in time, we can derive good estimates from an analogy with branching processes. We might like to do that here. If we know what the system looks like most of the time, we might try to ‘grow’ a multiplicative coalescent, viewing it like a branching process, with distribution given by the typical distribution. The problem is that when I do this, I find that the expectation of the offspring distribution is . This looks fine, since 1 is the threshold for extinction with probability 1. However, throughout the analysis, I have only been paying attention to the exponent of n in all the time and size estimates. For example, I view and as the same. This is a problem, as when I say the expectation is , I am really saying it is . This means it could be or . Of course, there is a massive difference between these, since a branching process grows expectationally!
So, this approach appears doomed in its current form. I have some other ideas, but a bit more background may be required before going into those. I’m going to be rather busy with teaching on my return to the office, so unfortunately it is possible that there may be many posts about second year probability and third year applied probability before anything more about CIFF.