Random Mappings for Cycle Deletion

In previous posts here and here, I’ve talked about attempts to describe a cycle deleting process. We amend the dynamics of the standard random graph process by demanding that whenever a cycle is formed in the graph we delete all the edges that lie on the cycle. The aim of this is to prevent the system growing giant components, and perhaps give a system that displays the characteristics of self-organised criticality. In the posts linked to, we discuss the difficulties caused by the fact that the tree structure of components in such a process is not necessarily uniform.

Today we look in the opposite direction. It gives a perfectly reasonable model to take a multiplicative coalescent with quadratic fragmentation (this corresponds to cycle deletion, since there are O(n^2) edges which would give a cycle if added to a tree on n vertices) and a fragmentation kernel corresponding to adding an extra edge to a uniform spanning tree on n vertices then deleting the edges of the unique cycle. The focus of the rest of this post, we consider this fragmentation mechanism, in particular thinking about how we would sample from it most practically. Not least, without going through Prufer codes or some other clever machinery, it is not trivial to sample a uniform spanning tree.

First, we count the number of unicyclic graphs on n labelled vertices. If we know that the vertices on the cycle are v_1,\ldots,v_k, then the number of cycles with an identified edge is

u_1=1,\quad u_k=\frac{k!}{2},\, k\ge 2.

If we know that the tree coming off the cycle from vertex v_i has size m, say, then each of the possible rooted labelled trees with size m is equally likely. So taking w_j=j^{j-1}, the number of rooted trees on j labelled vertices, we get B_n(u_\bullet,w_\bullet) for the number of such unicyclic graphs on [n]. Recall B_n is the nth Bell polynomial, which gives the size of a compound combinatorial structure, where we have some structure on blocks and some other structure within blocks. Then the random partition of [n] given by the tree sizes has the distribution \text{Gibbs}_n(u_\bullet,w_\bullet).

Consider now a related object, the so-called random mapping digraph. What follows is taken from Chapter 9 of Combinatorial Stochastic Processes. We can view any mapping M_n:[n]\rightarrow[n] as a digraph where every vertex has out-degree 1. Each such digraph contains a collection of directed cycles, supported on those elements x for which M_n^k(x)=x for some k. Such an element x is called a cyclic point. Each cyclic point can be viewed as the root of a labelled tree.

In an identical manner to the unicyclic graph, the sizes of these directed trees in the digraph decomposition of a uniform random mapping is distributed as \text{Gibbs}_n(\bullet !,w_\bullet). So this is exactly the same as the cycle deletion kernel, apart from in the probability that the partition has precisely one block. In practice, for large n, the probability of this event is very small in both cases. And if we wanted to sample the cycle deletion kernel exactly, we could choose the trivial partition with some probability p, and otherwise sample from the random mapping kernel, where p is chosen such that

p+\frac{1-p}{B_n(\bullet !, w_\bullet)}=\frac{1}{B_n(u_\bullet,w_\bullet)}.

At least we know from the initial definition of a random mapping, that B_n(\bullet !,w_\bullet)=n^n. The number of unicyclic graphs with an identified edge is less clear. It turns out that the partition induced by the random mapping has a nice limit, after rescaling, as the lengths of excursions away from 0 in the standard Brownian bridge on [0,1].

The time for a fuller discussion of this sort of phenomenon is in the context of Poisson-Dirichlet distributions, as the above exchangeable partition turns out to be PD(1/2,1/2). However, for now we remark that the jumps of a subordinator give a partition after rescaling. The case of a stable subordinator is particularly convenient, as calculations are made easier by the Levy-Khintchine formula.

A notable example is the stable-1/2 subordinator, which can be realised as the inverse of the local time process at zero of a Brownian motion. The jumps of this process are then the excursion lengths of the original Brownian motion. A calculation involving the tail of the w_j’s indicates that 1/2 is the correct parameter for a subordinator to describe the random mappings. Note that the number of blocks in the partition corresponds to the local time at zero of the Brownian motion. (This is certainly not obvious, but it should at least be intuitively clear why a larger local time roughly indicates more excursions which indicates more blocks.)

So it turns out, after checking some of the technicalities, that it will suffice to show that the rescaled number of blocks in the random mapping partition \frac{|\Pi_n|}{\sqrt{n}} converges to the Raleigh density, which is a size-biased Normal random variable (hence effectively first conditioned to be positive), and which also is the distribution of the local time of the standard Brownian bridge.

After that very approximate description, we conclude by showing that the distribution of the number of blocks does indeed converge as we require. Recall Cayley’s formula kn^{n-k-1} for the number of labelled forests on [n] with a specified set of k roots. We also need to know how many labelled forests there are with any set of roots. Suppose we introduce an extra vertex, labelled 0, and connect it only to the roots of a rooted labelled forest on [n]. This gives a bijection between unlabelled trees on {0,1,…,n} and labelled forests with a specified set of roots on [n]. So we can use Cayley’s original formula to conclude there are (n+1)^{n-1} such forests. We can do a quick sanity check that these are the same, which is equivalent to showing

\sum_{k=1}^n k n^{-k-1}\binom{n}{k}=\frac{1}{n}(1+\frac{1}{n})^{n-1}.

This odd way of writing it is well-motivated. The form of the LHS is reminiscent of a generating function, and the additional k suggests taking a derivative. Indeed, the LHS is the derivative

\frac{d}{dx}(1+x)^n,

evaluated at \frac{1}{n}. This is clearly the same as the RHS.

That said, having established that the random mapping partition is essentially the same, it is computationally more convenient to consider that instead. By the digraph analogy, we again need to count forests with k roots on n vertices, and multiply by the number of permutations of the roots. This gives:

\mathbb{P}(|\Pi_n|=k)=\frac{kn^{n-k-1}\cdot k! \binom{n}{k}}{n^n}=\frac{k}{n}\prod_{i=1}^{k-1}\left(1-\frac{i}{n}\right).

Now we can consider the limit. Being a bit casual with notation, we get:

\lim \mathbb{P}(\frac{|\Pi_n|}{\sqrt{n}}\in dl)\approx \sqrt{n}dl \mathbb{P}(|\Pi_n|=l\sqrt{n}).

Since the Raleigh distribution has density l\exp(-\frac12 l^2)dl, it suffices for this informal verification to check that

\prod_{i=1}^{l\sqrt{n}}(1-\frac{i}{n})\approx \exp(-\frac12 l^2). (*)

We take logs, so the LHS becomes:

\log(1-\frac{1}{n})+\log(1-\frac{2}{n})+\ldots+\log(1-\frac{l\sqrt{n}}{n}).

If we view this as a function of l and differentiate, we get

d(LHS)=\sqrt{n}dl \log (1-\frac{l}{\sqrt{n}})\approx \sqrt{n}dl \left[-\frac{l}{\sqrt{n}}-\frac{l^2}{2n}\right]\approx -ldl.

When l is zero, the LHS should be zero, so we can obtain the desired result (*) by integrating then taking an exponential.