Persistent Hubs

This post is based on the paper “Existence of a persistent hub in the convex preferential attachment model” which appeared on arXiv last week. It can be found here. My aim is to explain (again) the model; the application-based motivation for the result; and a heuristic for the result in a special case. In particular, I want to stress the relationship between PA models and urns.

The preferential attachment model attempts to describe the formation of large complex networks. It is constructed dynamically: vertices are introduced one at a time. For each new vertex, we have to choose which existing vertices to join to. This choice is random and reinforced. That is, the new vertex is more likely to join to an existing vertex with high degree than to an existing vertex with degree 1. It’s clear why this might correspond well to the evolution of, say, the world wide web. New webpages are much more likely to link to an established site, eg Wikipedia or Google, than to a uniformly randomly chosen page.

The model is motivated also by a desire to fit a common property of real-world networks that is not exhibited by, among others, the Erdos-Renyi random graph model. In such a network, we expect a few nodes to have much greater connectivity than average. In a sense these so-called hubs drive connectivity of the system. This makes sense in practice. If you are travelling by train around the South-East of England, it is very likely you will pass through at least one of Reading, East Croydon, or about five major terminus in London. It would be absurd for every station to be of equal significance to the network. By contrast, the typical vertex degree in the sparse Erdos-Renyi model is O(1), and has a limiting Poisson distribution, with a super-exponential tail.

So, this paper addresses the following question. We know that the PA model, when set up right, has power-law tails for the degree distribution, and so has a largest degree that is an order of magnitude larger than the average degree. Let’s call this the ‘hub’ for now. But the model is dynamic, so we should ask how this hub changes in time as we add extra vertices. In particular, is it the case that one vertex should grow so large that it remains as the dominant hub forever? This paper answers this question in the affirmative, for a certain class of preferential attachment schemes.

We assign a weighting system to possible degrees, that is a function from N to R+. In the case of proportional PA, this function could be f(n)=n. In general, we assume it is convex. Note that the more convex this weight function is, the stronger the preference a new vertex feels towards existing dominant vertices. Part of the author’s proof is a formalisation of this heuristic, which provides some machinery allowing us to treat only really the case f(n)=n. I will discuss only this case from now on.

I want to focus on the fact that we have another model which describes aspects of the degree evolution rather well. We consider some finite fixed collection of vertices at some time, and consider the evolution of their degrees. We will be interested in limiting properties, so the exact time doesn’t matter too much. We look instead at the jump chain, ie those times when one of the degrees changes. This happens when a new vertex joins to one of the chosen vertices. Given that the new vertex has joined one of the chosen vertices, the choice of which of the chosen vertices is still size-biased proportional to the current degrees. In other words, the jump chain of this degree sequence is precisely Polya’s Urn.

This is a powerful observation, as it allows us to make comments about the limiting behaviour of finite quantities almost instantly. In particular, we know that from any starting arrangement, Polya’s Urn converges almost surely. This is useful to the question of persistence for the following reason.

Recall that in the case of two colours, starting with one of each, we converge to the uniform distribution. We should view this as a special case of the Dirichlet distribution, which is supported on partitions into k intervals of [0,1]. In particular, for any fixed k, the probability that two of the intervals have the same size is 0, as the distribution is continuous. So, since the convergence of the proportions in Polya’s Urn is almost sure, with probability one all of the proportions are with \epsilon>0 of their limit, and so taking epsilon small enough, given the limit, which we are allowed to do, we can show that the colour which is largest in the limit is eventually the largest at finite times.

Unfortunately, we can’t mesh these together these finite-dimensional observations particularly nicely. What we require instead is a result showing that if a vertex has large enough degree, then it can never be overtaken by any new vertex. This proved via a direct calculation of the probability that a new vertex ‘catches up’ with a pre-existing vertex of some specified size.

That calculation is nice and not too complicated, but has slightly too many stages and factorial approximations to consider reproducing or summarising here. Instead, I offer the following heuristic for a bound on the probability that a new vertex will catch up with a pre-existing vertex of degree k. Let’s root ourselves in the urn interpretation for convenience.

If the initial configuration is (k,1), corresponding to k red balls and 1 blue, we should consider instead the proportion of red balls, which is k/k+1 obviously. Crucially (for proving convergence results if nothing else), this is a martingale, which is clearly bounded within [0,1]. So the expectation of the limiting proportion is also k/k+1. Let us consider the stopping time T at which the number of red balls is equal to the number of blue balls. We decompose the expectation by conditioning on whether T is finite.

\mathbb{E}X_\infty=\mathbb{E}[X_\infty|T<\infty]\mathbb{P}(T<\infty)+\mathbb{E}[X_\infty|T=\infty]\mathbb{P}(T=\infty)

\leq \mathbb{E}[X_\infty | X_T,T<\infty]\mathbb{P}(T<\infty)+(1-\mathbb{P}(T=\infty))

using that X_\infty\leq 1, regardless of the conditioning,

= \frac12 \mathbb{P}(T<\infty) + (1-\mathbb{P}(T<\infty))

\mathbb{P}(T<\infty) \leq \frac{2}{k+1}.

We really want this to be finite when we sum over k so we can use some kind of Borel-Cantelli argument. Indeed, Galashin gets a bound of O(k^{-3/2}) for this quantity. We should stress where we have lost information. We have made the estimate \mathbb{E}[X_\infty|T=\infty]=1 which is very weak. This is unsurprising. After all, the probability of this event is large, and shouldn’t really affect the limit that much when it does not happen. The conditioned process is repelled from 1/2, but that is of little relevance when starting from k/k+1. It seems likely this expectation is in fact \frac{k}{k+1}+O(k^{-3/2}), from which the result will follow.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s