I’m back in Rio, this time for the Brazilian Probability School, which this year is being held in parallel with the Brazilian Mathematical Colloquium, so there’s a lot of possible lectures to be attending across a wide range of topics. I’ve been paying particular to a course by Veronique Gayrard concerning the phenomenon of aging, as seen in various spin-glass and trap models. [Lecture notes exist, but haven’t yet been put online.]
I want to write something about the setup for one of these models. It took me quite a long time to settle on a title for this post, and as you can see I’ve hedged. At least in this post, I’m not so interested in the model (and don’t want to try and offer a physical motivation at this point) but rather in talking about the natural model-independent problem it reduces to.
Motivation
Let be IID random variables which take some fixed value K>0 with probability 1/K, and otherwise take the value zero. The law of large numbers says that for large m, the rescaled partial sum process
. The weak LLN makes this precise in the sense of convergence in distribution, and the strong LLN gives almost sure convergence.
But the speed of convergence is obviously not uniform over all distributions of the underlying IID random variables. This is particularly clear in the setup I’ve outlined, in the regime where . Certainly if
, then we have
with probability
and otherwise
. So if we let K and m diverge together with scaling as given, the only version of a LLN we can write down is
which is obviously different to the original version for fixed K and diverging m.
If we take , then the rescaled partial sum process converges in distribution to a scaled Poisson process. Of course, the Poisson process obeys it’s own law of large numbers (or law of large times), but on this scale the first-order behaviour is random.
At a more general level, what we are doing in the previous examples is looking at a process which converges to equilibrium, but studying it on a faster timescale than the timescale of this convergence. The REM-like trap model, which will be the eventual focus of this post, does exactly this to a continuous-time Markov chain, with the additional factor that the holding rates are random and heavy-tailed.
The mean-field REM-like trap model
This REM-like trap model is defined as follows. We have N sites, and for these sites we sample an IID collection of holding rates according to some distribution. We then choose a sequence of a IID uniform samples from {1,…,N}, labelled
. We think of this as recording an itinerary of visits to the sites, where the jth site we visit is
. (Though notice that under this definition, it’s possible that the jth site we visit and the j+1st site we visit are the same.) We wait at each site for an exponential holding time, with parameter
if we are at site j, and these holding times are independent of the other holding times, and independent of the trajectory, all conditional on
.
You can think of this as a continuous-time RW on the complete graph (with self-loops), where the jump chain is uniform, and the holding rates are given by
. This explains the notation, and how you’d construct a similar model on a different underlying graph.
The general of a trap model is a random walk with very inhomogeneous speed, for example because some holding times have very large expectation. In a setting with more inbuilt geometry, for example on a lattice, we can imagine the RW getting trapped in regions associated with atypically low speeds. We might therefore think of a site with very long holding times as being deep, in the sense that the chain might get stuck there.
This will be most interesting if we allow an extreme range of values taken by , and so the best choice is a distribution in the domain of attraction of an
-stable law with parameter
. That is
, where L is a slowly-varying function at
.
This distribution has infinite mean, and so we couldn’t apply either LLN to a sequence of copies of . However, obviously the sequence
almost surely does have finite mean, since each entry is finite! So for each N, the trap model will have a LLN on large timescales, but we will investigate at faster timescales.
The clock process
At least for the purpose of this post, we will focus on the clock process, which records the (continuous) time which elapses before we arrive at the *k*th state of the jump chain.
That is,
where the exponential random variables are independent except through their parameters. This can be made even more clear if we take advantage of the method to write a general exponential distribution as a multiple of a exponential distribution with parameter 1. Let be IID exponential RVs independent of
and the jump chain. Then
Let’s briefly pause to apply the LLN to for fixed N. It matters whether we consider the quenched or annealed settings here. As usual, quenched means we fix a realisation of the random environment, and draw all conclusions in terms of that environment (think of conditional expectations). And annealed means that we also include the randomness of the environment. This is notationally annoying, so as a shorthand we write
for quenched expectations
, and
for an expectation over all randomness.
Then the quenched rate of growth of is given by
and so the annealed rate
since . But as in the introduction, these rates are only relevant to laws of large numbers when k grows on a large enough timescale, and we will consider smaller scales of k.
Timescales of the clock process
We’re going to look for scaling limits of the clock process. The increments are ‘sort of IID’ and ‘sort of heavy-tailed’ (we’ll clarify these sort ofs when we need to) so it wouldn’t be surprising if the scaling limits are Levy processes. The clock process is increasing, so in fact the scaling limits should be subordinators, and it wouldn’t be surprising if under some circumstances they turned out to be stable subordinators.
There is flexibility about how to do the rescaling. From now on, we are working in a regime. Let’s assume we look at
steps of the jump chain, where
is some divergent sequence. A property of large sums of IID stable distributions with parameter
is that the scaling of the value of the sum is comparable to the scale of the largest summand. That is, the partial sum is dominated by its largest summands. Compare with the standard case for non-negative RVs, where for k summands, the sum is
, while the largest summand is
.
So to identify the scale of the clock process after steps of the jump chain, it’s sufficient to identify the scale of its expected largest holding time. All of this is vague at the level of constants, so we choose a divergent sequence
for which
Note 1: this means that the number of holding times among the first which are at least
is binomial with
expectation. The fact that is well-approximated by a Poisson distribution will be relevant shortly.
Note 2: because we already insisted that had a slowly-varying tail, this gives control of the
etc as well.
We expect that , and so we consider scaling limits of the process
as usual. [Note I am using the opposite convention to VG’s notes, where ~ denotes the unrescaled clock process.]
Scaling limits
We identify two types of scaling limit, depending on whether or
. The former is called an intermediate timescale, while the latter is an extreme timescale. After this long motivation and notational preliminary section, my goal is to explain (partly to myself) why these scaling limits are different.
First, we state the result for intermediate timescales. Let be the stable subordinator with parameter
, that is with Levy measure
. Then
, in the Skorohod topology. We need to be clear about the sense of convergence, and the role of the random environment. It turns out that if in addition
, then this convergence holds for almost all realisations of the random environment. That is, the laws of the processes (with respect to the randomness of the jump chain / holding times etc) converge. When
is only
, then the convergence holds in probability with respect to the environment. It took me a while to parse what this means. It means that for large N, the probability that the random environment induces a law of
which is far from the law of
tends to zero.
The exact Levy triple of the limit process is not the important message here, and if that’s unfamiliar, then it isn’t a problem. The point is that you would also get this limiting Levy process if you took the sum process of genuinely IID random variables with the same -tail. And this is not surprising. Since recall that in the intermediate timescale
, so during the first
steps of the jump chain, we do not typically visit many sites more than once. Indeed, if
, then this is the birthday problem, and we typically visit no site more than once. However, even in the weaker setting
, look at the deepest 1000 sites we visit during the first
steps. We can compute that, in expectation, we visit essentially zero of these more than once. But these 1000 sites dominate the clock process at
. So from the point of view of the clock process, since we hardly ever visit relevant sites twice, the depths
are essentially independent, and so it’s unsurprising that we get the scaling limit corresponding to IID partial sums.
For extreme timescales, by contrast, this fails. If we take , we expect to visit each site roughly 1000 times, indeed the number of visits to a given site will be approximately
. But it’s still the case that the scaling limit will be dominated by the deepest sites. In particular, at some point on this timescale we will visit the deepest site, and indeed we will visit it multiple times if we look at
for large t. So the jumps of any scaling limit are not independent any more unless we condition on all the depths
.
However, all is not lost, since we can show that the point process of rescaled depths converges to a Poisson random measure on
. The candidate for the scaling limit of the clock process is then the subordinator whose Levy measure is this Poisson random measure. This isn’t itself a Levy process, but it is a mixture of Levy processes, reflecting that on extreme timescales the quenched and annealed viewpoints are different since there is enough time to visit the whole landscape.
Heuristically, the extreme timescale is the entry point for convergence to equilibrium. Indeed, taking , the number of visits to each of the 1000 top sites converge to their expectation, corresponding to convergence of the clock process to equilibrium, since these holding times continue to dominate the sum. The clock process therefore starts to feel the finiteness of the state space, which introduces dependence between the most relevant holding times, which was not the close on intermediate timescales.
In the next post, I’m going to try and summarise VG’s descriptions of taking this model beyond the mean-field setting, where the range of possibilities becomes much much richer. I’m also going to try and say something and glassy dynamics and ageing, and why the physical motivation justifies considering these particular models and scalings.