Eventually Almost Everywhere

A blog about probability and olympiads by Dominic Yeo

Main menu

Skip to primary content
Skip to secondary content
  • Home
  • Contents
  • Olympiad
  • About
  • Contact

Monthly Archives: August 2017

Trap models and laws of not-so-large numbers

Posted on August 4, 2017 by dominicyeo
Reply

I’m back in Rio, this time for the Brazilian Probability School, which this year is being held in parallel with the Brazilian Mathematical Colloquium, so there’s a lot of possible lectures to be attending across a wide range of topics. I’ve been paying particular to a course by Veronique Gayrard concerning the phenomenon of aging, as seen in various spin-glass and trap models. [Lecture notes exist, but haven’t yet been put online.]

I want to write something about the setup for one of these models. It took me quite a long time to settle on a title for this post, and as you can see I’ve hedged. At least in this post, I’m not so interested in the model (and don’t want to try and offer a physical motivation at this point) but rather in talking about the natural model-independent problem it reduces to.

Motivation

Let X_1,X_2,X_3,\ldots be IID random variables which take some fixed value K>0 with probability 1/K, and otherwise take the value zero. The law of large numbers says that for large m, the rescaled partial sum process \frac{1}{m}(X_1+\ldots+X_m) \approx 1. The weak LLN makes this precise in the sense of convergence in distribution, and the strong LLN gives almost sure convergence.

But the speed of convergence is obviously not uniform over all distributions of the underlying IID random variables. This is particularly clear in the setup I’ve outlined, in the regime where K\rightarrow\infty. Certainly if 1\ll m\ll K, then we have \frac{1}{m}(X_1+\ldots+X_m) \ge \frac{K}{m} with probability \approx \frac{m}{K} and otherwise \frac{1}{m}(X_1+\ldots+X_m) = 0. So if we let K and m diverge together with scaling as given, the only version of a LLN we can write down is

\frac{1}{m}(X_1+\ldots+X_m)\stackrel{d}\rightarrow 0,

which is obviously different to the original version for fixed K and diverging m.

If we take m=\Theta(K), then the rescaled partial sum process converges in distribution to a scaled Poisson process. Of course, the Poisson process obeys it’s own law of large numbers (or law of large times), but on this scale the first-order behaviour is random.

At a more general level, what we are doing in the previous examples is looking at a process which converges to equilibrium, but studying it on a faster timescale than the timescale of this convergence. The REM-like trap model, which will be the eventual focus of this post, does exactly this to a continuous-time Markov chain, with the additional factor that the holding rates are random and heavy-tailed.

The mean-field REM-like trap model

This REM-like trap model is defined as follows. We have N sites, and for these sites we sample an IID collection of holding rates (\tau_N(1),\ldots,\tau_N(N)) according to some distribution. We then choose a sequence of a IID uniform samples from {1,…,N}, labelled (J_N(1),J_N(2),\ldots). We think of this as recording an itinerary of visits to the sites, where the jth site we visit is J_N(j). (Though notice that under this definition, it’s possible that the jth site we visit and the j+1st site we visit are the same.) We wait at each site for an exponential holding time, with parameter \tau_N(j) if we are at site j, and these holding times are independent of the other holding times, and independent of the trajectory, all conditional on (\tau_N(1),\ldots,\tau_N(N)).

You can think of this as a continuous-time RW on the complete graph K_N (with self-loops), where the jump chain is uniform, and the holding rates are given by (\tau_N(1),\ldots,\tau_N(N)). This explains the notation, and how you’d construct a similar model on a different underlying graph.

The general of a trap model is a random walk with very inhomogeneous speed, for example because some holding times have very large expectation. In a setting with more inbuilt geometry, for example on a lattice, we can imagine the RW getting trapped in regions associated with atypically low speeds. We might therefore think of a site with very long holding times as being deep, in the sense that the chain might get stuck there.

This will be most interesting if we allow an extreme range of values taken by \tau_N, and so the best choice is a distribution in the domain of attraction of an \alpha-stable law with parameter \alpha\in(0,1). That is \mathbb{P}(\tau \ge u) = u^{-\alpha}L(u), where L is a slowly-varying function at +\infty.

This distribution has infinite mean, and so we couldn’t apply either LLN to a sequence of copies of \tau. However, obviously the sequence (\tau_N(1),\ldots,\tau_N(N)) almost surely does have finite mean, since each entry is finite! So for each N, the trap model will have a LLN on large timescales, but we will investigate at faster timescales.

The clock process

At least for the purpose of this post, we will focus on the clock process, which records the (continuous) time which elapses before we arrive at the *k*th state of the jump chain.

That is,

S_N(0)=0, \quad S_N(k) = \sum_{i=0}^k \mathrm{Exp}\left(1/\tau_N(J_N(i))\right),

where the exponential random variables are independent except through their parameters. This can be made even more clear if we take advantage of the method to write a general exponential distribution as a multiple of a exponential distribution with parameter 1. Let e_0,e_1,\ldots be IID exponential RVs independent of (\tau_N(1),\ldots,\tau_N(N)) and the jump chain. Then

S_N(k)=\sum_{i=0}^k \tau_N(J_N(i)) e_i.

Let’s briefly pause to apply the LLN to S_N for fixed N. It matters whether we consider the quenched or annealed settings here. As usual, quenched means we fix a realisation of the random environment, and draw all conclusions in terms of that environment (think of conditional expectations). And annealed means that we also include the randomness of the environment. This is notationally annoying, so as a shorthand we write \mathbb{E}_{\tau_N} for quenched expectations \mathbb{E}[\cdot \,|\, \tau_N(1),\ldots,\tau_N(N)], and \mathbb{E} for an expectation over all randomness.

Then the quenched rate of growth of S_N is given by

\mathbb{E}_{\tau_N}\left[ \frac{S_N(k)}{k}\right] = \frac{\tau_N(1)+\ldots +\tau_N(N)}{N},

and so the annealed rate

\mathbb{E}\left[\frac{S_N(k)}{k}\right] = \infty,

since \mathbb{E}[\tau_N(1)]=\infty. But as in the introduction, these rates are only relevant to laws of large numbers when k grows on a large enough timescale, and we will consider smaller scales of k.

Timescales of the clock process

We’re going to look for scaling limits of the clock process. The increments are ‘sort of IID’ and ‘sort of heavy-tailed’ (we’ll clarify these sort ofs when we need to) so it wouldn’t be surprising if the scaling limits are Levy processes. The clock process is increasing, so in fact the scaling limits should be subordinators, and it wouldn’t be surprising if under some circumstances they turned out to be stable subordinators.

There is flexibility about how to do the rescaling. From now on, we are working in a N\rightarrow\infty regime. Let’s assume we look at t a_N steps of the jump chain, where (a_N) is some divergent sequence. A property of large sums of IID stable distributions with parameter \alpha\in(0,1) is that the scaling of the value of the sum is comparable to the scale of the largest summand. That is, the partial sum is dominated by its largest summands. Compare with the standard case for non-negative RVs, where for k summands, the sum is \Theta(k), while the largest summand is O(\log k).

So to identify the scale of the clock process after a_N steps of the jump chain, it’s sufficient to identify the scale of its expected largest holding time. All of this is vague at the level of constants, so we choose a divergent sequence (c_N) for which

\mathbb{P}(\tau_N(1) \ge c_N) = \Theta\left(\frac{1}{a_N}\right).

Note 1: this means that the number of holding times among the first a_N which are at least c_N is binomial with \Theta(1) expectation. The fact that is well-approximated by a Poisson distribution will be relevant shortly.

Note 2: because we already insisted that \tau(x) had a slowly-varying tail, this gives control of the \mathbb{P}(\tau_N(1)\ge 5c_N) etc as well.

We expect that S_N(t a_N) = \Theta(c_N), and so we consider scaling limits of the process

\tilde S_N(t):= \frac{1}{c_N} S_N(\lfloor t a_N \rfloor),

as usual. [Note I am using the opposite convention to VG’s notes, where ~ denotes the unrescaled clock process.]

Scaling limits

We identify two types of scaling limit, depending on whether a_N\ll N or a_N = \Theta(N). The former is called an intermediate timescale, while the latter is an extreme timescale. After this long motivation and notational preliminary section, my goal is to explain (partly to myself) why these scaling limits are different.

First, we state the result for intermediate timescales. Let S^{\mathrm{int}} be the stable subordinator with parameter \alpha, that is with Levy measure \alpha \Gamma(\alpha)u^{-\alpha}. Then \tilde S_N \Rightarrow S^{\mathrm{int}}, in the Skorohod topology. We need to be clear about the sense of convergence, and the role of the random environment. It turns out that if in addition a_N\ll \frac{N}{\log N}, then this convergence holds for almost all realisations of the random environment. That is, the laws of the processes (with respect to the randomness of the jump chain / holding times etc) converge. When a_N is only \ll N, then the convergence holds in probability with respect to the environment. It took me a while to parse what this means. It means that for large N, the probability that the random environment induces a law of \tilde S_N which is far from the law of S^{\mathrm{int}} tends to zero.

The exact Levy triple of the limit process is not the important message here, and if that’s unfamiliar, then it isn’t a problem. The point is that you would also get this limiting Levy process if you took the sum process of genuinely IID random variables with the same \alpha-tail. And this is not surprising. Since recall that in the intermediate timescale a_N\ll N, so during the first t a_N steps of the jump chain, we do not typically visit many sites more than once. Indeed, if a_N\ll \sqrt{N}, then this is the birthday problem, and we typically visit no site more than once. However, even in the weaker setting a_N\ll N, look at the deepest 1000 sites we visit during the first ta_N steps. We can compute that, in expectation, we visit essentially zero of these more than once. But these 1000 sites dominate the clock process at ta_N. So from the point of view of the clock process, since we hardly ever visit relevant sites twice, the depths \tau_N(J_N(1)),\tau_N(J_N(2)),\ldots are essentially independent, and so it’s unsurprising that we get the scaling limit corresponding to IID partial sums.

For extreme timescales, by contrast, this fails. If we take a_N=1000 N, we expect to visit each site roughly 1000 times, indeed the number of visits to a given site will be approximately \mathrm{Poisson}(1000). But it’s still the case that the scaling limit will be dominated by the deepest sites. In particular, at some point on this timescale we will visit the deepest site, and indeed we will visit it multiple times if we look at ta_N for large t. So the jumps of any scaling limit are not independent any more unless we condition on all the depths \tau_N.

However, all is not lost, since we can show that the point process of rescaled depths \sum \delta_{\tau_N(i)/c_N} converges to a Poisson random measure on [0,\infty). The candidate for the scaling limit of the clock process is then the subordinator whose Levy measure is this Poisson random measure. This isn’t itself a Levy process, but it is a mixture of Levy processes, reflecting that on extreme timescales the quenched and annealed viewpoints are different since there is enough time to visit the whole landscape.

Heuristically, the extreme timescale is the entry point for convergence to equilibrium. Indeed, taking t\rightarrow\infty, the number of visits to each of the 1000 top sites converge to their expectation, corresponding to convergence of the clock process to equilibrium, since these holding times continue to dominate the sum. The clock process therefore starts to feel the finiteness of the state space, which introduces dependence between the most relevant holding times, which was not the close on intermediate timescales.

In the next post, I’m going to try and summarise VG’s descriptions of taking this model beyond the mean-field setting, where the range of possibilities becomes much much richer. I’m also going to try and say something and glassy dynamics and ageing, and why the physical motivation justifies considering these particular models and scalings.

Advertisements
Posted in Levy Processes, Markov Chains, Probability Theory, Random Walks, Statistical Physics | Tagged aging, Brazilian probability school, clock process, convergence, extreme timescale, heavy tailed, IMPA, intermediate timescale, jump chain, law of large numbers, Levy process, mean-field, poisson process, poisson random measure, random environment, REM-like trap model, Skorohod metric, Skorohod topology, spin glass, stable distribution, stable Levy process, strong law of large numbers, subordinator, timescale, trap model, weak law of large numbers | Leave a reply

Recent Posts

  • EGMO 2018
  • BMO2 2018
  • BMO1 2017 – Questions 5 and 6
  • BMO1 2017 – Questions 1-4
  • Kernels of critical graph components
  • Random 3-regular graphs
  • Random transpositions
  • Trap models and laws of not-so-large numbers
  • DGFF 4 – Properties of the Green’s function
  • Generating uniform trees

Previous Articles

Categories

  • Algebra (5)
    • Functional equations (1)
    • Groups (4)
  • Analysis (40)
    • Functional Analysis (2)
    • Linear Algebra (9)
    • Measure Theory (5)
    • Spectral Theory (5)
    • Topology (16)
    • Variational Principles (1)
  • Applied Probability (49)
    • Communications and Networks (14)
    • Markov Chains (25)
    • Math Finance (1)
    • Optional Stopping (5)
    • Population Genetics (3)
    • Queueing Theory (6)
    • Statistical Physics (2)
  • Combinatorics (33)
    • Additive Combinatorics (1)
    • Combinatorial algorithms (2)
    • Nice Bijections (6)
    • Spectral graph theory (1)
  • Euclidean geometry (3)
  • Favourite Proofs (10)
  • Game Theory (6)
  • General Interest (22)
  • Number Theory (6)
    • Diophantine equations (2)
  • Optimization and Operations Research (10)
  • Part III Revision (28)
  • Philosophy of Maths (2)
  • Probability Theory (131)
    • Branching Processes (12)
    • Coalescence (15)
    • Discrete GFF (5)
    • Interacting Particle Systems (3)
    • Large Deviation Theory (13)
    • Levy Processes (4)
    • Martingales (4)
    • Mixing Times (11)
    • Percolation (6)
    • Random Graphs (25)
    • Random Maps (4)
    • Random Trees (24)
    • Random Walks (17)
    • SLE (3)
  • School and Olympiad (65)
    • British Maths Olympiad (12)
    • International Maths Olympiad (43)
    • National Maths Summer School (2)
    • Senior Mentoring (4)
  • Statistics (15)
    • Exchangeability (4)
  • Stochastic Calculus (19)
    • Brownian Motion (13)
    • Process Convergence (2)
  • Talks and Presentations (10)
    • Junior Probability Seminar (3)
  • Teaching (14)
  • Technical Things (8)
  • Travel (31)
  • Uncategorized (12)

Blogroll

  • Almost Sure
  • Andrew Gelman
  • Annoying Precision
  • Blame it on the Analyst
  • Correlated
  • cp4space
  • Evan Chen
  • f(t)
  • Gil Kalai
  • Renan Gross
  • Terry Tao
  • The Aftermatter
  • The Thesis Whisperer
  • Tim Gowers
  • Tom Lovering
web
analytics
Advertisements
August 2017
M T W T F S S
« Jun   Oct »
 123456
78910111213
14151617181920
21222324252627
28293031  

Stats

  • 284,632 hits

Spam Blocked

50,295 spam blocked by Akismet

  • Almost surely
  • bijection
  • blumenthal 0-1 law
  • BMO
  • branching process
  • British Mathematical Olympiad
  • brownian excursion
  • Brownian Motion
  • Cambridge
  • central limit theorem
  • coalescence
  • Combinatorial Stochastic Processes
  • combinatorics
  • continuum random tree
  • convergence
  • convergence in distribution
  • coupling
  • Cramer's theorem
  • Donsker's theorem
  • eigenvector
  • equilibrium distribution
  • Erdos Renyi
  • Euclidean geometry
  • exchangeable
  • excursion
  • exploration process
  • exponential tail
  • galton watson
  • generating function
  • giant component
  • hitting time
  • IID random variables
  • IMO
  • international mathematical olympiad
  • large deviation principle
  • large deviations
  • law of large numbers
  • Levy process
  • linyi
  • local limit
  • Markov chain
  • markov property
  • martingale
  • Mixing Time
  • Modular arithmetic
  • multiplicative coalescence
  • offspring distribution
  • olympiad
  • optional stopping
  • Optional Stopping Theorem
  • partition
  • percolation
  • permutation
  • perron-frobenius
  • poisson distribution
  • poisson point process
  • poisson process
  • power law
  • preferential attachment
  • random graph
  • random tree
  • random variable
  • Random walk
  • Rate function
  • simple random walk
  • sparse
  • Statistics
  • stochastic process
  • stopping time
  • Strong Markov Property
  • summer school
  • supercritical
  • UK team
  • uniform spanning tree
  • Weak convergence
Blog at WordPress.com.
Cancel