Lamperti Walks

DSC_2604

The theory of simple random walks on the integer lattice is a classical topic in probability theory. Polya proved in the 1920s that such a SRW on \mathbb{Z}^d is recurrent only for d=1 or 2. The argument is essentially combinatorial. We count the number of possible paths from 0 back to itself and show that this grows fast enough that even with the probabilistic penalty of having a particular long path we will still repeatedly see this event happening. In larger dimensions there is essentially ‘more space’ at large distances, at least comparatively, so a typical walk is more likely to escape into this space.

As Kakutani (of the product martingale theorem) said, and was subsequently quoted as the dedication on every undergraduate pdf about random walks: “A drunk man will find his way home, whereas a drunk bird may get lost forever.”

But transience in some sense a long-distance property. We can fiddle with the transition rates near zero and, so long as we don’t make anything deterministic this shouldn’t affect transience properties. Obviously if we have a (space-)homogeneous nearest-neighbour random walk on the integers with non-zero drift the process will be transient: it drifts towards positive infinity if the drift is positive. But can we have a random walk with non-zero drift, but where the drift tends to zero at large distances fast enough, and the process is still recurrent? What is the correct scaling for the decay of the drift to see interesting effects?

The answers to these questions is seen in the so-called Lamperti random walks, which were a recurring theme of the meeting on Aspects of Random Walks held in Durham this week. Thanks to the organisers for putting on such an excellent meeting. I hadn’t known much about this topic before, so thought it might be worth writing a short note.

As explained above, we consider time-homogeneous random walks. It will turn out that the exact distributions of the increments is not hugely important. Most of the properties we might care about will be determined only by the first two moments, which we define as:

\mu_1(x)=\mathbb{E}[X_{t+1}-X_t | X_t=x],

\mu_2=\mathbb{E}[(X_{t+1}-X_t)^2 | X_t=x].

Note that because the drift will be asymptotically zero, the second term is asymptotically equal to the variance of the increment. It will also turn out that the correct scaling for \mu_1 to see a phase transition is \mu_1(x)\sim \frac{c}{x}.

We begin by seeing how this works in the simplest possible example, from Harris (1952). Let’s restrict attention to a random walk on the non-negative integers, and impose the further condition that increments are +1 or -1. In the notation of a birth-and-death process from a first course on Markov chains, we can set:

p_j:=\mathbb{P}(X_{t+1}=j+1| X_t=j), \quad q_j=1-p_j.

We will set p_j=\frac12 + \frac{c}{2j}. Then a condition for transience is that

1+\frac{q_1}{p_1}+\frac{q_1q_2}{p_1p_2}+\ldots <\infty.

In our special case:

\frac{q_1\ldots q_r}{p_1\ldots p_r}\approx\frac{(r-2c)(r-1-2c)(r-2-2c)\ldots}{r!}\approx \frac{1}{r^{2c}}.

So we can deduce that this sum converges if c>1/2, giving transience. A similar, but slightly more complicated calculation specifies the two regimes of recurrence. If -1/2<=c<=1/2 then the chain is null-recurrent, meaning that the expected time to return to any given state is infinite. If c<-1/2, then it is positive recurrent.

In general, we assume \mu_1(x)\sim \frac{c}{x} and \mu_2(x)\approx s^2. In the case above, obviously s^2=1. The general result is that under mild assumptions on the increment distributions, for instance a (2+\epsilon)-moment, if we define r=-\frac{2c}{s^2}, then the RW is transient if r<-1, positive-recurrent if r>1, and null-recurrent otherwise. This is the main result of Lamperti.

To explain why we have parameterised exactly like this, it makes sense to talk about the more general proof methods, as obviously the direct Markov chain calculation won’t work in general. The motivating idea is that we can deal well with the situation where the drift is zero, so let’s transform the random walk so that the drift becomes zero. A function of a Markov chain that is more stable (in some sense) that the original MC, for analysis at least, is sometimes called a Lyapunov function. Here, the sensible thing is to consider Y_t=X_t^\gamma, for some exponent \gamma>0.

So long as our distributions are fairly well-behaved (eg a finite 2+\epsilon-moment), we can calculate the drift of Y as

\mathbb{E}[Y_{t+1}-Y_t| X_t=x]=\frac{\gamma}{2}x^{\gamma-2}(2c+(1-\gamma)s^2) +o(x^{\gamma-2}).

In particular, taking \gamma=1+r results in a random walk that is ‘almost’ a martingale. Note that the original RW was almost a martingale, in the sense that the drift is asymptotically zero, but now it is zero to second order as well.

To draw any rigorous conclusions, we need to be careful about exactly how precise this approximation is, but we won’t worry about that now. In particular, we need to know whether we can take this approximation over the optional stopping theorem, as this allows us to say:

\mathbb{P}(X\text{ hits }x\text{ before 0})=\mathbb{P}(Y\text{ hits }x^\gamma\text{ before 0})\sim x^{-\gamma}.

This is particularly useful for working out the expected excursion time away from 0, which precisely leads to the condition for null-recurrence.

In his talk, Ostap Hryniv showed that this Lyapunov function analysis can be taken much further, to derive much more precise results about excursions, maxima and ergodicity. Results of Menshikov and Popov from the 90s further specify the asymptotics for the invariant distribution, if it exists, in terms of r.

One cautionary remark I should make is that earlier I implied that once we know the drift of such a random walk is zero, we have recurrence. This is true on \mathbb{Z} with very mild restrictions, but is not necessarily true in higher dimensions. For example, consider the random walk on \mathbb{R}^2, where conditional on X_t, the increment is X_{t+1}-X_t is of length 1 and perpendicular to the vector X_t. The two possible directions are equally likely. The drift is therefore 0 everything, and the second moment is also well-behaved, but note that ||X_t||^2=t^2, just by considering Pythagoras. So in higher dimensions, we have to be a bit more careful, and put restrictions on the covariance structure of the increment distributions.

As a final comment, note that from Lamperti’s result, we can re-derive Polya’s result about SRW in higher dimensions. If we have X_t an SRW on \mathbb{Z}^d, then consider Y_t=||X_t||. By considering a couple of examples in two-dimensions, it is clear that this is not Markov. But the methods we considered above for the Lamperti walks were really martingale methods rather than Markov chain methods. And indeed this process Y has asymptotically zero drift with the right scaling. Here,

c=\frac{1}{2}(1-\frac{1}{d}),\quad s^2=\frac{1}{d},

and so r=d-1, leading to exactly the result we know to be true, that the SRW is transient precisely in three dimensions and higher.

REFERENCES

Harris – First Passage and Recurrence Distributions (1952)

The slides from Ostap Hryniv’s talk, on which this was based, can be found here.

Enhanced by Zemanta
Advertisement

Elastic Networks and TCP

Now, suppose each route is given a rate x_r, and there is a utility function U_r(x_r). Links have capacities as before, so seek to solve the optimisation problem

\max\quad \sum_r U_r(x_r)\quad\text{s.t.}\quad Ax\leq c

This is called the SYSTEM problem. However, the system might not know the utility functions U, or the incidence matrix A. We can alternatively consider the USER problem, where for a given route we assume the cost of the rate is linear, and maximise (utility – cost). That is:

\max\quad U_r(\frac{w_r}{\lambda_r})-w_r,\quad w_r\geq 0

where the constant of linearity has been absorbed into the utility function. For the NETWORK problem, we essentially repeat the system problem, only under the assumption that utility functions are w_r\log x_r for fixed w. That is

\max\quad \sum_r w_r\log x_r,\quad\text{s.t.}\quad Ax\leq c

These optimisation problems are linked formally by the following theorem:

Theorem: We can find \lambda,w,x that solve USER and NETWORK, for which w_r=\lambda_rx_r. Furthermore, x then solves SYSTEM.

To prove this result, write down SYSTEM’s Lagrangian L(x,z,\mu)=\sum_r U_r(x_r)+\mu^T(c-Ax-z). The Lagrangian of NETWORK is similar, and at the optimum of USER, these Lagrangians are in fact the same, with \mu_j,\lambda_r shadow prices for resources and routes respectively.

We discuss two regimes for fairness of allocations:

1) (x_r) is max-min fair if it is a feasible flow, and if the smallest flow component is as large as possible. That is, if y is another feasible flow, then \exists r such that y_r>x_r\Rightarrow\exists s: y_s<x_s\leq x_r, ie you can’t increase a flow component without taking away from some other component which is smaller to begin with. Such a vector exists if the feasible region is convex, which in all practical examples it ought to be.

2) (x_r) is proportionally fair if it is feasible, and if for any other feasible vector, the change in the sum, weighted by x is non-positive, that is:

\sum_r \frac{x_r-y_r}{x_r}\leq 0

Alternatively, can think of this as saying that the sum of the percentage changes in each component is non-positive. We say the distribution is weighted proportionally fair with respect to w if

\sum_r w_r\frac{x_r-y_r}{x_r}\leq 0

Then it can be seen by considering behaviour in a neighbourhood of the optimum of the Lagrangian, that x solves NETWORK iff it weighted proportionally fair.

TCP and Lyapunov functions

Assume rates are controlled by the ODE

\frac{d}{dt}x_r(t)=k_r(w_r-x_r(t)\sum_{j\in r}\mu_j(t))

where \mu_j(t)=p_j(\sum_{s\ni j}x_s(t)). The motivation is that the increase in flow is linear (we increase it whenever it appears possible) and the decrease is multiplicative (that is, after negative feedback – a sign of failure to transmit a packet – the flow is halved). We seek a so-called Lyapunov function, which is a function of the system monotone along trajectories. In this case, a suitable example is

U(x)=\sum w_r\log x_r -\sum_j \int_0^{\sum_{s\ni j}x_s}p_j(y)dy

Such a function is differentiable, and has a unique maximum. Bounding the derivative away from 0 ensures that the maximum is actually attained in every case.