BMO1 2018

The first round of the British Mathematical Olympiad was sat yesterday. The paper can be found here, and video solutions here. Copyright for the questions is held by BMOS. They are reproduced here with permission.

I hope any students who sat the paper enjoyed at least some of the questions, and found it challenging! The following commentaries on the problems are not official solutions, and are not always full solutions at all, but contain significant steps of solutions, so would be best saved until after you have attempted the problems, if you are planning to do so. I’ve written quite a lot about Q5 because I found it hard (or at least time-consuming) and somewhat atypical, and I’ve written a lot about Q6 because there was a lot to say. I hope at least some of this is interesting to some readers of all levels of olympiad experience.

Question 1

A list of five two-digit positive integers is written in increasing order on a blackboard. Each of the five integers is a multiple of 3, and each digit {0,1,…,9} appears exactly once on the blackboard. In how many ways can this be done? (Note that a two-digit number cannot begin with zero.)

It’s a trope of BMO1 that the first question must be doable by some sort of exhaustive calculation or listing exercise. Of course, that is rarely the most efficient solution.

However, there is normally a trade-off between eliminating all listing, and reducing to a manageable task.

The key observation here is that writing the integers in increasing order is really just a way to indicate that order of the choices doesn’t matter. Even if that seems counter-intuitive. The question wants to know how many ways to choose these five numbers. The order of choice doesn’t matter since we’re going to put them in ascending order on the blackboard anyway.

You want to make your choices with as much independence as possible. So it would, for example, be a bad idea to choose the smallest number first. How many possibilities are there where the smallest number is 24? What about 42? What about 69? These are all different, and some are zero, so will make the computation very taxing.

However, you might notice that the digits {0,3,6,9} have to go together to form two numbers, and the rest have to pair up with one digit from {1,4,7} and one from {2,5,8}. You might know that an integer is divisible by 3 precisely if its digit sum is divisible by 3, but in this context you wouldn’t lose too much time by simply listing everything! These tasks are now completely separate, so you can take the number of ways to pair up {0,3,6,9} and multiply by the number of ways to pair up {1,4,7} and {2,5,8}. You need to take care over the ordering. It does (obviously) matter which is the first digit and which is the second digit in a number!

Continue reading

Advertisement

Lamperti Walks

DSC_2604

The theory of simple random walks on the integer lattice is a classical topic in probability theory. Polya proved in the 1920s that such a SRW on \mathbb{Z}^d is recurrent only for d=1 or 2. The argument is essentially combinatorial. We count the number of possible paths from 0 back to itself and show that this grows fast enough that even with the probabilistic penalty of having a particular long path we will still repeatedly see this event happening. In larger dimensions there is essentially ‘more space’ at large distances, at least comparatively, so a typical walk is more likely to escape into this space.

As Kakutani (of the product martingale theorem) said, and was subsequently quoted as the dedication on every undergraduate pdf about random walks: “A drunk man will find his way home, whereas a drunk bird may get lost forever.”

But transience in some sense a long-distance property. We can fiddle with the transition rates near zero and, so long as we don’t make anything deterministic this shouldn’t affect transience properties. Obviously if we have a (space-)homogeneous nearest-neighbour random walk on the integers with non-zero drift the process will be transient: it drifts towards positive infinity if the drift is positive. But can we have a random walk with non-zero drift, but where the drift tends to zero at large distances fast enough, and the process is still recurrent? What is the correct scaling for the decay of the drift to see interesting effects?

The answers to these questions is seen in the so-called Lamperti random walks, which were a recurring theme of the meeting on Aspects of Random Walks held in Durham this week. Thanks to the organisers for putting on such an excellent meeting. I hadn’t known much about this topic before, so thought it might be worth writing a short note.

As explained above, we consider time-homogeneous random walks. It will turn out that the exact distributions of the increments is not hugely important. Most of the properties we might care about will be determined only by the first two moments, which we define as:

\mu_1(x)=\mathbb{E}[X_{t+1}-X_t | X_t=x],

\mu_2=\mathbb{E}[(X_{t+1}-X_t)^2 | X_t=x].

Note that because the drift will be asymptotically zero, the second term is asymptotically equal to the variance of the increment. It will also turn out that the correct scaling for \mu_1 to see a phase transition is \mu_1(x)\sim \frac{c}{x}.

We begin by seeing how this works in the simplest possible example, from Harris (1952). Let’s restrict attention to a random walk on the non-negative integers, and impose the further condition that increments are +1 or -1. In the notation of a birth-and-death process from a first course on Markov chains, we can set:

p_j:=\mathbb{P}(X_{t+1}=j+1| X_t=j), \quad q_j=1-p_j.

We will set p_j=\frac12 + \frac{c}{2j}. Then a condition for transience is that

1+\frac{q_1}{p_1}+\frac{q_1q_2}{p_1p_2}+\ldots <\infty.

In our special case:

\frac{q_1\ldots q_r}{p_1\ldots p_r}\approx\frac{(r-2c)(r-1-2c)(r-2-2c)\ldots}{r!}\approx \frac{1}{r^{2c}}.

So we can deduce that this sum converges if c>1/2, giving transience. A similar, but slightly more complicated calculation specifies the two regimes of recurrence. If -1/2<=c<=1/2 then the chain is null-recurrent, meaning that the expected time to return to any given state is infinite. If c<-1/2, then it is positive recurrent.

In general, we assume \mu_1(x)\sim \frac{c}{x} and \mu_2(x)\approx s^2. In the case above, obviously s^2=1. The general result is that under mild assumptions on the increment distributions, for instance a (2+\epsilon)-moment, if we define r=-\frac{2c}{s^2}, then the RW is transient if r<-1, positive-recurrent if r>1, and null-recurrent otherwise. This is the main result of Lamperti.

To explain why we have parameterised exactly like this, it makes sense to talk about the more general proof methods, as obviously the direct Markov chain calculation won’t work in general. The motivating idea is that we can deal well with the situation where the drift is zero, so let’s transform the random walk so that the drift becomes zero. A function of a Markov chain that is more stable (in some sense) that the original MC, for analysis at least, is sometimes called a Lyapunov function. Here, the sensible thing is to consider Y_t=X_t^\gamma, for some exponent \gamma>0.

So long as our distributions are fairly well-behaved (eg a finite 2+\epsilon-moment), we can calculate the drift of Y as

\mathbb{E}[Y_{t+1}-Y_t| X_t=x]=\frac{\gamma}{2}x^{\gamma-2}(2c+(1-\gamma)s^2) +o(x^{\gamma-2}).

In particular, taking \gamma=1+r results in a random walk that is ‘almost’ a martingale. Note that the original RW was almost a martingale, in the sense that the drift is asymptotically zero, but now it is zero to second order as well.

To draw any rigorous conclusions, we need to be careful about exactly how precise this approximation is, but we won’t worry about that now. In particular, we need to know whether we can take this approximation over the optional stopping theorem, as this allows us to say:

\mathbb{P}(X\text{ hits }x\text{ before 0})=\mathbb{P}(Y\text{ hits }x^\gamma\text{ before 0})\sim x^{-\gamma}.

This is particularly useful for working out the expected excursion time away from 0, which precisely leads to the condition for null-recurrence.

In his talk, Ostap Hryniv showed that this Lyapunov function analysis can be taken much further, to derive much more precise results about excursions, maxima and ergodicity. Results of Menshikov and Popov from the 90s further specify the asymptotics for the invariant distribution, if it exists, in terms of r.

One cautionary remark I should make is that earlier I implied that once we know the drift of such a random walk is zero, we have recurrence. This is true on \mathbb{Z} with very mild restrictions, but is not necessarily true in higher dimensions. For example, consider the random walk on \mathbb{R}^2, where conditional on X_t, the increment is X_{t+1}-X_t is of length 1 and perpendicular to the vector X_t. The two possible directions are equally likely. The drift is therefore 0 everything, and the second moment is also well-behaved, but note that ||X_t||^2=t^2, just by considering Pythagoras. So in higher dimensions, we have to be a bit more careful, and put restrictions on the covariance structure of the increment distributions.

As a final comment, note that from Lamperti’s result, we can re-derive Polya’s result about SRW in higher dimensions. If we have X_t an SRW on \mathbb{Z}^d, then consider Y_t=||X_t||. By considering a couple of examples in two-dimensions, it is clear that this is not Markov. But the methods we considered above for the Lamperti walks were really martingale methods rather than Markov chain methods. And indeed this process Y has asymptotically zero drift with the right scaling. Here,

c=\frac{1}{2}(1-\frac{1}{d}),\quad s^2=\frac{1}{d},

and so r=d-1, leading to exactly the result we know to be true, that the SRW is transient precisely in three dimensions and higher.

REFERENCES

Harris – First Passage and Recurrence Distributions (1952)

The slides from Ostap Hryniv’s talk, on which this was based, can be found here.

Enhanced by Zemanta

Branching Random Walk and Amenability

This post is about some of the things I learned in an interesting given by Elisabetta Candellero in Oxford last week, based on joint work with Matt Roberts. The paper on which this is based can be found here. The main thing I want to talk about are some properties of graphs which were mentioned near the beginning which I hadn’t heard about before.

Branching Random Walk (hereafter BRW) is a model to which much attention has been paid, because of its natural applications in a range of physical and genetic settings. As with many of the best models, the definition is pretty much in the title. We take the ingredients for a random walk on a graph, which is a graph, and a transition matrix P on that graph. For most of the time we will consider simple random walk, so the graph G exactly specifies P. This requires the additional condition that the graph G is locally finite. We will introduce a branching mechanism, so at discrete times {0,1,2,…} we will track both the number of particles, and their current locations. We start at time 0 with a single particle at some vertex. Then at each time-step, all the vertices present die, and each gives birth independently to some number of offspring according to a fixed probability distribution \mu. These offspring then perform one move according to transition matrix P. Note that if you want the system to carry the appearance of having no death, then taking the support of the offspring distribution to be {1,2,3,…} achieves precisely this. The properties we consider will not be very interesting unless G is infinite, so assume that from now on.

There are almost limitless ways we could think of to generalise these dynamics. The offspring distribution could be allowed to depend on the vertex the particle is occupying. The joint transition probabilities of the offspring at a vertex could be biased in favour or against the offspring moving to the same site next. The environment could be chosen in advance before the process starts, but random.

The classical question about BRW is that of recurrence and transience. The definition extends naturally from that of a Markov chain (which any non-branching random walk on a graph is). As in that setting, we say a BRW is recurrent if every vertex is almost surely visited infinitely often by particles of the graph.

Heuristically, we should observe that in some sense, it is quite difficult for simple random walk on an infinite graph to be recurrent. We have examples in \mathbb{Z},\mathbb{Z}^2, but these are about as ‘small’ as an infinite graph can be. An idea might be that if the number of sites some distance away from where we start grows rapidly as the distance grows, then there isn’t enough ‘pull’ back to visit the sites near where we start infinitely often. Extending this argument, it is easier for a BRW to be recurrent, as we have the option to make the branching rate large, which means that there are lots of particles at large times, hence more possibility for visiting everywhere. Note that if the offspring distribution is subcritical, we don’t stand a chance of having interesting properties. If we ignore the random walk part, we just have a subcritical Galton-Watson process, which dies out almost surely.

We need a measure of the concept discussed in the heuristic for how fast the number of vertices in the graph grows as we consider bands of vertices further and further away from the starting vertex. The standard measure for this is the spectral radius, which is defined not in terms of number of vertices, but through the limiting probability of returning to a fixed vertex at large time n. Precisely

\rho:= \limsup \mathbb{P}_i(X_n=i)^{1/n},

so in some approximation sense

\mathbb{P}_i(X_n=i)\sim \rho^{n},

which explains why \rho\le 1. Note that by considering the sum of such terms, if simple random walk on G is recurrent, then \rho=1, but the converse does not hold. (Consider SRW on \mathbb{Z}^3 for example.)

It’s also worth remarking that \rho is a class property. In particular, for a connected graph, the value of \rho is independent of i. This is not surprising, as if d is the graph distance between vertices i and j, then

p_{ii}^{(n)}\ge p_{ij}^{(d)}p_{jj}^{(n-2d)}p_{ji}^{(d)},

and vice versa, which enables us to sandwich usefully for the limits.

Really, \rho is a function of the transition matrix P. In fact, we can be more specific, by considering diagonalising P. The only case we care about is when P is infinite, so this is not especially nice, but it makes it clear why p_{ii}^{(n)} decays like |\rho|^n where \rho is the largest eigenvalue of P. Indeed this is an alternative definition of the spectral radius. Note that Perron-Frobenius theory (which seems to keep coming up on the blog this week…) says that since |\rho|\le 1, then if |\rho|=1, we must have \rho=1. So the spectral radius being 1 is precisely equivalent to having an invariant measure. We don’t know whether we can normalise it, but P-F guarantees the relevant left-eigenvector is non-negative, and hence a measure.

Next we give this situation a name. Say that a random walk is amenable if \rho(P)=1. We can extend this property to say that a graph is amenable if SRW on it is amenable.

This is not the standard definition of amenability. This property is originally defined (by von Neumann) in the context of groups. A group G is said to be amenable if there exists a left-invariant probability measure on G, ie \mu such that

\forall A\subset G, \forall g\in G, \mu(gA)=A.

The uniform distribution shows that any finite group is amenable.

It turns out that in general there are several conditions for a group which are equivalent to amenability. One is that, given G finitely generated by B, the Cayley graph for G with edges given by elements of B does not satisfy a strong isoperimetric inequality. Such an inequality is an alternative way of saying that the graph grows rapidly. It says that the size of the boundary of a subset of the vertices is uniformly large relative to the size of the set. Precisely, there exists a constant c>0 such that whenever U is a finite subset of the vertices, we have |\partial U|\ge c|U|. (Note that finiteness of U is important – we would not expect results like this to hold for very large subsets.)

Kesten proved that it is further equivalent to the statement that simple random walk on Cay(G,B) is amenable in our original sense. This technical and important result links the two definitions.

We finish by declaring the main classical result in BRW, which is a precise condition for transience. As motivated earlier, the rate of branching and the spectral radius have opposing effects on whether the system is recurrent or transient. Note that at some large time, the expected number of particles which have returned to the starting vertex is given by the expected number of particles in the system multiplied by the probability that any one of them is back at its origin, ie \sim \mu^n\rho^n. So the probability that there is a particle back at the origin at this time is (crudely transferring from expectation to probability) 1\wedge (\mu \rho)^n. We can conclude that the chain is recurrent if \mu > \rho^{-1} and transient if \mu<\rho^{-1}. This result is due to Benjamini and Peres.

The remaining case, when \mu=\rho^{-1} is called, unsurprisingly, critical BRW. It was proved in ’06 by Gantert and Muller that, in fact, all critical BRWs are transient too. This must exclude the amenable case, as we could think of SRW on \mathbb{Z} as a critical BRW by taking the branching distribution to be identically one, as the spectral radius is also 1.

In the end, the material in this post is rather preliminary to the work presented in EC’s talk, which concerned the trace of BRW, and whether there are infinitely many essentially different paths to infinity taken by the particles of the BRW. They show that this holds in a broad class of graphs with symmetric properties.

Enhanced by Zemanta

Random Interlacements

In this post, I want to talk about another recently-introduced model that’s generating a lot of interest in probability theory, Sznitman’s model of random interlacements. We also want to see, at least heuristically, how this relates to more familiar models.

We fix our attention on a lattice, which we assume to be \mathbb Z ^d. We are interested in the union of an infinite collection of simple random walks on the lattice. The most sensible thing to consider is not a collection of random walks from at a random set of starting points, but rather a family of trajectories, that is a doubly-infinite random walk defined on times (-\infty,\infty). We will want this family to have some obvious properties, such as translation invariance, in order to make analysis possible and ideally obtain some 0-1 laws. The natural thing to do is then to choose the trajectories through a Poisson Point Process. The tricky part will be finding an intensity measure that has all the properties we want, and gives trajectories that genuinely do look like SRWs, and, most importantly, have a union that is neither too sparse nor too dense. For example, it wouldn’t be very interesting if with high probability every point appeared in the union…

For reasons we will mention shortly, we are interested in the complement of the union of the trajectories. We call this the vacant set. We will find an intensity which we can freely scale by some parameter u\in\mathbb{R}^+, which will give us a threshold for the complement to contain an infinite component. This is in the same sense as the phase transition for Bernoulli percolation. That is, there is a critical value u^* say, such that for u<u^* the vacant set contains an infinite component (or percolates) almost surely, and almost surely it does not when u>u^*. A later result of Teixeira shows that, as in percolation, this infinite component is unique.

Let us first recall why it is not interesting to consider this process for d=1 or 2. On \mathbb{Z}, with high probability a single SRW hits every integer point trivially, since it visits arbitrarily large and arbitrarily small integers. For d=2, the SRW is recurrent, and so consists of a countably infinite sequence of excursions from (0,0). Note that the probability that an excursion from 0 hits some point (x,y) is non-zero, as it is at least 2^{-2(|x|+|y|)} for example. Therefore, with high probability the SRW hits (x,y), and so whp it hits every point.

Therefore it is only for d=>3 that we start seeing interesting effects. It is worth mentioning at this point some of the problems that motivated considering this model. First is the disconnection time of a discrete cylinder by a simple walk. For example, Sznitman considers the random walk on \mathbb{Z}\times (\mathbb{Z}/N\mathbb{Z})^d. Obviously, it is more interesting to consider how long it takes a (1-dimensional in the natural sense) path to disconnect a d=>3 dimensional set than a 2-dimensional one, as the latter is given just by the first time the path self-intersects.

More generally, we might be interested in random walks up to some time an order of magnitude smaller than the cover time. Recall the cover time is the time to hit each point of the set. For example, for the random walk on the d-dimensional torus (\mathbb{Z}/N\mathbb{Z})^d the cover time (as discussed in Markov Chains and Mixing Times posts) is N^d \log N, but the log N represents in some sense only the ‘final few’ vertices. So we should ask what the set of unhit vertices looks like at time N^d. And it turns out that for large N, the structure of this vacant set is related to the vacant set in the random interlacement model, in a local sense.

Anyway, the main question to ask is: what should the intensity measure be?

We patch it together locally. Start with the observation that transience of the random walk means almost surely a trajectory spends only finitely many steps in a fixed finite set K. So we index all the trajectories which hit K by the first time they hit K. Given that a trajectory hits K, it is clear what the conditional distribution of this hitting point should be. Recall that SRW on Z^d is reversible, so we consider the SRW backwards from this hitting then. Then the probability that the hitting point is x (on the boundary of K) is proportional to the probability that a SRW started from x goes to infinity without hitting K again. So once we’ve settled on the distribution of the hitting point x, it is clear how to construct all the trajectories through K. We pick x on the boundary of K according to this distribution, and take the union of an SRW starting from x conditioned not to hit K again, and an SRW starting from x with no conditioning. These correspond to the trajectory before and after the hitting time, respectively.

In fact, it turns out that this is enough. Suppose we demand that the probability that the hitting point is x is equal to the probability that a SRW started from x goes to infinity without hitting K again (rather than merely proportional to). Sznitman proves that there is a unique measure on the set of trajectories that restricts to this measure for every choice of K. Furthermore, the Poisson Point Process with the globally-defined intensity, unsurprisingly restricts to a PPP with the intensity specific to K.

We have not so far said anything about trajectories which miss this set K. Note that under any sensible intensity with the translation-invariance property, the intensity measure of the trajectories which hit K must be positive, since we can cover \mathbb{Z}^d with countably many copies of K. So the number of trajectories hitting K is a Poisson random variable.

Recall how we defined the probability that the hitting point of K was some point x on the boundary. The sum of these probability is called the capacity of K. It follows that this is the parameter of the Poisson random variable. Ie, the probability that no trajectory passes through K is:

\exp(-u\mathrm{cap}(K)),

recalling that u is the free parameter in the intensity. This is the most convenient framework through which to start analysing the probability that there is an infinite connected set which is hit by no trajectory.

We conclude by summarising Sznitman’s Remark 1.2, explaining why it is preferable to work with the space of trajectories rather than the space of paths. Note that if we are working with paths, and we want translation invariance, then this restricts to translation invariance of the distribution of starting points as well, so it is in fact a stronger condition. Note then that either the intensity of starting at 0 is zero, in which case there are no trajectories at all, or it is positive, in which case the set of starting points looks like Bernoulli site percolation.

However, the results about capacity would still hold if there were a measure that restricted satisfactorily. And so the capacity of K would still be the measure of paths hitting K, which would be at least the probability that the path was started in K. But by translation invariance, this grows linearly with |K|. But capacity grows at most as fast as the size of the set of boundary points of K, which will be an order of magnitude smaller when K is, for example, a large ball.

REFERENCES

This was mainly based on

Sznitman – Vacant Set of Random Interlacements and Percolation (0704.2560)

Also

Sznitman – Random Walks on Discrete Cylinders and Random Interlacements (0805.4516)

Teixeira – On the Uniqueness of the Infinite Cluster of the Vacant Set of Random Interlacements (0805.4106)

and some useful slides by the same author (teixeira.pdf)

Invariant Distributions of Markov Chains

My lecture course in Linyi was all about Markov chains, and we spent much of the final two sessions discussing the properties of invariant distributions. I was not surprised, however, that none of the class chose this topic as the subject for a presentation to give after the end of the teaching week. One of the main problems is that so many rather similar properties are introduced roughly simultaneously. As we did in the class, I thought it was worth making some sort of executive summary, as a mixture of revision and interest.

Definition: \pi is an invariant measure if \pi P=\pi. If in addition \sum_{i\in I}\pi_i=1, then we say it is an invariant distribution. Of course, if I is finite, then any invariant measure can be normalised to give an invariant distribution.

The key initial questions are about existence and uniqueness. First, if there are multiple communicating classes, then an invariant measure (resp. distribution) is a linear (resp. affine) combination of the invariant measures / distributions on each (closed) class. So we restrict attention to irreducible Markov chains.

In the finite case, P is a stochastic matrix so has a column eigenvector with eigenvalue 1, namely the vector with all entries equal to 1. Thus, by reference to general theory in linear algebra, P has a row eigenvector \pi with eigenvalue 1. To paraphrase a remark made by one of my students, what is not clear is that this should be a measure. Demonstrating that this is true is rather non-trivial I think, normally done by reference to the rather more general Perron-Frobenius theorem, though on the flight home I came up with a short argument using Lagrangian duality. For now, we accept existence in the finite case, and note that we typically show existence by showing that the vector of expected time spent in each state between successive visits to a fixed reference state satisfies the properties of an invariant measure.

This is a good moment to note that recurrence is not a necessary condition for the existence of an invariant measure. For example, the random walk on \mathbb{Z}^3 is transient, but the uniform measure is invariant. However, it is not a sufficient condition for the existence of an invariant distribution either. (Of course, an irreducible finite chain is always recurrent, and always has an invariant distribution, so now we are considering only the infinite state space case.) The random walk on \mathbb{Z}^2 is recurrent, but the invariant measure is not normalisable.

The property we in fact need is positive recurrence. This says that the expected return time to each point is finite. Again, this is a class property. This is a common requirement in probabilistic arguments: almost surely finite is often not strong enough to show results if the expectation is infinite (see for example the various requirements for the optional stopping theorem). If this holds, then \pi_i=\frac{1}{\mathbb{E}T_i}, where T_i is the the return time starting from some i\in I.

The final question is ‘Why are we interested?’ One of the best answers is to look at convergence properties. A simple suggestion is this: if we start in equilibrium, then X_0,X_1,X_2,\ldots are all equal in distribution. Note that the dependence structure remains complicated, and much much more interesting than the individual distributions. Next, we observe that a calculation of n-step transition probabilities for a finite chain will typically involve a linear combination of nth powers of eigenvalues. One of the eigenvalues is 1, and the others lie strictly between -1 and 1. We observe in examples that the constant coefficient in p_{ij}^{(n)} is generally a function of j alone, and so p_{ij}^{(n)}\rightarrow\lambda_j, some distribution on I. By considering P^{n+1}=P\cdot P^n, it is easy to see that if this converges, (\lambda_j)_{j\in I} is an invariant distribution. The classic examples which do not work are

P=\begin{pmatrix}0&1\\1&0\end{pmatrix} and P=\begin{pmatrix}0&1&0\\ 0&0&1\\1&0&0\end{pmatrix},

as then the distribution of X_n is a function of the remainder of n modulo 3 alone. With a little thought, we can give a precise classification of such chains which force you to be in particular proper subsets of the state space at regular times n. Chains without this property are called aperiodic, and we can show that distributions for such chains converge to the equilibrium distribution as n\rightarrow\infty.

Recurrence and Transience of BM

In this post, we consider Brownian motion as a Markov process, and consider the recurrence and transience properties in several dimensions. As motivation, observe from Question 5 of this exam paper that it is a very much non-trivial operation to show that Brownian motion in two-dimensions almost surely has zero Lebesgue measure. We would expect this to be true by default, as we visualise BM as a curve. So it is interesting to see how much we can deduce without significant technical analysis. We will make use of Ito’s formula. This material comes from the Part III course Advanced Probability, which doesn’t explicitly mention Ito’s result, and instead proves the result required separately, making use of the nature of solutions to the diffusion equation. In this context we assume that for f\in C_b^{1,2}:

M_t:=f(t,B_t)-f(0,B_0)-\int_0^t(\frac{\partial}{\partial t}+\frac12\Delta)f(s,B_s)ds

is a martingale. Of course, precisely, from Ito’s formula, this can be expressed as the stochastic integral of a bounded function with respect to Brownian motion, which is therefore a (continuous local, but bounded) martingale.

d=1: In one dimension, BM is point-recurrent. This means that almost surely, BM returns to zero infinitely many times. This is easiest shown by using the time-reversal equivalence to deduce that \lim\sup B_t=-\lim\inf B_t=\infty.

d=2: BM is two dimensions is point-transient. That means that the probability of returning to a given point is <1. In fact it is 0, as one might suspect from the fact that BM is space-invariant and, intuitively at least, has measure 0. However, it is neighbourhood-recurrent, meaning that it almost surely returns to a ball around a given point infinitely often. We discuss small balls around 0, but obviously the conclusions apply equally well elsewhere.

The aim is to choose a function so that the expression in Ito’s formula as above is as simple as possible. Taking f a function of space alone and harmonic causes the integral term to vanish. In this case, f(y)=\log|y| will suffice. Obviously we have to restrict attention to \epsilon\leq |y|\leq R. We stop M at T_\epsilon\wedge T_R, that is the first time that the BM hits the boundary of the annulus on which f is defined, and apply OST, since \log|B_t| is bounded here and the stopping time is a.s. finite. We obviously have to assume the BM starts in this annulus, but then we obtain:

\mathbb{E}_x\log|B_{T_\epsilon\wedge T_R}|=\log|x|

and so we can consider the two possibilities for B_{T_\epsilon\wedge T_R} to deduce:

\mathbb{P}_x(T_\epsilon<T_R)=\frac{\log R-\log|x|}{\log R-\log\epsilon}

Now let \epsilon\downarrow 0 to see that \mathbb{P}_x(B_t=0,\text{ some }t>0)=0. Now apply the (weak) Markov property at a small fixed time a, to deduce, with a mild abuse of notation:

\mathbb{P}_0(B_t=0,\text{ some }t>a)=\int \mathbb{P}_x(B_t=0,t>0)\mathbb{P}_0(B_a=dx)=0

as the first term in the integral we have shown to be 0 B_a-a.e. Then let a\downarrow 0 to obtain the result about point-transience.

For neighbourhood recurrence, instead let R\uparrow\infty, so \mathbb{P}_x(T_\epsilon<\infty)=1. As before, can integrate over law of B_n to obtain

\mathbb{P}_0(|B_t|\leq \epsilon,\text{ some }t\geq n)=1

which is precisely what we require for transience.

d=>3: BM is transient. That is, |B_t|\rightarrow\infty a.s. Note that for d>3, the first three components have the same distribution as BM in three dimensions, and so it suffices to consider the case d=3.

Here, the correct choice of harmonic function is f(y)=\frac{1}{|y|}, so conclude as before that

\mathbb{P}_x(T_\epsilon<T_R)=\frac{|x|^{-1}-R^{-1}}{\epsilon^{-1}-R^{-1}}

From this, we can take a limit to see that

\mathbb{P}_x(T_\epsilon<\infty)\leq \frac{\epsilon}{|x|}

We deploy a neat trick to lift this result to a global statement about transience. Define the events that the modulus never returns to n after hitting n^3

A_n:=\{|B_t|>n\quad \forall t\geq T_{n^3}\}

Calculate

\mathbb{P}_0(A_n^c)\stackrel{\text{SMP}}{=}\mathbb{E}_0[\mathbb{P}_{B_{T_{n^3}}}(T_n<\infty)]=\mathbb{E}_0[\frac{1}{n^2}]=\frac{1}{n^2}

Applying Borel-Cantelli 1, A_n eventually holds almost surely, which certainly implies the desired result.