An obvious remark


An obvious remark:

If a sequence of independent random variables X_n converge almost surely to some limit X, this limit must be a constant (almost surely).

I’ve been thinking about the Central Limit Theorem about related Large Deviations results this afternoon, and wasted almost an hour worrying about situations which were effectively well-disguised special cases of the above.

Why is it true? Well, suppose each X_n is \mathcal{F}_n-measurable. But by independence, we might as well take \mathcal{F}_n=\sigma(X_n). Then the limit variable X is independent of \mathcal{F}_n for all n, and thus independent of \cup F_n=\mathcal{F}\supset \sigma(X). If X is independent of itself, it must be almost surely constant.

Recurrence and Transience of BM

In this post, we consider Brownian motion as a Markov process, and consider the recurrence and transience properties in several dimensions. As motivation, observe from Question 5 of this exam paper that it is a very much non-trivial operation to show that Brownian motion in two-dimensions almost surely has zero Lebesgue measure. We would expect this to be true by default, as we visualise BM as a curve. So it is interesting to see how much we can deduce without significant technical analysis. We will make use of Ito’s formula. This material comes from the Part III course Advanced Probability, which doesn’t explicitly mention Ito’s result, and instead proves the result required separately, making use of the nature of solutions to the diffusion equation. In this context we assume that for f\in C_b^{1,2}:

M_t:=f(t,B_t)-f(0,B_0)-\int_0^t(\frac{\partial}{\partial t}+\frac12\Delta)f(s,B_s)ds

is a martingale. Of course, precisely, from Ito’s formula, this can be expressed as the stochastic integral of a bounded function with respect to Brownian motion, which is therefore a (continuous local, but bounded) martingale.

d=1: In one dimension, BM is point-recurrent. This means that almost surely, BM returns to zero infinitely many times. This is easiest shown by using the time-reversal equivalence to deduce that \lim\sup B_t=-\lim\inf B_t=\infty.

d=2: BM is two dimensions is point-transient. That means that the probability of returning to a given point is <1. In fact it is 0, as one might suspect from the fact that BM is space-invariant and, intuitively at least, has measure 0. However, it is neighbourhood-recurrent, meaning that it almost surely returns to a ball around a given point infinitely often. We discuss small balls around 0, but obviously the conclusions apply equally well elsewhere.

The aim is to choose a function so that the expression in Ito’s formula as above is as simple as possible. Taking f a function of space alone and harmonic causes the integral term to vanish. In this case, f(y)=\log|y| will suffice. Obviously we have to restrict attention to \epsilon\leq |y|\leq R. We stop M at T_\epsilon\wedge T_R, that is the first time that the BM hits the boundary of the annulus on which f is defined, and apply OST, since \log|B_t| is bounded here and the stopping time is a.s. finite. We obviously have to assume the BM starts in this annulus, but then we obtain:

\mathbb{E}_x\log|B_{T_\epsilon\wedge T_R}|=\log|x|

and so we can consider the two possibilities for B_{T_\epsilon\wedge T_R} to deduce:

\mathbb{P}_x(T_\epsilon<T_R)=\frac{\log R-\log|x|}{\log R-\log\epsilon}

Now let \epsilon\downarrow 0 to see that \mathbb{P}_x(B_t=0,\text{ some }t>0)=0. Now apply the (weak) Markov property at a small fixed time a, to deduce, with a mild abuse of notation:

\mathbb{P}_0(B_t=0,\text{ some }t>a)=\int \mathbb{P}_x(B_t=0,t>0)\mathbb{P}_0(B_a=dx)=0

as the first term in the integral we have shown to be 0 B_a-a.e. Then let a\downarrow 0 to obtain the result about point-transience.

For neighbourhood recurrence, instead let R\uparrow\infty, so \mathbb{P}_x(T_\epsilon<\infty)=1. As before, can integrate over law of B_n to obtain

\mathbb{P}_0(|B_t|\leq \epsilon,\text{ some }t\geq n)=1

which is precisely what we require for transience.

d=>3: BM is transient. That is, |B_t|\rightarrow\infty a.s. Note that for d>3, the first three components have the same distribution as BM in three dimensions, and so it suffices to consider the case d=3.

Here, the correct choice of harmonic function is f(y)=\frac{1}{|y|}, so conclude as before that


From this, we can take a limit to see that

\mathbb{P}_x(T_\epsilon<\infty)\leq \frac{\epsilon}{|x|}

We deploy a neat trick to lift this result to a global statement about transience. Define the events that the modulus never returns to n after hitting n^3

A_n:=\{|B_t|>n\quad \forall t\geq T_{n^3}\}



Applying Borel-Cantelli 1, A_n eventually holds almost surely, which certainly implies the desired result.

Martingale Convergence

I continue the theme of explaining important bookwork from the Part III course, Advanced Probability, as succinctly as possible. In this post, we consider the convergence properties of discrete time martingales.

1) Theorem: Assume X is a martingale bounded in L^1. Then

X_n\rightarrow X_\infty\in L^1(\mathcal{F}_\infty) a.s.

Remark: The theorem and proof works equally well for X a supermartingale.

Proof: Essentially, we want to reduce convergence of the random variables which make up the martingale to a countable collection of events. We do this by considering upcrossings, which counts the number of times the process alternates from less than a given interval to greater than a given interval. The formal definition will be too wide for this format, so we summarise as

N_n([a,b],X)= the number of disjoint time intervals up to time n in which X goes from [-\infty a) to (b,\infty]. Define N([a,b],X) to be the limit as n increases to infinity.

It is a genuinely easy check that a sequence converges (possibly to \infty) iff this number of upcrossings of any interval with rational bounds is finite. We will show that the martingale almost surely has this property. The key lemma is a bound due to Doob:

Lemma: (b-a)\mathbb{E}[N_n([a,b],X)]\leq \mathbb{E}[(X_n-a)^-]

 Proof: Say S_1<T_1<S_2<T_2,\ldots are the successive hitting times of [-\infty,a),(b,\infty] respectively. So N_n=\inf\{k:T_k\leq n\}. We decompose, abbreviating the number of upcrossings as N.

\sum_{k=1}^n(X_{T_k\wedge n}-X_{S_k\wedge n})=\sum_{k=1}^N(X_{T_k}-X_{S_k})+(X_n-X_{S_{N+1}})1_{\{S_{N+1}\leq n\}}

Now take an expectation of both sides, applying the Optional Stopping Theorem to the bounded stopping times on the LHS. (If we are working with a supermartingale, formally we need to take \mathbb{E}[\cdot|\mathcal{F}_{S_k}] of each summand on LHS to show that they are non-negative, and so taking a further expectation over \mathcal{F}_{S_k} gives the required result.) We obtain:

0\geq (b-a)\mathbb{E}N-\mathbb{E}(X_n-X_{S_{N+1}})1_{\{S_{N+1}\leq n\}}

If S_{N+1}>n then both 1_{\{S_{N+1}\leq n\}}=(X_n-a)^-=0. Otherwise (X_n-a)^-\geq X_{S_{N+1}}-X_n. This complete the proof of the lemma.

Since \mathbb{E}[(X_n-a)^-]\leq \mathbb{E}|X_n|+a<\infty, where this last bound is uniform in by assumption, applying monotone convergence, we get that N([a,b],X) is almost surely finite for every pair a<b\in\mathbb{Q}. Because this set is countable, we can deduce that this holds almost surely for every pair simultaneously. We therefore define X_\infty(w)=\lim X_n(w) when this limit exists, and 0 otherwise. With probability one the limit exists. Fatou’s lemma confirms that X_\infty\in L^1(\mathcal{F}_\infty).

2) We often want to have convergence in L^1 as well. Recall for Part II Probability and Measure (or elsewhere) that

UI + Convergence almost surely is necessary and sufficient for convergence in L^1.

This applies equally well to this situation. Note that for a martingale, this condition is often convenient, because, for example, we know that the set \{\mathbb{E}[X_\infty|\mathcal{F}_n],n\} is UI for any integrable X_\infty.

3) Convergence in L^p is easier to guarantee.

Theorem: i) X a martingale bounded in L^p iff ii) X_n\rightarrow X_\infty in L^p and almost surely iff iii) \exists Z\in L^p s.t. X_n=\mathbb{E}[Z|\mathcal{F}_n] a.s.

Remark: As part of the proof, we will show, as expected, that X_\infty,Z are the same.

Proof: i)->ii) Almost sure convergence follows from the above result applied to the p-th power process. We apply Doob’s inequality about running maxima in a martingale process:

||X_n^*||_p:=||\sup_{m\leq n}X_m||_p\leq \frac{p}{p-1}||X_n||_p

Using this, we see that X_n^*\uparrow X_\infty^*:=\sup|X_k|. Now consider |X_n-X_\infty|\leq 2X_\infty^*\in L^p and use Dominated Convergence to confirm convergence in L^p.

Note that Doob’s L^p inequality can be proven using the same author’s Maximal inequality and Holder.

ii)->iii) As we suspected, we show Z=X_\infty is suitable.

||X_n-\mathbb{E}[X_\infty|\mathcal{F}_n]||_p\stackrel{\text{large }m}{=}||\mathbb{E}[X_m-X_\infty|\mathcal{F}_n]||_p\stackrel{\text{Jensen}}{\leq}||X_m-X_\infty||_p\rightarrow 0

iii)->i) is easy. Z bounded in L^p implies X bounded by a simple application of the triangle inequality in the definition of conditional expectation.

Remarkable fact about Brownian Motion #2: Blumenthal’s 0-1 Law and its Consequences

Brownian motion is a martingale, so it is known that it has the Markovian property. That is, (B_{t+s}-B_s,t\geq 0) is a Brownian motion independent of \mathcal{F}_s. But in fact we can show that this is independent of \mathcal{F}_s^+=\cap_{t>s}\mathcal{F}_t, the sigma algebra that (informally) deals with events which are determined by the process up to time s and in the infinitesimal period after time s.

Is this surprising? Perhaps. This larger sigma algebra contains, for example, the existence and value of the right-derivative of the process at time s. But the events determined by the local behaviour after time s are clearly also determined by the whole process after time s, so the independence property looks likely to give a 0-1 type law, like in the proof of Kolmogorov’s 0-1 law.

Theorem: (B_{t+s}-B_s,t\geq 0) is independent of \mathcal{F}_s^+.

Proof: Take a sequence of times s<t_1<\ldots<t_k and A\in\mathcal{F}_s^+. It will suffice to show that the joint law of the new process at these times is independent of event A. So take F a bounded continuous function. The plan is to approximate B_s from above, as then we will definitely have independence from \mathcal{F}_s, and hope that we have enough machinery to carry through the statements through the limit down to s. So take s_n\downarrow s, and then: \mathbb{E}[F(B_{t_1+s}-B_s,\ldots,B_{t_k+s}-B_s)1_A]=\lim_n \mathbb{E}[F(B_{t_1+s_n}-B_{s_n},\ldots,B_{t_k+s_n}-B_{s_n})1_A] as continuity gives a.s. pointwise convergence, and we can lift to expectations by Dominated Convergence. The function in the limit on the right separates by independence as \mathbb{E}[F(_{t_1+s_n}-B_{s_n},\ldots,B_{t_k+s_n}-B_{s_n})]\mathbb{P}(A). Applying the previous argument in reserve gives that the limit of this is \mathbb{E}[F(B_{t_1+s}-B_s,\ldots,B_{t_k+s}-B_s)]\mathbb{P}(A) as desired.

In particular, this gives Blumenthal’s 0-1 Law, which states that \mathcal{F}_0^+ is trivial. This is apparent by setting s=0 in the above result, because then the process under discussion is the original process, and so \mathcal{F}_0^+ is independent of \mathcal{F}\supset \mathcal{F}_0^+.

A consequence is the following. For a BM in one dimension, let \tau=\inf\{t>0:B_t>0\}, \sigma=\inf\{t>0:B_t<0\}. By the fact that BM is almost surely non-constant, for any sample path, at least one of these is 0, and by symmetry \mathbb{P}(\tau=0)=\mathbb{P}(\sigma=0) so these are greater than or equal to 1/2. But it is easy to see that the event \{\tau=0\}\in\mathcal{F}_0^+, and so by the triviality of the sigma-field, this probability must be 1. With continuity, this means that every interval (0,\epsilon) contains a zero of the Brownian motion almost surely. Patching together on rational intervals (so we can use countable additivity) gives that BM is almost surely monotonic on no interval. A similar argument can be used to show that BM is almost surely not differentiable at t=0. For example, the existence and (conditional on existence) value of the derivative at t=0 is a trivial event, so by symmetry, either the derivative is a.s. =0, or a.s. doesn’t exist. Ruling out the former option can be done in a few ways. Predictably, in fact it is almost surely differentiable nowhere, but that is probably something to save for another post.

Remarkable fact about Brownian Motion #1: It exists.

A Brownian Motion in one dimension is a stochastic process B_t adapted to (\mathcal{F}_t), defined on some probability space, which is almost surely continuous, and has the properties that B_0=0 a.s. and for every 0\leq s\leq t, B_t-B_s\sim N(0,t-s) and is independent of \mathcal{F}_s. So it has independent increments.

It can be shown that Brownian motion is also surely differentiable nowhere, and is invariant under suitable time-space rescaling. It is therefore not obvious that a probability space with rich enough structure exists to construct such a process.

Theorem (Wiener): There exists a Brownian motion on some probability space.

Proof: We will construct Brownian motion on D, the dyadic rationals in [0,1], then develop the machinery which will enable us to conclude that taking a limit onto the reals retains all the properties we need. The Strong Markov property allows us to extend this to the real line by taking countably many copies, and d independent copies will give BM on \mathbb{R}^d.

We make the following observation. Given independent X_1,X_2\sim N(0,1), it is a simple check that conditional on (X_1|X_1+X_2=a)\sim N(\frac{a}{2},\frac{1}{2}). From this, it is clear how our construction on D will proceed. Take a family of independent N(0,1) RVs, one for each dyadic rational. Rescale these appropriately to construct the BM on D_{n+1} from the values on D_n. Can check the covariance of the new finer increments, and it is clear that these are independent, provided the original increments were independent.

We need continuity. The precise result needed will be dealt with at the end, but it is easy to check that B_d has dyadic increments bounded as \mathbb{E}|B_t-B_s|^p=|t-s|^{p/2}\mathbb{E}|N|^p. The Kolmogorov criterion thus gives that D\ni t\mapsto B_t(\omega) is Holder continuous for any exponent in (0,1/2). Then define: B_t=\lim_{D\ni s\downarrow t} B_s, t\in[0,1], which will also have the a.s. Holder property.

Need to check increments property. Given some increments t_0<t_1<\ldots<t_k, approximate from above by dyadic t_i^n\rightarrow t_i. Then (B_{t_0}^n,\ldots, B_{t_k}^n)\stackrel{\text{a.s.}}{\rightarrow} (B_{t_0},\ldots, B_{t_k}), so the joint distributions converge also. Using Levy then gives both the Gaussian property and the independence in one go.

Theorem (Kolmogorov’s Criterion): If there exist p,\epsilon>0 such that: \mathbb{E}|X_t-X_s|^p\leq C|t-s|^{1+\epsilon}\; \forall s,t\in D then for every \alpha\in (0,\frac{\epsilon}{p}), X is \alpha-Holder continuous almost surely.

Proof: By Markov and BC, can deduce from the condition the existence of a RV M such that almost surely: \sup_{n}\max_k 2^{-n\alpha}|X_{k2^{-n}}-X_{(k+1)2^{-n}}|\leq M<\infty. Now given dyadic s,t, there is a unique dyadic rational with smallest denominator, say 2^{r}, between them. Can then express the difference t-s as a sum of dyadic reciprocals with denominators greater than $2^{(r+1)}$, where each denominator occurs at most twice. So can write: |X_t-X_s|\leq 2\sum_{n\geq r+1}M2^{-n\alpha}=2M\cdot\frac{2^{-(r+1)\alpha}}{1-2^{-alpha}}. Then M<\infty a.s. gives the Holder criterion.

Convergence of Random Variables

The relationship between the different modes of convergence of random variables is one of the more important topics in any introduction to probability theory. For some reason, many of the textbooks leave the proofs as exercises, so it seems worthwhile to present a sketched but comprehensive summary.

Almost sure convergence: X_n\rightarrow X\;\mathbb{P}-a.s. if \mathbb{P}(X_n\rightarrow X)=1.

Convergence in Probability: X_n\rightarrow X in \mathbb{P}-probability if \mathbb{P}(|X_n-X|>\epsilon)\rightarrow 0 for any \epsilon>0.

Convergence in Distribution: X_n\stackrel{d}{\rightarrow} X if \mathbb{E}f(X_n)\rightarrow \mathbb{E}f(X) for any bounded, continuous function f. Note that this definition is valid for RVs defined on any metric space. When they are real-valued, this is equivalent to the condition that F_{X_n}(x)\rightarrow F_X(x) for every point x\in \mathbb{R} where F_X is continuous. It is further equivalent (by Levy’s Convergence Theorem) to its own special case, convergence of characteristic functions: \phi_{X_n}(u)\rightarrow \phi_X(U) for all u\in\mathbb{R}.

Note: In contrast to the other conditions for convergence, convergence in distribution (also known as weak convergence) doesn’t require the RVs to be defined on the same probability space. This thought can be useful when constructing counterexamples.

L^p-convergence: X_n\rightarrow X in L^p if ||X_n-X||_p\rightarrow 0; that is, \mathbb{E}|X_n-X|^p\rightarrow 0.

Uniform Integrability: Informally, a set of RVs is UI if the integrals over small sets tend to zero uniformly. Formally: (X_n) is UI if \sup_{n,A\in\mathcal{F}}\{\mathbb{E}[|X_n|1(A)]|\mathbb{P}(A)\leq \delta\}\rightarrow 0 as \delta\rightarrow 0.

Note: In particular, a single RV, and a collection of independent RVs are UI. If X~U[0,1] and X_n=n1(X\leq \frac{1}{n}), then the collection is not UI.

Continue reading