Doob inequalities and Doob-Meyer decomposition

The first post I wrote on this blog was about martingales, way back in 2012 at a time when I had known what a martingale was for about a month. I now don’t have this excuse. So I’m going to write about a couple of properties of (discrete-time) martingales that came up while adjusting a proof which my thesis examiners suggested could be made much shorter as part of their corrections.

Doob’s submartingale inequality

When we prove that some sequence of processes converges to some other process, we typically want to show that this holds in some sense uniformly over a time-interval, rather than just at some fixed time. We don’t lose much at this level of vagueness by taking the limit process to be identically zero. Then, if the convergent processes are martingales or closely similar, we want to be able to bound \sup_{k\le n} |Z_k| in some sense.

Doob’s submartingale inequality allows us to do this. Recall that a submartingale has almost-surely non-negative conditional increments. You might think of it heuristically as ‘more increasing than a martingale’. If Z_n is a martingale, then |Z_n| is a submartingale. This will be useful almost immediately.

The statement is that for (Z_n) a non-negative submartingale,

\mathbb{P}\left( \sup_{k\le n} Z_k \ge \lambda\right) \le \frac{\mathbb{E}\left[Z_n\right]}{\lambda}.

The similarity of the statement to the statement of Markov’s inequality is no accident. Indeed the proof is very similar. We consider whether the event in question happens, and find lower bounds on the expectation of Z_n under both possibilities.

Formally, for ease of notation, let Z_n^* be the running maximum \sup_{k\le n}Z_k. Then, we let T:= n\wedge \inf\{k\le n, M_j\ge \lambda\} and apply the optional stopping theorem for submartingales at T, which is by construction at most n. That is

\mathbb{E}[Z_n]\ge \mathbb{E}[Z_T]=\mathbb{E}\left[Z_T\mathbf{1}_{Z_n^*<\lambda}\right] + \mathbb{E}\left[Z_T \mathbf{1}_{Z_n^*\ge \lambda}\right].

The first of these summands is positive, and the second is at least \lambda \mathbb{P}\left( Z_N^* \ge \lambda \right), from which the result follows.

We’ve already said that for any martingale Z_n, |Z_n| is a submartingale, but in fact f(Z_n) is a submartingale whenever f is convex, and \mathbb{E}|f(Z_n)|<\infty for each n. Naturally, this continues to hold when Z_n is itself a submartingale.

[Note that Z_n^* is also a submartingale, but this probably isn’t as interesting.]

A particularly relevant such function f is f(x)=x^p, for p>1. If we take Z_n a non-negative submartingale which is uniformly bounded in L^p, then by applying Holder’s inequality and this submartingale inequality, we obtain

\mathbb{E}\left( \sup_{k\le n}Z_n^p \right) \le \left(\frac{p}{p-1}\right)^p \mathbb{E}\left[ Z_n^p \right].

Since Z_n^p is a submartingale, then a limit in n on the RHS is monotone, and certainly a limit in n on the LHS is monotone, so we can extend to

\mathbb{E}\left( \sup_{k\le n}Z_\infty^p \right) \le \left(\frac{p}{1-p}\right)^p \mathbb{E}\left[ Z_\infty^p \right].

Initially, we have to define \mathbb{E}\left[ Z_\infty^p \right] through this limit, but in fact this result, Doob’s L^p inequality, shows that Z_\infty:= \lim Z_n exists almost surely as well.

Naturally, we will often apply this in the case p=2, and in the third of these three sections, we will see why it might be particularly straightforward to calculate \mathbb{E}\left[Z_\infty^2\right].

Remark: as in the case of Markov’s inequality, it’s hard to say much if the submartingale is not taken to be non-negative. Indeed, this effect can be seen even if the process is only defined for a single time step, for which the statement really is then Markov’s inequality.

Doob-Meyer decomposition

Unfortunately, most processes are not martingales. Given an discrete-time process X_n adapted to \mathcal{F}=(\mathcal{F}_n), it is a martingale if the conditional expectations of the increments are all almost surely zero. But given a general adapted process X_n which is integrable (so the increments have well-defined finite expectation), we can iteratively construct a new process M_n, where the increments are centred versions of X_n‘s increments. That is,

M_{n+1}-M_n:= X_{n+1}-X_n - \mathbb{E}\left[ X_{n+1}-X_n \,\big|\, \mathcal{F}_n\right] = X_{n+1}-\mathbb{E}\left[X_{n+1} \,\big|\, \mathcal{F}_n\right]. (*)

Then it’s immediately clear from the definition that M_n is a martingale.

There’s a temptation to tie oneself up in knots with the dependence. We might have that increments of the original process X_n depend on the current value of the process. And is it necessarily clear that we can recover the current value of the original process from the current value of M_n? Well, this is why we demand that everything be adapted, rather than just Markov. It’s not the case that M_n should be Markov, but it clearly is adapted.

Now we look at the middle expression in (*), and in particular the term we are subtracting, namely the conditional expectation. If we define, in the standard terminology, A_0=0 and

A_{n+1}-A_n:= \mathbb{E}\left[ X_{n+1}-X_n \,\big|\, \mathcal{F}_n\right],

then we have decomposed the original process X_n as the sum of a martingale M_n, and this new process A_n. In particular, note that the increment A_{n+1}-A_n given above is adapted to \mathcal{F}_n, which is a stronger condition than being adapted to \mathcal{F}_{n+1} as we would expect a priori. This property of the process (A_n) is called predictability (or possibly previsibility).

This decomposition X_n=X_0+M_n+A_n as just defined is called the Doob-Meyer decomposition, and there is a unique such decomposition where M_n is a martingale, and A_n is predictable. The proof of uniqueness is very straightforward. We look at the equalities given above as definitions of M_n,A_n, but then work in the opposite direction to show that they must hold if the decomposition holds.

I feel a final heuristic is worthwhile, using the term drift, more normally encountered in the continuous-time setting to describe infinitissimal expected increments. The increments of A_n represent the drift of X_n, and the increments of M_n are what remains from X_n after subtracting the drift. In general, the process to be subtracted to turn a non-martingale into a martingale is called a compensator, and the existence or otherwise of such processes is important but challenging for some classes of continuous-time processes.

In particular, note that when X_n is itself a martingale, then A_n\equiv 0. However, probably the most useful case is when X_n is a submartingale, as then the drift is always non-negative, and so A_n is almost surely increasing. The converse holds too.

This is relevant because this Doob-Meyer decomposition is obviously only a useful tool for treating X_n if we can handle the two processes M_n,A_n easily. We have tools to bound the martingale term, but this previsible term might in general be tricky, and so the case where X_n is a submartingale is good, as increasing processes are much easier than general processes, since bounding the whole process might involve only bounding the final term in many contexts.

Predictable quadratic variation

A particularly relevant example is the square of a martingale, that is X_n=M_n^2, where M_n is a martingale. By the convexity condition discussed earlier, X_n is a submartingale (provided it is integrable, ie M_n is square-integrable), and so the process A_n in its Doob-Meyer decomposition is increasing. This is often called the (predictable) quadratic variation of (X_n).

This predictable quadratic variation is sometimes denoted \langle X_n\rangle. This differs from the (regular) quadratic variation which is defined as the sum of the squares of the increments, that is [X_n]:= \sum_{k=0}^{n-1} (X_{k+1}-X_k)^2. Note that this is adapted, but obviously not previsible. The distinction between these two processes is more important in continuous time. There, they are almost surely equal for a continuous local martingale, but not for eg a Poisson process. (For a Poisson process, the PQV is deterministic, indeed linear, while the (R)QV is almost surely equal to the Poisson process itself.) In the discrete time setting, the regular quadratic variation is not relevant very often, while the predictable quadratic variation is useful, precisely because of this decomposition.

Whenever we have random variables which we then centre, there is a standard trick to apply when treating their variance. That is

A_{n+1}-A_n= \mathbb{E}\left[ M^2_{n+1}-M^2_n \,\big|\, \mathcal{F}_n\right]
= \mathbb{E}\left[ M^2_{n+1}\,\big|\, \mathcal{F}_n\right] - 2M_n^2 +M_n^2
= \mathbb{E}\left[ M^2_{n+1}\,\big|\, \mathcal{F}_n\right] - 2M_n \mathbb{E}\left[ M_{n+1}\,\big|\, \mathcal{F}_n\right] + M_n^2
= \mathbb{E}\left[ \left(M_{n+1}-M_n\right)^2\,\big|\, \mathcal{F}_n\right].

One consequence is seen by taking an ‘overall’ expectation. Because M_n^2-A_n is a martingale,

\mathbb{E}\left[M_n^2\right] = \mathbb{E}\left[A_n\right] = \mathbb{E}\left[M_0^2\right] + \sum_{k=0}^{n-1} \mathbb{E}\left[A_{k+1}-A_k\right]
= \mathbb{E}\left[ M_0^2\right] + \sum_{k=0}^{n-1}\mathbb{E}\left[ \left(M_{k+1}-M_k\right)^2 \right]. (**)

This additive (Pythagorean) property of the square of a martingale is useful in applications where there is reasonably good control on each increment separately.

We can also see this final property without the Doob-Meyer decomposition. For a martingale it is not the case that the increments on disjoint intervals are independent. However, following Williams 12.1 [1], disjoint intervals are orthogonal, in the sense that

\mathbb{E}\left[(M_t-M_s)(M_v-M_u)\right]=0,

whenever s\le t\le u\le v. Then, when we square the expression M_n=M_0+\sum M_{k+1}-M_k, and take expectations, all the cross terms vanish, leaving precisely (*).

References

[1] Williams – Probability with Martingales

I also followed the notes I made in 2011/12 while attending Perla Sousi’s course on Advanced Probability, and Arnab Sen’s subsequent course on Stochastic Calculus, though I can’t find any evidence online for the latter now.

Working towards the Ito Isometry

We have defined the stochastic integral (H\cdot A)_t for a finite variation, continuous adapted process A, and H a previsible process. We want to extend this to integrals with respect to continuous local martingales. Analogous to defining the Lebesgue integral by extending linearly from indicator functions, we define integrals of simple processes with respect to martingales. A simple process is a disjointly (time-)supported linear combination of the canonical previsible process Z_s1_{(s,t]}. It will be convenient to demand that the variables of integration are in \mathcal{M}^2. Then the space \mathcal{M}^2 is preserved under this integral operation, that is H\cdot M\in\mathcal{M}^2, and in particular

\mathbb{E}(H\cdot M)_\infty^2\leq ||H||_\infty^2\mathbb{E}(M_\infty-M_0)^2

By Doob’s L^2 inequality, X\in\mathcal{M}^2\Rightarrow ||\sup_t|X_t|||_2<\infty, so it makes sense to define \mathcal{C}^2 as the set of cadlag adapted processes s.t. |||X|||:=||\sup_t|X_t|||<\infty. Of particular interest is that \mathcal{C}^2 is complete, by finding limits for subsequences and lifting to the whole sequence by Fatou.

We aim to construct integrals over \mathcal{M}_{c,loc} by taking limits of integrals over \mathcal{M}_c^2. For this space, we have that M^2-[M] is not just \in\mathcal{M}_{c,loc} but in fact a UI true martingale. Then, as a natural extension of the orthogonality trick (remembering that M^2-[M]\in\mathcal{M}_{c,loc}), we obtain

\mathbb{E}[M]_\infty=\mathbb{E}(M_\infty-M_0)^2

We then define for M\in\mathcal{M}_c^2 the norm ||\cdot||_M by ||H||_M^2:=\mathbb{E}\left(\int_0^tH_s^2d[M]_s\right) with L^2(M) the space of H s.t. that this is finite. Then for H\in \mathcal{S}, we have the Ito isometry:

I: (L^2(M),||\cdot||_M)\rightarrow(M_c^2,||\cdot||) with I(H)=H\cdot M

Now, we also have S dense in L^2(M). Why? We know linear combinations of 1_A, A\in\mathcal{P} are dense, so it STP A\in\mathcal{P}\Rightarrow A\in\bar{S}. Use a typical Dynkin’s lemma argument on \mathcal{A}=\{A\in\mathcal{P}:1_A\in \bar{S}\}, which contains \pi-system generating \mathcal{P}. So extend to L^2(M) generally. We recover the original motivating result that [H\cdot M]=H^2\cdot[M]. Take a sequence of stopping times reducing both H and M to force boundedness of integrand, and M\in\mathcal{M}_c^2. Stopping the integral is equivalent to stopping the integrand, and checking limits in stopping times allows us to lift the Ito Isometry result to this one about quadratic variations.

Martingale Convergence

I continue the theme of explaining important bookwork from the Part III course, Advanced Probability, as succinctly as possible. In this post, we consider the convergence properties of discrete time martingales.

1) Theorem: Assume X is a martingale bounded in L^1. Then

X_n\rightarrow X_\infty\in L^1(\mathcal{F}_\infty) a.s.

Remark: The theorem and proof works equally well for X a supermartingale.

Proof: Essentially, we want to reduce convergence of the random variables which make up the martingale to a countable collection of events. We do this by considering upcrossings, which counts the number of times the process alternates from less than a given interval to greater than a given interval. The formal definition will be too wide for this format, so we summarise as

N_n([a,b],X)= the number of disjoint time intervals up to time n in which X goes from [-\infty a) to (b,\infty]. Define N([a,b],X) to be the limit as n increases to infinity.

It is a genuinely easy check that a sequence converges (possibly to \infty) iff this number of upcrossings of any interval with rational bounds is finite. We will show that the martingale almost surely has this property. The key lemma is a bound due to Doob:

Lemma: (b-a)\mathbb{E}[N_n([a,b],X)]\leq \mathbb{E}[(X_n-a)^-]

 Proof: Say S_1<T_1<S_2<T_2,\ldots are the successive hitting times of [-\infty,a),(b,\infty] respectively. So N_n=\inf\{k:T_k\leq n\}. We decompose, abbreviating the number of upcrossings as N.

\sum_{k=1}^n(X_{T_k\wedge n}-X_{S_k\wedge n})=\sum_{k=1}^N(X_{T_k}-X_{S_k})+(X_n-X_{S_{N+1}})1_{\{S_{N+1}\leq n\}}

Now take an expectation of both sides, applying the Optional Stopping Theorem to the bounded stopping times on the LHS. (If we are working with a supermartingale, formally we need to take \mathbb{E}[\cdot|\mathcal{F}_{S_k}] of each summand on LHS to show that they are non-negative, and so taking a further expectation over \mathcal{F}_{S_k} gives the required result.) We obtain:

0\geq (b-a)\mathbb{E}N-\mathbb{E}(X_n-X_{S_{N+1}})1_{\{S_{N+1}\leq n\}}

If S_{N+1}>n then both 1_{\{S_{N+1}\leq n\}}=(X_n-a)^-=0. Otherwise (X_n-a)^-\geq X_{S_{N+1}}-X_n. This complete the proof of the lemma.

Since \mathbb{E}[(X_n-a)^-]\leq \mathbb{E}|X_n|+a<\infty, where this last bound is uniform in by assumption, applying monotone convergence, we get that N([a,b],X) is almost surely finite for every pair a<b\in\mathbb{Q}. Because this set is countable, we can deduce that this holds almost surely for every pair simultaneously. We therefore define X_\infty(w)=\lim X_n(w) when this limit exists, and 0 otherwise. With probability one the limit exists. Fatou’s lemma confirms that X_\infty\in L^1(\mathcal{F}_\infty).

2) We often want to have convergence in L^1 as well. Recall for Part II Probability and Measure (or elsewhere) that

UI + Convergence almost surely is necessary and sufficient for convergence in L^1.

This applies equally well to this situation. Note that for a martingale, this condition is often convenient, because, for example, we know that the set \{\mathbb{E}[X_\infty|\mathcal{F}_n],n\} is UI for any integrable X_\infty.

3) Convergence in L^p is easier to guarantee.

Theorem: i) X a martingale bounded in L^p iff ii) X_n\rightarrow X_\infty in L^p and almost surely iff iii) \exists Z\in L^p s.t. X_n=\mathbb{E}[Z|\mathcal{F}_n] a.s.

Remark: As part of the proof, we will show, as expected, that X_\infty,Z are the same.

Proof: i)->ii) Almost sure convergence follows from the above result applied to the p-th power process. We apply Doob’s inequality about running maxima in a martingale process:

||X_n^*||_p:=||\sup_{m\leq n}X_m||_p\leq \frac{p}{p-1}||X_n||_p

Using this, we see that X_n^*\uparrow X_\infty^*:=\sup|X_k|. Now consider |X_n-X_\infty|\leq 2X_\infty^*\in L^p and use Dominated Convergence to confirm convergence in L^p.

Note that Doob’s L^p inequality can be proven using the same author’s Maximal inequality and Holder.

ii)->iii) As we suspected, we show Z=X_\infty is suitable.

||X_n-\mathbb{E}[X_\infty|\mathcal{F}_n]||_p\stackrel{\text{large }m}{=}||\mathbb{E}[X_m-X_\infty|\mathcal{F}_n]||_p\stackrel{\text{Jensen}}{\leq}||X_m-X_\infty||_p\rightarrow 0

iii)->i) is easy. Z bounded in L^p implies X bounded by a simple application of the triangle inequality in the definition of conditional expectation.