# Doob inequalities and Doob-Meyer decomposition

The first post I wrote on this blog was about martingales, way back in 2012 at a time when I had known what a martingale was for about a month. I now don’t have this excuse. So I’m going to write about a couple of properties of (discrete-time) martingales that came up while adjusting a proof which my thesis examiners suggested could be made much shorter as part of their corrections.

Doob’s submartingale inequality

When we prove that some sequence of processes converges to some other process, we typically want to show that this holds in some sense uniformly over a time-interval, rather than just at some fixed time. We don’t lose much at this level of vagueness by taking the limit process to be identically zero. Then, if the convergent processes are martingales or closely similar, we want to be able to bound $\sup_{k\le n} |Z_k|$ in some sense.

Doob’s submartingale inequality allows us to do this. Recall that a submartingale has almost-surely non-negative conditional increments. You might think of it heuristically as ‘more increasing than a martingale’. If $Z_n$ is a martingale, then $|Z_n|$ is a submartingale. This will be useful almost immediately.

The statement is that for $(Z_n)$ a non-negative submartingale,

$\mathbb{P}\left( \sup_{k\le n} Z_k \ge \lambda\right) \le \frac{\mathbb{E}\left[Z_n\right]}{\lambda}.$

The similarity of the statement to the statement of Markov’s inequality is no accident. Indeed the proof is very similar. We consider whether the event in question happens, and find lower bounds on the expectation of $Z_n$ under both possibilities.

Formally, for ease of notation, let $Z_n^*$ be the running maximum $\sup_{k\le n}Z_k$. Then, we let $T:= n\wedge \inf\{k\le n, M_j\ge \lambda\}$ and apply the optional stopping theorem for submartingales at T, which is by construction at most n. That is

$\mathbb{E}[Z_n]\ge \mathbb{E}[Z_T]=\mathbb{E}\left[Z_T\mathbf{1}_{Z_n^*<\lambda}\right] + \mathbb{E}\left[Z_T \mathbf{1}_{Z_n^*\ge \lambda}\right].$

The first of these summands is positive, and the second is at least $\lambda \mathbb{P}\left( Z_N^* \ge \lambda \right)$, from which the result follows.

We’ve already said that for any martingale $Z_n$, $|Z_n|$ is a submartingale, but in fact $f(Z_n)$ is a submartingale whenever f is convex, and $\mathbb{E}|f(Z_n)|<\infty$ for each n. Naturally, this continues to hold when $Z_n$ is itself a submartingale.

[Note that $Z_n^*$ is also a submartingale, but this probably isn’t as interesting.]

A particularly relevant such function f is $f(x)=x^p$, for p>1. If we take $Z_n$ a non-negative submartingale which is uniformly bounded in $L^p$, then by applying Holder’s inequality and this submartingale inequality, we obtain

$\mathbb{E}\left( \sup_{k\le n}Z_n^p \right) \le \left(\frac{p}{p-1}\right)^p \mathbb{E}\left[ Z_n^p \right].$

Since $Z_n^p$ is a submartingale, then a limit in n on the RHS is monotone, and certainly a limit in n on the LHS is monotone, so we can extend to

$\mathbb{E}\left( \sup_{k\le n}Z_\infty^p \right) \le \left(\frac{p}{1-p}\right)^p \mathbb{E}\left[ Z_\infty^p \right].$

Initially, we have to define $\mathbb{E}\left[ Z_\infty^p \right]$ through this limit, but in fact this result, Doob’s $L^p$ inequality, shows that $Z_\infty:= \lim Z_n$ exists almost surely as well.

Naturally, we will often apply this in the case p=2, and in the third of these three sections, we will see why it might be particularly straightforward to calculate $\mathbb{E}\left[Z_\infty^2\right].$

Remark: as in the case of Markov’s inequality, it’s hard to say much if the submartingale is not taken to be non-negative. Indeed, this effect can be seen even if the process is only defined for a single time step, for which the statement really is then Markov’s inequality.

Doob-Meyer decomposition

Unfortunately, most processes are not martingales. Given an discrete-time process $X_n$ adapted to $\mathcal{F}=(\mathcal{F}_n)$, it is a martingale if the conditional expectations of the increments are all almost surely zero. But given a general adapted process $X_n$ which is integrable (so the increments have well-defined finite expectation), we can iteratively construct a new process $M_n$, where the increments are centred versions of $X_n$‘s increments. That is,

$M_{n+1}-M_n:= X_{n+1}-X_n - \mathbb{E}\left[ X_{n+1}-X_n \,\big|\, \mathcal{F}_n\right] = X_{n+1}-\mathbb{E}\left[X_{n+1} \,\big|\, \mathcal{F}_n\right].$ (*)

Then it’s immediately clear from the definition that $M_n$ is a martingale.

There’s a temptation to tie oneself up in knots with the dependence. We might have that increments of the original process $X_n$ depend on the current value of the process. And is it necessarily clear that we can recover the current value of the original process from the current value of $M_n$? Well, this is why we demand that everything be adapted, rather than just Markov. It’s not the case that $M_n$ should be Markov, but it clearly is adapted.

Now we look at the middle expression in (*), and in particular the term we are subtracting, namely the conditional expectation. If we define, in the standard terminology, $A_0=0$ and

$A_{n+1}-A_n:= \mathbb{E}\left[ X_{n+1}-X_n \,\big|\, \mathcal{F}_n\right],$

then we have decomposed the original process $X_n$ as the sum of a martingale $M_n$, and this new process $A_n$. In particular, note that the increment $A_{n+1}-A_n$ given above is adapted to $\mathcal{F}_n$, which is a stronger condition than being adapted to $\mathcal{F}_{n+1}$ as we would expect a priori. This property of the process $(A_n)$ is called predictability (or possibly previsibility).

This decomposition $X_n=X_0+M_n+A_n$ as just defined is called the Doob-Meyer decomposition, and there is a unique such decomposition where $M_n$ is a martingale, and $A_n$ is predictable. The proof of uniqueness is very straightforward. We look at the equalities given above as definitions of $M_n,A_n$, but then work in the opposite direction to show that they must hold if the decomposition holds.

I feel a final heuristic is worthwhile, using the term drift, more normally encountered in the continuous-time setting to describe infinitissimal expected increments. The increments of $A_n$ represent the drift of $X_n$, and the increments of $M_n$ are what remains from $X_n$ after subtracting the drift. In general, the process to be subtracted to turn a non-martingale into a martingale is called a compensator, and the existence or otherwise of such processes is important but challenging for some classes of continuous-time processes.

In particular, note that when $X_n$ is itself a martingale, then $A_n\equiv 0$. However, probably the most useful case is when $X_n$ is a submartingale, as then the drift is always non-negative, and so $A_n$ is almost surely increasing. The converse holds too.

This is relevant because this Doob-Meyer decomposition is obviously only a useful tool for treating $X_n$ if we can handle the two processes $M_n,A_n$ easily. We have tools to bound the martingale term, but this previsible term might in general be tricky, and so the case where $X_n$ is a submartingale is good, as increasing processes are much easier than general processes, since bounding the whole process might involve only bounding the final term in many contexts.

A particularly relevant example is the square of a martingale, that is $X_n=M_n^2$, where $M_n$ is a martingale. By the convexity condition discussed earlier, $X_n$ is a submartingale (provided it is integrable, ie $M_n$ is square-integrable), and so the process $A_n$ in its Doob-Meyer decomposition is increasing. This is often called the (predictable) quadratic variation of $(X_n)$.

This predictable quadratic variation is sometimes denoted $\langle X_n\rangle$. This differs from the (regular) quadratic variation which is defined as the sum of the squares of the increments, that is $[X_n]:= \sum_{k=0}^{n-1} (X_{k+1}-X_k)^2$. Note that this is adapted, but obviously not previsible. The distinction between these two processes is more important in continuous time. There, they are almost surely equal for a continuous local martingale, but not for eg a Poisson process. (For a Poisson process, the PQV is deterministic, indeed linear, while the (R)QV is almost surely equal to the Poisson process itself.) In the discrete time setting, the regular quadratic variation is not relevant very often, while the predictable quadratic variation is useful, precisely because of this decomposition.

Whenever we have random variables which we then centre, there is a standard trick to apply when treating their variance. That is

$A_{n+1}-A_n= \mathbb{E}\left[ M^2_{n+1}-M^2_n \,\big|\, \mathcal{F}_n\right]$
$= \mathbb{E}\left[ M^2_{n+1}\,\big|\, \mathcal{F}_n\right] - 2M_n^2 +M_n^2$
$= \mathbb{E}\left[ M^2_{n+1}\,\big|\, \mathcal{F}_n\right] - 2M_n \mathbb{E}\left[ M_{n+1}\,\big|\, \mathcal{F}_n\right] + M_n^2$
$= \mathbb{E}\left[ \left(M_{n+1}-M_n\right)^2\,\big|\, \mathcal{F}_n\right].$

One consequence is seen by taking an ‘overall’ expectation. Because $M_n^2-A_n$ is a martingale,

$\mathbb{E}\left[M_n^2\right] = \mathbb{E}\left[A_n\right] = \mathbb{E}\left[M_0^2\right] + \sum_{k=0}^{n-1} \mathbb{E}\left[A_{k+1}-A_k\right]$
$= \mathbb{E}\left[ M_0^2\right] + \sum_{k=0}^{n-1}\mathbb{E}\left[ \left(M_{k+1}-M_k\right)^2 \right].$ (**)

This additive (Pythagorean) property of the square of a martingale is useful in applications where there is reasonably good control on each increment separately.

We can also see this final property without the Doob-Meyer decomposition. For a martingale it is not the case that the increments on disjoint intervals are independent. However, following Williams 12.1 [1], disjoint intervals are orthogonal, in the sense that

$\mathbb{E}\left[(M_t-M_s)(M_v-M_u)\right]=0,$

whenever $s\le t\le u\le v$. Then, when we square the expression $M_n=M_0+\sum M_{k+1}-M_k$, and take expectations, all the cross terms vanish, leaving precisely (*).

References

[1] Williams – Probability with Martingales

I also followed the notes I made in 2011/12 while attending Perla Sousi’s course on Advanced Probability, and Arnab Sen’s subsequent course on Stochastic Calculus, though I can’t find any evidence online for the latter now.

# Working towards the Ito Isometry

We have defined the stochastic integral $(H\cdot A)_t$ for a finite variation, continuous adapted process A, and H a previsible process. We want to extend this to integrals with respect to continuous local martingales. Analogous to defining the Lebesgue integral by extending linearly from indicator functions, we define integrals of simple processes with respect to martingales. A simple process is a disjointly (time-)supported linear combination of the canonical previsible process $Z_s1_{(s,t]}$. It will be convenient to demand that the variables of integration are in $\mathcal{M}^2$. Then the space $\mathcal{M}^2$ is preserved under this integral operation, that is $H\cdot M\in\mathcal{M}^2$, and in particular

$\mathbb{E}(H\cdot M)_\infty^2\leq ||H||_\infty^2\mathbb{E}(M_\infty-M_0)^2$

By Doob’s $L^2$ inequality, $X\in\mathcal{M}^2\Rightarrow ||\sup_t|X_t|||_2<\infty$, so it makes sense to define $\mathcal{C}^2$ as the set of cadlag adapted processes s.t. $|||X|||:=||\sup_t|X_t|||<\infty$. Of particular interest is that $\mathcal{C}^2$ is complete, by finding limits for subsequences and lifting to the whole sequence by Fatou.

We aim to construct integrals over $\mathcal{M}_{c,loc}$ by taking limits of integrals over $\mathcal{M}_c^2$. For this space, we have that $M^2-[M]$ is not just $\in\mathcal{M}_{c,loc}$ but in fact a UI true martingale. Then, as a natural extension of the orthogonality trick (remembering that $M^2-[M]\in\mathcal{M}_{c,loc}$), we obtain

$\mathbb{E}[M]_\infty=\mathbb{E}(M_\infty-M_0)^2$

We then define for $M\in\mathcal{M}_c^2$ the norm $||\cdot||_M$ by $||H||_M^2:=\mathbb{E}\left(\int_0^tH_s^2d[M]_s\right)$ with $L^2(M)$ the space of H s.t. that this is finite. Then for $H\in \mathcal{S}$, we have the Ito isometry:

$I: (L^2(M),||\cdot||_M)\rightarrow(M_c^2,||\cdot||)$ with $I(H)=H\cdot M$

Now, we also have S dense in $L^2(M)$. Why? We know linear combinations of $1_A, A\in\mathcal{P}$ are dense, so it STP $A\in\mathcal{P}\Rightarrow A\in\bar{S}$. Use a typical Dynkin’s lemma argument on $\mathcal{A}=\{A\in\mathcal{P}:1_A\in \bar{S}\}$, which contains $\pi$-system generating $\mathcal{P}$. So extend to $L^2(M)$ generally. We recover the original motivating result that $[H\cdot M]=H^2\cdot[M]$. Take a sequence of stopping times reducing both H and M to force boundedness of integrand, and $M\in\mathcal{M}_c^2$. Stopping the integral is equivalent to stopping the integrand, and checking limits in stopping times allows us to lift the Ito Isometry result to this one about quadratic variations.

# Martingale Convergence

I continue the theme of explaining important bookwork from the Part III course, Advanced Probability, as succinctly as possible. In this post, we consider the convergence properties of discrete time martingales.

1) Theorem: Assume $X$ is a martingale bounded in $L^1$. Then

$X_n\rightarrow X_\infty\in L^1(\mathcal{F}_\infty)$ a.s.

Remark: The theorem and proof works equally well for $X$ a supermartingale.

Proof: Essentially, we want to reduce convergence of the random variables which make up the martingale to a countable collection of events. We do this by considering upcrossings, which counts the number of times the process alternates from less than a given interval to greater than a given interval. The formal definition will be too wide for this format, so we summarise as

$N_n([a,b],X)=$ the number of disjoint time intervals up to time n in which X goes from $[-\infty a)$ to $(b,\infty]$. Define $N([a,b],X)$ to be the limit as n increases to infinity.

It is a genuinely easy check that a sequence converges (possibly to $\infty$) iff this number of upcrossings of any interval with rational bounds is finite. We will show that the martingale almost surely has this property. The key lemma is a bound due to Doob:

Lemma: $(b-a)\mathbb{E}[N_n([a,b],X)]\leq \mathbb{E}[(X_n-a)^-]$

Proof: Say $S_1 are the successive hitting times of $[-\infty,a),(b,\infty]$ respectively. So $N_n=\inf\{k:T_k\leq n\}$. We decompose, abbreviating the number of upcrossings as N.

$\sum_{k=1}^n(X_{T_k\wedge n}-X_{S_k\wedge n})=\sum_{k=1}^N(X_{T_k}-X_{S_k})+(X_n-X_{S_{N+1}})1_{\{S_{N+1}\leq n\}}$

Now take an expectation of both sides, applying the Optional Stopping Theorem to the bounded stopping times on the LHS. (If we are working with a supermartingale, formally we need to take $\mathbb{E}[\cdot|\mathcal{F}_{S_k}]$ of each summand on LHS to show that they are non-negative, and so taking a further expectation over $\mathcal{F}_{S_k}$ gives the required result.) We obtain:

$0\geq (b-a)\mathbb{E}N-\mathbb{E}(X_n-X_{S_{N+1}})1_{\{S_{N+1}\leq n\}}$

If $S_{N+1}>n$ then both $1_{\{S_{N+1}\leq n\}}=(X_n-a)^-=0$. Otherwise $(X_n-a)^-\geq X_{S_{N+1}}-X_n$. This complete the proof of the lemma.

Since $\mathbb{E}[(X_n-a)^-]\leq \mathbb{E}|X_n|+a<\infty$, where this last bound is uniform in by assumption, applying monotone convergence, we get that $N([a,b],X)$ is almost surely finite for every pair $a. Because this set is countable, we can deduce that this holds almost surely for every pair simultaneously. We therefore define $X_\infty(w)=\lim X_n(w)$ when this limit exists, and 0 otherwise. With probability one the limit exists. Fatou’s lemma confirms that $X_\infty\in L^1(\mathcal{F}_\infty)$.

2) We often want to have convergence in $L^1$ as well. Recall for Part II Probability and Measure (or elsewhere) that

UI + Convergence almost surely is necessary and sufficient for convergence in $L^1$.

This applies equally well to this situation. Note that for a martingale, this condition is often convenient, because, for example, we know that the set $\{\mathbb{E}[X_\infty|\mathcal{F}_n],n\}$ is UI for any integrable $X_\infty$.

3) Convergence in $L^p$ is easier to guarantee.

Theorem: i) X a martingale bounded in $L^p$ iff ii) $X_n\rightarrow X_\infty$ in $L^p$ and almost surely iff iii) $\exists Z\in L^p$ s.t. $X_n=\mathbb{E}[Z|\mathcal{F}_n]$ a.s.

Remark: As part of the proof, we will show, as expected, that $X_\infty,Z$ are the same.

Proof: i)->ii) Almost sure convergence follows from the above result applied to the p-th power process. We apply Doob’s inequality about running maxima in a martingale process:

$||X_n^*||_p:=||\sup_{m\leq n}X_m||_p\leq \frac{p}{p-1}||X_n||_p$

Using this, we see that $X_n^*\uparrow X_\infty^*:=\sup|X_k|$. Now consider $|X_n-X_\infty|\leq 2X_\infty^*\in L^p$ and use Dominated Convergence to confirm convergence in $L^p$.

Note that Doob’s $L^p$ inequality can be proven using the same author’s Maximal inequality and Holder.

ii)->iii) As we suspected, we show $Z=X_\infty$ is suitable.

$||X_n-\mathbb{E}[X_\infty|\mathcal{F}_n]||_p\stackrel{\text{large }m}{=}||\mathbb{E}[X_m-X_\infty|\mathcal{F}_n]||_p\stackrel{\text{Jensen}}{\leq}||X_m-X_\infty||_p\rightarrow 0$

iii)->i) is easy. Z bounded in $L^p$ implies X bounded by a simple application of the triangle inequality in the definition of conditional expectation.