# Doob inequalities and Doob-Meyer decomposition

The first post I wrote on this blog was about martingales, way back in 2012 at a time when I had known what a martingale was for about a month. I now don’t have this excuse. So I’m going to write about a couple of properties of (discrete-time) martingales that came up while adjusting a proof which my thesis examiners suggested could be made much shorter as part of their corrections.

Doob’s submartingale inequality

When we prove that some sequence of processes converges to some other process, we typically want to show that this holds in some sense uniformly over a time-interval, rather than just at some fixed time. We don’t lose much at this level of vagueness by taking the limit process to be identically zero. Then, if the convergent processes are martingales or closely similar, we want to be able to bound $\sup_{k\le n} |Z_k|$ in some sense.

Doob’s submartingale inequality allows us to do this. Recall that a submartingale has almost-surely non-negative conditional increments. You might think of it heuristically as ‘more increasing than a martingale’. If $Z_n$ is a martingale, then $|Z_n|$ is a submartingale. This will be useful almost immediately.

The statement is that for $(Z_n)$ a non-negative submartingale,

$\mathbb{P}\left( \sup_{k\le n} Z_k \ge \lambda\right) \le \frac{\mathbb{E}\left[Z_n\right]}{\lambda}.$

The similarity of the statement to the statement of Markov’s inequality is no accident. Indeed the proof is very similar. We consider whether the event in question happens, and find lower bounds on the expectation of $Z_n$ under both possibilities.

Formally, for ease of notation, let $Z_n^*$ be the running maximum $\sup_{k\le n}Z_k$. Then, we let $T:= n\wedge \inf\{k\le n, M_j\ge \lambda\}$ and apply the optional stopping theorem for submartingales at T, which is by construction at most n. That is

$\mathbb{E}[Z_n]\ge \mathbb{E}[Z_T]=\mathbb{E}\left[Z_T\mathbf{1}_{Z_n^*<\lambda}\right] + \mathbb{E}\left[Z_T \mathbf{1}_{Z_n^*\ge \lambda}\right].$

The first of these summands is positive, and the second is at least $\lambda \mathbb{P}\left( Z_N^* \ge \lambda \right)$, from which the result follows.

We’ve already said that for any martingale $Z_n$, $|Z_n|$ is a submartingale, but in fact $f(Z_n)$ is a submartingale whenever f is convex, and $\mathbb{E}|f(Z_n)|<\infty$ for each n. Naturally, this continues to hold when $Z_n$ is itself a submartingale.

[Note that $Z_n^*$ is also a submartingale, but this probably isn’t as interesting.]

A particularly relevant such function f is $f(x)=x^p$, for p>1. If we take $Z_n$ a non-negative submartingale which is uniformly bounded in $L^p$, then by applying Holder’s inequality and this submartingale inequality, we obtain

$\mathbb{E}\left( \sup_{k\le n}Z_n^p \right) \le \left(\frac{p}{p-1}\right)^p \mathbb{E}\left[ Z_n^p \right].$

Since $Z_n^p$ is a submartingale, then a limit in n on the RHS is monotone, and certainly a limit in n on the LHS is monotone, so we can extend to

$mathbb{E}\left( \sup_{k\le n}Z_\infty^p \right) \le \left(\frac{p}{1-p}\right)^p \mathbb{E}\left[ Z_\infty^p \right].$

Initially, we have to define $\mathbb{E}\left[ Z_\infty^p \right]$ through this limit, but in fact this result, Doob’s $L^p$ inequality, shows that $Z_\infty:= \lim Z_n$ exists almost surely as well.

Naturally, we will often apply this in the case p=2, and in the third of these three sections, we will see why it might be particularly straightforward to calculate $\mathbb{E}\left[Z_\infty^2\right].$

Remark: as in the case of Markov’s inequality, it’s hard to say much if the submartingale is not taken to be non-negative. Indeed, this effect can be seen even if the process is only defined for a single time step, for which the statement really is then Markov’s inequality.

Doob-Meyer decomposition

Unfortunately, most processes are not martingales. Given an discrete-time process $X_n$ adapted to $\mathcal{F}=(\mathcal{F}_n)$, it is a martingale if the conditional expectations of the increments are all almost surely zero. But given a general adapted process $X_n$ which is integrable (so the increments have well-defined finite expectation), we can iteratively construct a new process $M_n$, where the increments are centred versions of $X_n$‘s increments. That is,

$M_{n+1}-M_n:= X_{n+1}-X_n - \mathbb{E}\left[ X_{n+1}-X_n \,\big|\, \mathcal{F}_n\right] = X_{n+1}-\mathbb{E}\left[X_{n+1} \,\big|\, \mathcal{F}_n\right].$ (*)

Then it’s immediately clear from the definition that $M_n$ is a martingale.

There’s a temptation to tie oneself up in knots with the dependence. We might have that increments of the original process $X_n$ depend on the current value of the process. And is it necessarily clear that we can recover the current value of the original process from the current value of $M_n$? Well, this is why we demand that everything be adapted, rather than just Markov. It’s not the case that $M_n$ should be Markov, but it clearly is adapted.

Now we look at the middle expression in (*), and in particular the term we are subtracting, namely the conditional expectation. If we define, in the standard terminology, $A_0=0$ and

$A_{n+1}-A_n:= \mathbb{E}\left[ X_{n+1}-X_n \,\big|\, \mathcal{F}_n\right],$

then we have decomposed the original process $X_n$ as the sum of a martingale $M_n$, and this new process $A_n$. In particular, note that the increment $A_{n+1}-A_n$ given above is adapted to $\mathcal{F}_n$, which is a stronger condition than being adapted to $\mathcal{F}_{n+1}$ as we would expect a priori. This property of the process $(A_n)$ is called predictability (or possibly previsibility).

This decomposition $X_n=X_0+M_n+A_n$ as just defined is called the Doob-Meyer decomposition, and there is a unique such decomposition where $M_n$ is a martingale, and $A_n$ is predictable. The proof of uniqueness is very straightforward. We look at the equalities given above as definitions of $M_n,A_n$, but then work in the opposite direction to show that they must hold if the decomposition holds.

I feel a final heuristic is worthwhile, using the term drift, more normally encountered in the continuous-time setting to describe infinitissimal expected increments. The increments of $A_n$ represent the drift of $X_n$, and the increments of $M_n$ are what remains from $X_n$ after subtracting the drift. In general, the process to be subtracted to turn a non-martingale into a martingale is called a compensator, and the existence or otherwise of such processes is important but challenging for some classes of continuous-time processes.

In particular, note that when $X_n$ is itself a martingale, then $A_n\equiv 0$. However, probably the most useful case is when $X_n$ is a submartingale, as then the drift is always non-negative, and so $A_n$ is almost surely increasing. The converse holds too.

This is relevant because this Doob-Meyer decomposition is obviously only a useful tool for treating $X_n$ if we can handle the two processes $M_n,A_n$ easily. We have tools to bound the martingale term, but this previsible term might in general be tricky, and so the case where $X_n$ is a submartingale is good, as increasing processes are much easier than general processes, since bounding the whole process might involve only bounding the final term in many contexts.

A particularly relevant example is the square of a martingale, that is $X_n=M_n^2$, where $M_n$ is a martingale. By the convexity condition discussed earlier, $X_n$ is a submartingale (provided it is integrable, ie $M_n$ is square-integrable), and so the process $A_n$ in its Doob-Meyer decomposition is increasing. This is often called the (predictable) quadratic variation of $(X_n)$.

This predictable quadratic variation is sometimes denoted $\langle X_n\rangle$. This differs from the (regular) quadratic variation which is defined as the sum of the squares of the increments, that is $[X_n]:= \sum_{k=0}^{n-1} (X_{k+1}-X_k)^2$. Note that this is adapted, but obviously not previsible. The distinction between these two processes is more important in continuous time. There, they are almost surely equal for a continuous local martingale, but not for eg a Poisson process. (For a Poisson process, the PQV is deterministic, indeed linear, while the (R)QV is almost surely equal to the Poisson process itself.) In the discrete time setting, the regular quadratic variation is not relevant very often, while the predictable quadratic variation is useful, precisely because of this decomposition.

Whenever we have random variables which we then centre, there is a standard trick to apply when treating their variance. That is

$A_{n+1}-A_n= \mathbb{E}\left[ M^2_{n+1}-M^2_n \,\big|\, \mathcal{F}_n\right]$
$= \mathbb{E}\left[ M^2_{n+1}\,\big|\, \mathcal{F}_n\right] - 2M_n^2 +M_n^2$
$= \mathbb{E}\left[ M^2_{n+1}\,\big|\, \mathcal{F}_n\right] - 2M_n \mathbb{E}\left[ M_{n+1}\,\big|\, \mathcal{F}_n\right] + M_n^2$
$= \mathbb{E}\left[ \left(M_{n+1}-M_n\right)^2\,\big|\, \mathcal{F}_n\right].$

One consequence is seen by taking an ‘overall’ expectation. Because $M_n^2-A_n$ is a martingale,

$\mathbb{E}\left[M_n^2\right] = \mathbb{E}\left[A_n\right] = \mathbb{E}\left[M_0^2\right] + \sum_{k=0}^{n-1} \mathbb{E}\left[A_{k+1}-A_k\right]$
$= \mathbb{E}\left[ M_0^2\right] + \sum_{k=0}^{n-1}\mathbb{E}\left[ \left(M_{k+1}-M_k\right)^2 \right].$ (**)

This additive (Pythagorean) property of the square of a martingale is useful in applications where there is reasonably good control on each increment separately.

We can also see this final property without the Doob-Meyer decomposition. For a martingale it is not the case that the increments on disjoint intervals are independent. However, following Williams 12.1 [1], disjoint intervals are orthogonal, in the sense that

$\mathbb{E}\left[(M_t-M_s)(M_v-M_u)\right]=0,$

whenever $s\le t\le u\le v$. Then, when we square the expression $M_n=M_0+\sum M_{k+1}-M_k$, and take expectations, all the cross terms vanish, leaving precisely (*).

References

[1] Williams – Probability with Martingales

I also followed the notes I made in 2011/12 while attending Perla Sousi’s course on Advanced Probability, and Arnab Sen’s subsequent course on Stochastic Calculus, though I can’t find any evidence online for the latter now.