# Doob inequalities and Doob-Meyer decomposition

The first post I wrote on this blog was about martingales, way back in 2012 at a time when I had known what a martingale was for about a month. I now don’t have this excuse. So I’m going to write about a couple of properties of (discrete-time) martingales that came up while adjusting a proof which my thesis examiners suggested could be made much shorter as part of their corrections.

Doob’s submartingale inequality

When we prove that some sequence of processes converges to some other process, we typically want to show that this holds in some sense uniformly over a time-interval, rather than just at some fixed time. We don’t lose much at this level of vagueness by taking the limit process to be identically zero. Then, if the convergent processes are martingales or closely similar, we want to be able to bound $\sup_{k\le n} |Z_k|$ in some sense.

Doob’s submartingale inequality allows us to do this. Recall that a submartingale has almost-surely non-negative conditional increments. You might think of it heuristically as ‘more increasing than a martingale’. If $Z_n$ is a martingale, then $|Z_n|$ is a submartingale. This will be useful almost immediately.

The statement is that for $(Z_n)$ a non-negative submartingale,

$\mathbb{P}\left( \sup_{k\le n} Z_k \ge \lambda\right) \le \frac{\mathbb{E}\left[Z_n\right]}{\lambda}.$

The similarity of the statement to the statement of Markov’s inequality is no accident. Indeed the proof is very similar. We consider whether the event in question happens, and find lower bounds on the expectation of $Z_n$ under both possibilities.

Formally, for ease of notation, let $Z_n^*$ be the running maximum $\sup_{k\le n}Z_k$. Then, we let $T:= n\wedge \inf\{k\le n, M_j\ge \lambda\}$ and apply the optional stopping theorem for submartingales at T, which is by construction at most n. That is

$\mathbb{E}[Z_n]\ge \mathbb{E}[Z_T]=\mathbb{E}\left[Z_T\mathbf{1}_{Z_n^*<\lambda}\right] + \mathbb{E}\left[Z_T \mathbf{1}_{Z_n^*\ge \lambda}\right].$

The first of these summands is positive, and the second is at least $\lambda \mathbb{P}\left( Z_N^* \ge \lambda \right)$, from which the result follows.

We’ve already said that for any martingale $Z_n$, $|Z_n|$ is a submartingale, but in fact $f(Z_n)$ is a submartingale whenever f is convex, and $\mathbb{E}|f(Z_n)|<\infty$ for each n. Naturally, this continues to hold when $Z_n$ is itself a submartingale.

[Note that $Z_n^*$ is also a submartingale, but this probably isn’t as interesting.]

A particularly relevant such function f is $f(x)=x^p$, for p>1. If we take $Z_n$ a non-negative submartingale which is uniformly bounded in $L^p$, then by applying Holder’s inequality and this submartingale inequality, we obtain

$\mathbb{E}\left( \sup_{k\le n}Z_n^p \right) \le \left(\frac{p}{p-1}\right)^p \mathbb{E}\left[ Z_n^p \right].$

Since $Z_n^p$ is a submartingale, then a limit in n on the RHS is monotone, and certainly a limit in n on the LHS is monotone, so we can extend to

$\mathbb{E}\left( \sup_{k\le n}Z_\infty^p \right) \le \left(\frac{p}{1-p}\right)^p \mathbb{E}\left[ Z_\infty^p \right].$

Initially, we have to define $\mathbb{E}\left[ Z_\infty^p \right]$ through this limit, but in fact this result, Doob’s $L^p$ inequality, shows that $Z_\infty:= \lim Z_n$ exists almost surely as well.

Naturally, we will often apply this in the case p=2, and in the third of these three sections, we will see why it might be particularly straightforward to calculate $\mathbb{E}\left[Z_\infty^2\right].$

Remark: as in the case of Markov’s inequality, it’s hard to say much if the submartingale is not taken to be non-negative. Indeed, this effect can be seen even if the process is only defined for a single time step, for which the statement really is then Markov’s inequality.

Doob-Meyer decomposition

Unfortunately, most processes are not martingales. Given an discrete-time process $X_n$ adapted to $\mathcal{F}=(\mathcal{F}_n)$, it is a martingale if the conditional expectations of the increments are all almost surely zero. But given a general adapted process $X_n$ which is integrable (so the increments have well-defined finite expectation), we can iteratively construct a new process $M_n$, where the increments are centred versions of $X_n$‘s increments. That is,

$M_{n+1}-M_n:= X_{n+1}-X_n - \mathbb{E}\left[ X_{n+1}-X_n \,\big|\, \mathcal{F}_n\right] = X_{n+1}-\mathbb{E}\left[X_{n+1} \,\big|\, \mathcal{F}_n\right].$ (*)

Then it’s immediately clear from the definition that $M_n$ is a martingale.

There’s a temptation to tie oneself up in knots with the dependence. We might have that increments of the original process $X_n$ depend on the current value of the process. And is it necessarily clear that we can recover the current value of the original process from the current value of $M_n$? Well, this is why we demand that everything be adapted, rather than just Markov. It’s not the case that $M_n$ should be Markov, but it clearly is adapted.

Now we look at the middle expression in (*), and in particular the term we are subtracting, namely the conditional expectation. If we define, in the standard terminology, $A_0=0$ and

$A_{n+1}-A_n:= \mathbb{E}\left[ X_{n+1}-X_n \,\big|\, \mathcal{F}_n\right],$

then we have decomposed the original process $X_n$ as the sum of a martingale $M_n$, and this new process $A_n$. In particular, note that the increment $A_{n+1}-A_n$ given above is adapted to $\mathcal{F}_n$, which is a stronger condition than being adapted to $\mathcal{F}_{n+1}$ as we would expect a priori. This property of the process $(A_n)$ is called predictability (or possibly previsibility).

This decomposition $X_n=X_0+M_n+A_n$ as just defined is called the Doob-Meyer decomposition, and there is a unique such decomposition where $M_n$ is a martingale, and $A_n$ is predictable. The proof of uniqueness is very straightforward. We look at the equalities given above as definitions of $M_n,A_n$, but then work in the opposite direction to show that they must hold if the decomposition holds.

I feel a final heuristic is worthwhile, using the term drift, more normally encountered in the continuous-time setting to describe infinitissimal expected increments. The increments of $A_n$ represent the drift of $X_n$, and the increments of $M_n$ are what remains from $X_n$ after subtracting the drift. In general, the process to be subtracted to turn a non-martingale into a martingale is called a compensator, and the existence or otherwise of such processes is important but challenging for some classes of continuous-time processes.

In particular, note that when $X_n$ is itself a martingale, then $A_n\equiv 0$. However, probably the most useful case is when $X_n$ is a submartingale, as then the drift is always non-negative, and so $A_n$ is almost surely increasing. The converse holds too.

This is relevant because this Doob-Meyer decomposition is obviously only a useful tool for treating $X_n$ if we can handle the two processes $M_n,A_n$ easily. We have tools to bound the martingale term, but this previsible term might in general be tricky, and so the case where $X_n$ is a submartingale is good, as increasing processes are much easier than general processes, since bounding the whole process might involve only bounding the final term in many contexts.

A particularly relevant example is the square of a martingale, that is $X_n=M_n^2$, where $M_n$ is a martingale. By the convexity condition discussed earlier, $X_n$ is a submartingale (provided it is integrable, ie $M_n$ is square-integrable), and so the process $A_n$ in its Doob-Meyer decomposition is increasing. This is often called the (predictable) quadratic variation of $(X_n)$.

This predictable quadratic variation is sometimes denoted $\langle X_n\rangle$. This differs from the (regular) quadratic variation which is defined as the sum of the squares of the increments, that is $[X_n]:= \sum_{k=0}^{n-1} (X_{k+1}-X_k)^2$. Note that this is adapted, but obviously not previsible. The distinction between these two processes is more important in continuous time. There, they are almost surely equal for a continuous local martingale, but not for eg a Poisson process. (For a Poisson process, the PQV is deterministic, indeed linear, while the (R)QV is almost surely equal to the Poisson process itself.) In the discrete time setting, the regular quadratic variation is not relevant very often, while the predictable quadratic variation is useful, precisely because of this decomposition.

Whenever we have random variables which we then centre, there is a standard trick to apply when treating their variance. That is

$A_{n+1}-A_n= \mathbb{E}\left[ M^2_{n+1}-M^2_n \,\big|\, \mathcal{F}_n\right]$
$= \mathbb{E}\left[ M^2_{n+1}\,\big|\, \mathcal{F}_n\right] - 2M_n^2 +M_n^2$
$= \mathbb{E}\left[ M^2_{n+1}\,\big|\, \mathcal{F}_n\right] - 2M_n \mathbb{E}\left[ M_{n+1}\,\big|\, \mathcal{F}_n\right] + M_n^2$
$= \mathbb{E}\left[ \left(M_{n+1}-M_n\right)^2\,\big|\, \mathcal{F}_n\right].$

One consequence is seen by taking an ‘overall’ expectation. Because $M_n^2-A_n$ is a martingale,

$\mathbb{E}\left[M_n^2\right] = \mathbb{E}\left[A_n\right] = \mathbb{E}\left[M_0^2\right] + \sum_{k=0}^{n-1} \mathbb{E}\left[A_{k+1}-A_k\right]$
$= \mathbb{E}\left[ M_0^2\right] + \sum_{k=0}^{n-1}\mathbb{E}\left[ \left(M_{k+1}-M_k\right)^2 \right].$ (**)

This additive (Pythagorean) property of the square of a martingale is useful in applications where there is reasonably good control on each increment separately.

We can also see this final property without the Doob-Meyer decomposition. For a martingale it is not the case that the increments on disjoint intervals are independent. However, following Williams 12.1 [1], disjoint intervals are orthogonal, in the sense that

$\mathbb{E}\left[(M_t-M_s)(M_v-M_u)\right]=0,$

whenever $s\le t\le u\le v$. Then, when we square the expression $M_n=M_0+\sum M_{k+1}-M_k$, and take expectations, all the cross terms vanish, leaving precisely (*).

References

[1] Williams – Probability with Martingales

I also followed the notes I made in 2011/12 while attending Perla Sousi’s course on Advanced Probability, and Arnab Sen’s subsequent course on Stochastic Calculus, though I can’t find any evidence online for the latter now.

# Increments of Random Partitions

The following is problem 2.1.4. from Combinatorial Stochastic Processes:

Let $X_i$ be the indicator of the event that i the least element of some block of an exchangeable random partition $\Pi_n$ of [n]. Show that the joint law of the $(X_i,1\leq i\leq n)$ determines the law of $\Pi_n$.

As Pitman says, this is a result by Serban Nacu, the paper for which can be found here. In this post I’m going to explain what an exchangeable random partition is, how to prove the result, and a couple of consequences.

The starting point is the question ‘what is an exchangeable random partition?’ The most confusing aspect is that there are multiple definitions depending on whether the blocks of the partition are sets or just integers corresponding to a size. Eg, {1,2,4} u {3} is a partition of [4], corresponding to the partition 3+1 of 4. Obviously one induces the other, and in an exchangeable setting the laws of one may determine the laws of the other.

In the second case, we assume 3+1 is the same partition as 1+3. If order does matter then we call it a composition instead. This gets a bit annoying for set partitions, as we don’t want these to be ordered either. But if we want actually to talk about the sets in question we have to give them labels, which becomes an ordering, so we need some canonical way to assign these labels. Typically we will say $\Pi_n=\{A_1,\ldots,A_k\}$, where the curly brackets indicate that we don’t care about order, and we choose the labels by order of appearance, so by increasing order of least elements.

We say that a random partition $\Pi_n$ of [n] is exchangeable if its distribution is invariant the action on partitions induced by the symmetric group. That is, relabelling doesn’t change probabilities. We can express this functionally by saying

$\mathbb{P}(\Pi_n=\{A_1,\ldots,A_k\})=p(|A_1|,\ldots,|A_k|),$

for p a symmetric function. This function is then called the exchangeable partition probability function (EPPF) by Pitman.

Consider a partition of 4 into sets of sizes 3 and 1. There is a danger that this definition looks like it might be saying that the probability that A_1 is the set of size 3 is the same as the probability that A_1 is the set of size 1. This would be a problem because we expect to see some size-biasing to the labelling. Larger sets are more likely to contain small elements, merely because they contain more elements. Fortunately the definition is not broken after all. The statement above makes no reference to the probabilities of seeing various sizes for A_1 etc. For that, we would have to sum over all partitions with that property. It merely says that the partitions:

$\{1,2,3\}\cup\{4\},\quad \{1,2,4\}\cup\{3\},\quad\{1,3,4\}\cup\{2\},\quad \{2,3,4\}\cup\{1\}$

have respective probabilities:

$p(3,1),\quad p(3,1),\quad p(3,1),\quad p(1,3),$

and furthermore these are equal.

Anyway, now let’s turn to the problem. The key idea is that we want to be looking at strings of 0s and 1s that can only arise in one way. For example, the string 10…01 can only arise corresponding to the partitions {1,2,…,n-1} u {n} and {1,2,…,n-2,n} u {n-1}. So now we know p(n-1,1) and so also p(1,n-1). Furthermore, note that 10…0 and 11…1 give the probabilities of 1 block of size n and n blocks of size 1 respectively at once.

So then the string 10…010 can only arise from partitions {1,2,…,n-2,n} u {n-1} or {1,2,…,n-2} u {n-1,n}. We can calculate the probability that it came from the former using the previously found value of p(n-1,1) and a combinatorial weighting, so the remaining probability is given by p(2,n-2). Keep going. It is clear what ‘keep going’ means in the case of p(a,b) but for partitions with more than two blocks it seems a bit more complicated.

Let’s fix k the number of blocks in partitions under consideration, and start talking about compositions, that is $a_1+\ldots+a_k=n$. The problem we might face in trying to generalise the previous argument is that potentially lots of compositions might generate the same sequence of 0s and 1s, so the ‘first time’ we consider a composition might be the same for more than one composition. Trying it out in the case k=3 makes it clear that this is not going to happen, but we need some partial ordering structure to explain why this is the case.

Recall that a composition with k blocks is a sequence $a=(a_1,\ldots,a_k)$ which sums to n. Let’s say a majorizes b if all its partial sums are at least as large. That is $a_1+\ldots+a_l\geq b_1+\ldots+b_l$ for all $1\leq l \leq k$. We say this is strict if at least one of the inequalities is strict. It is not hard to see that if a majorizes b then this is strict unless a = b.

Since we don’t care about ordering, we assume for now that all compositions are arranged in non-increasing order. So we find a partition corresponding to some such composition $a_1,\ldots,a_k$. The partition is:

$\{1,\ldots,a_1\}\cup\{a_1+1,\ldots,a_1+a_2\}\cup\{a_1+a_2+1,\ldots,a_1+a_2+a_3\}\cup\ldots\cup\{n-a_k,\ldots,n\}.$

This generates a sequence of 0s and 1s as describe above, with $a_i-1$ 0s between the i’th 1 and the (i+1)th 1. The claim is that given some composition which admits a partition with this same corresponding sequence, that composition must majorize a. Proof by induction on l. So in fact we can prove Nacu’s result inductively down the partial ordering described. We know the probability of the sequence of 0s and 1s corresponding to the partition of [n] described by assumption. We know the probability of any partition corresponding to a composition which majorizes a by induction, and we know how many partitions with this sequence each such composition generates. Combining all of this, we can find the probability corresponding to a.

Actually I’m not going to say much about consequences of this except to paraphrase very briefly what Nacu says in the paper. One of the neat consequences of this result is that it allows us to prove in a fairly straightforward way that the only infinite family of exchangeable random partitions with independent increments is the so-called Chinese Restaurant process.

Instead of attempting to prove this, I will explain what all the bits mean. First, the Chinese Restaurant process is the main topic of the next chapter of the book, so I won’t say any more about it right now, except that its definition is almost exact what is required to make this particular result true.

We can’t extend the definition of exchangeable to infinite partitions immediately, because considering invariance under the symmetric group on the integers is not very nice, in particular because there’s a danger all the probabilities will end up being zero. Instead, we consider restrictions of the partition to $[n]\subset\mathbb{N}$, and demand that these nest appropriately, and are exchangeable.

Independent increments is a meaningful thing to consider since one way to construct a partition, infinite or otherwise, is to consider elements one at a time in the standard ordering, either adding the new element to an already present block, or starting block. Since 0 or 1 in the increment sequence corresponds precisely to these events, it is meaningful to talk about independent increments.

# Subordinators and the Arcsine rule

After the general discussion of Levy processes in the previous post, we now discuss a particular class of such processes. The majority of content and notation below is taken from chapters 1-3 of Jean Bertoin’s Saint-Flour notes.

We say $X_t$ is a subordinator if:

• It is a right-continuous adapted stochastic process, started from 0.
• It has stationary, independent increments.
• It is increasing.

Note that the first two conditions are precisely those required for a Levy process. We could also allow the process to take the value $\infty$, where the hitting time of infinity represents ‘killing’ the subordinator in some sense. If this hitting time is almost surely infinite, we say it is a strict subordinator. There is little to be gained right now from considering anything other than strict subordinators.

Examples

• A compound Poisson process, with finite jump measure supported on $[0,\infty)$. Hereafter we exclude this case, as it is better dealt with in other languages.
• A so-called stable Levy process, where $\Phi(\lambda)=\lambda^\alpha$, for some $\alpha\in(0,1)$. (I’ll define $\Phi$ very soon.) Note that checking that the sample paths are increasing requires only that $X_1\geq 0$ almost surely.
• The hitting time process for Brownian Motion. Note that this does indeed have jumps as we would need. (This has $\Phi(\lambda)=\sqrt{2\lambda}$.)

Properties

• In general, we describe Levy processes by their characteristic exponent. As a subordinator takes values in $[0,\infty)$, we can use the Laplace exponent instead:

$\mathbb{E}\exp(-\lambda X_t)=:\exp(-t\Phi(\lambda)).$

• We can refine the Levy-Khintchine formula;

$\Phi(\lambda)=k+d\lambda+\int_{[0,\infty)}(1-e^{-\lambda x})\Pi(dx),$

• where k is the kill rate (in the non-strict case). Because the process is increasing, it must have bounded variation, and so the quadratic part vanishes, and we have a stronger condition on the Levy measure: $\int(1\wedge x)\Pi(dx)<\infty$.
• The expression $\bar{\Pi}(x):=k+\Pi((x,\infty))$ for the tail of the Levy measure is often more useful in this setting.
• We can think of this decomposition as the sum of a drift, and a PPP with characteristic measure $\Pi+k\delta_\infty$. As we said above, we do not want to consider the case that X is a step process, so either d>0 or $\Pi((0,\infty))=\infty$ is enough to ensure this.

Analytic Methods

We give a snapshot of a couple of observations which make these nice to work with. Define the renewal measure U(dx) by:

$\int_{[0,\infty)}f(x)U(dx)=\mathbb{E}\left(\int_0^\infty f(X_t)dt\right).$

If we want to know the distribution function of this U, it will suffice to consider the indicator function $f(x)=1_{X_t\leq x}$ in the above.

The reason to exclude step processes specifically is to ensure that X has a continuous inverse:

$L_x=\sup\{t\geq 0:X_t\leq x\}$ so $U(x)=\mathbb{E}L_x$ is continuous.

In fact, this renewal measure characterises the subordinator uniquely, as we see by taking the Laplace transform:

$\mathcal{L}U(\lambda)=\int_{[0,\infty)}e^{-\lambda x}U(dx)=\mathbb{E}\int e^{-\lambda X_t}dt$

$=\int \mathbb{E}e^{-\lambda X_t}dt=\int\exp(-t\Phi(\lambda))dt=\frac{1}{\Phi(\lambda)}.$

The Arcsine Law

X is Markov, which induces a so-called regenerative property on the range of X, $\mathcal{R}$. Formally, given s, we do not always have $s\in\mathcal{R}$ (as the process might jump over s), but we can define $D_s=\inf\{t>s:t\in\mathcal{R}\}$. Then

$\{v\geq 0:v+D_s\in\mathcal{R}\}\stackrel{d}{=}\mathcal{R}.$

In fact, the converse holds as well. Any random set with this regenerative property is the range of some subordinator. Note that $D_s$ is some kind of dual to X, since it is increasing, and the regenerative property induces some Markovian properties.

In particular, we consider the last passage time $g_t=\sup\{s, in the case of a stable subordinator with $\Phi(\lambda)=\lambda^\alpha$. Here, $\mathcal{R}$ is self-similar with scaling exponent $\alpha$. The distribution of $\frac{g_t}{t}$ is thus independent of t. In this situation, we can derive the generalised arcsine rule for the distribution of $g_1$:

$\mathbb{R}(g_1\in ds)=\frac{\sin \alpha\pi}{\pi}s^{\alpha-1}(1-s)^{-\alpha}ds.$

The most natural application of this is to the hitting time process of Brownian Motion, which is stable with $\alpha=\frac12$. Then $g_1=S_1-B_1$, in the usual notation for the supremum process. Furthermore, we have equality in distribution of the processes (see previous posts on excursion theory and the short aside which follows):

$(S_t-B_t)_{t\geq 0}\stackrel{d}{=}(|B_t|)_{t\geq 0}.$

So $g_1$ gives the time of the last zero of BM before time 1, and the arcsine law shows that its distribution is given by:

$\mathbb{P}(g_1\leq t)=\frac{2}{\pi}\text{arcsin}\sqrt{t}.$

# The Levy-Khintchine Formula

Because of a string of coincidences involving my choice of courses for Part III and various lecturers’ choices about course content, I didn’t learn what a Levy process until a few weeks’ ago. Trying to get my head around the Levy-Khintchine formula took a little while, so the following is what I would have liked to have been able to find back then.

A Levy process is an adapted stochastic process started from 0 at time zero, and with stationary, independent increments. This is reminiscent, indeed a generalisation, of the definition of Brownian motion. In that case, we were able to give a concrete description of the distribution of $X_1$. For a general Levy process, we have

$X_1=X_{1/n}+(X_{2/n}-X_{1/n})+\ldots+(X_1-X_{1-1/n}).$

So the distribution of $X_1$ is infinitely divisible, that is, can be expressed as the distribution of the sum n iid random variables for all n. Viewing this definition in terms of convolutions of distributions may be more helpful, especially as we will subsequently consider characteristic functions. If this is the first time you have seen this property, note that it is not a universal property. For example, it is not clear how to write a U[0,1] random variable as a convolution of two iid RVs. Note that exactly the same argument suffices to show that the distribution of $X_t$ is infinitely divisible.

It will be most convenient to work with the characteristic functions

$\mathbb{E}\exp(i\langle \lambda,X_t\rangle).$

By stationarity of increments, we can show that this is equal to

$\exp(-\Psi(\lambda)t)\quad\text{where}\quad \mathbb{E}\exp(i\langle \lambda,X_1\rangle)=:\exp(-\Psi(\lambda)).$

This function $\Psi(\lambda)$ is called the characteristic exponent. The argument resembles that used for Cauchy’s functional equations, by dealing first with the rationals using stationarity of increments, then lifting to the reals by the (right-)continuity of

$t\mapsto \mathbb{E}\exp(i\langle \lambda,X_t\rangle).$

As ever, $\Psi(\lambda)$ uniquely determines the distribution of $X_1$, and so it also uniquely determines the distribution of Levy process. The only condition on $\Psi$ is that it be the characteristic function of an infinitely divisible distribution. This condition is given explicitly by the Levy-Khintchine formula.

Levy-Khintchine

$\Psi(\lambda)$ is the characteristic function of an infinitely divisible distribution iff

$\Psi(\lambda)=i\langle a,\lambda\rangle +\frac12 Q(\lambda)+\int_{\mathbb{R}^d}(1-e^{i\langle \lambda,x\rangle}+i\langle \lambda,x\rangle 1_{|x|<1})\Pi(dx).$

for $a\in\mathbb{R}^d$, Q a quadratic form on $\mathbb{R}^d$, and $\Pi$ a so-called Levy measure satisfying $\int (1\wedge |x|^2)\Pi(dx)<\infty$.

This looks a bit arbitrary, so first let’s explain what each of these terms ‘means’.

• $i\langle a,\lambda\rangle$ comes from a drift of $-a$. Note that a deterministic linear function is a (not especially interesting) Levy process.
• $\frac12Q(\lambda)$ comes from a Brownian part $\sqrt{Q}B_t$.

The rest corresponds to the jump part of the process. Note that a Poisson process is an example of a Levy process, hence why we might consider thinking about jumps in the first place. The reason why there is an indicator function floating around is that we have to think about two regimes separately, namely large and small jumps. Jumps of size bounded below cannot happen too often as otherwise the process might explode off to infinity in finite time with positive probability. On the other hand, infinitesimally small jumps can happen very often (say on a dense set) so long as everything is controlled to prevent an explosion on the macroscopic scale.

There is no canonical choice for where the divide between these regimes happens, but conventionally this is taken to be at $|x|=1$. The restriction on the Levy measure near 0 ensures that the sum of the squares all jumps up some finite time converges absolutely.

• $\Pi\cdot 1_{|x|\geq 1}$ gives the intensity of a standard compound Poisson process. The jumps are well-spaced, and so it is a relatively simple calculation to see that the characteristic function is

$\int_{\mathbb{R}^d}(1-e^{i\langle \lambda,x\rangle})1_{|x|\geq 1}\Pi(dx).$

The intensity $\Pi\cdot 1_{|x|<1}$ gives infinitely many hits in finite time, so if the expectation of this measure is not 0, we explode immediately. We compensate by drifting away from this at rate

$\int_{\mathbb{R}^d}x1_{|x|<1}\Pi(dx).$

To make this more rigorous, we should really consider $1_{\epsilon<|x|<1}$ then take a limit, but this at least explains where all the terms come from. Linearity allows us to interchange integrals and inner products, to get the term

$\int_{\mathbb{R}^d}(1-e^{-i\langle \lambda,x\rangle}+i\langle\lambda,x\rangle 1_{|x|<1})\Pi(dx).$

If the process has bounded variation, then we must have Q=0, and also

$\int (1\wedge |x|)\Pi(dx)<\infty,$

that is, not too many jumps on an |x| scale. In this case, then this drift component is well-defined and linear $\lambda$, so can be incorporated with the drift term at the beginning of the Levy-Khintchine expression. If not, then there are some $\lambda$ for which it does not exist.

There are some other things to be said about Levy processes, including

• Stable Levy processes, where $\Psi(k\lambda)=k^\alpha \Psi(\lambda)$, which induces the rescaling-invariance property: $k^{-1/\alpha}X_{kt}\stackrel{d}{=}X$. The distribution of each $X_t$ is then also a stable distribution.
• Resolvents, where instead of working with the process itself, we work with the distribution of the process at a random exponential time.

# The Poisson Process – A Third Characteristion

There remains the matter of the distribution of the number of people to arrive in a fixed non-infinitissimal time interval. Consider the time interval [0,1], which we divide into n smaller intervals of equal width. As n grows large enough that we know the probability that two arrivals occur in the same interval tends to zero (as this is $\leq no(\frac{1}{n})$), we can consider this as a sequence of iid Bernoulli random variables as before. So

$\mathbb{P}(N_1=k)=\binom{n}{k}(\frac{\lambda}{n})^k(1-\frac{\lambda}{n})^{n-k}$
$\approx \frac{n^k}{k!} \frac{\lambda^k}{n^k}(1-\frac{\lambda}{n})^n\approx \frac{\lambda^k}{k!}e^{-\lambda}.$

We recognise this as belonging to a Poisson (hence the name of the process!) random variable. We can repeat this for a general time interval and obtain $N_t\sim \text{Po}(\lambda t)$.

Note that we implicitly assumed that, in the infinitissimal case at least, behaviour in disjoint intervals was independent. We would hope that this would lift immediately to the large intervals, but it is not immediately obvious how to make this work. This property of independent increments is one of the key definitions of a Levy Process, of which the Poisson process is one of the two canonical examples (the other is Brownian Motion).

As before, if we can show that the implication goes both ways (and for this case it is not hard – letting $t\rightarrow 0$ clearly gives the infinitissimal construction), we can prove results about Poisson random variables with ease, for example $\text{Po}(\lambda)+\text{Po}(\mu)=\text{Po}(\lambda+\mu)$.

This pretty much concludes the construction of the Poisson process. We have three characterisations:
1) $X_n\sim\text{Exp}(\lambda)$ all iid.
2) The infinitissimal construction as before, with independence.
3) The number of arrivals in a time interval of width t $\sim \text{Po}(\lambda t)$. (This is sometimes called a stationary increments property.) Furthermore, we have independent increments.

A formal derivation of the equivalence of these forms is important but technical, and so not really worth attempting here. See James Norris’s book for example for a fuller exposition.

The final remark is that the Poisson Process has the Markov property. Recall that this says that conditional on the present, future behaviour is independent of the past. Without getting into too much detail, we might like to prove this by using the independent increments property. But remember that for a continuous process, it is too much information to keep track of all the distributions at once. It is sufficient to track only the finite marginal distributions, provided the process is cadlag, which the Poisson process is, assuming we deal with the discontinuities in the right way. Alternatively, the exponential random variable is memoryless, a property that can be lifted, albeit with some technical difficulties, to show the Markov property.