Subordinators and the Arcsine rule

After the general discussion of Levy processes in the previous post, we now discuss a particular class of such processes. The majority of content and notation below is taken from chapters 1-3 of Jean Bertoin’s Saint-Flour notes.

We say X_t is a subordinator if:

  • It is a right-continuous adapted stochastic process, started from 0.
  • It has stationary, independent increments.
  • It is increasing.

Note that the first two conditions are precisely those required for a Levy process. We could also allow the process to take the value \infty, where the hitting time of infinity represents ‘killing’ the subordinator in some sense. If this hitting time is almost surely infinite, we say it is a strict subordinator. There is little to be gained right now from considering anything other than strict subordinators.


  • A compound Poisson process, with finite jump measure supported on [0,\infty). Hereafter we exclude this case, as it is better dealt with in other languages.
  • A so-called stable Levy process, where \Phi(\lambda)=\lambda^\alpha, for some \alpha\in(0,1). (I’ll define \Phi very soon.) Note that checking that the sample paths are increasing requires only that X_1\geq 0 almost surely.
  • The hitting time process for Brownian Motion. Note that this does indeed have jumps as we would need. (This has \Phi(\lambda)=\sqrt{2\lambda}.)


  • In general, we describe Levy processes by their characteristic exponent. As a subordinator takes values in [0,\infty), we can use the Laplace exponent instead:

\mathbb{E}\exp(-\lambda X_t)=:\exp(-t\Phi(\lambda)).

  • We can refine the Levy-Khintchine formula;

\Phi(\lambda)=k+d\lambda+\int_{[0,\infty)}(1-e^{-\lambda x})\Pi(dx),

  • where k is the kill rate (in the non-strict case). Because the process is increasing, it must have bounded variation, and so the quadratic part vanishes, and we have a stronger condition on the Levy measure: \int(1\wedge x)\Pi(dx)<\infty.
  • The expression \bar{\Pi}(x):=k+\Pi((x,\infty)) for the tail of the Levy measure is often more useful in this setting.
  • We can think of this decomposition as the sum of a drift, and a PPP with characteristic measure \Pi+k\delta_\infty. As we said above, we do not want to consider the case that X is a step process, so either d>0 or \Pi((0,\infty))=\infty is enough to ensure this.

Analytic Methods

We give a snapshot of a couple of observations which make these nice to work with. Define the renewal measure U(dx) by:

\int_{[0,\infty)}f(x)U(dx)=\mathbb{E}\left(\int_0^\infty f(X_t)dt\right).

If we want to know the distribution function of this U, it will suffice to consider the indicator function f(x)=1_{X_t\leq x} in the above.

The reason to exclude step processes specifically is to ensure that X has a continuous inverse:

L_x=\sup\{t\geq 0:X_t\leq x\} so U(x)=\mathbb{E}L_x is continuous.

In fact, this renewal measure characterises the subordinator uniquely, as we see by taking the Laplace transform:

\mathcal{L}U(\lambda)=\int_{[0,\infty)}e^{-\lambda x}U(dx)=\mathbb{E}\int e^{-\lambda X_t}dt

=\int \mathbb{E}e^{-\lambda X_t}dt=\int\exp(-t\Phi(\lambda))dt=\frac{1}{\Phi(\lambda)}.

The Arcsine Law

X is Markov, which induces a so-called regenerative property on the range of X, \mathcal{R}. Formally, given s, we do not always have s\in\mathcal{R} (as the process might jump over s), but we can define D_s=\inf\{t>s:t\in\mathcal{R}\}. Then

\{v\geq 0:v+D_s\in\mathcal{R}\}\stackrel{d}{=}\mathcal{R}.

In fact, the converse holds as well. Any random set with this regenerative property is the range of some subordinator. Note that D_s is some kind of dual to X, since it is increasing, and the regenerative property induces some Markovian properties.

In particular, we consider the last passage time g_t=\sup\{s<t:s\in\mathcal{R}\}, in the case of a stable subordinator with \Phi(\lambda)=\lambda^\alpha. Here, \mathcal{R} is self-similar with scaling exponent \alpha. The distribution of \frac{g_t}{t} is thus independent of t. In this situation, we can derive the generalised arcsine rule for the distribution of g_1:

\mathbb{R}(g_1\in ds)=\frac{\sin \alpha\pi}{\pi}s^{\alpha-1}(1-s)^{-\alpha}ds.

The most natural application of this is to the hitting time process of Brownian Motion, which is stable with \alpha=\frac12. Then g_1=S_1-B_1, in the usual notation for the supremum process. Furthermore, we have equality in distribution of the processes (see previous posts on excursion theory and the short aside which follows):

(S_t-B_t)_{t\geq 0}\stackrel{d}{=}(|B_t|)_{t\geq 0}.

So g_1 gives the time of the last zero of BM before time 1, and the arcsine law shows that its distribution is given by:

\mathbb{P}(g_1\leq t)=\frac{2}{\pi}\text{arcsin}\sqrt{t}.

The Levy-Khintchine Formula

Because of a string of coincidences involving my choice of courses for Part III and various lecturers’ choices about course content, I didn’t learn what a Levy process until a few weeks’ ago. Trying to get my head around the Levy-Khintchine formula took a little while, so the following is what I would have liked to have been able to find back then.

A Levy process is an adapted stochastic process started from 0 at time zero, and with stationary, independent increments. This is reminiscent, indeed a generalisation, of the definition of Brownian motion. In that case, we were able to give a concrete description of the distribution of X_1. For a general Levy process, we have


So the distribution of X_1 is infinitely divisible, that is, can be expressed as the distribution of the sum n iid random variables for all n. Viewing this definition in terms of convolutions of distributions may be more helpful, especially as we will subsequently consider characteristic functions. If this is the first time you have seen this property, note that it is not a universal property. For example, it is not clear how to write a U[0,1] random variable as a convolution of two iid RVs. Note that exactly the same argument suffices to show that the distribution of X_t is infinitely divisible.

It will be most convenient to work with the characteristic functions

\mathbb{E}\exp(i\langle \lambda,X_t\rangle).

By stationarity of increments, we can show that this is equal to

\exp(-\Psi(\lambda)t)\quad\text{where}\quad \mathbb{E}\exp(i\langle \lambda,X_1\rangle)=:\exp(-\Psi(\lambda)).

This function \Psi(\lambda) is called the characteristic exponent. The argument resembles that used for Cauchy’s functional equations, by dealing first with the rationals using stationarity of increments, then lifting to the reals by the (right-)continuity of

t\mapsto \mathbb{E}\exp(i\langle \lambda,X_t\rangle).

As ever, \Psi(\lambda) uniquely determines the distribution of X_1, and so it also uniquely determines the distribution of Levy process. The only condition on \Psi is that it be the characteristic function of an infinitely divisible distribution. This condition is given explicitly by the Levy-Khintchine formula.


\Psi(\lambda) is the characteristic function of an infinitely divisible distribution iff

\Psi(\lambda)=i\langle a,\lambda\rangle +\frac12 Q(\lambda)+\int_{\mathbb{R}^d}(1-e^{i\langle \lambda,x\rangle}+i\langle \lambda,x\rangle 1_{|x|<1})\Pi(dx).

for a\in\mathbb{R}^d, Q a quadratic form on \mathbb{R}^d, and \Pi a so-called Levy measure satisfying \int (1\wedge |x|^2)\Pi(dx)<\infty.

This looks a bit arbitrary, so first let’s explain what each of these terms ‘means’.

  • i\langle a,\lambda\rangle comes from a drift of -a. Note that a deterministic linear function is a (not especially interesting) Levy process.
  • \frac12Q(\lambda) comes from a Brownian part \sqrt{Q}B_t.

The rest corresponds to the jump part of the process. Note that a Poisson process is an example of a Levy process, hence why we might consider thinking about jumps in the first place. The reason why there is an indicator function floating around is that we have to think about two regimes separately, namely large and small jumps. Jumps of size bounded below cannot happen too often as otherwise the process might explode off to infinity in finite time with positive probability. On the other hand, infinitesimally small jumps can happen very often (say on a dense set) so long as everything is controlled to prevent an explosion on the macroscopic scale.

There is no canonical choice for where the divide between these regimes happens, but conventionally this is taken to be at |x|=1. The restriction on the Levy measure near 0 ensures that the sum of the squares all jumps up some finite time converges absolutely.

  • \Pi\cdot 1_{|x|\geq 1} gives the intensity of a standard compound Poisson process. The jumps are well-spaced, and so it is a relatively simple calculation to see that the characteristic function is

\int_{\mathbb{R}^d}(1-e^{i\langle \lambda,x\rangle})1_{|x|\geq 1}\Pi(dx).

The intensity \Pi\cdot 1_{|x|<1} gives infinitely many hits in finite time, so if the expectation of this measure is not 0, we explode immediately. We compensate by drifting away from this at rate


To make this more rigorous, we should really consider 1_{\epsilon<|x|<1} then take a limit, but this at least explains where all the terms come from. Linearity allows us to interchange integrals and inner products, to get the term

\int_{\mathbb{R}^d}(1-e^{-i\langle \lambda,x\rangle}+i\langle\lambda,x\rangle 1_{|x|<1})\Pi(dx).

If the process has bounded variation, then we must have Q=0, and also

\int (1\wedge |x|)\Pi(dx)<\infty,

that is, not too many jumps on an |x| scale. In this case, then this drift component is well-defined and linear \lambda, so can be incorporated with the drift term at the beginning of the Levy-Khintchine expression. If not, then there are some \lambda for which it does not exist.

There are some other things to be said about Levy processes, including

  • Stable Levy processes, where \Psi(k\lambda)=k^\alpha \Psi(\lambda), which induces the rescaling-invariance property: k^{-1/\alpha}X_{kt}\stackrel{d}{=}X. The distribution of each X_t is then also a stable distribution.
  • Resolvents, where instead of working with the process itself, we work with the distribution of the process at a random exponential time.

The Poisson Process – A Third Characteristion

There remains the matter of the distribution of the number of people to arrive in a fixed non-infinitissimal time interval. Consider the time interval [0,1], which we divide into n smaller intervals of equal width. As n grows large enough that we know the probability that two arrivals occur in the same interval tends to zero (as this is \leq no(\frac{1}{n})), we can consider this as a sequence of iid Bernoulli random variables as before. So

\approx \frac{n^k}{k!} \frac{\lambda^k}{n^k}(1-\frac{\lambda}{n})^n\approx \frac{\lambda^k}{k!}e^{-\lambda}.

We recognise this as belonging to a Poisson (hence the name of the process!) random variable. We can repeat this for a general time interval and obtain N_t\sim \text{Po}(\lambda t).

Note that we implicitly assumed that, in the infinitissimal case at least, behaviour in disjoint intervals was independent. We would hope that this would lift immediately to the large intervals, but it is not immediately obvious how to make this work. This property of independent increments is one of the key definitions of a Levy Process, of which the Poisson process is one of the two canonical examples (the other is Brownian Motion).

As before, if we can show that the implication goes both ways (and for this case it is not hard – letting t\rightarrow 0 clearly gives the infinitissimal construction), we can prove results about Poisson random variables with ease, for example \text{Po}(\lambda)+\text{Po}(\mu)=\text{Po}(\lambda+\mu).

This pretty much concludes the construction of the Poisson process. We have three characterisations:
1) X_n\sim\text{Exp}(\lambda) all iid.
2) The infinitissimal construction as before, with independence.
3) The number of arrivals in a time interval of width t \sim \text{Po}(\lambda t). (This is sometimes called a stationary increments property.) Furthermore, we have independent increments.

A formal derivation of the equivalence of these forms is important but technical, and so not really worth attempting here. See James Norris’s book for example for a fuller exposition.

The final remark is that the Poisson Process has the Markov property. Recall that this says that conditional on the present, future behaviour is independent of the past. Without getting into too much detail, we might like to prove this by using the independent increments property. But remember that for a continuous process, it is too much information to keep track of all the distributions at once. It is sufficient to track only the finite marginal distributions, provided the process is cadlag, which the Poisson process is, assuming we deal with the discontinuities in the right way. Alternatively, the exponential random variable is memoryless, a property that can be lifted, albeit with some technical difficulties, to show the Markov property.

Strong Markov Property for BM

The Strong Markov Property is the most important result to demonstrate for any Markov process, such as Brownian Motion. It is also probably the most widely requested item of bookwork on the Part III Advanced Probability exam. I feel it is therefore worth practising writing as quickly as possible.

Theorem (SMP): Take (B_t) a standard (\mathcal{F}_t)-BM, and T an a.s. finite stopping time. Then (B_{T+t}-B_T,t\geq 0) is a standard BM independent of \mathcal{F}_T.

Proof: We write B_t^{(T)}=B_{T+t}-B_T for ease of notation. We will show that for any A\in\mathcal{F}_T and F bounded, measurable:


This will suffice to establish independence, and taking A=\Omega\in\mathcal{F}_t shows that B_t^T is a standard BM since (Levy), BM is uniquely characterised by its finite joint distributions.

To prove the result, we approximate discretely, and apply the Markov property.

\mathbb{E}[1_AF(B_{t_1}^{(T)},\ldots)]=\lim_{m\rightarrow\infty}\sum_{k=1}^\infty \mathbb{E}[1_{A\cap\{T\in((k-1)2^{-m},k2^{-m}]\}}F(B_{t_1}^{(k2^{-m})},\ldots)]

by bounded convergence, using continuity of F, right-continuity of B, and that T<\infty a.s. (so that 1_A=\sum 1_{A\cap \{T\in(-,-]\}})

\stackrel{\text{WMP}}{=}\lim_{m\rightarrow\infty}\sum_{k=1}^\infty \mathbb{P}[A\cap\{T\in((k-1)2^{-m},k2^{-m}]\}]\mathbb{E}F(B_{t_1},\ldots,B_{t_n})


which is exactly what we required.

Remarks: 1) We only used right-continuity of the process, and characterisation by joint marginals, so the proof works equally well for Levy processes.

2) We can in fact show that it is independent of \mathcal{F}_T^+, by considering T+\frac{1}{n} which is still a stopping time, then taking a limit in this as well in the above proof. For details of a similar result, see my post on Blumenthal’s 0-1 Law.

Motivating Ito’s Formula

Ito’s formula, which characterises the stochastic differential, has been mentioned by various textbooks and courses, but now for the first time (after James Norris’s first lecture for the Stochastic Calculus course) I think I finally have a reasonable idea of what’s going on. The reasons I was initially confused help to explain what the motivation is:

  • What processes can we consider? Well, initially continuous time, time-homogeneous Markov processes in \mathbb{R}^d with continuous paths. It could be space-homogeneous as well if desired. By the theory of decomposition of Levy processes (ie what we are considering), the continuous paths property gives that such a process must be a Brownian motion with drift. This has the property that X_{t+dt}-X_t\sim N(b(X_t)dt,a(X_t)dt) where a(X_t) is the diffusivity, that is, the intensity of the Brownian component, and b(X_t) is the drift.
  • What is the stochastic differential? Well, for a process as above, we define: dX_t:= X_{t+dt}-X_t-N(b(X_t)dt,a(X_t)dt). This is non-deterministic: that’s reasonable since X is a stochastic process. And, a normal differential is meaningful only when you integrate, so similarly the stochastic differential is only meaningful when you take an expectation.
  • Write N_t for the Brownian noise. Then \mathbb{E}[d(f(X_t))|\mathcal{F}_t]=\mathbb{E}[f(X_{t+dt})-f(X_t)|\mathcal{F}_t], so by Taylor: =\mathbb{E}[f'(X_t)(b(X_t)dt+N_t)+\frac12 f''(X_t)N_t^2 +o(dt)], remembering that N_t=O(\sqrt{dt}).
  • This is generally written as \mathbb{E}[d(f(X_t))|\mathcal{F}_t]=Lf(X_t)dt where Lf(x)=b(x)f'(x)+\frac12 a(x)f''(x). Now note that \mathbb{E}[dX_t|\mathcal{F}_t]=b(X_t)dt and \mathbb{E}[dX_tdX_t|\mathcal{F}_t]=a(X_t)dt, so it is reasonable that we might ‘cancel the expectations’ to get: d(f(X_t))=f'(X_t)dX_t+\frac12 f''(X_t)dX_tdX_t.
  • Use a suitable tensor product or dX_tdX_t^T when d>1.
  • This is (a version of) Ito’s Lemma.