Skorohod embedding

Background

Suppose we are given a standard Brownian motion (B_t), and a stopping time T. Then, so long as T satisfies one of the regularity conditions under which the Optional Stopping Theorem applies, we know that \mathbb{E}[B_T]=0. (See here for a less formal introduction to OST.) Furthermore, since B_t^2-t is a martingale, \mathbb{E}[B_T^2]=\mathbb{E}[T], so if the latter is finite, so is the former.

Now, using the strong Markov property of Brownian motion, we can come up with a sequence of stopping times 0=T_0, T_1, T_2,\ldots such that the increments T_k-T_{k-1} are IID with the same distribution as T. Then 0,B_{T_1},B_{T_2},\ldots is a centered random walk. By taking T to be the hitting time of \{-1,+1\}, it is easy to see that we can embed simple random walk in a Brownian motion using this approach.

p1020956_compressedEmbedding simple random walk in Brownian motion.

The Skorohod embedding question asks: can all centered random walks be constructed in this fashion, by stopping Brownian motion at a sequence of stopping time? With the strong Markov property, it immediately reduces the question of whether all centered finite-variance distributions X can be expressed as B_T for some integrable stopping time T.

The answer to this question is yes, and much of what follows is drawn from, or at least prompted by Obloj’s survey paper which details the problem and rich history of the many approaches to its solution over the past seventy years.

Applications and related things

The relationship between random walks and Brownian motion is a rich one. Donsker’s invariance principle asserts that Brownian motion appears as the scaling limit of a random walk. Indeed, one can construct Brownian motion itself as the limit of a sequence of consistent random walks with normal increments on an increasingly dense set of times. Furthermore, random walks are martingales, and we know that continuous, local martingales can be expressed as a (stochastically) time-changed Brownian motion, from the Dubins-Schwarz theorem.

The Skorohod embedding theorem can be used to prove results about random walks with general distribution by proving the corresponding result for Brownian motion, and checking that the construction of the sequence of stopping times has the right properties to allow the result to be carried back to the original setting. It obviously also gives a coupling between a individual random walk and a Brownian motion which may be useful in some contexts, as well as a coupling between any pair of random walks. This is useful in proving results for random walks which are much easier for special cases of the distribution. For example, when the increments are Gaussian, or when there are combinatorial approaches to a problem about simple random walk. At the moment no aspect of this blog schedule is guaranteed, but I plan to talk about the law of the iterated logarithm shortly, whose proof is approachable in both of these settings, as well as for Brownian motion, and Skorohod embedding provides the route to the general proof.

At the end, we will briefly compare some other ways to couple a random walk and a Brownian motion.

Adding extra randomness

One thing we could do is sample a copy of X independently from the Brownian motion, then declare T= \tau_{X}:= \inf\{t\ge 0: B_t=X\}, the hitting time of (random value) X. But recall that unfortunately \tau_x has infinite expectation for all non-zero x, so this doesn’t fit the conditions required to use OST.

Skorohod’s original method is described in Section 3.1 of Obloj’s notes linked above. The method is roughly to pair up positive values taken by X appropriately with negative values taken by X in a clever way. If we have a positive value b and a negative value a, then \tau_{a,b}, the first hitting time of \mathbb{R}\backslash (a,b) is integrable. Then we choose one of these positive-negative pairs according to the projection of the distribution of X onto the pairings, and let T be the hitting time of this pair of values. The probability of hitting b conditional on hitting {a,b} is easy to compute (it’s \frac{-a}{b-a}) so we need to have chosen our pairs so that the ‘probability’ of hitting b (ie the density) comes out right. In particular, this method has to start from continuous distributions X, and treat atoms in the distribution of X separately.

The case where the distribution X is symmetric (that is X\stackrel{d}=-X) is particularly clear, as then the pairs should be (-x,x).

However, it feels like there is enough randomness in Brownian motion already, and subsequent authors showed that indeed it wasn’t necessary to introduce extra randomness to provide a solution.

One might ask whether it’s possible to generate the distribution on the set of pairs (as above) out of the Brownian motion itself, but independently from all the hitting times. It feels like it might be possible to make the distribution on the pairs measurable with respect to

\mathcal{F}_{0+} = \bigcap\limits_{t>0} \mathcal{F}_t,

the sigma-algebra of events determined by limiting behaviour as t\rightarrow 0 (which is independent of hitting times). But of course, unfortunately \mathcal{F}_{0+} has a zero-one law, so it’s not possible to embed non-trivial distributions there.

Dubins solution

The exemplar for solutions without extra randomness is due to Dubins, shortly after Skorohod’s original argument. The idea is to express the distribution X as the almost sure limit of a martingale. We first use the hitting time of a pair of points to ‘decide’ whether we will end up positive or negative, and then given this information look at the hitting time (after this first time) of two subsequent points to ‘decide’ which of four regions of the real interval we end up in.

I’m going to use different notation to Obloj, corresponding more closely with how I ended up thinking about this method. We let

a_+:= \mathbb{E}[X \,|\, X>0], \quad a_- := \mathbb{E}[X\,|\, X<0], (*)

and take T_1 = \tau_{\{a_-,a_+\}}. We need to check that

\mathbb{P}\left( B_{T_1}=a_+\right) = \mathbb{P}\left(X>0\right),

for this to have a chance of working. But we know that

\mathbb{P}\left( B_{T_1}=a_+\right) = \frac{a_+}{a_+-a_-},

and we can also attack the other side using (*) and the fact that \mathbb{E}[X]=0, using the law of total expectation:

0=\mathbb{E}[X]=\mathbb{E}[X\,|\, X>0] \mathbb{P}(X>0) + \mathbb{E}[X\,|\,X<0]\mathbb{P}(X<0) = a_+ \mathbb{P}(X>0) + a_- \left(1-\mathbb{P}(X>0) \right),

\Rightarrow\quad \mathbb{P}(X>0)=\frac{a_+}{a_+-a_-}.

Now we define

a_{++}=\mathbb{E}[X \,|\, X>a_+],\quad a_{+-}=\mathbb{E}[X\,|\, 0<X<a_+],

and similarly a_{-+},a_{--}. So then, conditional on B_{T_1}=a_+, we take

T_2:= \inf_{t\ge T_1}\left\{ B_t\not\in (a_{+-},a_{++})  \right\},

and similarly conditional on B_{T_1}=a_-. By an identical argument to the one we have just deployed, we have \mathbb{E}\left[B_{T_2} \,|\,\mathcal{F}_{T_1} \right] = B_{T_1} almost surely. So, although the a_{+-+} notation now starts to get very unwieldy, it’s clear we can keep going in this way to get a sequence of stopping times 0=T_0,T_1,T_2,\ldots where B_{T_n} determines which of the 2^n regions of the real line any limit \lim_{m\rightarrow\infty} B_{T_m} should lie in.

A bit of work is required to check that the almost sure limit T_n\rightarrow T is almost surely finite, but once we have this, it is clear that B_{T_n}\rightarrow B_T almost surely, and B_T has the distribution required.

Komlos, Major, Tusnady coupling

We want to know how close we can make this coupling between a centered random walk with variance 1, and a standard Brownian motion. Here, ‘close’ means uniformly close in probability. For large times, the typical difference between one of the stopping times 0,T_1,T_2,\ldots in the Skorohod embedding and its expectation (recall \mathbb{E}[T_k]=k) is \sqrt{n}. So, constructing the random walk S_0,S_1,S_2,\ldots from the Brownian motion via Skorohod embedding leads to

\left |S_k - B_k \right| = \omega(n^{1/4}),

for most values of k\le n. Strassen (1966) shows that the true scale of the maximum

\max_{k\le n} \left| S_k - B_k \right|

is slightly larger than this, with some extra powers of \log n and \log\log n as one would expect.

The Komlos-Major-Tusnady coupling is a way to do a lot better than this, in the setting where the distribution of the increments has a finite MGF near 0. Then, there exists a coupling of the random walk and the Brownian motion such that

\max_{k\le n}\left|S_k- B_k\right| = O(\log n).

That is, there exists C such that

\left[\max_{k\le n} \left |S_k-B_k\right| - C\log n\right] \vee 0

is a tight family of distributions, indeed with uniform exponential tail. To avoid digressing infinitely far from my original plan to discuss the proof of the law of iterated logarithm for general distributions, I’ll stop here. I found it hard to find much coverage of the KMT result apart from the challenging original paper, and many versions expressed in the language of empirical processes, which are similar to random walks in many ways relevant to convergence and this coupling, but not for Skorohod embedding. So, here is a link to some slides from a talk by Chatterjee which I found helpful in getting a sense of the history, and some of the modern approaches to this type of normal approximation problem.

Advertisements

100k Views

When I started this blog, I implicitly made a promise to myself that I would be aiming for quality of posts rather than quantity of posts. Evaluating quality is hard, whereas one always feels some vague sense of purpose by seeing some measure of output increase. Nonetheless, I feel I have mostly kept this promise, and haven’t written that much solely for the sake of getting out another post. This post is something of an exception, since I noticed in the office on Friday that this blog was closing in on 100,000 page views. Some of these were not actually just me pressing F5 repeatedly. Obviously this is no more relevant an integer to be a threshold as any other, and one shouldn’t feel constrained by base 10, but it still feels like a good moment for a quick review.

Here are some interesting statistics over the (3+\epsilon) years of this website’s existence.

  • 175 posts, not including this one, or the three which are currently in draft status. This works out at about 4.7 posts a month. By some margin the most prolific period was May 2012, when I was revising for my Part III exams in Cambridge, and a series of posts about the fiddliest parts of stochastic calculus and network theory seemed a good way to consolidate this work. I’ve learned recently that PhDs are hard, and in fact it’s been a year since I last produced at least five posts in a month, if you discount the series of olympiad reports, which though enjoyable, don’t exactly require a huge amount of mathematical engagement.
  • By at least one order of magnitude, the most viewed day was 17th August 2014, when several sources simultaneously linked to the third part of my report on IMO 2014 in Cape Town. An interesting point to note is that WordPress counts image views separately to page views, so the rare posts which have a gallery attached count well in a high risk / high return sense. In any case, the analytics consider that this day resulted in 2,366 views by 345 viewers. During a typical period, the number of views per visitor fluctuates between roughly 1.5 and 1.8, so clearly uploading as many photos of maths competitions as possible is the cheap route to lots of hits, at least by the WordPress metric.

EAE stats views

  • One might well expect the distributions involved in such a setup to follow a power-law. It’s not particularly clear from the above data about views per month since late 2012 whether this holds. One anomalously large data point (corresponding to the interest in IMO 2014) does not indicate a priori a power law tail… In addition, there is a general upward trend. Since a substantial proportion of traffic arrives from Google, one might naively assume that traffic rate might be proportion to amount of content, which naturally will grow in time, though it seems impractical to test this. One might also expect more recent posts to be more popular, though in practice this seems not to have been the case.
  • The variance in popularity of the various posts has been surprisingly large. At some level, I guess I thought there would be more viewers who browse through lots of things, but such people would probably do so from the home page, so it doesn’t show up as viewing lots of different articles. There is some internal linking between some pages, but not enough to be a major effect.
  • At either end of the spectrum, a post about coupling and the FKG inequality has received only 16 views in 2.5 years, while a guide to understanding the Levy-Khintchine formula has, in slightly less time, racked up 2,182 hits. There are direct reasons for this. Try googling Levy-Khintchine formula and see what comes up. In a sense, this is not enough, since you also need people to be googling the term in question, and picking topics that are hard but not super-hard at a masters level is probably maximising interest. But I don’t have a good underlying reason for why some posts should end up being more Google-friendly than others.
  • In particular, quality of article seems at best loosely correlated with number of views. This is perhaps worrying, both for my reputation, and for the future of written media, but we will see. Indeed, two articles on the Dubins-Schwarz theorem and a short crib sheet for convergence of random variables, both get a regular readership, despite seeming to have been written (in as much as a blog post can be) on the back of an envelope. I also find it amusing that the Dubins-Schwarz theorem is always viewed at the same time of the year, roughly mid-February, as presumably it comes up in the second term of masters courses, just like it did in my own.
  • By contrast, I remain quite pleased with the Levy-Khintchine article. It’s the sort of topic which is perfectly suited to this medium, since most books on Levy processes seem to assume implicit that their readers will already be familiar with this statement. So it seemed like a worthwhile enterprise to summarise this derivation, and it’s nice to see that others clearly feel the same, and indeed I still find some posts of this flavour useful as revision for myself.

Blog log plot

  • This seemed like a particularly good data set in which to go hunting for power-laws. I appreciate that taking a print-screen of an Excel chart will horrify many viewers, but never mind. The above plot shows the log of page view values for those mathematical articles with at least 200 hits. You can see the Levy-Khintchine as a mild anomaly at the far left. While I haven’t done any serious analysis, this looks fairly convincing.
  • I haven’t engaged particularly seriously in interaction with other blogs and other websites. Perhaps I should have done? I enjoy reading such things, but networking in this fashion seems like a zero-sum game overall except for a few particularly engaged people, even if one gets a pleasing spike in views from a reciprocal tweet somewhere. As a result, the numbers of comments and out-going clicks here is essentially negligible.
  • Views from the UK narrowly outnumber views from the US, but at present rates this will be reversed very soon. I imagine if I discounted the olympiad posts, which are sometimes linked from UK social media, this would have happened already.
  • From purely book-keeping curiosity, WordPress currently thinks the following countries (and territories – I’m not sure how the division occurs…) have recorded exactly one view: Aaland Islands, Afghanistan, Belize, Cuba, Djibouti, El Salvador, Fiji, Guatemala, Guernsey, Laos, Martinique, Mozambique, New Caledonia, Tajikistan, US Virgin Islands, and Zambia. Visiting all of those would be a fun post-viva trip…

Conclusion

As I said, we all know that 100,000 is just a number, but taking half an hour to write this has been a good chance to reflect on what I’ve written here in the past three years. People often ask whether I would recommend that they start something similar. My answer is ‘probably yes’, so long as the writer is getting something out of most posts they produce in real time. When writing about anything hard and technical, you have to accept that until you become very famous, interest in what you produce is always going to be quite low, so the satisfaction has to be gained from the process of understanding and digesting the mathematics itself. None of us will be selling the musical rights any time soon.

I have two pieces of advice to anyone in such a position. 1) Wait until you’ve written five posts before publishing any of them. This is a good test of whether you actually want to do it, and you’ll feel much more plausible linking to a website with more than two articles on it. 2) Don’t set monthly post count targets. Tempting though it is to try this to prevent your blog dying, it doesn’t achieve anything in the long term. If you have lots to say, say lots; if you don’t, occasionally saying something worthwhile feels a lot better when you look back on it than producing your target number of articles which later feel underwhelming.

I don’t know whether this will make it to 6+2\epsilon years, but for now, I’m still enjoying the journey through mathematics.

SLE Revision 4: The Gaussian Free Field and SLE4

I couldn’t resist breaking the order of my revision notes in order that the title might be self-referential. Anyway, it’s the night before my exam on Conformal Invariance and Randomness, and I’m practising writing this in case of an essay question about the Gaussian Free Field and its relation to the SLE objects discussed in the course.

What is a Gaussian Free Field?

The most natural definition is too technical for this context. Instead, recall that we could very informally consider a Poisson random measure to have the form of a series of Poisson random variables placed at each point in the domain, weighted infinitissimely so that the integrals over an area give a Poisson random variable with mean proportional to the measure of the area, and so that different areas are independent. Here we do a similar thing only for infinitesimal centred Gaussians. We have to specify the covariance structure.

We define the Green’s function on a domain D, which has a resonance with PDE theory, by:

G_D(x,y)=\lim_{\epsilon\rightarrow 0}\mathbb{E}[\text{time spent in }B(y,\epsilon)\text{ by BM started at }x\text{ stopped at }T_D]

We want the covariance structure of the hypothetical infinitesimal Gaussians to be given by \mathbb{E}(g(x)g(y))=G_D(x,y). So formally, we define (\Gamma(A),A\subset D) for open A, by (\Gamma(A_1),\ldots,\Gamma(A_n)) a centred Gaussian RV with covariance \mathbb{E}(\Gamma(A_1)\Gamma(A_2))=\int_{A_1\times A_2}dxdyG_D(x,y).

The good news is that we have a nice expression G_U(0,x)=\log\frac{1}{|x|}, and the Green’s functions are conformally invariant in the sense that G_{\phi(D)}(\phi(x),\phi(y))=G_D(x,y), following directly for conformality of Brownian Motion.

The bad news is that the existence is not clear. The motivation for this is the following though. We have a so-called excursion measure for BMs in a domain D. There isn’t time to discuss this now: it is infinite, and invariant under translations of the boundary (assuming the boundary is \mathbb{R}\subset \bar{\mathbb{H}}, which is fine after taking a conformal map). Then take a Poisson Point Process on the set of Brownian excursions with this measure. Now define a function f on the boundary of the domain dD, and define \Gamma_f(A) to be the sum of the values of f at the starting point of BMs in the PPP passing through A, weighted by the time spent in A. We have a universality relation given by the central limit theorem: if we define h to be (in a point limit) the expected value of this variable, and we take n independent copies, we have:

\frac{1}{\sqrt{n}}\left['\Gamma_f^1(A)+\ldots+\Gamma_f^n(A)-n\int_Ah(x)dx\right]\rightarrow\Gamma(A)

where this limiting random variable is Gaussian.

For now though, we assume existence without full proof.

SLE_4

We consider chordal SLE_k, which has the form of a curve \gamma[0,\infty] from 0 to \infty in H. The g_t the regularising function as normal, consider \tilde{X}_t=X_t-W_t:=g_t(x)-\sqrt{\kappa}\beta_t for some fixed x. We are interested in the  evolution of the function arg x. Note that conditional on the (almost sure for K<=4) event that x does not lie on the curve, arg x will converge either to 0 or pi almost surely, depending on whether the curve passes to the left or the right (respectively) of x.

By Loewner’s DE for the upper half-plane and Ito’s formula:

d\bar{X}_t=\sqrt{\kappa}d\beta_t,\quad d\log\bar{X}_t=(2-\frac{\kappa}{2})\frac{dt}{\bar{X}_t^2}+\frac{\sqrt{\kappa}}{\bar{X}_t}d\beta_t

So, when K=4, the dt terms vanish, which gives that log X is a local martingale, and so

d\theta_t=\Im(\frac{2}{\bar{X}_t}d\beta_t

is a true martingale since it is bounded. Note that

\theta_t=\mathbb{E}[\pi1(x\text{ on right of }\gamma)|\mathcal{F}_t]

Note that also:

\mathbb{P}(\text{BM started at }x\text{ hits }\gamma[0,t]\cup\mathbb{R}\text{ to the right of }\gamma(t)|\gamma[0,t])=\frac{\theta_t}{\pi} also.

SLE_4 and the Gaussian Free Field on H

We will show that this chordal SLE_4 induces a conformal Markov type of property in Gaussian Free Fields constructed on the slit-domain. Precisely, we will show that if \Gamma_T is a GFF on H_T=\mathbb{H}\backslash\gamma[0,T], then \Gamma_T+ch_T(\cdot)=\Gamma_0+ch_0(\cdot), where c is a constant to be determined, and h_t(x)=\theta_t(x) in keeping with the lecturer’s notation!

It will suffice to check that for all fixed p with compact support \Gamma_T(p)+c(h_T(p)-h_0(p)) is a centred Gaussian with variance \int dxdyG_H(x,y)p(x)p(y).

First, applying Ito and conformal invariance of the Green’s functions under the maps g_t,

dG_{H_t}(x,y)=cd[h(x),h(y)]_t

The details are not particularly illuminating, but exploit the fact that Green’s function on H has a reasonable nice form \log\left|\frac{x-\bar{y}}{x-y}\right|. We are also being extremely lax with constants, but we have plenty of freedom there.

After applying Ito and some (for now unjustified) Fubini:

dh_t(p)=\left(\int c.p(x)\Im(\frac{1}{\bar{X}_t})dx\right)d\beta_t

and so as we would have suspected (since h(x) was), this is a local martingale. We now deploy Dubins-Schwarz:

h_T(p)-h_T(0)\stackrel{d}{=}B_{\sigma(T)} for B an independent BM and

\sigma(T)=\int_0^Tdt(\int c.p(x)\Im(\frac{1}{\tilde{X}_t})dx)^2

So conditional on (h_T(p),t\in[0,T]), we want to make up the difference to \Gamma_0. Add to h_T(p)-h_0(p) an independent random variable distribution as N(0,s-\sigma(T)), where

s=\int dxdyp(x)p(y)G(x,y)\quad =:\Gamma_0(p)

Then

s-\sigma(T)=\int p(x)p(y)[G(x,y)-c\int_0^Tdt\Im(\frac{1}{X_t})\Im(\frac{1}{Y_t})]dxdy=\int p(x)p(y)G_t(x,y)dxdy as desired.

Why is this important?

This is important, or at least interesting, because we can use it to reverse engineer the SLE. Informally, we let T\rightarrow\infty in the previous result. This states that taking a GFF in the domain left by removing the whole of the SLE curve (whatever that means) then adding \pi at points on the left of the curve, which is the limit \lim_T h_T is the same as a normal GFF on the upper half plane added to the argument function. It is reasonable to conjecture that a GFF in a non-connected domain has the same structure as taking independent GFFs in each component, and this gives an interesting invariance condition on GFFs. It can also be observed (Schramm-Sheffield) that SLE_4 arises by reversing the argument – take an appropriate conditioned GFF on H and look for the interface between it being ‘large’ and ‘small’ (Obviously this is a ludicrous simplification). This interface is then, under a suitable limit, SLE_4.

Dubins-Schwarz Theorem

In developing the stochastic integral, much of our motivation has come from considering integrals with respect to Brownian Motion. In this section, we develop some results which justify that Brownian Motion is the canonical stochastic process with non-zero quadratic variation (which is related, but not directly equivalent to the property of infinite total variation). In particular, we shall observe the Dubins-Schwarz theorem, which shows that martingales with unbounded (as time \rightarrow\infty) quadratic variation ARE Brownian Motion, up to a (stochastic) time change.

Recall Levy’s characterisation of a d-dimensional BM, which allows us to avoid considering independent normal increments. Given X^1,\ldots,X^d\in\mathcal{M}_{c,loc}:

X=(X^1,\ldots,X^d) a BM iff [X^i,X^j]_t=\delta_{ij}t

Obviously, one direction has been shown as part of the construction and properties of quadratic variation. For the other direction,, because laws are precisely defined by characteristic functions, it suffices to show that

\mathbb{E}\left[\exp(i\langle \theta,X_t-X_s\rangle)|\mathcal{F}_s\right]=\exp(-\frac12||\theta||^2(t-s))

We set Y_t:=\langle \theta,X_t\rangle, and deduce [Y]=t||\theta||^2 and Z=\mathcal{E}(iY)=\exp(iY_t+\frac12[Y]_t)\in\mathcal{M}_{c,loc}, and furthermore is bounded on compact [0,t], hence is a true martingale. So \mathbb{E}\left(\frac{Z_t}{Z_s}|\mathcal{F}_s\right)=1 which is pretty much what was required.

Now, Dubins-Schwarz states

Theorem: Given M\in\mathcal{M}_{c,loc}, M_0=0, [M]_\infty=\infty almost surely, if we set \tau_s:=\inf\{t:[M]_t>s\}, then B_s:=M_{\tau_s} is a (\mathcal{F}_{\tau_s})-BM, with M_t=B_{[M]_t}.

This final result is clear if [M]_t is almost surely strictly increasing in t: just take s=[M]_t in the definition.

We know B is cadlag: we first show B as defined is almost surely continuous. It remains to show B_{s-}=B_s\,\forall s>0\iff M_{\tau_{s-}}=M_{\tau_s}, noting that \tau_{s-}=\inf\{t\geq 0:[M]_t=s\} (by continuity) is a stopping time also.

The only interesting case is if \tau_{s-}<\tau_s, for which need to show [M] is constant. This is intuitively obvious, but formally, we must appeal to (M^2-[M])^{\tau_s} which is UI, since \mathbb{E}[M^{\tau_s}]_\infty<\infty. Now may apply OST to obtain \mathbb{E}[M_{\tau_s}^2-M_{\tau_{s-}}^2|\mathcal{F}_{\tau_{s-}}]=\mathbb{E}[(M_{\tau_s}-M_{\tau_{s-}})^2|\mathcal{F}_{\tau_{s-}}]=0 which implies M is almost surely constant on [\tau_{s-},\tau_s]. We need to lift this to the case where it holds for all s simultaneously almost surely. Note that cadlag almost surely plus almost surely continuous at each point does not implies almost surely continuous everywhere (eg consider H(U(0,1)) with H the Heaviside function and U a uniform distribution). Instead, we record intervals of constancy of both M_t,[M]_t. That is, we set

T_r=\inf\{t>r:M_t\neq M_r\},\quad S_r=\inf\{t>r:[M]_t\neq [M]_r\}

Then these are cadlag, and by above T_r=S_r\,\forall r\in\mathbb{Q}^+ a.s. therefore T_r=S_r\,\forall r almost surely. Thus M, [M] are constant on the same intervals.

We also check B is adapted to \mathcal{G}_t=\mathcal{F}_{\tau_t}. STP X_T1_{\{T<\infty\}} is \mathcal{F}_T-measurable for X cadlag adapted. Approximating T discretely from above gives the result, exploiting that the result is clear if T has countable support. Now, obtain M^{\tau_s}\in\mathcal{M}_c^2, so M_{t\wedge \tau_s} UI by Doob, so by OST, get \mathbb{E}[M_{\tau_s}|\mathcal{F}_{\tau_s}]=M_{\tau_r}, to get B a martingale. The finally:

\mathbb{E}[B_s^2-s|\mathcal{G}_r]=\mathbb{E}[(M^2-[M])_{\tau_s}|\mathcal{F}_{\tau_s}]=M_{\tau_r}^2-[M]_{\tau_r}=B_r^2-r

And so we can apply Levy’s characterisation to finish the result.