Skorohod embedding

Background

Suppose we are given a standard Brownian motion (B_t), and a stopping time T. Then, so long as T satisfies one of the regularity conditions under which the Optional Stopping Theorem applies, we know that \mathbb{E}[B_T]=0. (See here for a less formal introduction to OST.) Furthermore, since B_t^2-t is a martingale, \mathbb{E}[B_T^2]=\mathbb{E}[T], so if the latter is finite, so is the former.

Now, using the strong Markov property of Brownian motion, we can come up with a sequence of stopping times 0=T_0, T_1, T_2,\ldots such that the increments T_k-T_{k-1} are IID with the same distribution as T. Then 0,B_{T_1},B_{T_2},\ldots is a centered random walk. By taking T to be the hitting time of \{-1,+1\}, it is easy to see that we can embed simple random walk in a Brownian motion using this approach.

p1020956_compressedEmbedding simple random walk in Brownian motion.

The Skorohod embedding question asks: can all centered random walks be constructed in this fashion, by stopping Brownian motion at a sequence of stopping time? With the strong Markov property, it immediately reduces the question of whether all centered finite-variance distributions X can be expressed as B_T for some integrable stopping time T.

The answer to this question is yes, and much of what follows is drawn from, or at least prompted by Obloj’s survey paper which details the problem and rich history of the many approaches to its solution over the past seventy years.

Applications and related things

The relationship between random walks and Brownian motion is a rich one. Donsker’s invariance principle asserts that Brownian motion appears as the scaling limit of a random walk. Indeed, one can construct Brownian motion itself as the limit of a sequence of consistent random walks with normal increments on an increasingly dense set of times. Furthermore, random walks are martingales, and we know that continuous, local martingales can be expressed as a (stochastically) time-changed Brownian motion, from the Dubins-Schwarz theorem.

The Skorohod embedding theorem can be used to prove results about random walks with general distribution by proving the corresponding result for Brownian motion, and checking that the construction of the sequence of stopping times has the right properties to allow the result to be carried back to the original setting. It obviously also gives a coupling between a individual random walk and a Brownian motion which may be useful in some contexts, as well as a coupling between any pair of random walks. This is useful in proving results for random walks which are much easier for special cases of the distribution. For example, when the increments are Gaussian, or when there are combinatorial approaches to a problem about simple random walk. At the moment no aspect of this blog schedule is guaranteed, but I plan to talk about the law of the iterated logarithm shortly, whose proof is approachable in both of these settings, as well as for Brownian motion, and Skorohod embedding provides the route to the general proof.

At the end, we will briefly compare some other ways to couple a random walk and a Brownian motion.

Adding extra randomness

One thing we could do is sample a copy of X independently from the Brownian motion, then declare T= \tau_{X}:= \inf\{t\ge 0: B_t=X\}, the hitting time of (random value) X. But recall that unfortunately \tau_x has infinite expectation for all non-zero x, so this doesn’t fit the conditions required to use OST.

Skorohod’s original method is described in Section 3.1 of Obloj’s notes linked above. The method is roughly to pair up positive values taken by X appropriately with negative values taken by X in a clever way. If we have a positive value b and a negative value a, then \tau_{a,b}, the first hitting time of \mathbb{R}\backslash (a,b) is integrable. Then we choose one of these positive-negative pairs according to the projection of the distribution of X onto the pairings, and let T be the hitting time of this pair of values. The probability of hitting b conditional on hitting {a,b} is easy to compute (it’s \frac{-a}{b-a}) so we need to have chosen our pairs so that the ‘probability’ of hitting b (ie the density) comes out right. In particular, this method has to start from continuous distributions X, and treat atoms in the distribution of X separately.

The case where the distribution X is symmetric (that is X\stackrel{d}=-X) is particularly clear, as then the pairs should be (-x,x).

However, it feels like there is enough randomness in Brownian motion already, and subsequent authors showed that indeed it wasn’t necessary to introduce extra randomness to provide a solution.

One might ask whether it’s possible to generate the distribution on the set of pairs (as above) out of the Brownian motion itself, but independently from all the hitting times. It feels like it might be possible to make the distribution on the pairs measurable with respect to

\mathcal{F}_{0+} = \bigcap\limits_{t>0} \mathcal{F}_t,

the sigma-algebra of events determined by limiting behaviour as t\rightarrow 0 (which is independent of hitting times). But of course, unfortunately \mathcal{F}_{0+} has a zero-one law, so it’s not possible to embed non-trivial distributions there.

Dubins solution

The exemplar for solutions without extra randomness is due to Dubins, shortly after Skorohod’s original argument. The idea is to express the distribution X as the almost sure limit of a martingale. We first use the hitting time of a pair of points to ‘decide’ whether we will end up positive or negative, and then given this information look at the hitting time (after this first time) of two subsequent points to ‘decide’ which of four regions of the real interval we end up in.

I’m going to use different notation to Obloj, corresponding more closely with how I ended up thinking about this method. We let

a_+:= \mathbb{E}[X \,|\, X>0], \quad a_- := \mathbb{E}[X\,|\, X<0], (*)

and take T_1 = \tau_{\{a_-,a_+\}}. We need to check that

\mathbb{P}\left( B_{T_1}=a_+\right) = \mathbb{P}\left(X>0\right),

for this to have a chance of working. But we know that

\mathbb{P}\left( B_{T_1}=a_+\right) = \frac{a_+}{a_+-a_-},

and we can also attack the other side using (*) and the fact that \mathbb{E}[X]=0, using the law of total expectation:

0=\mathbb{E}[X]=\mathbb{E}[X\,|\, X>0] \mathbb{P}(X>0) + \mathbb{E}[X\,|\,X<0]\mathbb{P}(X<0) = a_+ \mathbb{P}(X>0) + a_- \left(1-\mathbb{P}(X>0) \right),

\Rightarrow\quad \mathbb{P}(X>0)=\frac{a_+}{a_+-a_-}.

Now we define

a_{++}=\mathbb{E}[X \,|\, X>a_+],\quad a_{+-}=\mathbb{E}[X\,|\, 0<X<a_+],

and similarly a_{-+},a_{--}. So then, conditional on B_{T_1}=a_+, we take

T_2:= \inf_{t\ge T_1}\left\{ B_t\not\in (a_{+-},a_{++})  \right\},

and similarly conditional on B_{T_1}=a_-. By an identical argument to the one we have just deployed, we have \mathbb{E}\left[B_{T_2} \,|\,\mathcal{F}_{T_1} \right] = B_{T_1} almost surely. So, although the a_{+-+} notation now starts to get very unwieldy, it’s clear we can keep going in this way to get a sequence of stopping times 0=T_0,T_1,T_2,\ldots where B_{T_n} determines which of the 2^n regions of the real line any limit \lim_{m\rightarrow\infty} B_{T_m} should lie in.

A bit of work is required to check that the almost sure limit T_n\rightarrow T is almost surely finite, but once we have this, it is clear that B_{T_n}\rightarrow B_T almost surely, and B_T has the distribution required.

Komlos, Major, Tusnady coupling

We want to know how close we can make this coupling between a centered random walk with variance 1, and a standard Brownian motion. Here, ‘close’ means uniformly close in probability. For large times, the typical difference between one of the stopping times 0,T_1,T_2,\ldots in the Skorohod embedding and its expectation (recall \mathbb{E}[T_k]=k) is \sqrt{n}. So, constructing the random walk S_0,S_1,S_2,\ldots from the Brownian motion via Skorohod embedding leads to

\left |S_k - B_k \right| = \omega(n^{1/4}),

for most values of k\le n. Strassen (1966) shows that the true scale of the maximum

\max_{k\le n} \left| S_k - B_k \right|

is slightly larger than this, with some extra powers of \log n and \log\log n as one would expect.

The Komlos-Major-Tusnady coupling is a way to do a lot better than this, in the setting where the distribution of the increments has a finite MGF near 0. Then, there exists a coupling of the random walk and the Brownian motion such that

\max_{k\le n}\left|S_k- B_k\right| = O(\log n).

That is, there exists C such that

\left[\max_{k\le n} \left |S_k-B_k\right| - C\log n\right] \vee 0

is a tight family of distributions, indeed with uniform exponential tail. To avoid digressing infinitely far from my original plan to discuss the proof of the law of iterated logarithm for general distributions, I’ll stop here. I found it hard to find much coverage of the KMT result apart from the challenging original paper, and many versions expressed in the language of empirical processes, which are similar to random walks in many ways relevant to convergence and this coupling, but not for Skorohod embedding. So, here is a link to some slides from a talk by Chatterjee which I found helpful in getting a sense of the history, and some of the modern approaches to this type of normal approximation problem.

Advertisements

Mixing Times 1 – Reversing Markov Chains

A small group of us have started meeting to discuss Levin, Peres and Wilmer’s book on Markov Chains and Mixing Times. (The plan is to cover a couple of chapters every week, then discuss points of interest and some of the exercises – if anyone is reading this and fancies joining, let me know!) Anyway, this post is motivated by something we discussed in our first session.

Here are two interesting facts about Markov Chains. 1) The Markov property can be defined in terms of products of transition probabilities giving the probability of a particular initial sequence. However, a more elegant and general formulation is to say that, conditional on the present, the past and the future are independent. 2) All transition matrices have at least one equilibrium distribution. In fact, irreducible Markov Chains have precisely one equilibrium distribution. Then, if we start with any distribution, the distribution of the chain at time t converges to the equilibrium distribution.

But hang on. This might be a fairly serious problem. On the one hand we have given a definition of the Markov property that is symmetric in time, in the sense that it remains true whether we are working forwards or backwards. While, on the other hand, the convergence to equilibrium is very much not time-symmetric: we move from disorder to order as time advances. What has gone wrong here?

We examine each of the properties in turn, then consider how to make them fit together in a non-contradictory way.

Markov Property

As many of the students in the Applied Probability course learned the hard way, there are many ways to define the Markov property depending on context, and some are much easier to work with than others. For a Markov chain, you can find a way to say that the transition probability \mathbb{P}(X_{n+1}=x_{n+1}\,|\,X_n=x_n,\ldots,X_0=x_0) is independent of x_0,\ldots,x_{n-1}. Alternatively, you can use this to give an inductive specification for the probability of the first n values of X being some sequence.

It requires a moment’s checking to see that the earlier definition of past/future independence is consistent with this. Let’s first check that we haven’t messed up a definition somewhere, and that the time-reversal of a general Markov chain does have the Markov property, even as defined in the context of a Markov chain.

For clarity, consider X_0,X_1,\ldots, X_N a Markov chain on some finite state space, with N some fixed finite end time. We aren’t losing anything by reversing over a finite time interval – after all, we need to know how to do it over a finite time interval before it could possibly make sense to do it over (-\infty,\infty). We examine (Y_n)_{n=0}^N defined by Y_n:= X_{N-n}.

\mathbb{P}(X_n=x_n|X_{n+1}=x_{n+1},\ldots,X_N=x_N)=\mathbb{P}(X_n=x_n|X_{n+1}=x_{n+1})

is the statement of the Markov property for (Y_n). We rearrange the left hand side to obtain:

=\frac{\mathbb{P}(X_n=x_n,X_{n+1}=x_{n+1},\ldots,X_N=x_N)}{\mathbb{P}(X_{n+1}=x_{n+1},\ldots,X_N=x_N)}

=\frac{\mathbb{P}(X_N=x_N|X_n=x_n,\ldots,X_{N-1}=x_{N-1})\mathbb{P}(X_n=x_n,\ldots,X_{N-1}=x_{N-1})}{\mathbb{P}(X_N=x_N|X_{n+1}=x_{n+1},\ldots,X_{N-1}=x_{N-1})\mathbb{P}(X_{n+1}=x_{n+1},\ldots,X_{N-1}=x_{N-1})}.

Now, by the standard Markov property on the original chain (X_n), the first probability in each of the numerator and denominator are equal. This leaves us with exactly the same form of expression as before, but with one fewer term in the probability. So we can iterate until we end up with

\frac{\mathbb{P}(X_n=x_n,X_{n+1}=x_{n+1})}{\mathbb{P}(X_{n+1}=x_{n+1})}=\mathbb{P}(X_n=x_n|X_{n+1}=x_{n+1}),

as required.

So there’s nothing wrong with the definition. The reversed chain Y genuinely does have this property, regardless of the initial distribution of X.

In particular, if our original Markov chain starts at a particular state with probability 1, and we run it up to time N, then saying that the time-reversal is a Markov chain too is making a claim that we have a non-trivial chain that converges from some general distribution at time 0 to a distribution concentrated at a single point by time N. This seems to contradict everything we know about these chains.

Convergence to Equilibrium – Markov Property vs Markov Chains

It took us a while to come up with a reasonable explanation for this apparent discrepancy. In the end, we come to the conclusion that Markov chains are a strict subset of stochastic processes with the Markov property.

The key thing to notice is that a Markov chain has even more regularity than the definition above implies. The usual description via a transition matrix says that the probability of moving to state y at time t+1 given that you are at state x at time t is some function of x and y. The Markov property says that this probability is independent of the behaviour up until time t. But we also have that the probability is independent of t. The transition matrix P has no dependence on time t – for example in a random walk we do not have to specify the time to know what happens next. This is the property that fails for the non-stationary time-reversal.

In the most extreme example, we say X_0=x_0 with probability 1. So in the time reversal, \mathbb{P}(Y_N=x_0|Y_{N-1}=y_{N-1})=1 for all y_{N-1}. But it will obviously not be the case in general that \mathbb{P}(Y_n=x_0|Y_{n-1}=y_{n-1})=1 for all y_{n-1}, as this would mean the chain Y would be absorbed after one step at state x_0, which is obviously not how the reversal of X should behave.

Perhaps the best way to reconcile this difference is to consider this example where you definitely start from x_0. Then, a Markov chain in general can be thought of as a measure on paths, that is \Omega^N, with non-trivial but regular correlations between adjacent components. (In the case of stationarity, all the marginals are equal to the stationary distribution – a good example of i.d. but not independent RVs.) This is indexed by the transition matrix and the initial distribution. If the initial distribution is a single point mass, then this can be viewed as a restriction to a smaller set of possible paths, with measures rescaled appropriately.

What have we learned?

Well, mainly to be careful about assuming extra structure with the Markov property. Markov Chains are nice because there is a transition matrix which is constant in time. Other processes, such as Brownian motion are space-homogeneous, where the transitions, or increments in this context, are independent of time and space. However, neither of these properties are true for a general process with the Markov property. Indeed, we have seen in a post from a long time ago that there are Markov processes which do not have the Strong Markov Property, which seems unthinkable if we limit our attention to chain-like processes.

Most importantly, we have clarified the essential point that reversing a Markov Chain only makes sense in equilibrium. It is perfectly possibly to define the reversal of a chain not started at a stationary distribution, but lots of unwelcome information from the forward chain ends up in the reversed chain. In particular, the theory of Markov Chains is not broken, which is good.

Supremum of Brownian Motion

We define the supremum process of Brownian Motion by:

S_t:=\sup_{0\leq s\leq t}B_s.

Here are two facts about Brownian Motion. Firstly, the Reflection Principle:

\mathbb{P}(S_t\geq b,B_t\leq a)=\mathbb{P}(B_t\geq 2b-a),

which we motivate by ‘stopping’ at time S_t, and using the SMP for Brownian Motion, even though it isn’t a stopping time. By setting a=b, we get:

\mathbb{P}(S_t\geq b)=\mathbb{P}(S_t\geq b,B_t\leq b)+\mathbb{P}(B_t\geq b)=2\mathbb{P}(B_t\geq b)=\mathbb{P}(|B|\geq b),

and conclude that

S_t\stackrel{d}{=}|B_t|\quad\text{for each }t\geq 0.

The second fact comes from the decomposition of BM into local times and excursions:

(S_t,S_t-B_t)_{t\geq 0}\stackrel{d}{=}(L_t,|B_t|)_{t\geq 0},

where L is the local time process at 0, and this equality in distribution holds for the processes. See the previous post on excursion theory for explanation of what local times mean.

In particular, combining these two facts gives:

S_t\stackrel{d}{=}S_t-B_t\quad\text{for every }t\geq 0.

I thought that was rather surprising, and wanted to think of a straightforward reason why this should be true. I think the following works:

Brownian motion is time-reversible. In particular, as processes, we have

(B_s)_{s\geq 0}\stackrel{d}{=}(B_{t-s}-B_t)_{s\geq 0}

\Rightarrow \sup_{0\leq r\leq t}B_r\stackrel{d}{=}\sup_{0\leq r\leq t}(B_{t-r}-B_t)

\Rightarrow S_t\stackrel{d}{=}S_t-B_t.

Strong Markov Property for BM

The Strong Markov Property is the most important result to demonstrate for any Markov process, such as Brownian Motion. It is also probably the most widely requested item of bookwork on the Part III Advanced Probability exam. I feel it is therefore worth practising writing as quickly as possible.

Theorem (SMP): Take (B_t) a standard (\mathcal{F}_t)-BM, and T an a.s. finite stopping time. Then (B_{T+t}-B_T,t\geq 0) is a standard BM independent of \mathcal{F}_T.

Proof: We write B_t^{(T)}=B_{T+t}-B_T for ease of notation. We will show that for any A\in\mathcal{F}_T and F bounded, measurable:

\mathbb{E}[1_AF(B_{T+t_1}-B_T,\ldots,B_{T+t_n}-B_T)]=\mathbb{P}(A)\mathbb{E}F(B_{t_1},\ldots,B_{t_n})

This will suffice to establish independence, and taking A=\Omega\in\mathcal{F}_t shows that B_t^T is a standard BM since (Levy), BM is uniquely characterised by its finite joint distributions.

To prove the result, we approximate discretely, and apply the Markov property.

\mathbb{E}[1_AF(B_{t_1}^{(T)},\ldots)]=\lim_{m\rightarrow\infty}\sum_{k=1}^\infty \mathbb{E}[1_{A\cap\{T\in((k-1)2^{-m},k2^{-m}]\}}F(B_{t_1}^{(k2^{-m})},\ldots)]

by bounded convergence, using continuity of F, right-continuity of B, and that T<\infty a.s. (so that 1_A=\sum 1_{A\cap \{T\in(-,-]\}})

\stackrel{\text{WMP}}{=}\lim_{m\rightarrow\infty}\sum_{k=1}^\infty \mathbb{P}[A\cap\{T\in((k-1)2^{-m},k2^{-m}]\}]\mathbb{E}F(B_{t_1},\ldots,B_{t_n})

\stackrel{\text{DOM}}{=}\mathbb{P}(A)\mathbb{E}F(B_{t_1},\ldots,B_{t_n})

which is exactly what we required.

Remarks: 1) We only used right-continuity of the process, and characterisation by joint marginals, so the proof works equally well for Levy processes.

2) We can in fact show that it is independent of \mathcal{F}_T^+, by considering T+\frac{1}{n} which is still a stopping time, then taking a limit in this as well in the above proof. For details of a similar result, see my post on Blumenthal’s 0-1 Law.

Feller Processes and the Strong Markov Property

Markov Property

We go way back to the Part 1B short course Markov Chains. In the first lecture of this course, we met discrete time Markov Chains. A definition was given, in terms of conditional single period transition probabilities, and it was immediately proved that general transition probabilities are specified by sums of products of entries in a so-called transition matrix. This proves very useful for performing calculations. But the question will inevitably be asked: “Prove that this is a Markov process.” And the answer “because it obviously is” isn’t good enough.

The point is that all of the above is relevant to the setting, but the ideal definition of a Markov process is something like the very general statement:

Conditional on the present, the past and the future are independent.

This opens up the possibility of a large class of processes being Markov processes. A technical definition would be that for s<t and a measurable subset of the state space.

\mathbb{P}(X_t\in A|\mathcal{F}_s)=\mathbb{P}(X_t\in A|\sigma(X_s)).

It is easy to check that this is equivalent to the original definition in the that context.

Strong Markov Property

SMP states that given a stopping time T, conditional on the event \{T<\infty\}:

(X_{T+t}-X_T, t\geq 0)\stackrel{d}{=}(X_t^0,t\geq 0),

that is, the process started at time T has the same distribution as the original process started from 0 (in space as well as time). Furthermore, it is independent of \mathcal{F}_T, which requires technical definition, but which informally is the sigma field of events defined by what happens up to time T.

For a discrete time Markov chain, prove SMP by pre-multiplying by the indicator function 1(T=n), which reduces SMP to the normal Markov property. Then take the countable sum over (which is permissible) to get SMP. For Brownian Motion in one dimension, make dyadic approximations to the stopping time from above. SMP applies to these approximations, and measure theoretic machinery and the (almost sure) continuity of paths allows the equivalence of distributions to hold in the limit. Independence follows by expressing \mathcal{F}_T=\cap \mathcal{F}_{T_n} as the intersection of sigma fields corresponding to the approximations.

In both of these cases, an implicit countable substructure (discrete time and continuity respectively) have been required to deduce SMP. This suggests that there are Markov processes which do not have SMP.

Motivating Counterexample

Take B to be a Brownian Motion in one dimension, with B_0 a RV which contains 0 in its support. Now define the the process:

X_t=B_t1_{B_0\neq 0}.

Then X is certainly not Strong Markov, by considering the hitting time of 0. Then the process started there is either identically 0, or a standard BM, but which is determined by time 0 properties rather than time T properties.

But is Markov. Take s<t and A Borel. Then:

\mathbb{P}(X_t\in A|\mathcal{F}_s)=\mathbb{E}[1(X_t\in A\cap X_0\neq 0)+1(X_t\in A\cap X_0=0)|\mathcal{F}_s]

=1(X_0\neq 0)\int_A \frac{1}{\sqrt{2\pi(t-s)}}\exp(-\frac{(X_s-y)^2}{2(t-s)})dy+1(X_0=0)1(0\in A)

1(X_s\neq 0)\int_A(\ldots)dy + 1(X_s=0,X_0\neq 0)\int_A(\ldots)dy + 1(0\in A)[1(X_s=0)-1(X_s=0,X_0\neq 0)]

Now 1(X_s=0, X_0\neq 0)=0 a.s. so

= \mathbb{E}[1(X_t\in A)|X_s], which is precisely the Markov property.

Feller Property

In general, it hard to verify the Strong Markov Property for a given stochastic process. Instead, we consider a property which is stronger, but easier to check. Continue reading

Remarkable fact about Brownian Motion #4: The Dirichlet Problem

So this property of Brownian Motion is so elegant, in my opinion, that when I was recently asked what my ‘favourite theorem’ was, I suggested this. With this result, we can use this probabilistic structure to specify solutions to an important PDE, with boundary conditions, over a large class of domains.

Given a domain D, Laplace’s equation is: \Delta u=0 on D, and u=f on the boundary dD, where f is any continuous function defined there. This PDE arises wherever the notion of potentials is defined, for example electromagnetism, fluids and thermodynamics.

Theorem: Given suitable regularity conditions on D to be discussed later, Laplace’s equation has a unique solution, given by:

u(x)=\mathbb{E}_x[f(B_{T_D})]

Notation: First, what does this mean? Define T_D:=\inf\{t:B_t\not\in D\}, to be the time at which a Brownian Motion leaves the domain D. This is a stopping time, and so will be suitable for application of the Strong Markov Property. \mathbb{E}_x means that we are taking expectation with respect to a BM started at x. So informally, we are defining u(x) as: start a BM at x; see where it hits the boundary of D; record the value of f at that point. Then set u(x) to be the expected value of this process.

Existence: First, we are going to check that the solution conjectured is a solution. We will need a lemma:

Lemma: A locally-bounded function u satisfies \Delta u=0 on a domain D if and only if it has the property that for every closed ball \bar{B(x,r)}\subset D we have:

u(x)=\frac{1}{\sigma_{x,r}(S(x,r))}\int_{S(x,r)}u(z)d\sigma_{x,r}(z)

where \sigma_{x,r} is the surface area measure on the boundary S(x,r) of the ball radius r centred on x. Essentially, this says that u(x) is equal to the average value of u on a ball around x.

Proof of Theorem: First, existence. Set u as specified in the statement of the theorem. Given a Brownian Motion started at x, we have stopping times T_r<T_D corresponding to the hitting times of the ball radius r around x and the boundary dD. The domination condition holds by continuity provided B(x,r) is contained within D. So we may apply the Strong Markov Property:

\mathbb{E}_x[f(B_{T_D})]=\mathbb{E}_x[\mathbb{E}_x[f(B_{T_D})|\mathcal{F}_{T_r}]]=\mathbb{E}_x[\mathbb{E}_{B_{T_r}}[f(B_{T_D})]]

By definition, the left hand expression is u(x). But also, because the distribution of B_{T_r} is uniform on S(x,r), the right hand side is equal to:

\frac{1}{\sigma_{x,r}(S(x,r))}\int_{S(x,r)}u(z)d\sigma_{x,r}(z)

and so by the lemma, this guarantees that the function u is harmonic on the interior of D.

The lemma can also be used to show uniqueness. Continue reading