We continue directly from the end of the last post, where I was talking about how to play sequences of fair games, and whether by playing cunningly (including choosing when to stop playing cunningly) you can end up with an ‘unfair’ game overall. (Ie where you make a profit or a loss on average.) We gave two examples. First, the martingale strategy, where on a sequence of fair games you double your stake each time you lose. The result is that you win back your original stake at some point with probability one, but possibly accumulate huge temporary losses along the way. In the second game, you follow the path of a simple random walk from zero until it hits one, and then cash in. Here we observe that the time until this happens is almost surely finite, but has infinite expectation.
There’s another possible problem. It seems ridiculous, but suppose we could look into the future. Then our strategy for the random walk might be something like: check in advance what will happen in the first ten steps, and stop playing whenever we get to the moment which we know is the maximum value the walk will attain. Well then, sometimes the walk will never go above zero, in which case we will stop playing at the very start, and sometimes the walk will go above zero, in which case we make a positive amount. So overall, our mean return must be positive. Obviously if we have the option to adjust our stakes, this is completely ridiculous, because we would bet high (infinitely high?) if we knew we were about to win, and nothing if we were about to lose. So, obvious though it seems, we should emphasise that we mustn’t be allowed to look into the future!
The optional stopping theorem says that looking into the future, and these two problems already mentioned are essentially all that can go wrong. To say anything more interesting, at this point we really do need a little bit of notation.
In general, a sequence of fair games of this kind is called a martingale. The origin of the word is fairly unclear – see this unexpectedly comprehensive article. The martingale will be something like , representing the wealth (or whatever) at discrete time-steps. The key property is the fair game property, which says that whatever has happened up to time k, the next game is fair. That is:
Note that in any of the situations we are describing, X should describe our wealth, rather than the underlying process. In the random walk example, these are the same, but in the martingale strategy suggestion, should be our wealth after the kth game, rather than anything directly recording the outcomes of the coin tosses.
If we allow to be random (and of course, being always equal to zero is a special case of being random…) we can then add up an initial sequence of such equations to obtain
That is, if we play any sequence of fair games a fixed finite number of times, we have overall a fair game. (In the original strategy, we have a martingale, irrespective of the precise rule we use to choose how much we invest on each coin toss.) But what if we stop the process at a time determined by the current knowledge? (ie without looking into the future.)
Let’s call an example of such a random time T, and this property that we aren’t allowed to look into the future is described technically as the stopping time property. A proper setup would use more notation and fewer words at (*), but even without sigma-algebras, we can say that T is a stopping time if deciding whether T=k depends only on , and not on later actions.
To show the optional stopping theorem, the key idea is that if you want to stop at time T, one option is to keep playing beyond time T with zero stakes. Thus we have a fair game at all times, even after T. We write this as , where means ‘minimum’, so that if k>T, the process stays constant.
Since is a martingale, we can invoke (**),
Now what happens if we take k to be very large? How well does this truncated average approximate itself?
This is where we want to return to our assumptions about what might make this go wrong. Let’s say that T has finite expectation, and that there is some absolute bound on how large an increment can be, say C. Then, whenever , we have . And when T>k, we have
where we take the final expectation only across T-k when this quantity is positive, since this is the only case which contributes to the left hand side.
Now we need to show that by choosing k large enough, we can make the RHS very small. Obviously we don’t have a chance of doing this if C is not finite! With a bit of thought, we can see that , and so we also don’t have a chance of doing this if . But if , then , and so
and so certainly
But (***) holds for all values of k, and so the only consistent option is that
There are a couple more combinations of conditions (mostly involving relaxing one of these slightly, and substantially strengthening the other) which also work, but this seems like the more natural form. For a full formal statement, there are many resources available, and the Wikipedia page, for example, isn’t too bad. In the mists of history, I wrote about some of these topics more formally, but maybe less helpfully, since I’d known the theory myself for about a week.