Parking on a ring, linear hashing

I’ve spent most of my doctorate trying to analyse how adding destructive dynamics affects the behaviour of a particular random growth process, the classical random graph. In this post I’m going to talk about another random growth process, which is slightly less natural, but for which one can show some similar qualitative properties.

The model, and the additive coalescent

Consider m places arranged in a circle, and for consistency of analogy we think of these as parking spaces. Some number n of cars will arrive one at a time. Each car will arrive at a space chosen uniformly at random. If it is empty they will park in it, otherwise they will look clockwise until they find an empty space, and park there. For now we are only interested in growth, so we assume cars never leave. We are interested in the sizes of blocks of consecutively parked cars.

The reason to consider this slightly unnatural statement is its equivalence to the problem of hashing with linear probing, apparently a key topic in computer science, which I won’t pretend that I know anything about. In any case, it’s a nice model, and it seems reasonable that it would have a basis in more realistic search algorithms.

So, how does the sequence of sizes of blocks of consecutively parked cars grow? Well, given the sequence of block sizes, it is reasonably easy to convince yourself that the order of the blocks around the circle is uniformly random, and the number of empty spaces between adjacent blocks is also uniformly random.

Assume for now that there are at least three blocks. A block of size x can merge with a block of size y with the arrival of the next car only if the blocks are adjacent, with exactly one empty space between them. The chance of this is uniform among all pairs of blocks. Now suppose this is the case, and that the block of size y lies clockwise from the block of size x. Then they will merge precisely if the next car arrives at any of the x occupied spaces in that block, or at the empty space between the pair of blocks. This has probability \frac{x+1}{m}. There’s also the opposite ordering to consider, where the block of size x lies clockwise from the other. The total probability of this merge \{x,y\}\mapsto \{x+y+1\} is therefore proportional to (x+y+2).

So the process of block sizes looks a bit like the additive coalescent, at least for large blocks. This is in contrast to the random graph process, where the sequence of component sizes behaves exactly like a multiplicative coalescent, where blocks merge at a rate proportional to the product of their sizes.

Asymptotics

As in the random graph process, it’s interesting to ask roughly how large the largest block will be in such a configuration. Pittel [3] considers the case where the number of empty places \ell = m-n \approx \beta m, for some \beta\in (0,1).

A less interesting model would be to choose the positions of the n cars uniformly at random. But then the size of a block is roughly geometric with parameter \beta, and there are \Theta(m) blocks with high probability. Relatively straightforward calculations in extreme value theory suggest that the largest block is likely to have size on the order of \log m in this setting.

Of course, the actual model is slightly more complicated, because the size of a block is self-reinforcing, since larger blocks are more likely to grow than smaller blocks. However, we can still get somewhere with naïve estimates. Let’s label the places clockwise. Then in order for there to be a block starting at 0 and stretching beyond \alpha \log m, a necessary condition is that at least \alpha \log m cars arrive at those places. The number of cars which arrive at those places is binomial, since there are n cars, and each arrives at a place chosen uniformly, and independently of the other cars. So this event corresponds to

\mathrm{Bin}(n,\frac{\alpha \log m}{m}) \ge \alpha \log m.

Then, since n\approx (1-\beta)n, this event corresponds approximately to

\mathrm{Po}((1-\beta)\alpha \log m) \ge \alpha \log m.

The probability that a Poisson RV is at least a constant multiple larger than its mean decays exponentially with the mean, hence in this case the probability is asymptotically some negative power of m, depending on the value of \alpha. But there are O(m) possible places for such a block to start, so whether we can apply a union bound usefully or not depends on whether the power of m is strictly less than -1.

Since all of this depends on \alpha, it is reasonable that everything is fine, and the largest block does have size at least \alpha \log m when \alpha is small, and very unlikely when \alpha is large. This heuristic argument fits with Pittel’s theorem. Indeed, his result shows much stronger concentration: that the fluctuations of the size of the largest block are O(1).

Critical regime and empirical processes

The following is a paraphrase of the introduction and some methods from [2].

Obviously, once m=m cars have arrived, there’s no room for manoeuvre and definitely all the places are taken in one giant block. But it’s not obvious in general what scaling for the number of gaps will give rise to giant blocks of \Theta(m) cars.

As for the random graph, we can find a process similar to the exploration process of a (random) graph which encodes much of the information we care about. Let Y_k be the number of cars which arrive at place k. So the sum of the Y_ks will be n, the total number of cars. Now consider the process

C_0=0, \ldots, C_{k+1}=C_k + Y_{k+1}-1.

A block has the property that the number of arrivals within that set of places is equal to the number of places. So every time this *empirical process* C drops below its previous running minimum, this indicates the end of a block. To make this equivalence precise, we need to be a bit careful about where we start counting. It works exactly if we start at the beginning of a block. If not, it might introduce some unwanted divisions within the first block.

What we have is a process that looks roughly like a random walk that is constrained to pass through the point (m,n-m), which is equal to (m,-l). Even if we aren’t totally precise about how this is like a random walk, we would expect to see Brownian fluctuations after rescaling. Indeed, we might expect to see a Brownian bridge added to a deterministic linear function with negative gradient. But this is only meaningful if the random part is at least as large as the deterministic part, and since the fluctuations have order \sqrt{m}, if l is much larger than this, the rescaled empirical process is essentially deterministic, so we won’t see any macroscopic excursions above the minimum.

If l is substantially smaller than \sqrt{m}, then there is no real difference between (m,-l) and (m,0), and what we see is just a Brownian bridge. At this point, where we choose to start the process is actually important. If we were to start it at the minimum of the Brownian bridge instead, we would have seen a Brownian excursion, which corresponds to one block occupying (almost) all of the places.

Unsurprisingly, the story is completed by considering \ell=\Theta(\sqrt{m}), where the rescaled empirical process looks like a slanted Brownian bridge, that is Brownian motion conditioned to pass through $(1,-\frac{\ell}{\sqrt{m})$. There isn’t an obvious fix to the question of where to start the process, but it turns out that the correct way is now adding a Brownian excursion onto the deterministic linear function with gradient - \frac{\ell}{\sqrt{m}}. It’s now reasonable that the excursions above the minimum should macroscopic.

This scaling limit works dynamically as well, where the same Brownian excursion is used for different gradients of the deterministic line, corresponding to \ell moving through the critical window m-\Theta(\sqrt{m}). Finally, a direction to Bertoin’s recent paper [1] for the model with an additional destructive property. Analogous to the forest fire, blocks of cars are removed at a rate proportional to their size (as a result, naturally, of ‘Molotov cocktails’…). Similar effects of self-organised criticality are seen when the rate of bombs is scaled appropriately.

References

[1] – Bertoin – Burning cars in a parking lot (paper / slides)

[2] – Chassaing + Louchard – Phase transition for parking blocks, Brownian excursion and coalescence (arXiv)

[3] – Pittel – Linear probing: the probable largest search time grows logarithmically with the number of records

Advertisement

Fair games and the martingale strategy III

Gambler’s Ruin

Continuing directly from the previous post, the nicest example of the optional stopping theorem we developed there is to example a simple random walk constrained between two values, say 0 and N. This represents an idealised gambling situation, where the gambler stops playing either when they reach some pre-agreed profit, or when they go bankrupt. We assume that we start at level k, for k = 1,2,…,N-1.

Naturally, we want to know the probabilities of winning (ie getting to N) and losing (ie going bankrupt). We could set this up by conditioning on the first step. Let p_k be the probability of winning starting from level k. Then we must have

p_k= \frac12 p_{k+1}+\frac12 p_{k-1},\quad k=1,\ldots,N-1, (*)

with the obvious boundary conditions p_0=0, p_N=1. In an ideal world, we just know how to solve second order difference equations like (*). Well, actually it isn’t too hard, because we can see from (*) directly that

p_{k+1}-p_k = p_k-p_{k-1},

and so p_k is a linear function of k, and so p_k = k/N follows pretty much immediately.

But, we can also use OST profitably. Let T be the time at which we first hit 0 or N. It’s intuitively clear that this should have finite expectation, since the problems you might encounter with just the hitting time of a single level shouldn’t apply. Or you can consider the expected number of steps before you see N ups or downs in a row, which certainly provides an upper bound on T. This random number of steps is sort of geometric (at least, can be upper bounded by a geometric RV) and so has finite expectation. So can apply OST to X at T, and we have

\mathbb{E}[X_T] = N\cdot \mathbb{P}(X_T=N) + 0 \cdot \mathbb{P}(X_T=0) = \mathbb{E}[X_0]=k,

from which we also derive p_k=k/N.

The reason we talk about gambler’s ruin is by considering the limit N\rightarrow\infty with k fixed. After a moment’s thought, it’s clear we can’t really talk about stopping the process when we hit infinity, since that won’t happen at any finite time. But we can ask what’s the probability that we eventually hit zero. Then, if we imagine a barrier at level N, the probability that we hit 0 at some point is bounded below by the probability that we hit 0 before we hit level N (given that we know we hit either zero or level N with probability one), and this is \frac{N-k}{N}, and by choosing N large enough, we can make this as close to 1 as we want. So the only consistent option is that the probability of hitting 0 at some point is one. Hence gambler’s ruin. With probability one, ruin will occur. There’s probably a moral lesson hiding there not especially subtly.

A problem about pricing options

So the deal here seems to be that if you just care about your average, it doesn’t matter how to choose to play a sequence of fair games. But what if you care about something other than your average? In any real setting, we maybe care about slightly more than this. Suppose I offer you a bet on a coin toss: you get £3 if it comes up heads, and I get £1 if it comes up tails. Sounds like a good bet, since on average you gain a pound. But what about if you get £10,003 if it comes up heads and I get £10,001 if it comes up tails? I’m guessing you’re probably not quite so keen now.

But if you were an international bank, you might have fewer reservations about the second option. My intention is not to discuss whether our valuation of money is linear here, but merely to offer motivation for the financial option I’m about to propose. The point is that we are generally risk-averse (well, most of us, most of the time) and so we are scared of possible large losses, even when there is the possibility of large profits to balance it out.

Let’s assume we have our simple random walk, and for definiteness let’s say it starts at £1. Suppose (eg as a very niche birthday present) we have the following opportunity: at any point between now and time t=5, we have the right to buy one unit of the stock for £2.

We want to work out how much this opportunity, which from now on I’m going to call an option, is worth on average. Note that now it does seem that when we choose to cash in the option will have an effect on our return, and so we will have to include this in the analysis.

Note that, once we’ve bought a unit of the stock, we have an asset which is following a simple random walk (ie sequential fair games) and so from this point on its expected value remains unchanged. So in terms of expectation, we might as well sell the stock at the same moment we buy it. So if we cash in the option when the stock is currently worth £X, we will on average have a return of £(X-2). This means that we’ll only ever consider exercising our option if the current value of the stock is greater than £2. This narrows down our strategy slightly.

This sort of option minimises the risk of a large loss, since the worst thing that happens is that you never choose to exercise your option. So if you actually paid for the right to have this option, that cost is the largest amount you can lose. In the trading world, this type of opportunity is called an American option.

The trick here is to work backwards in time, thinking about strategies. If at time t=4, the stock is worth £1, then the best that can happen is that it’s worth £2 at time t=5, and this still gains you no wealth overall. Similarly if it’s worth £0 at time t=3. So we’ve identified a region where, if the stock value enters this region, we might as well rip up our contract, because we definitely aren’t going to gain anything. Remember now that we’ve also said you won’t ever cash in if the stock’s value is at most £2, because you don’t gain anything on average.

Now suppose that the stock has value £3 at time t=4. There’s no danger of it ever getting back below £2 during the lifetime of the option, so from now on your potential return is following the trajectory of a simple random walk, ie a fair game. So on average, it makes no difference whether you cash in now, or wait until t=5, or some combination of the two. The same argument holds if the stock has value £4 at time t=3 or time t=4, and so we can identify a region where you might as well cash in.

American Option 1

What about the final region? If the stock value is greater than £2, but not yet in the definitely-cash-in area, what should you do? Well, if you think about it, the value of the stock is a fair game. But your return should be better than that, because the stock price doesn’t take account of the fact that you wouldn’t buy in (and make a loss overall) if the value drops below £2. So at this stage, your future options are better than playing a fair game, and so it doesn’t make sense (in terms of maximising your *average*) to cash in.

Now we can actually work backwards in time to establish how much any starting value is worth under this optimal strategy. We can fill in the values in the ‘doomed’ area (ie all zeros) and on the ‘cash in now’ area (ie current value minus 2), and construct backwards using the fact that we have a random walk.

American Option 2

The final answer ends up being 7/16 if the stock had value £1 at time 0. Note that the main point here is that working out the qualitative form of the strategy was the non-trivial part. Once we’d done that, everything was fairly straightforward. I claim that this was a reasonably fun adjustment to the original problem, but have minimal idea whether pricing options is in general an interesting thing to do.

Anyway, I hope that provided an interesting overview to some of the topics of interest within the question of how to choose strategies for games based on random processes.

Fair games and the martingale strategy II

Optional Stopping

We continue directly from the end of the last post, where I was talking about how to play sequences of fair games, and whether by playing cunningly (including choosing when to stop playing cunningly) you can end up with an ‘unfair’ game overall. (Ie where you make a profit or a loss on average.) We gave two examples. First, the martingale strategy, where on a sequence of fair games you double your stake each time you lose. The result is that you win back your original stake at some point with probability one, but possibly accumulate huge temporary losses along the way. In the second game, you follow the path of a simple random walk from zero until it hits one, and then cash in. Here we observe that the time until this happens is almost surely finite, but has infinite expectation.

There’s another possible problem. It seems ridiculous, but suppose we could look into the future. Then our strategy for the random walk might be something like: check in advance what will happen in the first ten steps, and stop playing whenever we get to the moment which we know is the maximum value the walk will attain. Well then, sometimes the walk will never go above zero, in which case we will stop playing at the very start, and sometimes the walk will go above zero, in which case we make a positive amount. So overall, our mean return must be positive. Obviously if we have the option to adjust our stakes, this is completely ridiculous, because we would bet high (infinitely high?) if we knew we were about to win, and nothing if we were about to lose. So, obvious though it seems, we should emphasise that we mustn’t be allowed to look into the future!

The optional stopping theorem says that looking into the future, and these two problems already mentioned are essentially all that can go wrong. To say anything more interesting, at this point we really do need a little bit of notation.

In general, a sequence of fair games of this kind is called a martingale. The origin of the word is fairly unclear – see this unexpectedly comprehensive article. The martingale will be something like X_0,X_1,X_2,\ldots, representing the wealth (or whatever) at discrete time-steps. The key property is the fair game property, which says that whatever has happened up to time k, the next game is fair. That is:

\mathbb{E}[X_{k+1}-X_k \,|\,\text{any event involving }X_0,\ldots,X_k] = 0. (*)

Note that in any of the situations we are describing, X should describe our wealth, rather than the underlying process. In the random walk example, these are the same, but in the martingale strategy suggestion, X_k should be our wealth after the kth game, rather than anything directly recording the outcomes of the coin tosses.

If we allow X_0 to be random (and of course, being always equal to zero is a special case of being random…) we can then add up an initial sequence of such equations to obtain

\mathbb{E}[X_k]=\mathbb{E}[X_k-X_{k_1}] + \ldots + \mathbb{E}[X_1-X_0] + \mathbb{E}[X_0]=\mathbb{E}[X_0]. (**)

That is, if we play any sequence of fair games a fixed finite number of times, we have overall a fair game. (In the original strategy, we have a martingale, irrespective of the precise rule we use to choose how much we invest on each coin toss.) But what if we stop the process at a time determined by the current knowledge? (ie without looking into the future.)

Let’s call an example of such a random time T, and this property that we aren’t allowed to look into the future is described technically as the stopping time property. A proper setup would use more notation and fewer words at (*), but even without sigma-algebras, we can say that T is a stopping time if deciding whether T=k depends only on X_0,X_1,\ldots,X_k, and not on later actions.

Informal Proof

To show the optional stopping theorem, the key idea is that if you want to stop at time T, one option is to keep playing beyond time T with zero stakes. Thus we have a fair game at all times, even after T. We write this as X_{T\wedge k}, where \wedge means ‘minimum’, so that if k>T, the process stays constant.

Since X_{T\wedge k} is a martingale, we can invoke (**),

\mathbb{E}[X_{T\wedge k}] = \mathbb{E}[X_0].

Now what happens if we take k to be very large? How well does this truncated average approximate \mathbb{E}[X_T] itself?

This is where we want to return to our assumptions about what might make this go wrong. Let’s say that T has finite expectation, and that there is some absolute bound on how large an increment can be, say C. Then, whenever T\le k, we have X_T=X_{T\wedge k}. And when T>k, we have

|X_T - X_{T\wedge k}| = |X_T-X_k| \le C(T-k).

Therefore

|\mathbb{E}[X_T]-\mathbb{E}[X_0]|= |\mathbb{E}[X_T] - \mathbb{E}[X_{T\wedge k}] | \le C \mathbb{E}[(T-k)\vee 0], (***)

where we take the final expectation only across T-k when this quantity is positive, since this is the only case which contributes to the left hand side.

Now we need to show that by choosing k large enough, we can make the RHS very small. Obviously we don’t have a chance of doing this if C is not finite! With a bit of thought, we can see that \mathbb{E}[(T-k)\vee 0]\ge \mathbb{E}[T] - k, and so we also don’t have a chance of doing this if \mathbb{E}[T]=\infty. But if \mathbb{E}[T]<\infty, then \sum_{\ell\ge 1} \ell \mathbb{P}(T=\ell) <\infty, and so

\sum_{\ell \ge k} \ell \mathbb{P}(T=\ell)\rightarrow 0,\quad \text{as }k\rightarrow\infty,

and so certainly

\mathbb{E}[(T-k)\vee 0] = \sum_{\ell \ge k}(\ell -k)\mathbb{P}(T=\ell) \rightarrow 0.

But (***) holds for all values of k, and so the only consistent option is that

\mathbb{E}[X_T]=\mathbb{E}[X_0].

There are a couple more combinations of conditions (mostly involving relaxing one of these slightly, and substantially strengthening the other) which also work, but this seems like the more natural form. For a full formal statement, there are many resources available, and the Wikipedia page, for example, isn’t too bad. In the mists of history, I wrote about some of these topics more formally, but maybe less helpfully, since I’d known the theory myself for about a week.

Fair games and the martingale strategy I

I went back to my school a couple of weeks ago and gave a talk. I felt I’d given various incarnations of a talk on card-shuffling too many times, so it was time for a new topic. The following post (and time allowing, one or two more) is pretty much what I said.

The Martingale Strategy

Suppose we bet repeatedly on the outcome of tossing a fair coin. Since it’s November, my heart is set on buying an ice cream that costs £1, so my aim is to win this amount from our game. My strategy is this:

First, I bet £1. If I win, then that’s great, because I now have made exactly enough profit to buy the ice cream. If I lose, then I play again, and this time I bet £2. Again, if I win, then my total profit is £2-£1 = £1, so I stop playing and buy the ice cream. If I lose, then I play a third time, again doubling my stake. So if I win for the first time on the seventh go, my overall profit will be

£64 – (£1+£2+£4+£8+£16+£32) = £1,

and it’s clear that this can be continued and I will eventually win a round, and at this point my total profit will be £1. So I will always eventually be able to buy my ice cream.

But, there’s nothing special about the value £1, so I could replace the words ‘ice cream’ with ‘private tropical island’, so why am I still here in the UK on a wet Monday when I could be on my beach lounger?

There are some fairly obvious reasons why the strategy I’ve described is not actually a fail-safe way to make a profit. For a start, although with probability one a head will come up eventually, there is a small positive chance that the first 200 rolls will all be tails. At this point, I would have accrued a debt of roughly 2^{200} pounds, and this is slightly more than the number of atoms in the universe. All this for an ice cream?

So there are major problems carrying out this strategy in a finite world. And of course, it’s no good if we stop after a very large but finite number of turns, because then there’s always this very small chance that we’ve made a very large loss, which is bad, partly because we can’t have the ice cream, but also because it exactly cancels out the chance of making our £1 profit, and so our overall average profit is exactly zero.

Though I’ve set this up in an intentionally glib fashion, as so often is the case, we might have stumbled across an interesting mathematical idea. That is, if we play a fair game a finite number of times, we have a fair game overall, meaning our overall average profit is zero. But if we are allowed to play a potentially infinite number of times, then it’s not clear how to define our overall ‘average’ profit, since we feel it ought to be zero, as an extension of the finite case, but also might be positive, because it ends up being £1 with probability one.

It’s tempting at this stage to start writing statements like

1 \times 1 + (-\infty) \times 0=0 ,

to justify why this might have come about, where we consider the infinitely unlikely event that is infinitely costly. But this is only convincing at the most superficial level, and so it makes more sense to think a bit more carefully about under exactly what circumstances we can extend our observation about the overall fairness of a finite sequence of individual fair games.

A second example

The previous example was based upon a series of coin tosses, and we can use exactly the same source of randomness to produce a simple random walk. This is a process that goes up or down by 1 in each time step, where each option happens with probability ½, independently of the history.

We could avoid the requirement to deal with very large bets by always staking £1, and then cashing in the first time we have a profit of £1. Then, if we start the random walk at zero, it models our profit, and we stop the first time it gets to 1. It’s not obvious whether we hit 1 with probability one. Let’s show this.

In order to hit some positive value k, the random walk must pass through 1, 2, and so on, up to (k-1) and then finally k. So \mathbb{P}(\text{hit k}) = [\mathbb{P}(\text{hit 1})]^k. And similarly for negative values. Also, the probability that we return to zero is the same as the probability that we ever hit 1, since after one time-step they are literally the same problem (after symmetry). So, if the probability of hitting 1 is p<1, then the number of visits to zero is geometric (supported on 1,2,3,…) with parameter p, and so

\mathbb{E}[\text{visits to k}] = \mathbb{E}[\text{visits to zero}] \times \mathbb{P}(\text{hit k})=(1+1/p) \times p^{|k|} = (p+1)p^{|k|-1}.

Thus, when we sum over all values of k, we are summing a pair of geometric series with exponent <1, and so we get a finite answer. But if the expected number of visits to anywhere (ie the sum across all places) is finite, this is clearly ridiculous, since we are running the process for an infinite time, and at each time-step we must be somewhere! So we must in fact have p=1, and thus another potential counter-example to the claim that a sequence of fair games can sometimes be unfair.

We might have exactly the same set of practical objections, such as this method requiring arbitrarily large liquidity (even though it doesn’t grow exponentially fast so doesn’t seem so bad).

What will actually turn out to be useful is that although the bets are now small, the average time until we hit 1 is actually infinite. Remember that, even though most things we see in real life don’t have this property, it is completely possible for a random variable to take finite values yet have infinite expectation.

Notes on the Martingale Strategy

There’s no reason why the originally proposed strategy had to be based upon fair coin tosses. This strategy might work in a more general setting, where the chance of winning on a given turn is not ½, or is not even constant. So long as at each stage you bet exactly enough that, if you win, you recoup all your losses so far, and one extra pound, this has the same overall effect.

Of course, we need to check that we do eventually win a round, which is not guaranteed if the probability of winning (conditional on not having yet won) decays sufficiently fast. If we let p_k be the probability of winning on turn k, given that we haven’t previously won, then we require that the probability of never winning \prod_{k\ge 1}(1-p_k)=0. By taking logs and taking care of the approximations, it can be seen that the divergence or otherwise of \sum p_k determines which way this falls.

In the next post, we’ll talk about how the two problems encountered here, namely allowing large increments, and considering a stopping time with infinite expectation are exactly the two cases where something can go wrong. We’ll also talk about a slightly different setting, where the choice of when to stop playing becomes a bit more dynamic and complicated.

Pencils, Simson’s Line and BMO1 2015 Q5

When on olympiad duty, I normally allow myself to be drawn away from Euclidean geometry in favour of the other areas, which I feel are closer to home in terms of the type of structures and arguments I am required to deal with in research. For various reasons, I nonetheless ended up choosing to present the solution to the harder geometry on the first round of this year’s British Mathematical Olympiad a couple of weeks ago. The paper was taken a week ago, so I’m now allowed to write about it, and Oxford term finished yesterday so I now have time to write up the notes I made about it during a quick trip to Spain. Here’s three gratuitous photos to remind us all what a blue sky looks like:

And here’s the statement of the problem:

BMO1 Q5 a

and you can find the video of the solution I presented here (at least for now). Thanks to the AV unit at the University of Bath, not just as a formality, but because they are excellent – I had no right to end up looking even remotely polished.

As so often with geometry problems, the key here is to find an entry point into the problem. There are a lot of points and a lot of information (and we could add extra points if we wanted to), but we don’t expect that we’ll need to use absolutely all the information simultaneously. The main reason I’m going to the trouble to write this blog post is that I found an unusually large number of such entry points for this problem. I think finding the entry points is what students usually find hardest, and while I don’t have a definitive way to teach people how to find these, perhaps seeing a few, with a bit of reverse reconstruction of my thought process might be helpful or interesting?

If you haven’t looked at the problem before, you will lose this chance if you read what follows. Nonetheless, some of you might want to anyway, and some of you might have looked at the problem but forgotten it, or not have a diagram to hand, so here’s my whiteboard diagram:

BMO1 Q5 b

Splitting into stages

A natural first question is: “how am supposed to show that four points are collinear?” Typically it’s interesting enough to show that three points are collinear. So maybe our strategy will be to pick three of the points, show they are collinear, then show some other three points are collinear then patch together. In my ‘official solution’ I made the visual observation that it looks like the four points P,Q,R,S are not just collinear, but lie on a line parallel to FE. This is good, because it suggests an alternative, namely split the points P,Q,R,S into three segments, and show each of them is parallel to FE. We can reduce our argument by 1/3 since PQ and RS are symmetric in terms of the statement.

BMO1 Q5 c

So in our reduced diagram for RS, we need an entry point. It doesn’t look like A is important at all. What can we say about the remaining seven points. Well it looks like we’ve got a pencil of three lines through C, and two triangles each constructed by taking one point on each of these lines. Furthermore, two pairs of sides of the triangles are parallel. Is this enough to prove that the third side is parallel?

Well, yes it is. I claim that this is the natural way to think about this section of the diagram. The reason I avoided it in the solution is that it requires a few more lines of written deduction than we might have expected. The key point is that saying BF parallel to DR is the same as saying BFC and DRC are similar. And the same applies to BE parallel to DS being the same as saying BEC similar to DSC.

We now have control of a lot of angles in the diagram, and by being careful we could do an angle chase to show that <FEB = <RSD or similar, but this is annoying to write down on a whiteboard. We also know that similarity gives rise to constant ratios of lengths. And this is (at least in terms of total equation length) probably the easiest way to proceed. FC/RC = BC/DC by the first similarity relation, and EC/SC=BC/DC by the second similarity relation, so FC/RC = EC/SC and we can reverse the argument to conclude FE || RS.

So, while I’m happy with the cyclic quadrilaterals argument in the video (and it works in an almost identical fashion for the middle section QR too), spotting this pencil of lines configuration was key. Why did I spot it? I mean, once A is eliminated, there were only the seven points in the pencil left, but we had to (actively) make the observation that it was a pencil. Well, this is where it becomes hard to say. Perhaps it was the fact that I was working out of a tiny notebook so felt inclined to think about it abstractly before writing down any angle relations (obviously there are lots)? Perhaps it was because I just knew that pencils of lines and sets of parallel lines go together nicely?

While I have said I am not a geometry expert, I am aware of Desargues’ Theorem, of which this analysis is a special case, or at least of the ingredients. This is not an exercise in showing off that I know heavy projective machinery to throw at non-technical problems, but rather that knowing the ingredients of a theorem is enough to remind you that there are relations to be found, which is certainly a meta-analytic property that exists much more widely in mathematics and beyond.

Direct enlargment

If I’d drawn my board diagram even more carefully, it might have looked like FE was in fact the enlargement of the line P,Q,R,S from D by a factor of 2. This is the sort of thing that might have been just an accidental consequence of the diagram, but it’s still worth a try. In particular, we only really need four points in our reduced diagram here, eg D,E,F,R, though we keep in mind that we may need to recall some property of the line FR, which is really the line FC.

Let’s define R’ to be the enlargement of R from D by a factor 2. That is, we look along the ray DR, and place the point R’ twice as far from D as R. We want to show that R’ lies on FE. This would mean that FR is the perpendicular bisector of DR’ in the triangle FDR’, and would further require that FR is the angle bisector of <DFR’, which we note is <DFE. At this stage our diagram is small enough that I can literally draw it convincingly on a post-it note, even including P and P’ for good measure:

BMO1 Q5 d

So all we have to do is check that FC (which is the same as FR) is actually the angle bisector of DFE, and for this we should go back to a more classical diagram (maybe without P,Q,R,S) and argue by angle-chasing. Then, we can reverse the argument described in the previous paragraph. Q also fits this analysis, but P and S are a little different, since these lie on the external angle bisectors. This isn’t qualitatively harder to deal with, but it’s worth emphasising that this might be harder to see!

I’ve described coming at this approach from the observation of the enlargement with a factor of 2. But it’s plausible that one might have seen the original diagram and said “R is the foot of the perpendicular from D onto the angle bisector of DFE”, and then come up with everything useful from there. I’m not claiming that this observation is either especially natural nor especially difficult, but it’s the right way to think about point R for this argument.

Simson Lines

The result about the Simson Line says that whenever P is a point on the circumcircle of a triangle ABC, the feet of the perpendiculars from P to the sides of the triangle (some of which will need to be extended) are collinear. This line is called the Simson line. The converse is also true, and it is little extra effort to show that the reflections of P in the sides are collinear (ie the Simson line enlarged from P by factor 2) and pass through the orthocentre H of ABC.

It turns out that this can be used to solve the problem quite easily. I don’t want to emphasise how to do this. I want to emphasise again that the similarity of the statement of the theorem to the statement of this particular problem is the important bit. Both involve dropping perpendiculars from a single point onto other lines. So even if it hadn’t worked easily in this case, it would still have been a sensible thing to try if one knew (and, crucially, remembered) the Simson line result.

I was working on this script during an evening in Barcelona, and tapas culture lends itself very well to brief solutions. Whether it was exactly between the arrival of cerveza and the arrival of morcilla or otherwise, this was the extent of my notes on this approach to the problem:

BMO1 Q5 e

And this makes sense. No computation or technical wizardry is required. Once you’ve identified the relevant reference triangle (here HEC), and have an argument to check that the point playing the role of P (here D) is indeed on the circumcircle (it’s very clear here), you are done. But it’s worth ending by reinforcing the point I was trying to make, that considering the Simson line is an excellent entry point to this problem because of the qualitative similarities in the statements. Dealing with the details is sometimes hard and sometimes not, and in this case it wasn’t, but that isn’t normally the main challenge.