# EGMO 2015

It’s been a while since I last wrote anything substantial. There have been some posts in the pipeline, but mainly I’ve been working hard on technical things that don’t translate very well into blog posts, and when I haven’t been working on those, have felt like doing non-maths.

Anyway, among other things which happened recently were the UK’s IMO selection camp in Cambridge during the last week of March, and the fourth European Girls’ Mathematical Olympiad in Belarus this past weekend. At the former, I was quite busy organising various things, and so gave a session based on some of my favourite problems about points and lines that I’ve used in the past. My hope was that with each problem in turn the students would actively invest time in thinking about whether the ideas each other were having seemed likely to be profitable. The aim was that being critical about your approach is a necessary skill once you start having lots of ideas for approaches.

This is hard to practise at school just by reading around, whether regarding competition material or generally. In the competition world, official solutions often contain unmotivated magic. This is reasonable, since they are supposed to be a minimal elementary demonstration of the problem’s validity. Motivation is a luxury which space and time sometimes doesn’t permit. And the solutions you find on, for example, Mathlinks often give the impression that the solvers know how to do 25,000 specific types of problem, and the sole task is to identify which type the latest one belongs to. Relating problems to ones you’ve seen before is important, but can hide, or at least dilute the actual ideas in some cases. Knowing that a specific inequality is a special case of a big machine allows you to claim a ‘solution’ but doesn’t help you identify the relevant ideas.

Later in the camp, a conversation arose concerning to what extent the younger staff found these elementary methods and problems easier now that they had experienced various levels of higher education in mathematics than when they were at school. It’s a complicated question, and I don’t think there’s a simple answer. I think the students might suspect that doing a university degree teaches you ‘advanced techniques’ which immediately simplify some of these problems. In rare examples this can be the case, but the majority of the time, I personally feel the only advantage I have is perhaps better instincts for whether something is or isn’t going to work.

Probably a good test would be Euclidean geometry. Adult olympiad staff typically come in two flavours: those who used to be good at geometry and feel out of practice; and those who weren’t good at geometry and certainly had no inclination to remedy this after they left school. I’m probably closer to the first category and I definitely feel out of practice, but also have minimal inclination to remedy this. Nonetheless, on the rare occasions I try these questions (and it’s not going to be for 4.5 hours at this stage…) my progress rate is probably comparable to when I was in sixth form. I’ve no idea how to turn this into a more concrete assessment, but there must be something about doing abstract maths all the time that prevents you forgetting how to do this, so presumably it should be at least as helpful in the types of problem with non-zero overlap with things I actually do. I won’t discuss geometry in the rest of this post, but I did also enjoy the geometry questions – it’s simply that I feel anything I have to say about them would be less useful than saying nothing.

In any case, I enjoyed trying the problems from Belarus in between bouts of typing, and thought I would write some brief comments about how I decided whether my ideas were good or not. To achieve this, I’ve looked at my rough, and will tell you the ideas which didn’t work, as well as some of the ones which did. I’ve paraphrased the statements slightly to avoid having too many LaTeX tags.

WARNING: what follows will spoil questions {2,4,5} if you haven’t already looked at them, but would like to.

Question 2 – A domino is a 2 × 1 or 1 × 2 tile. Determine in how many ways exactly n^2 dominoes can be placed without overlapping on a 2n × 2n chessboard so that every 2 × 2 square contains at least two uncovered unit squares which lie in the same row or column.

The wording of the question prompted me to consider splitting the board naturally into n^2 2 x 2 squares. I then thought about this ‘at least’ in the question, and concluded that for these 2 x 2 squares, it should be ‘exactly’.

I tried doing an unusual colouring, when I coloured half the black squares green, and half blue and tried to show that either only greens or only blues were covered, but this was clearly not true, or fixable. I then tried to do something similar for the other set of 2 x 2 squares (those whose vertices have (odd, odd) coordinates). Roughly speaking, if too few of the outer cells on the original board are covered, you can’t achieve the bounds on these odd inner squares. But this didn’t really give any useful information.

However, it did get me thinking about how what happens in the far top-left affects the rest of the board, and from there most of the ideas came quickly. I described a 2 x 2 square as N, E, S, W depending on which ‘half’ of the square was covered. Then if a square is N, all the squares directly above it must be also be N (*).

I then made two mistakes, and if we’re looking for reasons where experience is helpful, it was probably here, as I spotted them fairly quickly, rather than wasting ages and ages.

First, I decided that either all squares were {N,E} or all were {S,W} which seemed plausible when I had been focusing on the top-left. This gave an answer of $2 \binom{2n}{n}$, but after a bit more thought is clearly not symmetric enough.

Second, I thought condition (*) might be the only constraint, along with similar ones for E, S, W naturally too. I tried to count these using a similar method of enumerating lines between these regions, and I realised I didn’t know how to do these sensibly, for example if it looked like this:

This led to another bit of meta-maths that I probably wouldn’t have considered if I was 16. Namely, that the idea of counting these monotone lines across the 2n x 2n grid was too nice not to be useful. Also, if I couldn’t see a way to adapt it for this more complicated setup, my setup was probably wrong. This thought was useful, and then by thinking about the interface between {N,E} and {S,W}, then the other way round, it made sense that the configuration could be parameterised by two monotone lines between different pairs of corners, giving an answer of $\binom{2n}{n}^2$.

So, if it’s possible to give a reason why I could do this, it’s probably because I had an arsenal of ways to check an answer for plausibility, which saved wasting time on something wrong, and also because I trusted that the answer would be nice, which saved wasting time on something which was also wrong, and would probably have been very complicated to resolve.

Question 4 – Determine whether there exists an infinite sequence $a_1,a_2,\ldots$ of positive integers satisfying $a_{n+2}=a_{n+1}+\sqrt{a_{n+1}+a_n}$.

So, my first reaction was ‘no way’. Why? Because everything’s determined by the first two terms, and I couldn’t think of any reason why a cool choice of the first two terms would force all of the sums $a_{n+1}+a_n$ to be perfect squares. It seemed just about possible that we could arbitrarily large finite sequences with this property. (Though this also turns out to be impossible.)

I imagine many contestants might have felt similarly and spent a while playing round with quadratic residues to get a sense for exactly how hard it is to make this work for the initial terms. But I was suspicious of the form of the recurrence. We know that if it had been defined linearly, then the terms would grow exponentially, but it’s natural to ask roughly how fast they grow in this example, even relaxing the restriction that the terms be integers.

The first observation was that they are indeed (strictly) growing! But how fast? Are there actually enough squares that every $a_{n+1}+a_n$ can be a different one? Note that the squares themselves satisfy a similar recurrence relation $(N+1)^2 = N^2 + 2\sqrt{N^2} + 1 > N^2 + \sqrt{ N^2 + N^2}$. So this seemed like a very good idea, and my instinct was that this should work, and I felt glad that I hadn’t pursued the quadratic residues approach.

From here we were basically home. I asked whether the sequence grows faster than the sequence $(\frac{n^2}{2})$, and the answer was no as

$a_{n+2}\le a_{n+1}+ \sqrt{2 a_{n+1}} \le (\sqrt{a_{n=1}} + \frac{1}{\sqrt 2})^2,$

so if (after translating indices) $a_n = \frac{x^2}{2}$, we have $a_{n+1}\le \frac{(x+1)^2}{2}$. This is clearly a problem or at best a very tight constraint if all the $\{a_n+a_{n+1}\}$ have to be perfect squares, so even though we aren’t completely finished, I am confident I can be in one or two lines, with a bit of care. Fiddling with small values of n looked like it would work, or showing that looking at a large enough initial subsequence that the effect of the first two terms dissipated, we must by the pigeonhole principle have $a_{n+2}+a_{n+1}=(k+1)^2,\, a_{n+1}+a_n=k^2$, which is enough by a parity argument, using the original statement.

This final bit looks and feels a bit messy, but by this stage it really is just finding any argument to justify why a sequence which grows at most as fast as $n^2$ can’t actually be $n^2$ eventually.

Probably the reason I did this quickly was because I trusted my instinct for ‘no’, and also because there seemed a quick way to qualify *roughly* how this sequence grew. Sensibly approximating things, and deciding whether it’s appropriate to approximate them is definitely something I feel I’ve got better at during my PhD, so I guess this was helpful, then just try and throw back the important condition that the elements were integers at the very end.

Question 5 – Anastasia partitions the integers [1,2m] into pairs. Boris tries to choose one integer from each pair so that the sum is n. Given n and m, prove Anastasia can select pairs so that Boris can’t succeed.

So I wasted some thought time by imagining that n was fixed and trying to come up with ideas for the choice of pairs which might avoid n. It struck me that there was no reason why a a typical (uniformly chosen) pairing should avoid any particular n unless this value was particularly big or small.

How big or how small? Well Boris can always choose the bigger element of a pair, so the way to make the minimum maximum is to pair as (1,2), (3,4), … , (2m-1,2m). Conveniently, this also achieves the maximum minimum. These can be calculated as $m^2,m(m+1)$ respectively. Suddenly this seems great, because we’ve actually solved the problem for a huge range of n, ie everything not within between these extrema.

The key step here was to start considering special pairings, chosen to give a reduced set of possible sums. Once we’ve had this idea, it makes sense to consider other different special pairings. The reason we got a small set of possible sums is that there’s lots of overlap. We can achieve lots of overlap by looking at the difference between elements in a pair, and making as many of these the same as possible. For, consider pairs (a,a+d), (b,b+d). Then it’s the same to take a and b+d as to take a+d and b, so we get the same sums in lots of different ways.

The other way to have as many differences the same as possible is to go for (1,m+1), (2,m+2), … , (m,2m). Conveniently, we can parameterise the sums now because at each choice, we decide whether to add an extra m or not, so the sum must be 1+2+…+m, plus some multiple of m. So we can defeat Boris, except when n is $\binom{m}{2}+km$.

This is a good point to stop because what followed was basically book-keeping. We only have to consider a couple of specific cases when m is odd, and one when m is even, that happen not to fall into either of the categories we can now deal with. It wasn’t too hard to corrupt the examples we already have to deal with these.

The key here was responding to initial failure by trying to find any control at all over n. Perhaps given enough time I would have started trying special pairings? Equally, I might have tried small examples, which would not really have been hugely helpful for this question. In any case, trying to eliminate very small or very large n luckily worked well, as a) I didn’t need to use the word ‘very’ twice in the end; and b) the idea of looking at choices of pairings to minimise the number of sums quickly gave other useful deductions.

I also really enjoyed Question 3, though was suspicious that I’d used a bound a great deal weaker than one in the question. Along the way, I was trying something confusing and not especially useful that led me into some interesting theory about Latin squares, which I might write something about later in the week.

# Hitting Probabilities for Markov Chains

This continues my previous post on popular questions in second year exams. In the interest of keeping it under 2,500 words I’m starting a new article.

In a previous post I’ve spoken about the two types of Markov chain convergence, in particular, considering when they apply. Normally the ergodic theorem can be used to treat the case where the chain is periodic, so the transition probabilities do not converge to a stationary distribution, but do have limit points – one at zero corresponding to the off-period transitions, and one non-zero. With equal care, the case where the chain is not irreducible can also be treated.

A favourite question for examiners concerns hitting probabilities and expected hitting times of a set A. Note these are unlikely to come up simultaneously. Unless the hitting probability is 1, the expected hitting time is infinite! In both cases, we use the law of total probability to derive a family of equations satisfied by the probabilities/times. The only difference is that for hitting times, we add +1 on the right hand side, as we advance one time-step to use the law of total probability.

The case of hitting probabilities is perhaps more interesting. We have:

$h_i^A = 1,\; i\in A, \quad h_i^A=\sum_{j\in S}p_{ij}h_j^A,\; i\not\in A.$

There are two main cases of interest: where the chain is finite but has multiple closed communicating classes, and where the chain is infinite, so even though it is irreducible, a trajectory might diverge before hitting 0.

For the case of a finite non-irreducible Markov chain, this is fairly manageable, by solving backwards from states where we know the values. Although of course you could ask about the hitting probability of an open state, the most natural question is to consider the probability of ending up in a particular closed class. Then we know that the hitting probability starting from site in the closed class A is 1, and the probability starting from any site in a different closed class is 0. To find the remaining values, we can work backwards one step at a time if the set of possible transitions is sparse enough, or just solve the simultaneous equations for $\{h_i^A: i\text{ open}\}$.

We therefore care mainly about an infinite state-space that might be transient. Typically this might be some sort of birth-and-death chain on the positive integers. In many cases, the hitting probability equations can be reduced to a quadratic recurrence relation which can be solved, normally ending up with the form

$h_i=A+B\lambda^i$,

where $\lambda$ might well be q/p or similar if the chain is symmetric. If the chain is bounded, typically you might know $h_0=1, h_N=0$ or similar, and so you can solve two simultaneous equations to find A and B. For the unbounded case you might often only have one condition, so you have to rely instead on the result that the hitting probabilities are the minimal solution to the family of equations. Note that you will always have $h^i_i=1$, but with no conditions, $h^i_j\equiv 1$ is always a family of solutions.

It is not clear a priori what it means to be a minimal solution. Certainly it is not clear why one solution might be pointwise smaller than another, but in the case given above, it makes sense. Supposing that $\lambda<1$, and A+B=1 say, then as we vary the parameters, the resulting set of ‘probabilities’ does indeed vary monotically pointwise.

Why is this true? Why should the minimum solution give the true hitting probability values? To see this, take the equations, and every time an $h_i^A$ appears on the right-hand side, substitute in using the equations. So we obtain, for $i\not\in A$,

$h_i^A=\sum_{j\in A}p_{ij}+\sum_{j\not\in A} p_{ij}h_j^A,$

and after a further iteration

$h_i^A=\sum_{j_1\in A}p_{ij_1}+\sum_{j_1\not\in A, j_2\in A}p_{ij_1}p_{j_1j_2}+\sum_{j_1,j_2\not\in A}p_{ij_1}p_{j_1j_2}h_{j_2}^A.$

So we see on the RHS the probability of getting from i to A in one step, and in two steps, and if keep iterating, we will get a large sum corresponding to the probability of getting from i to A in 1 or 2 or … or N steps, plus an extra term. Note that the extra term does not have to correspond to the probability of not hitting A by time N. After all, we do not yet know that $(h_{i}^A)$ as defined by the equations gives the hitting probabilities. However, we know that the probability of hitting A within N steps converges to the probability of hitting A at all, since the sequence is increasing and bounded, so if we take a limit of both sides, we get $h_i^A$ on the left, and something at least as large as the hitting probability starting from i on the right, because of the extra positive term. The result therefore follows.

It is worth looking out for related problems that look like a hitting probability calculation. There was a nice example on one of the past papers. Consider a simple symmetric random walk on the integers modulo n, arranged clockwise in a circle. Given that you start at state 0, what is the probability that your first return to state 0 involves a clockwise journey round the circle?

Because the system is finite and irreducible, it is not particularly interesting to consider the actual hitting probabilities. Also, note that if it is convenient to do so, we can immediately reduce the problem when n is even. In two steps, the chain moves from j to j+2 and j-2 with probability ¼ each, and stays at j with probability ½. So the two step chain is exactly equivalent to the lazy version of the same dynamics on n/2.

Anyway, even though the structure is different, our approach should be the same as for the hitting probability question, which is to look one step into the future. For example, to stand a chance of working, our first two moves must both be clockwise. Thereafter, we are allowed to move anticlockwise. There is nothing special about starting at 0 in defining the original probability. We could equally well ask for the probability that starting from j, the first time we hit 0 we have moved clockwise round the circle.

The only thing that is now not obvious is how to define moving clockwise round the circle, since it is not the case that all the moves have to be clockwise to have experienced a generally clockwise journey round the circle, but we definitely don’t want to get into anything complicated like winding numbers! In fact, the easiest way to make the definition is that given the hitting time of 0 is T, we demand that the chain was at state n at time T-1.

For convenience (ie to make the equations consistent) we take $h_0=0, h_n=1$ in an obvious abuse of notation, and then

$h_j=\frac12h_{j-1}+\frac12 h_{j+1},$

from which we get

$h_j=a+bj \Rightarrow h_j=\frac{j}{n}.$

Of course, once we have this in mind, we realise that we could have cut the circle at 0 (also known as n) and unfolded it to reduce the problem precisely to symmetric gambler’s ruin. In particular, the answer to the original problem is 1/2n, which is perhaps just a little surprising – maybe by thinking about the BM approximation to simple random walk, and that BM started from zero almost certainly crosses zero infinitely many times near we might have expected this probability to decay faster. But once it is unfolded into gambler’s ruin, we have the optimal stopping martingale motivation to reassure us that this indeed looks correct.

# Teaching Probability in China

As I’ve mentioned in passing in a few recent posts, which have tended to focus on Markov chains more than might perhaps be expected, I’ve just finished a two-week trip teaching at Linyi University in Shandong province, about 600km south of Beijing. I’ve been describing some of my touristic and social observations of China on my personal blog, but it has been mainly a mathematical adventure, so I thought I would describe some of the main cultural differences in the approach to science here. I was surprised to find these a far larger obstacle than the more obvious language barrier in the classroom.

1) I think many of us were surprised by how different the gender divide was at Linyi to what we are accustomed to in Cambridge. Although of course it varies from year to year, from course to course, and in select cases from lecturer to lecturer, in general there is a significant male majority in lecture halls. In China, amongst my students (who had just finished their third year), the ratio seemed to have been reversed and then further extremised. I had about twelve students in attendance during the middle of the course, and only one was a boy.

2) Perhaps the most clear reason for this reversal of the Western trend became apparent when I asked some of my group what they were planning to do after leaving university. As most of them have just a year left, they all seemed to have a fairly concrete idea of what their plans were, and the answers were perhaps unexpected. One of the most able girls said she wanted to continue with postgraduate mathematics, but all of the rest wanted to become teachers. And not just high school either. In fact, all bar two or three planned to teach middle or even elementary school. I don’t want to get burned by talking about things I don’t much understand, but I understand this is very different to the UK. Here, I do not believe many maths graduates from top universities (in a national context) have ambitions in primary education. I do not know which way round the causality works. Are they encouraged towards a pure science degree as a route into schools, or are teaching jobs viewed the natural job to follow a hard quantitative course? In either case, it sounds like an excellent situation to be in from the point of mathematical education.

3) The role of the lecturer in China evidently carries a great deal more gravitas than comparable positions in England. Clearly his task is to stand at the board and write down the notes, which the audiences copies politely. So far, so like Cambridge Part III. But then it seems that the notion of interacting with the class doesn’t really apply here. Even an utterly uncontentious question such as ‘what is your name, and which year are you in?’ produced a flurry of nervousness, and then each student stood up stiffly in turn to answer. This continued even as the class size reduced, and my manner became more informal, for example starting to explain ideas while perching on a desk in the front row. I guess old habits die hard. It’s a bit of a shame because I feel the purpose of a lecture is not just to transcribe course material, but also to give motivation and a flavour of the mathematical landscape, as well as to inject some enthusiasm into an appreciation of the topic. And it’s very hard to start such ideas in a culture where it is totally unexpected.

4) This was even more evident in some of the research talks. Each member of the Cambridge group and various Linyi staff, postgraduates and masters students gave a talk outlining some aspect of their research interest. The motivation had been to try and recreate a seminar series, at least in environment if not in totality given the expansive range of areas of study represented. I was struck by the fact that what seemed like the entire maths undergraduate community had been forced to attend, but many of the talks were completely inaccessible to a non-specialist audience. A paper on a technical property of spectral theory read out at speaking pace from a powerpoint is not going to get younger students excited about research maths.

5) As suggested by their enthusiasm for PDEs and complicated graph theory construction problems in the research talks, the students seemed to relish calculation over theory. I found it very difficult to get across the idea that it was sometimes better to use the reuslt we had just shown rather than immediately attempting to diagonalise all visible matrices. I think some of the undergraduate teaching mirrors sections of Maths A level in this country, where the emphasis is on deploying solution techniques rather than ideas. For example, the class looked absolutely lost when I tried to use recurrence relations, to the extent that I decided just to start my explanation from the basics. It turned out that I had notated the general solution as $h_i=\alpha \lambda^i+\beta \mu^i$ whereas their teacher had used $h_i=\alpha \lambda^{i-1}+\beta \mu^{i-1}$, and this had caused the major confusion. The problem is compounded by the fact that despite constant encouragement, they feel mortified at the thought of asking a question or seeking a clarification. Again, it seems to be a matter of respect, but of course this spirit can remain in a more enquiring environment.

Overall, it was a very interesting insight into a completely alien mathematical culture. I know which one I prefer, but I think my perspective is richer for having experienced something of mathematics in China. And hopefully any teaching I do in the near future will seem straightforward by comparison, if only for the property of sharing a language with the students…!