EGMO 2018

Last week the UK held its annual training and selection camp in Cambridge. This week, four of the students have been attending the European Girls’ Mathematical Olympiad. 2018 is the seventh edition of this prestigious competition, and is being held in Florence.

You can find very many details about the competition, and observe the UK’s excellent performance (with particular congratulations to Emily, who obtained a perfect score) at the competition website. A short article about the team in the TES can be found here.

In this post, I’m going to muse on some of the problems. You can find the two papers here and here.

Problem Two

Part a) would have been more immediate if the set A had been defined as

A:= \left\{\frac{k+1}{k} \,:\, k=1,2,3,\ldots\right\},

as this is instantly suggestive of a telescoping product such as

7 = \frac{7}{6}\cdot \frac{6}{5}\cdot\ldots \cdot \frac{2}{1}.

I found part b) to be one of the most difficult sections of the paper. It builds on the idea that given an expression for x as a product of elements of A, and an expression for y as a product of elements of A, we can combine these (ie take the product of the products!) to get an expression for xy as a product of elements of A. This yields f(xy)\le f(x)+f(y), and so the task is to show that sometimes this isn’t the most efficient way to proceed.

I tried a couple of things, like trying to bound f(p) below when p is a prime. This wasn’t ludicrous, as one would certainly need to use a term \frac{kp}{kp-1} somewhere in the product so that it is divisible by p. However, this didn’t go anywhere, and nor did investigating f(n!). Perhaps others had more success down these avenues.

But as a general rule, if an abstractly-defined function is typically hard to calculate, then classes where you can calculate it are likely to be extra valuable. Here, powers of two make life particularly easy. We have 2\in A, and so 2^n=2\times 2\times\ldots\times 2 is a valid product. And we can’t possibly achieve 2^n as a product of fewer terms than this, because 2 is the largest element of A. So f(2^n)=n. Note that this is already way better than the bound we would have achieved from our argument in part a), which yields f(m)\le m-1.

My next observation was that a similar argument and a natural construction gives f(2^n+1)=n+1. But this can be extended so that when 2^n+1\le m\le 2^{n+1}, we have f(m)\ge n+1 and in fact there is equality exactly for

m= 2^n+1, 2^n+2, 2^n+4,\ldots, 2^n+2^{n-1},2^{n+1}. (*)

In particular, note that all of these are even except 2^n+1. It may well be the case that we don’t need this extension, but before you’ve solved the problem you don’t know how much you’ll have to extend each idea!

I had a feeling that this meant f(2^n+1) was a strong candidate to satisfy the requirements, and that almost any factorisation would work. I suspect many students at this point did some refinement of choice of n, but I want to stay abstract, and use the extended observation (*). Since 2^n+1 is certainly not always prime, let’s focus on the infinitely many values n where it has a factorisation as 2^n+1 = ab, and consider whether a or b can achieve equality at (*). We’d better introduce the notation

2^\alpha<a<2^{\alpha+1},\quad 2^\beta<b<2^{\beta+1}.

So ab> 2^{ab}+2^a+2^b+1, and so \alpha+\beta>n. But similarly, ab< 2^{\alpha+1}2^{\beta+1}, so \alpha+\beta<n+2. We obtain

\alpha+\beta+1=n,

which is kind of what I’d been hoping for when I started this analysis. Now, we have

f(a)\ge \alpha+1,\quad f(b)\ge \beta+1,

\Rightarrow\quad f(a)+f(b)\ge \alpha+\beta+2 = n+1, (**)

with equality precisely if a,b both satisfy the equality conditions at (*). But a,b are odd, and so we have equality at (**) precisely if a=2^\alpha+1,b=2^\beta+1. So we have a resolution to the problem whenever 2^n+1 can be non-trivially factorised in any way other than 2^n+1=(2^\alpha+1)(2^\beta+1), so we have a rich (and certainly infinite) class of suitable (x,y).

Problem Three

An obvious remark. The jury will never choose contestant i if she has fewer than contestants in front of her unless they are forced to. They are only forced to if everyone has this property. So we ignore the second dashed bullet point, as it just tells us when the process ends. And with a little further thought, the process ends exactly when the contestants are in correct order.

I suspect part a) of this may end up featuring on future examples of our interactive write-up clinic, where students are challenged to produce technically-correct arguments for obvious but awkward mini-problems. The location of contestant C_n is probably a good starting point.

For part b), you have to find an optimal construction, and prove that it’s optimal. At national and junior olympiad level, students often forget that they have to supply both of these components. At international level, the challenge is to work out which one gives the entry into the problem. I would say that about two-thirds of the time, either the optimal construction is very obvious, or is best attacked after you’ve had some insight into the bound. For this question (and again it’s just my opinion), I felt it was all about the construction. I made absolutely no progress by aiming for bounds. Whereas the construction offers plenty of insight into how to prove the bounds, and so once I had it, I found it quick.

The usual rules apply. An optimal construction is going to have to be neat to describe. It’s very unlikely to have millions of cases. Intuitively, it seems reasonable that starting the contestants in reverse order gives the jury the greatest possible ‘elbow room’ to squeeze moves into the procedure. Perhaps you tried to prove this directly, by coupling a procedure starting from arbitrary order with a corresponding procedure starting from reverse order? Well, I found that very hard, and perhaps you did too.

However, that doesn’t mean it’s the wrong construction! The key question is, what to do about contestant C_n? Well, essentially nothing. This contestant can never be moved. So when do we start allowing other contestants to pass her? It seemed natural to let the other contestants C_1,\ldots,C_{n-1} do as much as possible among themselves first. That is

\mathbf{C_n},C_{n-1},\ldots,C_2,C_1 \quad \Rightarrow\Rightarrow\Rightarrow \quad \mathbf{C_n}, C_1,C_2,\ldots,C_{n-1},

where \Rightarrow\Rightarrow\Rightarrow denotes lots of moves. At this point, what to do next stood out for me, namely that one could use \mathbf{C_n} at the start to put all the others back into reverse order, while moving \mathbf{C_n} to the back. That is

\mathbf{C_n},C_1,C_2,\ldots,C_{n-1}\quad\Rightarrow \quad C_1,\mathbf{C_n},C_2,\ldots,C_{n-1} \quad\Rightarrow\quad C_2,C_1,\mathbf{C_n},C_3,\ldots,C_{n-1}

\Rightarrow\Rightarrow \quad C_{n-1},C_{n-2},\ldots,C_2,C_1,\mathbf{C_n}.

You might have tried other things first, but once you notice this, you just know it has to be right. It’s just too elegant a construction, and it looks like the sort of thing one prove will be optimal, because the overall process

\mathbf{C_n},C_{n-1},\ldots,C_n\quad \Rightarrow\Rightarrow\Rightarrow \quad \mathbf{C_n},C_1,C_2,\ldots,C_{n-1}

\Rightarrow\Rightarrow\quad C_{n-1},\ldots,C_2,C_1,\mathbf{C_n}\quad\Rightarrow\Rightarrow\Rightarrow\quad C_1,C_2,\ldots,C_{n-1},\mathbf{C_n},

incorporates the corresponding process for n-1 (twice, in fact) and so induction is very accessible both for calculating the total number of moves. We conjecture that this is indeed the optimal process, and under this assumption, with f(n) the number of moves, we would have f(n) = f(n-1) + (n-1) + f(n-1), from the three stages of the process, from which, after looking at small values,

f(n)=2^n - (n+1).

I started by saying that the construction was the hard part of this problem. Well, that doesn’t mean the bound is easy. But at least with a construction in hand, you can make observations that might inform a bounding argument:

  • observation 1: contestant C_n never jumps;
  • observation 2: in the optimal construction, by induction C_{n-1} doesn’t jump in the outer phases, so in fact jumps only once, ie during the middle phase;
  • observation 3: contestant C_{n-2} doesn’t jump very often, and in fact can only jump after at least one of C_{n-1} and C_n have ended up in front of her. Since we’ve established that C_{n-1},C_n don’t jump very often themselves, this gives a bound on the number of times C_{n-2}.

There is still work to do here, and errors with \pm 1 could easily creep in. But I still hold fast to my original claim that the construction was the hard part here. Or at least, the rough form of the construction. I guess it’s possible that one would have started making observations like the three above without a construction in mind, but I think it’s unlikely. Anyway, I leave for you the final details of the bounding argument, which involves formally transcribing observation 3, proving it, then generalising it to jumps of C_{n-3} and so on.

Problem Four

One of the exercises I have been setting to UK students recently is to produce short solution digests, which manifest any key ideas of the solution abstractly and briefly enough to resonate in the future. I’m a little tired of domino tiling problems, so I’ll do one of these here. This will be slightly longer than if I were not writing for a (small) audience.

Double-counting the total value by rows/columns and by dominos shows there are \frac{2kn}{3} dominos in a balanced configuration. When n=3, we can achieve k=1, and by tiling copies of this down the main diagonal, can extend to 3\,|\,n. For 3\not|\,n, we must have 3\,|\,k ie k\ge 3, but in fact k=3 is achievable, by tiling the main diagonal with copies of small boards for which k=3 can be constructed with a bit of trial-and-error.

The double-counting idea at the start is the nice part of the problem. The construction is a bit annoying, but saving ourselves work by building up large examples from copies of small examples is a useful motif to have in mind.

Problem Six

This question has lots of clues in the statement. It would, for example, be pretty surprising if the answer were ‘no’ to part b) given the setup in part a).

My confession is that I wasted lots of time on part a) not using the option m=0, which was foolish given that it’s clued from part b) that one needs to use the option m=0. My thought had been to consider some integer y, and ask which integers x were banned (if we were aiming for contradiction in part a)). For part a), it gets harder if t is smaller, so I found it helpful to think of t as \epsilon\ll 1. Anyway, if you struggled on part a), maybe worth reviewing whether you were definitely trying to solve part a), and not accidentally using the setup that really addressed part b)!

Some people have shown me solutions to part a) that carry an air of magic, by placing all the key steps (such as (*) to follow) in the language of the original setup. Let’s try to be cleaner. The key is to consider m=0. Since m=0 is included, we know that whenever x<y, we must have

\epsilon y \le x \le (1-\epsilon)y. (*)

Maybe you just have a gut feeling that this can’t be possible if you have enough xs and ys? But either way, choosing to focus on (*) is the key step, because once you know you have to prove the result based on this, it’s not too hard. I prefer addition to multiplication, so one might as well take logs (since does it really look like we’re going to use heavily the integer property now?) to obtain

\alpha\le |z_i - z_j|\le \beta,

for all z_i,z_j in some large finite collection, where 0<\alpha<\beta. You should now have a strong gut feeling that this is impossible. You have an arbitrarily large collection of real numbers which have to be close to each other pairwise, but never too close pairwise. How to finish the argument is a matter of taste.

For part b), assuming we’re aiming for the answer ‘yes’, we probably want to construct it one step at a time, and you want to think of t\approx \frac12 to make life as hard as possible.

Now, suppose you have x_1,x_2,\ldots,x_n so far. What next? Well if we could have

x_{n+1} \equiv \frac{x_i}{2}\,\mod x_i,

for all i=1,\ldots,n, that would be perfect. We can often solve sets of coupled residue equations like this using the Chinese Remainder Theorem. (Recall of course that the solutions themselves are rarely important – the fact that they exist is enough!) A couple of things go wrong with trying this crudely:

  • If x_i is odd, then \frac{x_i}{2} is not an integer…
  • If we correct this by asking for x_{n+1}\equiv \lfloor\frac{x_i}{2}\rfloor\,\mod x_i, then there’s a chance we might still be within the t-window around a multiple of x_i.
  • Unless we are going to make complicated demands on the residues, to use CRT it would be easier if all the x_is were coprime.

One option is to give up. But actually all these objections can be handled with fairly small alterations. Can you see how the second objection can be overcome by an appropriate choice of x_1? Remember that t is fixed right at the start, and cannot be equal to 1/2. Is the third objection actually an objection any more? If it is, can we fix it?

Anyway, I guess P6 was my favourite non-geometry question on the paper, though, that’s far from relevant. P5 was pretty neat too, but who knows whether a follow-up geometry post will materialise soon.

Advertisements

BMO1 2016 – the non-geometry

Here’s a link to yesterday’s BMO1 paper, and the video solutions for all the problems. I gave the video solution to the geometric Q5, and discuss aspects of this at some length in the previous post.

In these videos, for obvious educational reasons, there’s a requirement to avoid referencing theory and ideas that aren’t standard on the school curriculum or relatively obvious directly from first principles. Here, I’ve written down some of my own thoughts on the other problems in a way that might add further value for those students who are already have some experience at olympiads and these types of problems. In particular, on problems you can do, it’s worth asking what you can learn from how you did them that might be applicable generally, and obviously for some of the harder problems, it’s worth knowing about solutions that do use a little bit of theory. Anyway, I hope it’s of interest to someone.

bmo1-2016-q1Obviously we aren’t going to write out the whole list, but there’s a trade-off in time between coming up with neat ideas involving symmetry, and just listing and counting things. Any idea is going to formalise somehow the intuitive statement ‘roughly half the digits are odd’. The neat ideas involve formalising the statement ‘if we add leading zeros, then roughly half the digits are odd’. The level of roughness required is less in the first statement than the second statement.

Then there’s the trade-off. Trying to come up with the perfect general statement that is useful and true might lead to something like the following:

‘If we write the numbers from 0000 to N, with leading zeros, and all digits of N+1 are even, then half the total digits, ie 2N of them, are odd.’

This is false, and maybe the first three such things you try along these lines are also false. What you really want to do is control the numbers from 0000 to 1999, for which an argument by matching is clear, and gives you 2000 x 4 / 2 = 4000 odd digits. You can exploit the symmetry by matching k with 1999-k, or do it directly first with the units, then with the tens and so on.

The rest (that is, 2000 to 2016) can be treated by listing and counting. Of course, the question wants an actual answer, so we should be wary of getting it wrong by plus or minus one in some step. A classic error of this kind is that the number of integers between 2000 and 2016 inclusive is 17, not 16. I don’t know why the memory is so vivid, but I recall being upset in Year 2 about erring on a problem of this kind involving fences and fenceposts.

bmo1-2016-q2As with so many new types of equation, the recipe is to reduce to a type of equation you already know how to solve. Here, because {x} has a different form on different ranges, it makes sense to consider the three ranges

x\in[0,1/25],\, x\in[1/25,1/8],\, x\in [1/8,\infty),

as for each of these ranges, we can rewrite 5y\{8y\}\{25y\} in terms of standard functions without this bracket notation. On each range we can solve the corresponding equation. We then have to check that each solution does actually lie in the appropriate range, and in two cases it does, and in one case it doesn’t.

bmo1-2016-q3Adding an appropriately-chosen value to each side allows you to factorise the quadratics. This might be very useful. But is it an invitation to do number theory and look at coprime factors and so on, or is a softer approach more helpful?

The general idea is that the set of values taken by any quadratic sequence with integer coefficients and leading coefficient one looks from a distance like the set of squares, or the set \{m(m+1), \,m\in\mathbb{N}\}, which you might think of as ‘half-squares’ or ‘double triangle numbers’ as you wish. And by, ‘from a distance’ I mean ‘up to an additive constant’. If you care about limiting behaviour, then of course this additive constant might well not matter, but if you care about all solutions, you probably do care. To see why this holds, note that

n^2+2n = (n+1)^2 - 1,

so indeed up to an additive constant, the quadratic on the LHS gives the squares, and similarly

n^2 - 7n = (n-4)(n-3)-12,

and so on. To solve the equation n^2=m^2+6, over the integers, one can factorise, but another approach is to argue that the distance between adjacent squares is much more than 6 in the majority of cases, which leaves only a handful of candidates for n and m to check.

The same applies at this question. Adding on 9 gives

n^2-6n+9 = m^2 + m -1,

which is of course the same as

(n-3)^2 = m(m+1)-1.

Now, since we now that adjacent squares and ‘half-squares’ are more than one apart in all but a couple of cases, we know why there should only be a small number of solutions. I would call a method of this kind square-sandwiching, but I don’t see much evidence from Google that this term is generally used, except on this blog.

Of course, we have to be formal in an actual solution, and the easiest way to achieve this is to sandwich m(m+1)-1 between adjacent squares m^2 and (m+1)^2, since it is very much clear-cut that the only squares which differ by one are zero and one itself.

bmo1-2016-q4I really don’t have much to say about this. It’s not on the school curriculum so the official solutions are not allowed to say this, but you have to use that all integers except those which are 2 modulo 4 can be written as a difference of two squares. The easiest way to show this is by explicitly writing down the appropriate squares, treating the cases of odds and multiples of four separately.

So you lose if after your turn the running total is 2 modulo 4. At this point, the combinatorics isn’t too hard, though as in Q1 one has to be mindful that making an odd number of small mistakes will lead to the wrong answer! As in all such problems, it’s best to try and give a concrete strategy for Naomi. And it’s best if there’s something inherent in the strategy which makes it clear that it’s actually possible to implement. (Eg, if you claim she should choose a particular number, ideally it’s obvious that number is available to choose.)

One strategy might be: Naomi starts by choosing a multiple of four. Then there are an even number of multiples of four, so Naomi’s strategy is:

  • whenever Tom chooses a multiple of four, Naomi may choose another multiple of four;
  • whenever Tom chooses a number which is one (respectively three) modulo 4, Naomi may choose another which is three (respectively one) modulo 4.

Note that Naomi may always choose another multiple of four precisely because we’ve also specified the second condition. If sometimes Tom chooses an odd number and Naomi responds with a multiple of four out an idle and illogical sense of caprice, then the first bullet point would not be true. One can avoid this problem by being more specific about exactly what the algorithm is, though there’s a danger that statements like ‘whenever Tom chooses k, Naomi should choose 100-k’ can introduce problems about avoiding the case k=50.

bmo1-2016-q6I started this at the train station in Balatonfured with no paper and so I decided to focus on the case of just m, m+1 and n, n+2. This wasn’t a good idea in my opinion because it was awkward but guessable, and so didn’t give too much insight into actual methods. Also, it didn’t feel like inducting on the size of the sequences in question was likely to be successful.

If we know about the Chinese Remainder Theorem, we should know that we definitely want to use it here in some form. Here are some clearly-written notes about CRT with exercises and hard problems which a) I think are good; b) cite this blog in the abstract. (I make no comment on correlation or causality between a) and b)…)

CRT is about solutions to sets of congruence equations modulo various bases. There are two aspects to this , and it feels to me like a theorem where students often remember one aspect, and forget the other one, in some order. Firstly, the theorem says that subject to conditions on the values modulo any non-coprime bases, there exist solutions. In many constructive problems, especially when the congruences are not explicit, this is useful enough by itself.

But secondly, the theorem tells us what all the solutions are. There are two stages to this: finding the smallest solution, then finding all the solutions. Three comments: 1) the second of these is easy – we just add on all multiples of the LCM of the bases; 2) we don’t need to find the smallest solution – any solution will do; 3) if you understand CRT, you might well comment that the previous two comments are essentially the same. Anyway, finding the smallest solution, or any solution is often hard. When you give students an exercise sheet on CRT, finding an integer which is 3 mod 5, 1 mod 7 and 12 mod 13 is the hard part. Even if you’re given the recipe for the algorithm, it’s the kind of computation that’s more appealing if you are an actual computer.

Ok, so returning to this problem, the key step is to phrase everything in a way which makes the application of CRT easy. We observe that taking n=2m satisfies the statement – the only problem of course is that 2m is not odd. But CRT then tells us what all solutions for n are, and it’s clear that 2m is the smallest, so we only need to add on the LCM (which is odd) to obtain the smallest odd solution.

ISL14 N6 – Sums of Squares of Intervals

I wasn’t able to post this at the time because of the embargo on discussing this particular question until the end of IMO 2015.

I’m currently enjoying the slow descent into summer in the delightful surroundings of Tonbridge School where we are holding our final camp to select the UK team for this year’s IMO. The mechanism for this selection is a series of four exams following the format of the IMO itself. We take many of the questions for these tests from the shortlist of questions proposed but not used at the previous year’s competition. In this post, I’m going to talk about one of these questions. Obviously, since lots of other countries do the same thing, there is an embargo on publication or public-facing discussion of these problems until after the next competition, so this won’t appear for a couple of months.

The statement of the problem is the following:

IMO Shortlist 2014, N6: Let a_1<a_2<\ldots<a_n be pairwise coprime positive integers with a_1 a prime at least n+2. On the segment I=[0,a_1a_2\ldots a_n] of the real line, mark all integers that are divisible by at least one of the numbers a_1,\ldots,a_n. These points split I into a number of smaller segments. Prove that the sum of the squares of the lengths of these segments is divisible by a_1.

I marked our students’ attempts at this problem and spoke to them about it afterwards. In particular, the official solution we were provided with, and which we photocopied and gave to our students contains some good theory and some magic. On closer inspection, the magic turns out to be unnecessary, and actually distracts a little from the theory. Anyway, this is my guided tour of the problem.

We begin with two items from the rough work that some students submitted. Someone treated the case n=2, which is not quite trivial, but not hard at all, and remarked, sensibly, that n=3 seemed to have become hard. Packaged with this was the implicit observation that for the purpose of the question posed, a_1 plays a completely different role to the other quantities. While this seems obvious when presented in isolation, it’s a good thing to say explicitly so that we avoid wasting time imagining symmetries which are definitely not present later. We’ll sometimes refer to a_1 as p, whenever we are using its primality rather than its other roles.

The second idea coming out of rough work was to consider what happens when an interval gets broken into two by a ‘new point’, whatever that means. In particular, what happens to the sum of the squares of the interval lengths modulo p? The way the student had set up this procedure suggested that the calculations would be easy in the case where the point being added was a multiple of p=a_1. This ended up being a red herring, because there’s no real reason why we would want to add the multiples of a_1 at the end.

It does, however, suggest an inductive argument, if instead you add the points which are multiples of a_n, but not of any of \{a_1,\ldots,a_{n-1}\}, because the latter would already be present. It might seem like induction is illegal, since there is the condition a_1\ge n+2 already present in the statement. However, once a_1 is fixed, if the condition holds for a particular n, then clearly it holds for all smaller n (!), and so we can use induction to build up to the value of n required in the statement.

Now we are happy this might work in principle, let’s do it in practice. We are adding all these multiples of a_n so we want to have some way to index them. One choice is to use their actual values, or the quotient by a_n, ie all the integers up to \prod_{i=1}^{n-1}a_i which are coprime to a_n. But maybe after thinking about it, there will be a more helpful way to index them.

We are only interested in non-trivial splittings of intervals. We might initially be concerned that an existing interval might be split into more than two sub-intervals by the addition of the multiples of a_n, but the ordering suggested in the statement means this can’t happen. In general, let the initial size of the interval be K, and the sizes of the two sub-intervals be x and y for reasons which will become clear. Then the change in contribution to the total sum of interval lengths squared is given by

K^2 \quad \mapsto \quad x^2+y^2= K^2 - 2xy,

using that K=x+y. So the total change when we add all the multiples of a_{n} will be the sum of 2xy over all the intervals which get divided into two.

Now we have the challenge of evaluating this sum, or at least showing that it is divisible by p. Given a multiple a of a_1, what will x be?

Well, on the left of a, the distance to the nearest multiple of a_k is the remainder r_k of a on division by a_k, essentially by definition. So x will be the minimal such remainder as k varies. Similarly, y will be \min_k (a_k-r_k). We forget about the minus sign and the factor of 2 forever, since it doesn’t make any difference unless p=2, in which case we are already done, and so we now have our sum:

\sum_{a\text{ a new multiples of }a_n} \left( \min_i(r_i)\right) \left[ \min_j(a_j-r_j) \right].

Now we return to the question of how to index the new multiples of a_n. But the question’s easier now as we have some clues as to what might be useful. Each of the summands involves the remainders modulo the smaller a_is, so can we use these as part of the indexing?

Yes, of course we can. There’s a bijection between the possible sequences of remainders, and the new multiples of a_n. This is basically the statement of the Chinese Remainder Theorem. Our index set should then be

r_1,\ldots,r_{n-1} \in [1,a_1-1]\times\ldots\times [1,a_{n-1}-1].

Remember that the fact that they are not allowed to be zero is because we only care now about new points being added.

Even now though, it’s not obvious how to compute this sum, so the natural thing to try is to invert our view of what we are summing over. That is, to consider how many of the index sequences result in particular values of x and y. Given x and y at least one, each remainder r_k must lie in [x,a_k-y], ie there are a_i-(x+y)+1 values it is allowed to take. So our guess for the number of such indices might be \prod_{i=1}^{n-1} (a_i-(x+y)+1), which is indeed what I wrote up on the board before Neel pointed out that some of the sequences I’ve allowed do not actually attain this minimum.

Fortunately this is easily fixed. If the minimum for x is not attained, we know that r_k\in[x+1,a_k-y], and similarly for y. So we can apply the inclusion-exclusion principle to depth two to say that the number we actually want is

\prod_{i=1}^{n-1} (a_i-(x+y)+1) - 2\prod_{i=1}^{n-1} (a_i-(x+y)) + \prod_{i=1}^{n-1} (a_i-(x+y)-1).

This is a polynomial in (x+y), so we’ll call in P(x+y). We’ll come back to this expression. We still need to tidy up our overall sum, in particular by deciding what bounds we want on x and y. Thinking about the intervals of width p, it is clear that the bounds should be 1\le x,y and x+y\le p. So we currently have our sum as

\sum_{1\le x,y,\,x+y\le p} xy P(x+y),.

If you’ve got to this stage, I hope you would feel that this seems really promising. You’ve got to show that some sum of polynomial-like things over a reasonably well-behaved index set is divisible by p. If you don’t know general theorems that might confirm that this is true, once you’ve got statements like this, you might well guess then verify their existence.

First though, we need to split up this xy term, since our sum is really now all about x+y, which we’ll call z.

\sum_{2\le z\le p} P(z) \sum_{x+y=z,x,y\ge 1} xy.

Once we’ve had our thought that we mainly care about polynomials mod p at this stage, we might even save ourselves some work and just observe that \sum_{x+y=z}xy is a cubic in z. This is definitely not hard to check, using the formulae for triangle numbers and sums of squares. It’s also reasonable to conclude that the value of this is 0 for z=1, so we can change the indices on the sum so it doesn’t look like we’re missing a term. So we have

\sum_{z=1}^p P(z) Q(z),

where Q is some cubic, which we could work out if we really wanted to. But what’s the degree of P? P is made of a sum of 3 polynomials whose degree is clear because they are a product of linear factors. So each of these has degree (n-1), but in fact we can clearly see that the leading term cancels, and also with a tiny bit more care that the second order terms cancel too, so P has degree at most n-3.

[Tangent: students might well ask why it is convention to take the degree of the identically zero polynomial to be -\infty rather than zero. The fact that the above statement is true even when n=2 is a good justification for this.]

[Tangent 2: In my initially erroneous evaluation of the number of remainder sequences, without the inclusion-exclusion argument, I indeed ended up with P having degree n-1. I also worked out Q as being quadratic though, so my overall necessary condition was only one off from what was required, showing that two wrongs are not as good as two rights, but marginally better than only one wrong.]

Anyway, we are taking a sum of polynomials of degree at most n \le p-2. This is the point where one might get confused if you have been specifying everything in too much detail. The fact that I’ve said nothing about the polynomials apart from their degree is predicated on the fact that I now know I only need to only their degrees for what I want, but even if you had tracked everything very explicitly, you might still guess what to do at this stage.

It’s presented as a lemma in the official solution that if you have a polynomial of degree at most p-2, then \sum_{z=1}^p P(z)\equiv 0 modulo p. There are several ways to see this, of which a proof by induction is easiest to believe, but probably least intuitive. It’s clear what goes wrong for degree p-1 by FLT, so you probably want to look at the set \{z^k: z\in[1,p-1]\} modulo p. It won’t necessarily be all the residues modulo p, but thinking about primitive roots, or the roots of unity makes it fairly clear why this would be true. [Essentially any argument that turns multiplication into addition in a well-behaved way clears this up.]

The most important point is that this lemma is a) easy-ish and b) a natural thing to try given where you might have got in the question at this stage. From here you’re now basically done.

I don’t think this is an easy question at all, since even with lots of guiding through the steps there are a lot of tangents one might get drawn down, especially investigating exact forms of polynomials which it turns out we only need vague information about. Nonetheless, there are lots of lessons to be learned from the argument, which definitely makes it a good question at this level in my opinion.

How to Prove Fermat’s Little Theorem

The following article was prompted by a question from one of my mentees on the Senior Mentoring Scheme. A pdf version is also available.

Background Ramble

When students first meet problems in number theory, it often seems rather different from other topics encountered at a similar time. For example, in Euclidean geometry, we immediately meet the criteria for triangle similarity or congruence, and various circle theorems. Similarly, in any introduction to inequalities, you will see AM-GM, Cauchy-Schwarz, and after a quick online search it becomes apparent that these are merely the tip of the iceberg for the bewildering array of useful results that a student could add to their toolkit.

Initially, number theory lacks such milestones. In this respect, it is rather like combinatorics. However, bar one or two hugely general ideas, a student gets better at olympiad combinatorics questions by trying lots of olympiad combinatorics questions.

I don’t think this is quite the case for the fledgling number theorist. For them, a key transition is to become comfortable with some ideas and notation, particularly modular arithmetic, which make it possible to express natural properties rather neatly. The fact that multiplication is well-defined modulo n is important, but not terribly surprising. The Chinese Remainder Theorem is a `theorem’ only in that it is useful and requires proof. When you ask a capable 15-year-old why an arithmetic progression with common difference 7 must contain multiples of 3, they will often say exactly the right thing. Many will even give an explanation for the regularity in occurrence of these which is precisely the content of the theorem. The key to improving number theory problem solving skills is to take these ideas, which are probably obvious, but sitting passively at the back of your mind, and actively think to deploy them in questions.

Fermat’s Little Theorem

It can therefore come as a bit of a shock to meet your first non-obvious (by which I mean, the first result which seems surprising, even after you’ve thought about it for a while) theorem, which will typically be Fermat’s Little Theorem. This states that:

\text{For a prime }p,\text{ and }a\text{ any integer:}\quad a^p\equiv a\mod p. (1)

Remarks

  • Students are typically prompted to check this result for the small cases p=3, 5 and 7. Trying p=9 confirms that we do really need the condition that p be prime. This appears on the 2012 November Senior Mentoring problem sheet and is a very worthwhile exercise in recently acquired ideas, so I will say no more about it here.
  • Note that the statement of FLT is completely obvious when a is a multiple of p. The rest of the time, a is coprime to p, so we can divide by a to get the equivalent statement:

\text{For a prime }p,\text{ and }a\text{ any integer coprime to }p:\quad a^{p-1}\equiv 1\mod p. (2)

  • Sometimes it will be easier to prove (2) than (1). More importantly, (2) is sometimes easier to use in problems. For example, to show a^{p^2}\equiv a \mod p, it suffices to write as:

a^{p^2}\equiv a^{(p-1)(p+1)+1}\equiv (a^{p-1})^{p+1}\times a\equiv 1^{p+1}\times a \equiv a.

  • A word of warning. FLT is one of those theorems which it is tempting to use on every problem you meet, once you know the statement. Try to resist this temptation! Also check the statement with small numbers (eg p=3 ,a=2) the first few times you use it, as with any new theorem. You might be surprised how often solutions contain assertions along the lines of

a^p\equiv p \mod (a-1).

Proofs

I must have used FLT dozens of times (or at least tried to use it – see the previous remark), before I really got to grips with a proof. I think I was daunted by the fact that the best method for, say, p=7, a careful systematic check, would clearly not work in the general case. FLT has several nice proofs, and is well worth thinking about for a while before reading what follows. However, I hope these hints provide a useful prompt towards discovering some of the more interesting arguments.

Induction on a to prove (1)

  • Suppose a^p\equiv a\mod p. Now consider (a+1)^p modulo p.
  • What happens to each of the (p+1) terms in the expansions?
  • If necessary, look at the expansion in the special case p=5 or 7, formulate a conjecture, then prove it for general p.

Congruence classes modulo p to prove (2)

  • Consider the set \{a,2a,3a,\ldots,(p-1)a\} modulo p.
  • What is this set? If the answer seems obvious, think about what you would have to check for a formal proof.
  • What could you do now to learn something about a^{p-1}?

Combinatorics to prove (1)

  • Suppose I want a necklace with p beads, and I have a colours for these beads. We count how many arrangements are possible.
  • Initially, I have the string in a line, so there are p labelled places for beads. How many arrangements?
  • Join the two ends. It is now a circle, so we don’t mind where the labelling starts: Red-Green-Blue is the same as Green-Blue-Red.
  • So, we’ve counted some arrangements more than once. How many have we counted exactly once?
  • How many have we counted exactly p times? Have we counted any arrangements some other number of times?

Group Theory to prove (2)

This is mainly for the interest of students who have seen some of the material for FP3, or some other introduction to groups.

  • Can we view multiplication modulo p as a group? Which elements might we have to ignore to ensure that we have inverses?
  • What is \{1,a,a^2,a^3,\ldots\} in this context? Which axiom is hardest to check?
  • How is the size of the set of powers of a modulo p related to the size of the whole group of congruences?
  • Which of the previous three proofs is this argument is most similar to this one?
  • Can you extend this to show the Fermat-Euler Theorem:

\text{For any integer }n,\text{ and }a\text{ coprime to }n:\quad a^{\phi(n)}\equiv 1 \mod n,

where \phi(n) counts how many integers between 1 and n are coprime to n.