Balkan MO 2017 – Qs 1, 3 and 4

The UK is normally invited to participate as a guest team at the Balkan Mathematical Olympiad, an annual competition between eleven countries from South-Eastern Europe. I got to take part in Rhodes almost exactly ten years ago, and this year the competition was held in Ohrid, in Macedonia. There’s one paper, comprising four questions, normally one from each of the agreed olympiad topic areas, with 4.5 hours for students to address them. The contest was sat this morning, and I’m going to say quite a bit about the geometric Q2, and a little bit about Qs 1 and 3 also. In all cases, this discussion will include most of a solution, with some commentary, so don’t read these if you are planning to try the problems yourself.

I’m not saying anything about Q4, because I haven’t solved it. (Edit: I have solved it now, so will postpone Q2 until later today.)

Question One

Find all ordered pairs of positive integers (x,y) such that

x^3+y^3=x^2+42xy+y^2.

The first thought is that if either of x or y is ‘large’, then the LHS is bigger than the RHS, and so equality can’t hold. That is, there are only finitely many solutions. The smallest possible value of y is, naturally, 1, and substituting y=1 is convenient as then y^2=y^3, and it’s straightforward to derive x=7 as a solution.

Regarding the non-existence of large solutions, you can make this precise by factorising the LHS as

(x+y)(x^2-xy+y^2) = x^2+42xy+y^2.

There are 44 terms of degree two on the RHS, and one term of degree in the second bracket on the LHS. With a bit of AM-GM, you can see then that if x+y>44, you get a contradiction, as the LHS will be greater than the RHS. But that’s still a lot of possibilities to check.

It struck me that I could find ways to reduce the burden by reducing modulo various primes. 2, 3 and 7 all divide 42, and furthermore cubes are nice modulo 7 and squares are nice modulo 3, so maybe that would bring the number of possibilities down. But my instinct was that this wasn’t the right way to use the fact that we were solving over positive integers.

The second bracket in the factorisation looks enough like the RHS, that it’s worth exploring. If we move x^2-xy+y^2 from the right to the left, we get

(x+y-1)(x^2-xy+y^2) = 43xy. (1.1)

Now it suddenly does look useful that we are solving over positive integers, because 43 is a prime, so has to appear as a factor somewhere on the LHS. But it’s generally quite restrictive that x^2-xy+y^2 | 43xy. This definitely looks like something that won’t hold often. If x and y are coprime, then certainly x^2-xy+y^2 and y are coprime also. But actually if x and y have a non-trivial common factor d, we can divide both sides by d^2, and it still holds. Let’s write

x=dm,\quad y=dn,\quad\text{where }d=\mathrm{gcd}(x,y).

Then m^2 -mn+n^2 really does divide 43, since it is coprime to both m and n. This is now very restrictive indeed, since it requires that m^2-mn+n^2 be equal to 1 or 43. A square-sandwiching argument gives m^2-mn+n^2=1 iff m=n=1. 43 requires a little bit more work, with (at least as I did it) a few cases to check by hand, but again only has one solution, namely m=7, n=1 and vice versa.

We now need to add the common divisor d back into the mix. In the first case, (1.1) reduces to (2d-1)=43, which gives (x,y)=(22,22). In the second case, after cancelling a couple of factors, (1.1) reduces to (8d-1)=7, from which (x,y)=(7,1),(1,7) emerges, and these must be all the solutions.

The moral here seemed to be that divisibility was a stronger tool than case-reduction. But that was just this question. There are other examples where case-reduction is probably more useful than chasing divisibility.

Question Three

Find all functions f:\mathbb{N}\rightarrow\mathbb{N} such that

n+f(m) \,\big|\, f(n)+nf(m)

for all m,n\in\mathbb{N}.

What would be useful here? There are two variables, and a function. It would be useful if we could reduce the number of variables, or the number of occurences of f. We can reduce the number of variables by taking m=n, to get

n+f(n) \,\big|\, f(n) [1+n]. (3.1)

From this, we might observe that f(n)\equiv 1 is a solution. Of course we could analyse this much more, but this doesn’t look like a 10/10 insight, so I tried other things first.

In general, the statement that a\,|\,b also tells us that a\,|\, b-ka. That is, we can subtract arbitrary multiples of the divisor, and the result is still true. A recurring trope is that the original b is elegant, but an adjusted b-ka is useful. I don’t think we can do the latter, but by subtracting n^2 +nf(m) from the problem statement, we get

n+f(m) \,\big|\, n^2-f(n). (3.2)

There’s now no m on the RHS, but this relation has to hold for all m. One option is that f(n)=n^2 everywhere, then what we’ve deduced always holds since the RHS is zero. But if there’s a value of n for which f(n)\ne n^2, then (3.2) is a very useful statement. From now on, we assume this. Because then as we fix n and vary m, we need n+f(m) to remain a divisor of the RHS, which is fixed, and so has finitely many divisors. So f(m) takes only finitely many values, and in particular is bounded.

This ties to the observation that f\equiv 1 is a solution, which we made around (3.1), so let’s revisit that: (Note, there might be more elegant ways to finish from here, but this is what I did. Also note, n is no longer fixed as in previous paragraph.)

n+f(n) \,\big|\, f(n) [1+n]. (3.1)

Just to avoid confusion between the function itself, and one of the finite collection of values it might take, let’s say b is a value taken by f. So there are values of n for which

n+b \,\big|\, b(1+n).

By thinking about linear equations, you might be able to convince yourself that there are only finitely many solutions (in n) to this relation. There are certainly only finitely many solutions where LHS=RHS (well, at most one solution), and only finitely many where 2xLHS=RHS etc etc. But why do something complicated, when we can actually repeat the trick from the beginning, and subtract b(n+b), to obtain

n+b \,\big|\, b^2-b.

For similar reasons to before, this is a great deduction, because it means if b\ne 1, then the RHS is positive, which means only finitely many n can satisfy this relation. Remember we’re trying to show that no n can satisfy this relation if b\ne 1, so this is definitely massive progress!

If any of what’s already happened looked like magic, I hope we can buy into the idea that subtracting multiples of the divisor from the RHS is the only tool we used, and that making the RHS fixed gives a lot of information about the LHS as the free variable varies. The final step is not magic either. We know that f is eventually 1. If you prefer “for large enough n, f(n)=1,” since all other values appear only finitely often. I could write this with quantifiers, but I don’t want to, because that makes it seem more complicated than it is. We genuinely don’t care when the last non-1 value appears.

Anyway, since we’ve deduced this, we absolutely have to substitute this into something we already have. Why not the original problem statement? Fix m, then for all large enough n

n+f(m) \,\big|\, 1+nf(m). (3.3)

To emphasise, (3.3) has to hold for all large enough n. Is it possible that f(m)=2? Again, it’s easy to convince yourself not. But, yet again, why not use the approach we’ve used so profitably before to clear the RHS? In fact, we already did this, and called it (3.2), and we can make that work [3.4], but in this setting, because f(m) is fixed and we’re working with variable large n, it’s better to eliminate n, to get

n+f(m)\,\big|\, f(m)^2-1,

again for all large enough n. By the same size argument as before, this is totally impossible unless f(m)=1. Which means that in fact f(m)=1 for all m. Remember ages ago we assumed that f(n) was not n^2 everywhere, so this gives our two solutions: f(n)=1,\, f(n)=n^2.

Moral: choosing carefully which expression to work with can make life much more interesting later. Eliminating as many variables or difficult things from one side is a good choice. Playing with small values can help you understand the problem, but here you need to think about soft properties of the expression, in particular what happens when you take one variable large while holding another fixed.

[3.4] – if you do use the original approach, you get n^2-1 on the RHS. There’s then the temptation to kill the divisibility by taking n to be the integer in the middle of a large twin prime pair. Unfortunately, the existence of such an n is still just a conjecture

Question Four

(Statement copied from Art of Problem Solving. I’m unsure whether this is the exact wording given to the students in the contest.)

On a circular table sit n>2 students. First, each student has just one candy. At each step, each student chooses one of the following actions:

(A) Gives a candy to the student sitting on his left or to the student sitting on his right.

(B) Separates all its candies in two, possibly empty, sets and gives one set to the student sitting on his left and the other to the student sitting on his right.

At each step, students perform the actions they have chosen at the same time. A distribution of candy is called legitimate if it can occur after a finite number of steps.
Find the number of legitimate distributions.

My moral for this question is this: I’m glad I thought about this on the bus first. What I found hardest here was getting the right answer. My initial thoughts:

  • Do I know how to calculate the total number of possibilities, irrespective of the algorithm? Fortunately yes I do. Marbles-in-urns = barriers between marbles on a line (maybe add one extra marble per urn first). [4.1]
  • What happens if you just use technique a)? Well first you can get into trouble because what happens if you have zero sweets? But fine, let’s temporarily say you can have a negative number of sweets. If n is even, then there’s a clear parity situation developing, as if you colour the children red and blue alternately, at every stage you have n/2 sweets moving from red children to blue and vice versa, so actually the total number of sweets among the red children is constant through the process.
  • What happens if you just use technique b)? This felt much more promising.
  • Can you get all the sweets to one child? I considered looking at the child directly opposite (or almost-directly opposite) and ‘sweeping’ all the sweets away from them. It felt like this would work, except if for some parity reason we couldn’t prevent the final child having one (or more, but probably exactly one) sweets at the crucial moment when all the other sweets got passed to him.

Then I got home, and with some paper, I felt I could do all possibilities with n=5, and all but a few when n=6. My conjecture was that all are possible with n odd, and all are possible with n even, except those when none of the red kids or none of the kids get a sweet. I tried n=8, and there were a few more that I couldn’t construct, but this felt like my failure to be a computer rather than a big problem. Again there’s a trade-off between confirming your answer, and trying to prove it.

Claim: If n is even, you can’t achieve the configurations where either the red children or the blue children have no sweets.

Proof: Suppose you can. That means there’s a first time that all the sweets were on one colour. Call this time T. Without loss of generality, all the sweets are on red at T. Where could the sweets have been at time T-1? I claim they must all have been on blue, which contradicts minimality. Why? Because if at least one red child had at least one sweet, they must have passed at least one sweet to a blue neighbour.

Now it remains to give a construction for all other cases. In the end, my proof has two stages:

Step One: Given a configuration, in two steps, you can move a candy two places to the right, leaving everything else unchanged.

This is enough to settle the n odd case. For the even case, we need an extra step, which really corresponds to an initial phase of the construction.

Step Two: We can make some version of the ‘sweeping’ move precise, to end up in some configuration where the red number of children have any number of sweets except 0 or n.

Step one is not so hard. Realising that step one would be a useful tool to have was probably the one moment where I shifted from feeling like I hadn’t got into the problem to feeling that I’d mostly finished it. As ever in constructions, working out how to do a small local adjustment, which you plan to do lots of times to get a global effect, is great. (Think of how you solve a Rubik’s cube for example.)

Step two is notationally fiddly, and I would think very carefully before writing it up. In the end I didn’t use the sweeping move. Instead, with the observation that you can take an adjacent pair and continually swap their sweets it’s possible to set up an induction.

Actual morals: Observing the possibility to make a small change in a couple of moves (Step one above) was crucial. My original moral does still hold slightly. Writing lots of things down didn’t make life easier, and in the end the ideas on the bus were pretty much everything I needed.

[4.1] – one session to a group of 15 year olds is enough to teach you that the canon is always ‘marbles in urns’ never ‘balls’ nor ‘bags’, let alone both.

EGMO 2016 Paper II

Continuing from yesterday’s account of Paper I, this is a discussion of my thoughts about Paper II of EGMO 2016, happening at the moment in Busteni, Romania. This is not an attempt to describe official solutions, but rather to describe the thought process (well, a thought process) of someone tackling each question. I hope it might be interesting or useful, but for students, it will probably be more useful after at least some engagement with the problems. These are excellent problems, and reading any summary of solutions means you miss the chance to hunt for them yourself.

In actual news, you can follow the scoreboard as it is updated from Romania here. Well done to the UK team on an excellent performance, and hope everyone has enjoyed all aspects of the competition!

Question 4

Circles \omega_1,\omega_2 with the same radius meet at two points X_1,X_2. Circle \omega is externally tangent to \omega_1 at T_1, and internally tangent to \omega_2 at T_2. Prove that lines X_1T_1,X_2T_2 meet on \omega.

Thought 1: I’m not the biggest fan of geometry ever, but I thought this looked like a nice problem, because it’s only really about circles, so I figured it probably wouldn’t require anything too exotic.

Thought 2: I bet lots of people try inversion. But the equal radius condition means I’m probably happy with the circles as they are. I hope lots of people don’t try to place the diagram in some co-ordinate system, even if it possible to do it sensibly (eg by making \omega the reference circle).

Thought 3: The labelling of X_1,X_2 is unrelated to the rest of the indexing. So the intersection of X_1T_2,X_2T_1 should also lie on \omega, and possibly has some relationship (antipodal?) to the point I actually need to find out. But I can’t think of any reason why it’s easier to prove two points lie on a circle than just one, so let’s leave this as a thought rather than an idea.

Idea 1: I drew a terrible diagram on the back of a draft of my abstract, and for once, this was actually kind of helpful. Forget about radii being equal, one of them wasn’t even a circle. Anyway, while drawing in the later points, I was struggling to make it look convincingly like all the lengths which were supposed to be equal were in fact equal. So the idea was: almost all the segments in the diagram (once I’ve defined the circle centres O_{\omega_1} etc) have one of two lengths (the radii of \omega_1,\omega – red and green-ish in the diagram below), and with this in mind I can forget about the circles. We’ve got a rhombus, which is even better than a parallelogram, which is itself a really useful thing to have in a configuration. Another consequence of the proliferation of equal lengths is that almost all triangles are isosceles, and we know that similarity of isosceles triangles is particularly easy, because you only have to match up one angle.

Idea 2: How to prove it? We have to prove that two lines and a circle concur. This is where I actually need to stop and think for a moment: I could define the point where the lines meet and try to show it’s on the circle, or intersect one line with the circle, and show it’s on the other line. Idea 1 basically says I’m doing the problem using lengths, so I should choose the way that fits best with lengths.

20160414_093104

If I define the point P where X_2T_2 meets the circle (this was easier to draw in my diagram), then I know PO_\omega=T_2 O_\omega and so on. Then there were loads of isosceles triangles, and some of them were similar, which led to more parallel lines, and from this you could reverse the construction in the other direction to show that P also lay on the other line.

Question 5

Let k, n be integers such that k\ge 2,\, k\le n\le 2k-1. Place rectangular k x 1 or 1 x k tiles on an n x n chessboard in the natural way with no overlap until no further tile can be placed. Determine the minimum number of tiles that such an arrangement may contain.

Idea 1: It took me a while to parse the question. Minimum over what? I rephrased it in my head as: “to show the answer is eg n+5, I need to show that whenever you place n+4 tiles legally, you can’t add another. I also need to show that you can place n+5 such that you can’t add another.” This made life a lot easier.

Thought 1: What goes wrong if you take n=2k and beyond? Well, you can have two horizontal tiles on a given row. I’m not really sure how this affects the answer, but the fact that there is still space constraint for n<2k is something I should definitely use.

Diversion: I spent a while thinking that the answer was 4 and it was a very easy question. I spent a bit more time thinking that the answer was n, and it was a quite easy question, then realised that neither my construction nor my argument worked.

Thought 2: can I do the cases n=k,or 2k-1 or k+1? The answers were yes, unsure, and yes. The answer to k+1, which I now felt confident was actually four, was helpful, as it gave me a construction for k+2, …, 2k-1 that seemed good, even though it was clearly not optimal for 2k-1. Therefore, currently my potential answer has three regimes, which seemed unlikely, but this seemed a good moment to start trying to prove it was optimal. From now on, I’m assuming I have a configuration from which you can’t add another block.

20160414_100244

Idea 2: About this diagram, note that once I’ve filled out the top-left (k+1)x(k+1) sub-board in this way, there are still lots of ways to complete it, but I do have to have (n-k-1) horizontal and (n-k-1) vertical tiles roughly where I’ve put them. Why? Because I can’t ‘squeeze in’ a vertical tile underneath the blue bit, and I can’t squeeze in a horizontal tile to the right of the blue bit. Indeed, whenever I have a vertical block, there must be vertical blocks either to the left or to the right (*) (or possibly both if we’re near the middle). We need to make this precise, but before doing that, I looked back at where the vertical blocks were in the proposed optimum, and it turns out that all but (k-1) columns include a vertical block, and these (k-1) columns are next to each other.

This feels like a great idea for two reasons: 1) we’ve used the fact that n<2k at (*). 2) this feels very pigeonhole principle-ish. If we had fewer tiles, we’d probably have either at least k columns or least k rows without a vertical (or, respectively, horizontal) tile. Say k columns don’t include a vertical tile – so long as they are next to each other (which I think I know) we can probably include a horizontal tile somewhere in there.

So what’s left to do? Check the previous sentence actually works (maybe it’s full of horizontal tiles already?), and check the numerics of the pigeonhole bound. Also work out how the case n=2k-1 fits, but it seems like I’ve had some (/most) of the good ideas, so I stopped here.

Question 6

EGMO2016 Q6

I don’t actually want to say very much about this, because I didn’t finish off all the details. I want to talk briefly in quite vague terms about what to do if you think this problem looks scary. I thought it looked a bit scary because it looked similar to two number theoretic things I remember: 1) primes in arithmetic progressions. This is very technical in general, but I can remember how to do 3 mod 4 fairly easily, and 1 mod 4 with one extra idea; 2) if a square-free number can be written as a sum of two squares, this controls its factors modulo 4.

Vague Ideas: It seemed unlikely that this would involve copying a technical argument, so I thought about why I shouldn’t be scared. I think I shouldn’t be scared of the non-existence part. Often when I want to show there are no integer solutions to an equation, I consider showing there are no solutions modulo some base, and maybe this will be exactly what I do here. I’ll need to convert this statement about divisibility into an equation (hopefully) and check that n\equiv 3,4 modulo 7 doesn’t work.

For the existence of infinitely many solutions, maybe I’d use Chinese Remainder Theorem [1], or I’ll reduce it to something that I know has lots of solutions (eg Pythagoras), or maybe I can describe some explicit solutions?

Actual Idea 1: We’ve got n^2+m | n^4, but this is a very inefficient statement, since the RHS is a lot larger than the LHS, so to be useful I should subtract off a large multiple of the LHS. Difference of two squares is a good thing to try always, or I could do it manually. Either way, I get n^2+m | m^2 which is genuinely useful, given I know m=1,2, …, 2n, because the RHS is now comparable in size to the LHS, so I’ve narrowed it down from roughly n^2 possibilities to just three:

n^2+m=m^2,\quad 2(n^2+m)=m^2,\quad 3(n^2+m)=m^2. (*)

I’m going to stop now, because we’ve turned it into a completely different problem, which might be hard, but at least in principle this is solvable. I hope we aren’t actually scared of (*), since it looks like some problems we have solved before. I could handle one of these in a couple of lines, then struggled a bit more with the other pair. I dealt with one by recourse to some theory, and the final one by recourse to some theory after a lot of rearranging which I almost certainly got wrong, but I think I made an even number of mistakes rather than an odd number because I got the correct solution set modulo 7. Anyway, getting to (*) felt like the majority of the ideas, and certainly removed the fear factor of the Q6 label, so to fit the purpose of this discussion I’ll stop here.

[1] During one lunch in Lancaster, we were discussing why Chinese Remainder Theorem is called this. The claim was that an ancient Chinese general wanted to know the size of their army but it was too big to count, so had them arrange themselves into columns of various sizes, and counted the remainders. The general’s views on the efficiency of this algorithm remain lost in the mists of time.

ISL14 N6 – Sums of Squares of Intervals

I wasn’t able to post this at the time because of the embargo on discussing this particular question until the end of IMO 2015.

I’m currently enjoying the slow descent into summer in the delightful surroundings of Tonbridge School where we are holding our final camp to select the UK team for this year’s IMO. The mechanism for this selection is a series of four exams following the format of the IMO itself. We take many of the questions for these tests from the shortlist of questions proposed but not used at the previous year’s competition. In this post, I’m going to talk about one of these questions. Obviously, since lots of other countries do the same thing, there is an embargo on publication or public-facing discussion of these problems until after the next competition, so this won’t appear for a couple of months.

The statement of the problem is the following:

IMO Shortlist 2014, N6: Let a_1<a_2<\ldots<a_n be pairwise coprime positive integers with a_1 a prime at least n+2. On the segment I=[0,a_1a_2\ldots a_n] of the real line, mark all integers that are divisible by at least one of the numbers a_1,\ldots,a_n. These points split I into a number of smaller segments. Prove that the sum of the squares of the lengths of these segments is divisible by a_1.

I marked our students’ attempts at this problem and spoke to them about it afterwards. In particular, the official solution we were provided with, and which we photocopied and gave to our students contains some good theory and some magic. On closer inspection, the magic turns out to be unnecessary, and actually distracts a little from the theory. Anyway, this is my guided tour of the problem.

We begin with two items from the rough work that some students submitted. Someone treated the case n=2, which is not quite trivial, but not hard at all, and remarked, sensibly, that n=3 seemed to have become hard. Packaged with this was the implicit observation that for the purpose of the question posed, a_1 plays a completely different role to the other quantities. While this seems obvious when presented in isolation, it’s a good thing to say explicitly so that we avoid wasting time imagining symmetries which are definitely not present later. We’ll sometimes refer to a_1 as p, whenever we are using its primality rather than its other roles.

The second idea coming out of rough work was to consider what happens when an interval gets broken into two by a ‘new point’, whatever that means. In particular, what happens to the sum of the squares of the interval lengths modulo p? The way the student had set up this procedure suggested that the calculations would be easy in the case where the point being added was a multiple of p=a_1. This ended up being a red herring, because there’s no real reason why we would want to add the multiples of a_1 at the end.

It does, however, suggest an inductive argument, if instead you add the points which are multiples of a_n, but not of any of \{a_1,\ldots,a_{n-1}\}, because the latter would already be present. It might seem like induction is illegal, since there is the condition a_1\ge n+2 already present in the statement. However, once a_1 is fixed, if the condition holds for a particular n, then clearly it holds for all smaller n (!), and so we can use induction to build up to the value of n required in the statement.

Now we are happy this might work in principle, let’s do it in practice. We are adding all these multiples of a_n so we want to have some way to index them. One choice is to use their actual values, or the quotient by a_n, ie all the integers up to \prod_{i=1}^{n-1}a_i which are coprime to a_n. But maybe after thinking about it, there will be a more helpful way to index them.

We are only interested in non-trivial splittings of intervals. We might initially be concerned that an existing interval might be split into more than two sub-intervals by the addition of the multiples of a_n, but the ordering suggested in the statement means this can’t happen. In general, let the initial size of the interval be K, and the sizes of the two sub-intervals be x and y for reasons which will become clear. Then the change in contribution to the total sum of interval lengths squared is given by

K^2 \quad \mapsto \quad x^2+y^2= K^2 - 2xy,

using that K=x+y. So the total change when we add all the multiples of a_{n} will be the sum of 2xy over all the intervals which get divided into two.

Now we have the challenge of evaluating this sum, or at least showing that it is divisible by p. Given a multiple a of a_1, what will x be?

Well, on the left of a, the distance to the nearest multiple of a_k is the remainder r_k of a on division by a_k, essentially by definition. So x will be the minimal such remainder as k varies. Similarly, y will be \min_k (a_k-r_k). We forget about the minus sign and the factor of 2 forever, since it doesn’t make any difference unless p=2, in which case we are already done, and so we now have our sum:

\sum_{a\text{ a new multiples of }a_n} \left( \min_i(r_i)\right) \left[ \min_j(a_j-r_j) \right].

Now we return to the question of how to index the new multiples of a_n. But the question’s easier now as we have some clues as to what might be useful. Each of the summands involves the remainders modulo the smaller a_is, so can we use these as part of the indexing?

Yes, of course we can. There’s a bijection between the possible sequences of remainders, and the new multiples of a_n. This is basically the statement of the Chinese Remainder Theorem. Our index set should then be

r_1,\ldots,r_{n-1} \in [1,a_1-1]\times\ldots\times [1,a_{n-1}-1].

Remember that the fact that they are not allowed to be zero is because we only care now about new points being added.

Even now though, it’s not obvious how to compute this sum, so the natural thing to try is to invert our view of what we are summing over. That is, to consider how many of the index sequences result in particular values of x and y. Given x and y at least one, each remainder r_k must lie in [x,a_k-y], ie there are a_i-(x+y)+1 values it is allowed to take. So our guess for the number of such indices might be \prod_{i=1}^{n-1} (a_i-(x+y)+1), which is indeed what I wrote up on the board before Neel pointed out that some of the sequences I’ve allowed do not actually attain this minimum.

Fortunately this is easily fixed. If the minimum for x is not attained, we know that r_k\in[x+1,a_k-y], and similarly for y. So we can apply the inclusion-exclusion principle to depth two to say that the number we actually want is

\prod_{i=1}^{n-1} (a_i-(x+y)+1) - 2\prod_{i=1}^{n-1} (a_i-(x+y)) + \prod_{i=1}^{n-1} (a_i-(x+y)-1).

This is a polynomial in (x+y), so we’ll call in P(x+y). We’ll come back to this expression. We still need to tidy up our overall sum, in particular by deciding what bounds we want on x and y. Thinking about the intervals of width p, it is clear that the bounds should be 1\le x,y and x+y\le p. So we currently have our sum as

\sum_{1\le x,y,\,x+y\le p} xy P(x+y),.

If you’ve got to this stage, I hope you would feel that this seems really promising. You’ve got to show that some sum of polynomial-like things over a reasonably well-behaved index set is divisible by p. If you don’t know general theorems that might confirm that this is true, once you’ve got statements like this, you might well guess then verify their existence.

First though, we need to split up this xy term, since our sum is really now all about x+y, which we’ll call z.

\sum_{2\le z\le p} P(z) \sum_{x+y=z,x,y\ge 1} xy.

Once we’ve had our thought that we mainly care about polynomials mod p at this stage, we might even save ourselves some work and just observe that \sum_{x+y=z}xy is a cubic in z. This is definitely not hard to check, using the formulae for triangle numbers and sums of squares. It’s also reasonable to conclude that the value of this is 0 for z=1, so we can change the indices on the sum so it doesn’t look like we’re missing a term. So we have

\sum_{z=1}^p P(z) Q(z),

where Q is some cubic, which we could work out if we really wanted to. But what’s the degree of P? P is made of a sum of 3 polynomials whose degree is clear because they are a product of linear factors. So each of these has degree (n-1), but in fact we can clearly see that the leading term cancels, and also with a tiny bit more care that the second order terms cancel too, so P has degree at most n-3.

[Tangent: students might well ask why it is convention to take the degree of the identically zero polynomial to be -\infty rather than zero. The fact that the above statement is true even when n=2 is a good justification for this.]

[Tangent 2: In my initially erroneous evaluation of the number of remainder sequences, without the inclusion-exclusion argument, I indeed ended up with P having degree n-1. I also worked out Q as being quadratic though, so my overall necessary condition was only one off from what was required, showing that two wrongs are not as good as two rights, but marginally better than only one wrong.]

Anyway, we are taking a sum of polynomials of degree at most n \le p-2. This is the point where one might get confused if you have been specifying everything in too much detail. The fact that I’ve said nothing about the polynomials apart from their degree is predicated on the fact that I now know I only need to only their degrees for what I want, but even if you had tracked everything very explicitly, you might still guess what to do at this stage.

It’s presented as a lemma in the official solution that if you have a polynomial of degree at most p-2, then \sum_{z=1}^p P(z)\equiv 0 modulo p. There are several ways to see this, of which a proof by induction is easiest to believe, but probably least intuitive. It’s clear what goes wrong for degree p-1 by FLT, so you probably want to look at the set \{z^k: z\in[1,p-1]\} modulo p. It won’t necessarily be all the residues modulo p, but thinking about primitive roots, or the roots of unity makes it fairly clear why this would be true. [Essentially any argument that turns multiplication into addition in a well-behaved way clears this up.]

The most important point is that this lemma is a) easy-ish and b) a natural thing to try given where you might have got in the question at this stage. From here you’re now basically done.

I don’t think this is an easy question at all, since even with lots of guiding through the steps there are a lot of tangents one might get drawn down, especially investigating exact forms of polynomials which it turns out we only need vague information about. Nonetheless, there are lots of lessons to be learned from the argument, which definitely makes it a good question at this level in my opinion.

How to Prove Fermat’s Little Theorem

The following article was prompted by a question from one of my mentees on the Senior Mentoring Scheme. A pdf version is also available.

Background Ramble

When students first meet problems in number theory, it often seems rather different from other topics encountered at a similar time. For example, in Euclidean geometry, we immediately meet the criteria for triangle similarity or congruence, and various circle theorems. Similarly, in any introduction to inequalities, you will see AM-GM, Cauchy-Schwarz, and after a quick online search it becomes apparent that these are merely the tip of the iceberg for the bewildering array of useful results that a student could add to their toolkit.

Initially, number theory lacks such milestones. In this respect, it is rather like combinatorics. However, bar one or two hugely general ideas, a student gets better at olympiad combinatorics questions by trying lots of olympiad combinatorics questions.

I don’t think this is quite the case for the fledgling number theorist. For them, a key transition is to become comfortable with some ideas and notation, particularly modular arithmetic, which make it possible to express natural properties rather neatly. The fact that multiplication is well-defined modulo n is important, but not terribly surprising. The Chinese Remainder Theorem is a `theorem’ only in that it is useful and requires proof. When you ask a capable 15-year-old why an arithmetic progression with common difference 7 must contain multiples of 3, they will often say exactly the right thing. Many will even give an explanation for the regularity in occurrence of these which is precisely the content of the theorem. The key to improving number theory problem solving skills is to take these ideas, which are probably obvious, but sitting passively at the back of your mind, and actively think to deploy them in questions.

Fermat’s Little Theorem

It can therefore come as a bit of a shock to meet your first non-obvious (by which I mean, the first result which seems surprising, even after you’ve thought about it for a while) theorem, which will typically be Fermat’s Little Theorem. This states that:

\text{For a prime }p,\text{ and }a\text{ any integer:}\quad a^p\equiv a\mod p. (1)

Remarks

  • Students are typically prompted to check this result for the small cases p=3, 5 and 7. Trying p=9 confirms that we do really need the condition that p be prime. This appears on the 2012 November Senior Mentoring problem sheet and is a very worthwhile exercise in recently acquired ideas, so I will say no more about it here.
  • Note that the statement of FLT is completely obvious when a is a multiple of p. The rest of the time, a is coprime to p, so we can divide by a to get the equivalent statement:

\text{For a prime }p,\text{ and }a\text{ any integer coprime to }p:\quad a^{p-1}\equiv 1\mod p. (2)

  • Sometimes it will be easier to prove (2) than (1). More importantly, (2) is sometimes easier to use in problems. For example, to show a^{p^2}\equiv a \mod p, it suffices to write as:

a^{p^2}\equiv a^{(p-1)(p+1)+1}\equiv (a^{p-1})^{p+1}\times a\equiv 1^{p+1}\times a \equiv a.

  • A word of warning. FLT is one of those theorems which it is tempting to use on every problem you meet, once you know the statement. Try to resist this temptation! Also check the statement with small numbers (eg p=3 ,a=2) the first few times you use it, as with any new theorem. You might be surprised how often solutions contain assertions along the lines of

a^p\equiv p \mod (a-1).

Proofs

I must have used FLT dozens of times (or at least tried to use it – see the previous remark), before I really got to grips with a proof. I think I was daunted by the fact that the best method for, say, p=7, a careful systematic check, would clearly not work in the general case. FLT has several nice proofs, and is well worth thinking about for a while before reading what follows. However, I hope these hints provide a useful prompt towards discovering some of the more interesting arguments.

Induction on a to prove (1)

  • Suppose a^p\equiv a\mod p. Now consider (a+1)^p modulo p.
  • What happens to each of the (p+1) terms in the expansions?
  • If necessary, look at the expansion in the special case p=5 or 7, formulate a conjecture, then prove it for general p.

Congruence classes modulo p to prove (2)

  • Consider the set \{a,2a,3a,\ldots,(p-1)a\} modulo p.
  • What is this set? If the answer seems obvious, think about what you would have to check for a formal proof.
  • What could you do now to learn something about a^{p-1}?

Combinatorics to prove (1)

  • Suppose I want a necklace with p beads, and I have a colours for these beads. We count how many arrangements are possible.
  • Initially, I have the string in a line, so there are p labelled places for beads. How many arrangements?
  • Join the two ends. It is now a circle, so we don’t mind where the labelling starts: Red-Green-Blue is the same as Green-Blue-Red.
  • So, we’ve counted some arrangements more than once. How many have we counted exactly once?
  • How many have we counted exactly p times? Have we counted any arrangements some other number of times?

Group Theory to prove (2)

This is mainly for the interest of students who have seen some of the material for FP3, or some other introduction to groups.

  • Can we view multiplication modulo p as a group? Which elements might we have to ignore to ensure that we have inverses?
  • What is \{1,a,a^2,a^3,\ldots\} in this context? Which axiom is hardest to check?
  • How is the size of the set of powers of a modulo p related to the size of the whole group of congruences?
  • Which of the previous three proofs is this argument is most similar to this one?
  • Can you extend this to show the Fermat-Euler Theorem:

\text{For any integer }n,\text{ and }a\text{ coprime to }n:\quad a^{\phi(n)}\equiv 1 \mod n,

where \phi(n) counts how many integers between 1 and n are coprime to n.

Modular arithmetic – Beyond the Definitions

Modular arithmetic is a relatively simple idea to define. The natural motivation is to consider a clock. The display of a standard analogue clock makes no distinction between 4am, 4pm, and 4pm next Thursday. This is a direct visualisation of the integers modulo 12. Instead of counting in the usual way, where each successive integer is different to all those considered previously, here, every time we get to a multiple of 12, we reset our count back to zero. As a result, this procedure is often referred to as `clock arithmetic’.

A common problem for good students, for example those starting the UKMT’s Senior Mentoring Scheme, is that the idea of modular arithmetic seems very simple, but it’s hard to work out how it might be useful in application to problems. My claim is that the language of modular arithmetic is often the best way to discuss divisibility properties in problems about whole numbers. In particular, the behaviour of powers (ie m^n and so forth) is nice in this context, and the notation of modular arithmetic is the only sensible way to approach it. Anyway, let’s begin with a quick review of the definitions.

Definitions

We are interested in divisibility by some fixed integer n\geq 2, and the remainders given after we divide by n. Given an integer M, we can write this as:

M=kn+a,\quad\text{ where }a=0,1,\ldots,n-1,

and this decomposition is unique. We then say that M is congruent to a modulo n. Note that working modulo n, means that we are interested in remainders after division by n (rather than a or k or M or anything else). This has the feeling of a function or algorithm applied to M. We get told what M is, then work out the remainder after division by n, and say that this is `M mod n‘.

This is fine, but it very much worth bearing in mind a slightly different interpretation. Working modulo n is a way of saying that we aren’t interested in the exact value of an integer, only where it lies on the n-clock. In particular, this means we are viewing lots of integers as the same. The `sameness’ is actually more important in lots of arguments than the position on the n-clock.

More formally, we say that a\equiv b or a is congruent to b modulo n, if they have the same remainder after division by n. Another way of writing this is that

a\equiv b\quad \iff \quad n|a-b.

Sets of integers which are equivalent are called congruence classes. For example \{\ldots,-4,-1,2,5,8,\ldots\} is a congruence class modulo 3. Note that under the first definition, we consider all elements here to be congruent to 2, but in a particular question it may be more useful to consider elements congruent to -1, or some combination.

These definitions are equivalent, but it can be more useful to apply this second definition for proving things, rather than writing out b=kn+a or whatever all the time.

Addition and Multiplication

Everything has been set up in terms of addition, so it is easy to see that addition works well on congruence classes. That is:

\text{If }a\equiv b,c\equiv d,\quad\text{then }a+c\equiv b+d.

We could argue via a clock argument, but the second definition works very well here:

\text{We have }n|a-b,n|c-d,\quad\text{and so }n|(a+c)-(b+d),\text{ exactly as required.}

We want to show that a similar result happens for multiplication. But this holds as well:

\text{If }a\equiv b,c\equiv d,\quad\text{then }n|c(b-a)\text{ and }n|b(c-d).

\Rightarrow n|ac-bd,\text{ that is }ac\equiv bd.

Let’s just recap what this means. If we want to know what 157\times 32 is modulo 9, it suffices to say that 157\equiv 4 and 32\equiv 5, and so 157\times 32\equiv 4\times 5\equiv 2. In a more abstract setting, hopefully the following makes sense:

\text{ODD}\times\text{ODD}=\text{ODD};\quad \text{EVEN}\times\text{EVEN}=\text{EVEN};\quad \text{ODD}\times\text{EVEN}=\text{EVEN}.

This is exactly the statement of the result in the case n=2, where the congruence classes are the odd integers and the even integers.

Powers

What happens if we try to extend to powers? Is it the case that

\text{if }a\equiv b,c\equiv d,\quad\text{then }a^c\equiv b^d? Continue reading