# The Top-to-Random Shuffle

This article is based on a talk I gave to the Maths Society at St Paul’s School on Monday. It may turn into a short series if I have time before I go to ALEA in Luminy near Marseille on Saturday.

My original plan had been to talk about riffle-shuffling, and some of the interesting mixing time themed results one can obtain. As a motivating example, I began by discussing the simpler top-to-random shuffle, and this proved sufficiently interesting to occupy the time I had been allowed (and mea culpa a bit more). It therefore seems worth writing a hopefully moderately accessible blog post on the subject. The aim of this post at least is to discuss the idea that repeatedly shuffling brings a pack of cards close to randomness. We have to settle on a definition of ‘close to randomness’, and find some ways to calculate this.

Suppose we are playing some bizarre card game where it is necessary that three cards labelled, uncontroversially, 1, 2 and 3 need to be placed in a random order. If we are organised, we can write down all the ways to do this in a list:

123, 132, 213, 231, 312, 321.

We want to select each of these with equal probability. We could for example use a dice. Most relevantly, even a computer as ancient as my laptop is very happy simulating a random choice from this set. (Now is not the time to talk about exactly how pseudo-random or otherwise this choice would be.)

Of course, when we play a sensible card game we have not three cards, but fifty-two. So the approach described above still works in theory, but no longer in practice, as the list of possible arrangements now has size 52!. Recall this is defined to be

$52!=1\times 2 \times\ldots \times 52.$

The reason we get this particular expression is that when we are choosing the first card, we have 52 possible choices. Then, regardless of what this first card actually is, there are precisely 51 cards left from which to choose the second card. So there are 52×51 ways to pick the first two cards in the arrangement, and so on, giving the answer. We can approximate how large 52! is by counting powers of ten rather crudely. It seems reasonable that it should be about $10^{65}$. Note that the number of atoms in the universe is *only* about $10^{80}$, so if we are going to write down this list, we better have very compact handwriting! But being serious, this number is way too large to realistically compute with, so we have to come up with some cleverer methods.

One way is to spread the cards out on a table then pick them up one at a time, ensuring at all times that the choice of card is uniform among those currently present, and not related to any of the past choices. This is relatively easy for a computer, but hard for a human, and certainly deeply tedious for anyone waiting to receive their hand!

So we seek a different approach, namely an algorithm for shuffling. Our aim is to introduce overall randomness by repeatedly applying some simple but random process. Note we have to be careful about our definition of ‘random’ here. The permutation 123456 is just as ‘random’ as the permutation 361524. That is, if they are fixed, then they are not random at all. Just because it is easier to decribe one of them verbally does not mean it is less random. For example, if I am trying to cheat at poker, then I might be able to if I knew the exact order of the cards in the pack before the dealer dealt. It wouldn’t matter what that order was. I would have to adjust my strategy based on the order, but it wouldn’t affect the fact that I had a massive advantage!

The shuffling algorithm to be discussed here is the top-to-random shuffle. Like all the best things in life, this does exactly what it says on the tin. At a given time, we remove the top card from the deck at present, and insert it at a randomly chosen point in the deck. This could be on the bottom, and it could also be back on the top. It feels like this possibility to remain constant can’t possibly help us, but later we will discuss why we need this.

In any case, it feels natural that if we keep applying this procedure, the arrangement of the deck should start to get more and more random, in the sense that knowing the original arrangement will tell us successively little about the current arrangement as time progresses. But we need to find a way to quantify this if we are to do any mathematics.

When we are talking about real numbers, it is fairly clear what it means if I say that the numbers 2, 1.1, 1.01, 1.001 and so on are getting closer and closer to 1. Indeed we can measure the distance along the number line between each term and 1, using the absolute difference. It is not so clear how to compute the distance between two probability distributions. Bearing in mind the fact that a distribution on the set of permutations of cards is defined to be a set of 52! probabilities that sum to 1, there will be a 52!-1 dimensional space (eg the plane is two-dimensional, the world is three-dimensional, *and so on* – whatever that means) where we have a nice distance formula already.

But this is not what we will choose to use. Rather we return to the cheating-at-poker analogy. Suppose I am playing some sort of game involving the pack of cards with my enemy. He or she thinks the deck is perfectly random, but I know the actual distribution. How big a profit can I make by exploiting this knowledge? This will be our measure of how far a distribution is from uniform. It turns out that this will coincide precisely with the formal definition of total variation distance, but that language belongs to a different level of rigour and is not relevant here.

What is relevant is an explanatory example. Suppose we start with the arrangement 12345678. We are now going to perform one iteration of the top-to-random shuffle. The outcome might, for example, be 23456178, if we insert the 1 between the 6 and the 7. Note there were 8 places for the card to go, so the probability of this particular outcome is 1/8. Now let’s see how I might use my knowledge of the distribution to my advantage. Suppose I suggest the bet that the bottom card is an 8. My enemy thinks the stack is uniformly randomly arranged, so the probability of this is 1/8. On the other hand, I know that the only way the 8 might disappear from the bottom is if I place the 1 under it, which happens with probability 1/8. So in fact, I know the probability of this event is 7/8, which gives me an advantage of 3/4. In fact, I could come up with bets that do even better than this, but they are less simple to describe verbally.

At what point do I lose this advantage? Well, we said that the probability that the 8 leaves the bottom of the stack is 1/8. And it will continue to be 1/8 on every turn where it is at the bottom. Recalling that the outcomes of successive shuffles are independent, note this is reminiscent of rolling a dice until a six comes up. The number of rolls required to get the six is an example of a geometric random variable. I don’t want to spoil S1 (or whichever module) by going into too much detail, but it turns out that if the probability of an event happening on a single go is p, then the average time we have to wait is 1/p. So 1/(1/8)=8 of course, and this is how long we typically have to wait before the bet I placed before becomes much less effective.

Now seems like a good time to stop talking about 8 cards and start talking about n cards. Obviously, in practice, we will want n to be 52. Anyway, by the same argument as before, it takes on average n iterations before the bottom card leaves the bottom. This is important, because after then, my bet that the bottom card is n is no longer so effective. However, I could equally place a bet that one of the bottom *two* cards is n.

So we consider how long it takes before n is no longer one of the bottom two cards. Well certainly we need to wait until it is no long *the* bottom card, which takes time n on average. Then, once it is second bottom, there is now a 2/n chance that we move the previously top card below it, so by the same argument as before, the time for this to happen is n/2 on average. If we want this effect to disappear, we have to wait until the original bottom card is in fact at the top of the pile for the first time, and by extending our previous argument, the average time for this is

$n+\frac{n}{2}+\frac{n}{3}+\ldots+\frac{n}{n-1}.$

Fortunately, we have tools for approximating this sort of sum, in particular integration, which is the practice of finding the area under certain curves. It turns out that the answer is roughly n log n. You can think of as log n a measure of the number of digits required to write out n. (This is not the exact definition but it will do for now. In any case, log n gets larger as n gets larger, but not very fast.) There’s a lot more about this in my previous post on the coupon collector problem, from a more technical point of view.

The next question will be to prove that it is actually quite well shuffled by this time, but that’s for another post. The other question to ask is whether this is satisfactory overall? For n=52, the number of operations we have to perform is about 230, which is fine for a computer, but deeply tedious for anyone sitting at a casino table waiting for the next hand. So next time we’ll talk about the riffle shuffle, which seems to introduce a lot of randomness in each go, but we’ll also see that we have to be careful, because the randomness may not be as great as our intuition suggests.