Guess 2/3 of the average

The following problem was doing the rounds on some email lists in Oxford this week:

“You and a group of colleagues are all invited to pick an integer between 0 and 100. The average x is calculated, and the person whose guess is unique and comes closest to 2x/3 (among the unique guesses) is declared the winner.”

Although it wasn’t specified in the setting, one assumes that if two unique guesses were equally close to 2/3 of the average, then the two participants shared the prize.

In any case, it is interesting to ask the question, what is the optimal strategy?

I don’t know very much about game theory, so I’m going to say some things that I hope are true. I don’t claim that this is the best or even a good way of thinking about the situation. If we break the problem down into two parts, we have two of the more famous problems in game theory. To consider the first part, we remove the requirement that the winner’s guess be unique, and allow the participants to pick any real number between 0 and 100.

It seems to me there are three ways to do this. Let’s say we have K possible winners who guessed closed to 2x/3. We could give each of them the fixed prize, say £1. Or we could give each of them £1/K, or we could choose one at random and give them £1. I claim that the latter two methods are essentially the same, and so will ignore the third possibility.

Note that 2x/3 will definitely be at most 200/3. Suppose I pick a real value y that is greater than 200/3. What would happen if instead I picked 200/3? Well the value of x would decrease slightly. How much it decreases would depend on the number of people playing the game. It is not hard to check that we end up closer to 2x/3 if we pick 200/3 instead of y. Let’s call \alpha=\frac{200}{3}.

We might say that the strategy “pick \alpha” dominates the strategy “pick y”, because we always do at least as well with the first strategy as the second, regardless of what the other players do, and indeed in some cases do strictly better. Now comes the potentially philosophical bit. We have concluded that it would be irrational for us to pick any number greater than \alpha. We also know that the email was sent to mathematicians in Oxford, whom we might hope we could safely assume were equally rational. So now we are playing the same game as before, only with the bounds that we should pick a number between 0 and \alpha. We can then apply exactly the same argument to show that it would be irrational to pick any number greater than \frac{2\alpha}{3}. This equally applies to all the other agents, and so we continue, showing that for any k, it is irrational to pick any number greater than (\frac23)^k\alpha. In conclusion, the only rational strategy is to pick 0, and hence everyone will do this.

For the prize structure suggested, this is a Nash equilibrium, since no-one can improve their winnings by changing strategy (with everyone else’s kept the same). This principle is in general referred to as iterated elimination of dominated strategies. The assumption that, not only are players rational, but that all players know that all other players are rational, and all players know that all players know the others rational and so on, is called Common Knowledge of Rationality. This doesn’t seem hugely controversial in this artificial setting, but it is not hard to think of examples where this might not hold. For example, some players might be facing the same situation, but have different utility functions, so especially if there is some randomness in the dynamics, people will have different ‘rational’ opinions concerning strategies, and unless you know their utility functions, it might seem like the other players are not acting rationally.

This argument works equally well with integers. We first eliminate all possibilities greater than 67, and then keep dividing by 2/3, but we have to take the ceiling function to get the new upper bound. As a result, after repeated application of this idea, we conclude that the only rational strategies are to pick 0 or 1.

It can be seen that neither strategy dominates the other. If more than ¾ of the agents choose 1, then choosing 1 is the optimal strategy; if less than ¾ choose 1, then choosing 0 is better. If exactly ¾ choose 1 then everyone wins. There are, however, only two Nash equilibrium, given by everyone choosing 1, and everyone choosing 0. It is tempting to argue that we might expect roughly ½ the agents to choose 1, and so our better option is to choose 0. But we are assuming the other agents are rational, and there is no a priori reason why picking 0 or 1 with probability ½ each should be the rational thing to do.

However, choosing randomly is a possible strategy. In this setting, it becomes harder to talk about Nash equilibria. When there are just two agents, the set of even random strategies for each agent is relatively manageable, but when there is some arbitrary number of players, the set of possible strategies might be very complicated indeed.

The final stage of this integer problem is morally equivalent to playing a game where everyone has to pick 0 or 1, and you win if you are in the majority. For this problem, it is clear that the only Nash equilibria are for everyone to pick either 0 or 1. If the winners share the prize-money equally, then this game has the interesting property that every configuration is Pareto optimal. This means that no-one can improve their winnings without causing someone else’s situation to decrease. When it comes to deciding what to do, with no knowledge of other people’s strategies, we have made little progress. At least here the symmetry between 0 and 1 makes it clear how little progress we can make. The only thing we can say is that if the utility is concave (a generally reasonable assumption), then by conditioning on what everyone else does, an optimal strategy is to pick 0 with probability 1/2.

The problem with saying anything sensible in these sorts of problem with a zero-sum condition (as we effectively have if the winnings are shared) is the symmetry between all the players. Given any strategy, if everyone used that strategy, then everyone would win 1/N in expectation. This applies equally well if we add the condition that the winner has to pick something unique.

The only question we can perhaps say something about occurs if we say that no-one wins the original game if there are no unique integer answers. Then it is worth considering which strategy could everyone adopt so as to minimise the probability of this happening. For the reasons given above, we should indeed force everyone to take the same strategy, which we can interpret as a distribution on the integers. So which distribution on the integers between 1 and k maximises the probability of having at least one unique value?

We also need to know how many people are playing, so let’s say this is N. I don’t know how to solve the general problem, but I can say something if we fix either k or N and then let the other one go to infinity.

If we fix the number of people and let k go to infinity, then we can get arbitrarily close to probability one by using the uniform distribution. Indeed, any sequence of distributions where the probabilities converge uniformly to zero ( ie \max_{x\in[k]} \mathbb{P}(x) \rightarrow 0 ) has this property.

The case where k is fixed and the number of players goes to infinity is a bit trickier. Essentially, if we choose any fixed distribution, the event we seek becomes less and less likely as the number of agents grows. It is equivalent to demanding that if we roll a dice a million times, we only see precisely one six. If we replace a million by larger numbers, this probability decays exponentially.

If we want to maximise the probability of getting precise one agent choosing value 1, we observe that the number of agents choosing that value is binomial(N,p), where p is the probability of that value. If p=o(1/N) then asymptotically, with high probability the number of 1s is 0. If p=w(1/N) then asymptotically, with high probability, the number of 1s is \sim pN. Taking p=\frac{\alpha}{N}, the asymptotic distribution of the number of 1s is Poisson with parameter \alpha. We can solve to see which value of \alpha maximises the probability of getting a single 1. It turns out, unsurprisingly after considering the expectation of the corresponding pre-limit binomial distribution, that this maximum is achieved at \alpha=1.

So note now that if we take the probabilities of picking 1 and 2 both to be 1/N, we get two Poisson random variables asymptotically. For a similar argument to the construction of the Poisson process from independent infinitesimal increments, the covariance of these tends to 0. So I conjecture, and I reckon it is probably not too hard to come up with a formal proof that for large N, the optimal distribution looks like:

(\frac{1}{N},\ldots,\frac{1}{N},1-\frac{k-1}{N}),

or some permutation of that.

Enhanced by Zemanta

Responding to the Monty Hall Problem

It’s been a while since I got round to writing anything, so I’m easing myself back in by avoiding anything too mathematically strenuous. Shortly after my last post, there was a nice clip on Horizon featuring Marcus du Sautoy explaining the famous Monty Hall Problem to the ever-incredulous Alan Davies. The BBC article here shows the clip, and then an excellent account by probabilist John Moriarty explaining the supposed paradox.

Recall the situation. A game show is hosted by Monty Hall. At some point, to decide the grand prize, or whatever, a contestant plays the following lottery. There are three doors, and three prizes, one behind each. One of the prizes is worthwhile, normally a car, while the other two are rubbish. For reasons I don’t understand, it is normally said that behind the other two doors are goats. Anyway, MH knows where the car is from the beginning. Then the contestant picks a door, but doesn’t open it yet. MH then dramatically opens a different door, revealing a goat. The contestant then has the option to stick with his original choice, or switch to the third door. What should he or she do?

So the natural error to make is that since one possibility has been eliminated, since the remaining two were a priori equally likely, they should still be equally likely, and hence each occur with probability 1/2. So it makes no difference whether you stay or switch. But, of course, as well as removing information (ruling out one door), we have also gained information. The fact that MH chose to reveal that door, rather than the potential third door is important.

In my opinion, the best way to think about this is to imagine you are the host. The nice thing about being in this position is that then you can imagine you know everything about the configuration in advance, so are less likely to fall foul of counter-intuitive conditional probabilities. Anyway, there’s loads of symmetry, so from my point of view as the host, I only care whether the contestant initially selects the right door or the wrong door. If he selects the wrong door, then I need to open the ONLY OTHER wrong door. In other words, I have no choice in what I do next. I open the other wrong door. Therefore in this case the contestant should switch. If the contestant in fact selects the right door initially, I then have a choice about which door to open. However, it doesn’t really matter which I pick, by symmetry, and in either case the contestant would do better to stay with his original choice. Note that we have defined the outcome entirely by the initial choice. And as the host, I knew where the car was at the start, so from my point of view, unless he had somehow cheated, the contestant always had a 1/3 chance of guessing correctly. So it is better to switch 2/3 of the time.

The BBC’s weekly collation feature, The Loop, printed a couple of responses to the article. This is one:

“I believe that the problem is a game of two halves. In the first instance you have a 1:3 chance of getting the prize. Once you’ve made your initial choice, even though the ‘box’ contents remain the same, one ‘false’ choice is removed and you are then offered the chance to choose again. There is a play on words here, perhaps, with the option to ‘switch’. Effectively you are being asked to make a choice of 1:2. Now, if this is always going to happen then the odds of you winning the prize in the first instance is 1:2, as it doesn’t really matter whether or not you choose correctly the first time – one ‘false’ is going to be removed and you’ll be presented with just two options. It’s mind trickery. The first choice is totally irrelevant because you will always end up having to pick one option from two in order to attain the prize. In my humble opinion.”

I highlight this not to inspire contempt, since it is an easy mistake to make, even after reading as nice an explanation as Moriarty’s. Rather it is an excellent example of how using too many words can convince you an argument is complete before you’ve really made any concrete statement at all. Note that the sentences beginning ‘Effectively…’, ‘Now, if this…’ and ‘The first choice…’ all say exactly the same thing, which is, to use inappropriately formal terminology, exactly what the writer is trying to prove. The padding of ‘mind trickery’ and ‘humble opinions’ is as close as a maths argument can ever really come to an ad hominem.

So I was trying to come to a clean conclusion about exactly why the Monty Hall problem does such a good job at generating confusion. I think the following aspects of the set-up are important:

1) In general, people often get confused about the meaning of ‘random’. In some sense, the sequence 416235 is more random than the sequence 123456. However, typically, we use ‘random’ to mean ‘not determined in advance’. The key is that the bit where MH reveals a goat is not a random action. Whatever you do initially, you know that this will happen. So the fact that it does actually happen should not add extra information.

2) This fact is compounded by the time-steps at which decisions have to be made. It feels like the arc of the narrative is directing everything towards this final choice, which feels very separate from the initial choice. You can imagine the dramatic music that undoubtedly plays as MH opens the appropriate door, and the camera pans back to the contestant making their decision. All of which is an excellent way to suggest that something complicated has happened, when in fact in terms of what we are interested in overall, nothing has happened.

A point I hadn’t thought about before was that it is important that MH knows what happens in advance. Let’s develop this point and consider what happens if MH in fact has no knowledge of where the car is, and chooses a door at random. We will still assume that he must choose a different door to the one that the contestant originally chose.

Then, 1/3 of the time the contestant picks a goat, and MH picks the car, so he gets a goat whether he stays or switches. 1/3 of the time, the contestant picks a goat, and MH picks the other goat, so he should switch. And 1/3 of the time, the contestant originally picks the car, and then by default MH picks a goat, so he should stay. So, overall, it is equally preferable to stay as to switch.

I guess the morals of this story are: “be careful with conditional probabilities; sometimes apparently new information doesn’t change anything; and that it’s probably not a good idea to get into a discussion about the Monty Hall problem with a skeptic unless you have a piece of paper to hand on which to draw a probability table.”

PS. A final moral is that, based on the links WordPress suggested, at least 10% of the internet is devoted to explanations of the Monty Hall problem…

How to take a Passport Photograph – Part 1

As seems to be case whenever you start somewhere new, I’ve needed an almost infinite supply of passport-sized photographs recently. The university, my college, my department and of course the Chinese immigration authorities all wanted a record of my beautiful features. Anyway, as a result of all of this interest, I was in the do-it-yourself photo booth WHSmiths in Wandsworth getting some more the other day. The first attempt looked fine, but the machine offered me the possibility of trying again, up to twice if I wanted. This seemed like a win-win situation, so I said yes, not realising that the one I already had would not be kept ‘in the bag’. The second attempt looked somewhat startled, a pose that runs in my family, but not wanting to risk the possibility of a disastrous third attempt (and the financial penalty of having to do the whole operation again) I confirmed that I was happy and made do with the result. Naturally, the question that struck me: what is the optimal strategy for such a situation? (Assuming that, unlike me, you knew the rules from the beginning)

Mathematical model and choices

Let’s formulate this mathematically. Suppose there are n possible trials, corresponding to iid random variables X_1,\ldots,X_n. (Note that this assumes that your ‘performance’ does not improve or otherwise change during the process. Perhaps not a reasonable assumption in some contexts?) After trials X_1,\ldots,X_k have been observed, you have to choose whether to accept the value X_k as your ‘final answer’, or whether to continue.

The first key decision is: what distribution should the X_is have? Since in the original problem there isn’t a natural metric for quality, let’s assume that the X_is represent some well-defined quantitative utility, distributed as a uniform [0,1] random variable. Perhaps a normal random variable might be a more realistic model, but I can solve it in this case, so let’s stick to this for now. In addition, for the sake of making the eventual answer more simple, let’s say that 0 is the best quality and 1 is the worst. That is, we are looking for a strategy that stops the process at a time T so as to minimise \mathbb{E}X_T.

Finding an optimal strategy

The key observation is the following. In words, if we reject X_1, we can forget about its value as that is independent of \{X_2,\ldots,X_n\} which is now all that remains to base future judgments on. We return to the original problem with one fewer trial. In more mathematical notation, conditional on \{T\neq 1\}, T is independent of X_1.

The following argument assumes that an optimal strategy exists, which is not ideal, but can easily be justified. For now though, we proceed relatively informally by induction on n.

Let S be the stopping time for the optimal strategy on X_2,\ldots,X_n which we assume exists by induction. It is ‘obvious’ that the optimal strategy T for X_1,\ldots,X_n should be the following:

  • T=1 iff X_1<a(n), where this is a deterministic quality with dependence only on n.
  • Conditional on T\neq 1, take T=S.

From this alone, we can calculate \mathbb{E}X_T.

\mathbb{E}X_T=\mathbb{E}X_S1_{T=S}+\mathbb{E}X_T1_{T\neq S}

= (1-\mathbb{P}(T=1))\mathbb{E}X_S+\mathbb{E}X_1 1_{X_1<a(n)}

= (1-a(n))\mathbb{E}X_S +\frac12 a(n)^2.

This is minimised precisely when a(n)=\mathbb{E}X_s. We conclude that the optimal strategy, as of course we might well expect, is to take T=1 precisely if X_1 is less than the expected result of applying the optimal strategy to the remaining random variables.

By extension, we have a(n+1)=\mathbb{E}X_T, and so

a(n+1)=a(n)\left[1-\frac{a(n)}{2}\right]. (*)

The first few values are:

a(1)=1,\quad a(2)=\frac12,\quad a(3)=\frac38,\quad a(4)=\frac{39}{128},\quad\ldots

Behaviour of a(n)

The first question is: as n grows large, does a(n)\rightarrow 0? Well this isn’t too hard: the recursive definition (*) confirms that the sequence $a(1),a(2),\ldots$ is (strictly) decreasing, and so has a limit, which must be a fixed point of the equation (*). The only such fixed point is 0.

The second question is: what is the asymptotic behaviour of a(n) for large n? A quick run on MATLAB, or examination of the equation (*) suggests that

a(n)\approx \frac{2}{n}

should describe the behaviour well for large n. My basic attempts to verify this were initially unsuccessful, but I felt fairly sure that this should be true in some metric sense because of the following highly non-rigorous but nonetheless convincing idea.

Claim: a_n=\frac{2}{n}+O(n^{-3}) satisfies (*).

Why? Well, then:

\frac{2}{n+1}-\left[(\frac{2}{n}+O(n^{-3}))-\frac12(\frac{2}{n}+O(n^{-3}))^2\right]

=\frac{2}{n+1}-\frac{2}{n}+\frac{2}{n^2}+O(n^{-3})

=\frac{2}{n^2(n+1)}+O(n^{-3})=O(n^{-3}).

This proves the claim, but none of the = signs are really especially meaningful here. Perhaps there is a really slick way to tie this up that I’ve missed? In any case, I will save my own slightly involved method for a new post.

Infinite Games

The standard example of a two-person game is the Prisoners’ Dilemma. There are various ways of setting up the story for the model. Essentially there are two prisoners being questioned simultaneously but separately by detectives following some crime which they committed together. They have to make a choice about whether to confess the details or not. If they both keep quiet, they will likely receive short sentences because of a lack of evidence. If they both confess, then obviously they each receive a full sentence. If only one prisoner confesses and provides evidence against the other, then the other receives an even harsher sentence (because he proclaimed innocence) while the confessor gets let off with only a caution as a reward. Generally we call keeping quiet ‘cooperating’ and informing on the other prisoner ‘defection’. Some potential numerical outcomes of the various pairs of strategies can be summarised as:

CC – (2,2), CD – (0,3)

DC – (3,0), DD – (1,1)

From now on, we forget all about the underlying story and just think about this as a standard two-person game, as it provides plenty of interest by itself.

In words, if both agents cooperate, they get a better result than if they both defect. However, both cooperating is not a Nash equilibrium, because each agent would do better if they unilaterally chose to defect. Indeed, the strategy of ‘cooperate’ is strongly dominated by the strategy of ‘defect’. That is, if you cooperate you will do worse than if you defect, irrespective of how the other agent acts. A key notion of elementary game theory is that we should (iteratively if necessary) reject weakly dominated strategies, as suggested by common sense. Why would you choose to do something if there is an alternative which is always at least as good and sometimes strictly better? So the only rational outcome for the Prisoners’ Dilemma is for both agents to defect.

This is an example where the Nash equilibrium is not Pareto optimal. A pair of pure strategies is Pareto optimal unless it is possible to find a different pair where both agents do better. For example, both cooperating is strictly better for each agent than both defecting in this case. The question we might want to ask is: will repeating the game encourage mutual cooperation?

This idea is tempting. Perhaps cooperating in early rounds will encourage the opponent to cooperate in future rounds, and thus a scheme of mutual cooperation will develop. Note that each agent’s aim is only to maximise his own return, rather than his return relative to the other agent. He doesn’t mind if his opponent does better so long as he does well.

Suppose that the game is played twice. The key observation is that we should consider the second game first. Irrespective of what has happened in the first game, it is still a Nash equilibrium for both agents to defect. Recall that defection strongly dominated cooperation, so what has happened before has no interest to either agent. Whatever happened in the past, and whatever the other agent does next, it is better for him to defect. Now we’ve established that they will both always defect in the second round, it is clear that there behaviour in the first round can have no consequences apart from the immediate one, and so again they should both defect in the first round too. This procedure is called backwards induction, and applies equally well to situations where the outcome of the first game might affect the range of options or outcomes available in the second game.

In particular, it is clear that however many times we play the Prisoners’ Dilemma, so long as the number of repeats is finite, the optimal strategy will always be to defect every time. But what if we repeat an infinite number of times?

The first thing to be sorted out is how to define the total reward for a strategy. If we just add everything up, all strategies give infinite reward which is not very useful for comparison. One option might be to take the limit of the mean reward ie

\lim_{n\rightarrow\infty}\frac{\sum_{m=1}^n R_m}{n}.

The problem with this is that its value is only dependent on eventual behaviour, in the technical sense of tail events and so forth. More simply, the outcome of the first game has no effect on the value of this expression. Nor the second, nor the third and so on. It feels mathematically impossible, and indeed pointless, to choose a strategy so as to optimise a function which is in fact independent of the choice of strategy.

Instead we introduce a discount factor \delta<1, and instead seek to maximise

\sum_{m=1}^\infty \delta^m R_m.

This feels unsatisfactory at some level, as we might want all of the games to be of equal numerical significance. However, we could partially resolve our fears by taking a \delta\uparrow 1 limit at the end of the analysis. More directly, in any genuine application, a situation cannot be repeated indefinitely, and so it makes sense to allow the process to terminate after each go with probability 1-\delta, with \delta chosen suitably small. Now, since the payoffs for each individual round are bounded, the total payoff is also bounded.

As with other time-homogeneous infinite horizon problems, the great advantage of this formulation is that the same game remains after one round has taken place as at the beginning. This motivates considering stationary strategies, where the rule for choosing an action is essentially the same for every round. That is, it is dependent on previous outcomes, but in a reasonably straightforward way. This definition is quite informal, but it seems that little is to be gained from making it too much tighter right now.

Some notable stationary strategies are: C, cooperate every time; D, defect every time; G, for grim, cooperate until the other agent defects, after which you defect forever; T, for tit-for-tat, initially cooperate, then do whatever the other agent did in the previous round. There is a vast body of literature, both in formal papers and in the so-called blogosphere (the reasons for this might well be an interesting subject for a separate post), concerning the relative merits of these strategies. Tit-for-tat in particular attracts much attention as a dominant strategy in a vast range of contexts. Among the features that can easily be incorporated to the model is a population evolution, where more successful strategies are more likely to propagate into subsequent generations, and noise, where with some small probability, on each turn an agent does the opposite of what they planned to do. In particular, computer simulation is easily applicable to all of these strategies, and there have even been competitions where teams submit strategies to ‘compete’ over 100,000 iterations of the Prisoners’ dilemma.

In general, we want to find Nash equilibria for the infinite game. In particular, as usual we want to reduce the number of candidates by restricting attention to subgame perfect Nash equilibria. (There are several criticisms to be made of this approach, but this is certainly a matter to be discussed elsewhere.) We say that a pair of strategies satisfy the one-stage deviation condition if neither player can improve their payoff by deviating from their strategy unilaterally for one round, then returning to that strategy. Perhaps the most interesting theorem of the book is the following: subgame perfect Nash equilibria and strategies satisfying the one-stage deviation condition are equivalent. The proof is only substantially more than definition chasing.

This enables us to show, for example, that the adjusted grim strategy where each player starts by always cooperating, until either player defects, at which point both defect forever, is a subgame perfect Nash equilibrium. (The apparently inconsequential change in definition from that of a grim strategy is required for subgame perfection.) We conclude that there is a natural framework in which mutual cooperation can exist as a stable solution to the infinite Prisoners’ Dilemma, as we had initially said we would like.

Philosophy of Utility Maximisation

Over the weekend, I skimmed through the Springer Undergraduate Mathematics Series book ‘Game Theory‘ by James Webb. It’s a book that has been sitting on my bookcase for a while, and a topic which sits at only one step removed from what I am particularly interested in. But somehow, possibly because of its almost complete absence from the Cambridge maths course, I had never got round to reading the book or otherwise finding out much about the field. This has now changed, so I am writing three short remarks on some aspects I thought were of particular interest.

——–

Consider the following. I offer you the following bet. I toss a fair coin: if it comes up heads, you have to pay me £1, but if it comes up tails, I will give you £3. Would you take the bet? Obviously you are the only person who can actually answer that, but I’d expect the only obstacle to you taking the bet might be indifference. Why? Well your expected profit from this bet is £1.50, which is positive, and so it certainly seems that you are getting a better deal.

Now let’s up the stakes. If the coin comes up heads, you have to pay me £1,000,000, but if it comes up tails, you get £3m. Do you still take the bet? Well unless you are in a very fortunate financial state, you shouldn’t be taking this bet because you are not in a position to honour your commitment in the event (with probably 50%) that you lose.

But what about an intermediate bet? Suppose we change the stakes to £100 and £300. Or to £1,000 and £3,000. Would you take the bet? Without getting too bogged down in what the answer is, and what personal circumstances might lead to each possibility, suffice it to say this: each of the adjusted bets still offers a substantial positive expected profit, but the real risk of losing a large sum of money makes one less inclined to take up the offer.

Traditionally, we can describe this apparent logical problem like this. How much we value sums of money is not always proportional to the sum itself. Although some mathematicians might disagree, we don’t value numbers intrinsically, instead we value the effects that sums of money can have on our lives. More concretely, winning the lottery is awesome, and winning the lottery twice is even more awesome. However, it probably isn’t twice as good as winning it once, since many of the best things about winning the lottery, like retiring and paying off your mortgage or your student loan, can only be done once.

Mathematically, this is best represented by a utility function. This is a map from the set of outcomes of the bet or experiment to the reals, specifying the relative value the agent assigns to different outcomes (monetary or otherwise). We need the utility function to satisfy a few obvious rules. If it is a function of the profit or return, we expect it to be strictly increasing. That is, everyone places a higher utility on a larger sum of money. If we are considering utility as a function of (possibly non-numerical) outcomes, eg elements in a general probability space, then we need it to be transitive. That is, it describes a partial ordering on the set. We would also expect that all outcomes can be compared, so it is in fact a total ordering. The agent’s aim is then instead to maximise their expected utility \mathbb{E}U(x).

There are a couple of questions that are natural to ask:

1) Can all factors influencing a decision really be expressed numerically?

2) Is maximising some utility function actually the right thing to do?

We ask question 1) because we might think that some factors that influence our decisions such as fear, superstition or some code of morality cannot be quantified explicitly. We ask question 2) because at first it doesn’t seem as if maximising expected utility is any different to maximising expected monetary value.

This needs some clarification. The best way to do this is through the notion of risk aversion. This is an alternative way to think about the original problem with the £1,000 vs £3,000 bet. The risk or uncertainty of a random variable corresponding to a financial return is described by the variance. In general, we assume that most investors are risk-averse, that is, they prefer a guaranteed return of £10 rather than a non-trivial random variable with expectation equal to £10. So instead of maximising expectation, we maximise

\mathbb{E}X-\alpha \mathrm{Var}(X),

where \alpha is some positive constant. Of course, if an agent for some reason prefers risky processes, then choose \alpha<0.

So the question we are asking is: might this be happening for utility maximisation as well? More precisely, perhaps we need to consider risk-aversion even for utilities?

The way to resolve the problem is to reverse how we are looking at the situation. Tempting though it is to define the utility as a relative value assigned to outcomes, this forces us into a position where we have to come up with some method for assigning these in specific cases, most of which suggest problems of some kind. Although in practice the previous definition is fine and easy to explain, what we actually want is the following. When we make a choice, we must be maximising something. Define the utility to be the function such that maximising the expectation of this function corresponds to the agent’s decision process.

This is much more satisfactory, provided such a function is actually well-defined. Von Neumann and Morgenstern showed that it is, provided the agent’s preferences have a little bit more structure than discussed above. The further conditions required are continuity and independence, and are completely natural, but I don’t want to define them now because I haven’t done much of the necessary notation here.

In many ways, this is much more satisfying. We don’t always want precise enumerations of all contributory factors, but it is reassuring to know that subject to some entirely reasonable conditions, there is some structure under which we are acting optimally whenever we make a decision.

Braess’s Paradox and Traffic Networks

For a traffic network, we have routes and links as before, only here we specify the delay on a link as a function of the flow along that link D_j(y_j), where y_j=\sum_r A_{jr}x_r as before. Then a natural notion of an equilibrium is a distribution of flows such that no customer stands to gain by changing route. Formally, take S the set of source-destination pairs, H_{sr} the incidence matrix of pairs s and routes r, and S(r) the routes between the same source-destination pair as r.

Then define a Wardrop equilibrium to be a flow vector (x_r,r\in R) such that for any x_r>0,

\sum_j D_j(y_j)A_{jr}=\min_{r'\in S(r)}\sum_j D_j(y_j)A_{jr}

that is, the delay on this route is less than or equal to the delay on any route between that pair of vertices.

A natural assumption would be that D_j(y_j) is increasing in y_j for each j, and if we further assume that each D_j is continuously differentiable, then we can prove that a Wardrop Equilibrium exists.

Why? Consider the optimisation problem

\min\quad\sum_j \int_0^{y_j}D_j(u)du,\quad x\geq 0,Hx=f,Ax=y

where f is the flow requirement. The feasible region is convex and the objective function is convex differentiable so the optimum exists. The Lagrangian is

L(x,y,\lambda,\mu)=\sum_j \int_0^{y_j}D_j(u)du+\lambda\cdot (f-Hx)-\mu\cdot(y-Ax)

and at the minimum

0=D_j(y_j)-\mu_j,\quad 0=-\lambda_{S(r)}+\sum_j \mu_jA_{jr}

So \mu_j the delay on j, and then

\lambda_{S(r)}\left\{\begin{array}{l l}=\sum_j\mu_jA_{jr}& \quad x_r>0\\ \leq \sum_j\mu_jA_{jr} & \quad x_r=0\\ \end{array} \right.

So can interpret \lambda_{S(r)} as the minimum delay for this route, and so a solution to this problem is a Wardrop Equilibrium. If D_js are strictly increasing, then have uniqueness in (y_j), from the optimisation problem, but not in (x_r). For example, you could have two essentially identical routes from A to B, and from B to C, and it then doesn’t matter how you concatenate them to form a flow from A to C.

Note the counterintuitive Braess’s Paradox, where adding a route increasing the size of the delay at the Wardrop Equilibrium. The canonical example can be seen here, where the addition of the road across causes the delay to increase from 83 to 92. And indeed such effects have been observed in ‘real life’ as well.

Note that one explanation for why this seems counter-intuitive is that the objective function isn’t necessarily the most natural one. For a Wardrop equilibrium, the agents are acting entirely selfishly given present information. From the point of view of society, it might make more sense to minimise the average delay \min \sum_jy_jD_j(y_j). In this case the Lagrangian gives

0=D_j(y_j)+y_jD_j'(y_j)-\mu_j,\quad 0=-\lambda_{S(r)}+\sum_j\mu_jA_{jr}

So if add a congestion charge of y_jD_j'(y_j) to the delay function, the Wardrop Equilibrium becomes equal to the societal optimal. Note that this is NOT the societal optimal for (delay + charge). However, we have seen that a toll will enable selfish behaviour to produce a societal optimum. This is an example of mechanism design, which can be considered a branch of game theory, where the aim is to design systems so that agents work in a societally optimal manner.