The Prisoner’s
Dilemma
Suppose, in your desperation to get rich as quickly as possible, you consider the various alternatives, infer their likely consequences and decide that the best alternative is to rob the local bank. You recruit your best friend, Keith, well known for his meticulous attention to detail, to help you plan and carry out the crime. Thanks to your joint efforts, you succeed in breaking into the bank in the middle of the night, opening the safe, and making your get-away with a cool million pounds (approximately 1.65 million dollars at the time of writing) in the boot (trunk) of your car.
Unfortunately, years of poverty and neglect have left your car in a state of general disrepair, and you are stopped by the police for driving at night with only one headlight. In the course of a routine investigation, they discover the suitcase with the cool million pounds in the boot. You plead ignorance of any wrong doing, but they arrest you both anyway on the suspicion of robbery.
Without witnesses and without a confession, the police can convict you and your friend only of the lesser offence of possessing stolen property, which carries a penalty of one year in jail. However, if one of you turns witness against the other, and the other does not, then the first will be released free of charge, and the second will take all of the blame and be sentenced to six years in jail. If both of you turn witness, then the two of you will share the blame and each be sentenced to three years in jail.
This is an example of the classical Prisoner’s Dilemma. In Game Theory, the problem of deciding between alternative actions is often represented as a table, in which the rows and columns represent the actions of the players and the entries represent the resulting outcomes. In this case, the table looks like this:
|
|
You turn witness. |
You do not turn witness. |
|
Keith turns witness. |
You get 3 years in jail. Keith gets 3 years in jail. |
You get 6 years in jail. Keith gets 0 years in jail. |
|
Keith does not turn witness. |
You get 0 years in jail. Keith gets 6 years in jail. |
You get 1 year in jail. Keith gets 1 year in jail. |
If the two prisoners are able to consult with one another, then they will soon realise that the best option for both of them is not to turn witness against the other. To prevent this, the police separate them before they have a chance to consult. Thus each prisoner has to decide what to do without knowing what the other prisoner will do.
The Logic of the Prisoner’s Dilemma
The Prisoner’s Dilemma has a natural representation in terms of the prisoners’ goals and beliefs:
Goal: If I am arrested, then I turn
witness or I do not turn witness.
Beliefs: I am arrested.
A
prisoner gets 0 years in jail
if the prisoner turns witness
and the other prisoner does not turn witness.
A
prisoner gets 6 years in jail
if the prisoner does not turn witness
and the other prisoner turns witness.
A
prisoner gets 3 years in jail
if the prisoner turns state witness
and the other prisoner turns witness.
A prisoner gets 1 year in jail
if the prisoner does not turn witness
and the other prisoner does not turn witness.
This assumes, of course, that the prisoners believe what they are told by the police. It also assumes that both prisoners know that the same deal has been offered to the other prisoner. However, the analysis at the end of this chapter can easily be modified to deal with other cases.
The Logic of Games
In general, any two-person game represented as a table can also be represented as goals and beliefs. For example, the table:
|
|
First player does action A. |
First player does action B. |
|
Second player does action C. |
First player gets outcome AC. Second player gets outcome CA. |
First player gets outcome BC. Second player gets outcome CB. |
|
Second player does action D. |
First player gets outcome AD. Second player gets outcome DA. |
First player gets outcome BD. Second player gets outcome DB. |
can be represented by goals and beliefs, which in the
case of the first player are:
Goal: First player does action A or First player does action B.
Beliefs: First player gets outcome AC
if
First
player does action A
and Second player does action C.
First player gets outcome BC
if
First
player does action B
and Second player does action C.
First player gets outcome AD
if
First
player does action A
and Second player does action D.
First player gets outcome BD
if
First
player does action B
and Second player does action D.
Depending on the circumstances, a player may or may not know the outcomes for the other player.
Should you carry an umbrella?
Before discussing how to solve the prisoner’s dilemma, it is useful to compare it with the seemingly unrelated problem of deciding whether or not to take an umbrella when you leave home in the morning.
We can represent the umbrella problem as a game against nature:
|
|
I take an umbrella. |
I do not take an umbrella. |
|
It will rain. |
I stay dry. I carry the umbrella. |
I get wet. |
|
It will not rain. |
I stay dry. I carry the umbrella. |
I stay dry. |
We can represent the agent’s side of the game in terms of the agent’s goals and beliefs[1]:
Goal: If I go outside, then I take an umbrella or I do not take an
umbrella.
Beliefs: I go outside.
I carry an umbrella
if I take the umbrella.
I stay dry
if I take the umbrella.
I stay dry
if it will not rain.
I get wet
if I do not take an umbrella
and it will rain.
You can control whether or not you take an umbrella, but you can not control whether or not it will rain. At best, you can only try to estimate the probability of rain.
This should sound familiar. In chapter 5, when considering whether or not to rob a bank, I wrote:
“You can control whether or not you try to rob a bank. But you can’t control whether you
will be caught or be convicted. Not only are these possibilities beyond your control, but
you can not even predict their occurrence with any certainty. At best, you can only try to estimate their probability.”
It’s the same old story. To decide between different actions, you should infer their consequences, judge the utility and probability of those consequences, and choose the action with highest overall expected utility.
Suppose you judge that the benefit of staying dry, if it rains, is significantly greater than the cost in inconvenience of taking an umbrella, whether or not it rains.[2] Then you should decide to take the umbrella, if you estimate that the probability of rain is relatively high. But, you should decide not to take the umbrella, if you estimate that the probability of rain is relatively low.
Applying Decision
Theory to Taking an Umbrella
This kind of
“thinking”[3],
which combines judgements of utility with estimates of probability, is
formalised in the field of Decision Theory. According to the norms of Decision
Theory, you should weight the utility of each alternative outcome of an action
by its probability, and then sum all of the alternative weighted utilities to
measure the overall expected utility of the action. You should then choose the
action with highest expected utility[4].
In the case of deciding whether or not to take an umbrella, suppose you judge that
The benefit of staying dry is D.
The cost of carrying an umbrella is C.
The cost of getting wet is W.
The probability that it will rain is P,
and therefore that it will not rain is (1 – P).
Then the expected utility of taking the umbrella is
the benefit of staying dry
minus the cost of carrying the umbrella
= D – C.
The expected utility of not taking the umbrella is
the benefit of staying dry if it doesn’t rain
minus the cost of getting wet if it does rain
= (1 – P) ·D – P·W.
So, for example, if the benefit of staying dry is worth 1 candy bar, the cost of carrying an umbrella is worth 2 candy bars, and the cost of getting wet is worth 9 candy bars, then
D = 1
C = 2
W = 9.
The expected utility of taking the umbrella = – 1.
The expected utility of not taking the umbrella = (1 – 10P).
Therefore, if the probability of rain is greater than .2, then you should take an umbrella; and if it is less than .2, then you shouldn’t take an umbrella. If it is exactly .2, then it makes no difference, measured in candy bars, whether you take an umbrella or not.
The use of Decision Theory is normative, in the sense that its estimations and computations are an ideal, which we only approximate in reality. In Real Life, we tend to compile routine decisions into simpler rules, represented by means of goals and beliefs. For example:
Goals: If I go outside
and it looks likely to rain,
then I take an umbrella.
If
I go outside
and it looks unlikely to rain,
then I do not take an umbrella.
Beliefs: It looks likely to rain
if there are dark clouds in the sky.
It
looks likely to rain if it is forecast to rain.
It
looks unlikely to rain if there are no clouds in the sky.
It
looks unlikely to rain if it is forecast not to rain.
Solving the Prisoner’s Dilemma
Just as in the case of deciding whether to take an umbrella when you go outside, you can control your own actions in the Prisoner’s Dilemma, but you can not control the world around you. In this case you can not control the actions of the other prisoner. However, you can try to predict them as best as possible.
Suppose you take a Decision Theoretic approach and judge:
The utility of your getting N years in jail is –N.
The probability that Keith turns witness is P,
and therefore that Keith does not turn witness is (1 – P).
Then the expected utility of your turning witness
is 3 if Keith turns witness,
and 0 if he does not
= –3·P + 0·(1 – P)
= –3·P.
The expected utility of your not turning witness
is –6 if Keith turns witness,
and –1 if he does not
= –6·P – 1·(1 – P)
= –1 – 5·P.
But –3·P > –1 – 5·P, for all values of P. Therefore, no matter what the probability P that Keith turns witness, you are always better off by turning witness yourself.
Unfortunately, if Keith has the same beliefs, goals and utilities as you, then he will similarly decide to turn witness against you, and in which case both of you will get a certain 3 years in jail. You would have been better off if you forgot about Decision Theory, took a chance, and both of you refused to turn witness against the other, in which you would have both gotten only 1 year in jail.
But there is a different moral that you could draw from the story – that the fault lies, not with Decision Theory, but with your own selfish judgement of utility.
Suppose, instead of carrying only about yourself, you care about both yourself and Keith equally, and you judge:
The utility of your getting N years in jail and of
Keith getting M years in jail is – (N + M).
Then the expected utility of your turning witness
is –6 if Keith turns witness, and
is –6 if he does not
= –6·P – 6·(1 – P)
= –6.
The expected utility of your not turning witness
is –6 if Keith turns witness, and
is –2 if he does not
= –6·P – 2·(1 – P)
= –2 – 4·P.
But –6 ≥ –2 – 4·P, for all values of P. Therefore, no matter what the probability P that Keith turns witness, there is never any advantage in you turning witness yourself.
Now, if Keith has the same beliefs, goals and utilities as you, then he will similarly decide not to turn witness against you, in which case both of you will get a certain one year in jail.
But carrying equally about both yourself and Keith is probably unrealistic. To be more realistic, suppose instead that you care about Keith only half as much as you do about yourself:
The utility of your getting N years in jail and of
Keith getting M years in jail is – (N + 1/2·M).
Then the expected utility of your turning witness
is –4.5 if Keith turns witness, and
is –3 if he does not
= –4.5·P – 3·(1 – P)
= –3 –1.5·P.
The expected utility of your not turning witness
is –6 if Keith turns witness, and
is –1.5 if he does not
= –6·P – 1.5·(1 – P)
= –1.5 – 4.5·P.
But = –3 –1.5·P = –1.5 – 4.5·P when P = .5. Therefore, if you judge the probability of Keith turning witness is less than .5, then you should not turn witness. But if you judge that the probability is greater than .5, then you should turn witness – tit for tat.
Just as in the case of deciding whether to take an umbrella when you go outside, these calculations are a normative ideal, which we tend only to approximate in practice. In Real Life, we tend to compile our decisions into rules of behaviour, represented by goals and beliefs. For example:
Goals: If I am offered a deal
and the deal benefits me
and the deal harms another person more than it benefits me
and the person is my friend
then I reject the deal.
If
I am offered a deal
and the deal benefits me
and the deal harms another person
and the person is not my friend
then I accept the deal.
These rules are not very subtle,
but it should be clear that they can be refined, both to deal with other cases
and to distinguish more subtly other characteristics of the deal under
consideration.
Conclusions
There are three conclusions. The first concerns the Prisoner’s Dilemma itself – that it pays to co-operate with other agents, and not to try only to optimise our own narrow self-interests. This conclusion is, of course, well known in the literature about the Prisoner’s Dilemma. What may be less well known is the extent to which the benefits of co-operation can often be obtained simply by incorporating concern for the well-being of others in the utility function.
The second conclusion is much more general – that to decide between different courses of action we need, not only to judge the costs and benefits of our actions, but also to estimate the probability of circumstances outside control. We have seen this before, but it needs to be emphasised again, not only because it is so important, but also because it has been largely ignored in traditional logic. The approach taken in the chapter shows one way in which logic and probability can usefully be combined.
The third conclusion is more subtle. It is that the computations of Decision Theory are a normative ideal, which we often approximate in Real Life by using simpler rules represented by goals and beliefs. This relationship between “higher-level” Decision Theory and “lower-level” decision rules is like the relationship between higher-level logical representations and lower-level input-output associations.
As we have
seen, we can compile logical representations of goals and beliefs into
input-output associations, and sometimes decompile associations into logical
representations. Moreover, it seems that in human thinking, the two levels of thought can operate in tandem. The
input-output associations efficiently propose candidate outputs in response to
inputs, while reasoning about goals and beliefs monitors the quality of those
responses.
There seems to be a similar relationship between Decision Theory and
decision rules. Rules can be executed efficiently, but Decision Theory gives
better quality results. As in the case of higher and lower level
representations, Decision Theory can be used to monitor the application of
rules and propose modifications of the rules when they need to be changed,
either because they are faulty or because the environment itself has changed.
In his book, Thinking
and Deciding, Baron discusses similar relationships between normative,
prescriptive and descriptive approaches to decision making in detail.
[1] Notice that the representation in terms of beliefs is more
informative than the game representation, because it indicates more precisely
than the game representation the conditions upon which the outcome of an action
depends. For example, the representation in terms of beliefs indicates that
staying dry depends only on taking an umbrella and not on whether or not it
rains:
[2] In general, assuming we can quantify benefits and costs in the same units, then utility = benefits – costs.
[3] According to Baron’s “Thinking and Deciding”, this is not “thinking” at all, but “deciding” between different options. It is an interesting question to what extent “deciding” might also involve “thinking” at a different, perhaps meta-level. More about this later.
[4] In mathematical terms, if an action has n alternative outcomes with utilities u1, u2, ..., un with respective probabilities
p1, p2, ..., pn then the expected utility of the action
is p1·u1 + p2·u2
+ ... + pn·un.