The prisoner’s dilemma is a famous problem in game theory. The situation is as follows: you and an accomplice have been arrested on suspicion of a serious crime. The prosecutors have sufficient evidence to convict both of you on a lesser charge but offer both of you a bargain in the hopes of a conviction on the serious charge. If you betray your accomplice and testify that they committed the crime then you will get off with a lesser sentence. You must make your decisions in isolation without communicating, but you are aware that your accomplice has been offered the same bargain. If only you take the bargain then you will serve no time in prison while your accomplice serves 3 years. If both of you stay silent then you will both serve 1 year on the lesser charge, and if you both testify against each other then you will both serve 2 years. What do you do?

The context and exact numbers in this formulation are unimportant – the key features are that mutual co-operation is better than mutual betrayal, while the best and worst outcomes come on either side of a unilateral betrayal. You could reformulate the problem in many contexts – for example, two rival companies might have to decide between spending either a high or low amount on advertising, given that they will only get a greater market share if they spend more than their competitor. While the collective “best” option might be to both spend a small amount and share the demand equally, each company might be tempted to spend a higher amount in the hopes of dominating the market and pocketing a greater profit. If both do so, then they will each have a similar number of customers as if they had both spent the smaller amount, but will be out the extra sum of advertising money.

All formulations of the dilemma have one thing in common: no matter what your fellow “player” chooses, you are always better off betraying them than co-operating – if they choose co-operation, you walk away with the best possible outcome if you opt to betray, and if they choose to betray then you can avoid the worst by betraying them in return. Unfortunately, the other player can follow the same line of reasoning themselves, so that if both of you act “rationally” you will both choose to betray the other despite the fact that mutual co-operation is better for both of you. The story doesn’t end here, however – the situation becomes much more interesting when you might have the opportunity to play again against the same person. Does having the chance to punish your competitor for breaking co-operation change the situation? This blog post will show that in the case that the game is repeated indefinitely, the answer is **yes**.

### Nash Equilibrium

It turns out that when the prisoner’s dilemma is repeated indefinitely, there is no longer a clear strategy that dominates any other. To compare different strategies it will be useful to consider the game theoretic concept of a **Nash equilibrium**.

In a 2-player game, player 1 can select any action A^1 in an action set \mathcal{A}^1. Similarly, player 2 can choose an action A^2 from set \mathcal{A}^2. In the prisoner’s dilemma both players have the same action set, \mathcal{A} = \{co-operate, betray\}. Once all players have made their decision, players receive rewards R^1(A^1, A^2) and R^2(A^1, A^2) depending on the actions chosen. The key objective in solving such games is to identify **Nash equilibrium** policies, defined as pairs of strategies where neither player can guarantee a better expected outcome by unilaterally switching to a different strategy.

Nash equilibria always exist for matrix games (there may also be more than one), but unless the game is **zero-sum** (R^1 = -R^2), players may have different payoffs in different Nash equilibria. Note that Nash equilibrium strategies are often **mixed**, meaning that they can define a distribution over possible actions rather than identifying one action as optimal. For example, consider the two-player zero-sum game with the following payoffs for player 1:

Player 1 chooses A1 | Player 1 chooses A2 | |

Player 2 chooses A2 | 1 | -1 |

Player 2 chooses A2 | -1 | 1 |

Since the game is zero-sum, the payoffs for player 2 are the above multiplied by -1.

Here, if player 1 chooses action 1 (or 2), then if player 2 can guess their strategy they would be able to guarantee a payoff of -1. However, if player 1 randomly chooses either option with probability 1/2 then their expected payoff is 0 regardless of player 2’s strategy. By using this strategy, player 1 can guard against any potential extra knowledge or scheming on player 2’s part. In fact, if both players choose this strategy then we have a Nash equilibrium.

In the prisoner’s dilemma, your personal aim is to minimise the time you spend in prison, and you suppose that the same is true of your accomplice. We can specify the “payoffs” of the prisoner’s dilemma by the following table:

You choose co-operate | You choose betray | |

Opponent chooses co-operate | -1 | 0 |

Opponent chooses betray | -3 | -2 |

Here, the optimal strategies (betray, betray) form the only Nash equilibrium point in the game.

### The Iterated Prisoner’s Dilemma

If we repeat the prisoner’s dilemma game a fixed number of times, then mutual betrayal remains the only rational choice and hence the only equilibrium strategy. We can see this by considering what happens in the last time period and working backwards – the last time you play the game, the situation is exactly the same as when you play only once, since there is no future strategy to consider and your opponent will never get to retaliate against a betrayal. Given this, in the penultimate time you play, there is also no incentive to co-operate since you know that a rational opponent will betray you next even if you co-operate now. By backwards induction it remains rational to betray in every round of the game.

The key issue here is that there is a fixed termination time, a point beyond which there are no consequences to consider beyond the immediate payoffs awarded by the game. This disappears if the game will be played infinitely many times, or if neither player knows which round will be the last. As long as both consider there is a sufficiently large chance of playing again, the potential future rewards will matter more than the outcome of any single game. It remains true that mutual betrayal is the only **stationary** Nash equilibrium strategy (a stationary strategy is one that doesn’t depend on previous outcomes). However, if both players remember past events then there is incentive to co-operate, and it turns out that there are many possible Nash equilibrium strategies.

Take the so-called **grim trigger** strategy, for example – in this strategy, you choose to co-operate in the first game and continue to do so until your opponent chooses to betray you, after which you never co-operate again. If both players choose this strategy, then we have a Nash equilibrium: clearly, if you know that your opponent will choose this strategy, then you will not benefit from choosing to betray them unprompted as you will collect a good reward once and then be stuck in mutual betrayal forevermore. Your best bet is any strategy (including grim trigger) which co-operates in response to a co-operative opponent, as then you will consistently get the better mutual co-operation reward.

Another Nash equilibrium strategy that aims for mutual co-operation is the **tit-for-tat** strategy – here, you co-operate in the first game and then always choose the action your opponent played last. This would probably serve better than grim trigger in practice against an opponent who doesn’t know exactly which strategy you are playing, since it offers the possibility of reconciliation without being too forgiving. It is a little harder to see that this is a Nash equilibrium strategy – notice that if your opponent chooses betrayal unprompted then they can only leave the resulting cycle of mutual betrayal by co-operating once, despite knowing they will be betrayed. As long as the reward for betraying a co-operative opponent and then co-operating knowing you will be betrayed is less than that for mutually co-operating both times instead (as is the case in most formulations of the problem), then there is nothing to be gained by betraying a tit-for-tat player.

So, what is the “optimal” way to play in the indefinitely iterated prisoner’s dilemma? The answer to this is not actually known, and may well depend on what you know about your opponent. Clearly the two strategies suggested in the previous section are good options, and have the potential to get you a much better payoff than the betrayal strategy, despite the fact that this is also a potential Nash equilibrium strategy. If you aren’t sure of your opponent’s strategy, then tit-for-tat might be the better option since you’d rather end up in mutual co-operation than betrayal even if you are betrayed at some point early on. Indeed, this strategy generally does well in iterated prisoner’s dilemma competitions. If there is a chance of miscommunication then you might choose to play tit-for-tat but with a chance of co-operation even if you heard that your opponent has just betrayed you, so as to have the possibility of avoiding becoming stuck in mutual betrayal or alternating betrayal and co-operation.

### Further Reading

- PRISONER’S DILEMMA HAS UNEXPECTED APPLICATIONS – James Gleick
- Nice Strategies Finish First: A Review of “The Evolution of Cooperation” – Nicholas R. Miller