«Contents. 1 Nash Equilibrium in Extensive Form Games 2 1.1 Selten’s Game......... 1.2 The Little Horsey....... 1.3 Giving Gifts.. ...»
Perfect Equilibria in Extensive Form Games
Branislav L. Slantchev
Department of Political Science, University of California – San Diego
May 15, 2008
1 Nash Equilibrium in Extensive Form Games 2
1.1 Selten’s Game............
1.2 The Little Horsey..........
1.3 Giving Gifts.............
2 Perfect Bayesian Equilibrium 6
2.1 Conditional Beliefs....................................... 6
2.2 Sequential Rationality..................................... 7
2.3 Consistent Beliefs....................................... 10
2.4 Beliefs After Zero-Probability Events............................ 12
2.5 Perfect Bayesian Equilibrium................................ 14 3 Backward Induction 17
3.1 Rollback Equilibrium..................................... 18
3.2 Subgame-Perfect Equilibrium................................ 20 4 Practice with Subgame Perfection 25
4.1 Burning a Bridge........................................ 25
4.2 The Dollar Auction...................................... 26
4.3 Multi-Stage Games with Observed Actions.....................
We already know how to solve strategic form games and now we also know how to convert extensive form to strategic form as well. The solution concept we now deﬁne ignores the sequential nature of the extensive form and treats strategies as choices to be made by players before all play begins (i.e. just like in strategic games).
Definition 1. A Nash equilibrium of a ﬁnite extensive-form game Γ is a Nash equilibrium of the reduced normal form game G derived from Γ.
Recall that the mixed strategy Nash equilibrium of this game is:
[Rr ], [F r ], [m], [p].
If we want to express this in terms of behavior strategies, we would need to specify the probability distributions for the information sets. Player 1 has two information sets, b following the black card, and c following the red card. The probability distributions are ( 2/3[F ], 1/3[R]) at information set b, and (0[f ], 1[r ]) at information set c. In other words, if player 1 sees the black (losing) card, he folds with probability 2/3. If he sees the red (winning) card, he always raises. Player 2’s behavior strategy is speciﬁed above (she has only one information set).
Because in games of perfect recall mixed and behavior strategies are equivalent (Kuhn’s Theorem), we can conclude that a Nash equilibrium in behavior strategies must always exist in these games. This follows directly from Nash’s Theorem. Hence, we have the following
Theorem 1. For any extensive-form game Γ with perfect recall, a Nash equilibrium in behavior strategies exists.
Generally, the ﬁrst step to solving an extensive-form game is to ﬁnd all of its Nash equilibria. The theorem tells us at least one such equilibrium will exist. We furthermore know that if we ﬁnd the Nash equilibria of the reduced normal form representation, we would ﬁnd all equilibria for the extensive form. Hence, the usual procedure is to convert the extensive-form game to strategic form, and ﬁnd its equilibria.
1.1 Selten’s Game
However, some of these equilibria would have important drawbacks because they ignore the dynamic nature of the extensive-form. This should not be surprising: after all, we obtained the strategic form representation by removing the element of timing of moves completely.
Reinchard Selten was the ﬁrst to argue that some Nash equilibria are “more reasonable” than others in his 1965 article. He used the example in Fig. 2 (p. 3) to motivate the discussion, and so will we.
The strategic form representation has two pure-strategy Nash equilibria, D, L and U, R.1 Look closely at the Nash equilibrium (U, R) and what it implies for the extensive form. In the proﬁle (U, R), player 2’s information set is never reached, and she loses nothing by playing R there. But there is something “wrong” with this equilibrium: if player 2’s information set is ever reached, then she would be strictly better oﬀ by choosing L instead of R. In eﬀect, player 2 is threatening player 1 with an action that would not be in her own interest to carry out. Now player 2 does this in order to induce player 1 to choose U at the initial node thereby yielding her the highest payoﬀ of 2. But this threat is not credible because given the chance, player 2 will always play L, and therefore this is how player 1 would expect her to play if he chooses D. Consequently, player 1 would choose D and player 2 would choose L, which of course is the other Nash equilibrium D, L.
The Nash equilibrium U, R is not plausible because it relies on an incredible threat (that is, it relies on an action which would not be in the interest of the player to carry out). In fact, none of the MSNE will be plausible for that very reason either. According to our motivation for studying extensive form games, we are interested in sequencing of moves presumably because players get to reassess their plans of actions in light of past moves by other players What about mixed strategies? Suppose player 1 randomizes, in which case player 2’s best response is L.
But if this is the case, player 1 would be unwilling to randomize and would choose D instead. So it cannot be the case that player 1 mixes in equilibrium. What if player 2 mixes? Let q denote the probability of choosing L. Player 1’s expected payoﬀ from U is then 2q + 2(1 − q) = 2, and his expected payoﬀ from D is 3q. He would choose U if 2 ≥ 3q, or 2/3 ≥ q, otherwise he would choose D. Player 2 cannot mix with 1 q 2/3 in equilibrium because she has a unique best response to D. Therefore, she must be mixing with 0 ≤ q ≤ 2/3. For any such q, player 1 would play U. So, there is a continuum of mixed-strategy Nash equilibria, where player 1 chooses U, and player 2 mixes with probability q ≤ 2/3. These have the same problem as U, R.
(and themselves). That is, nonterminal histories represent points at which such reassessment may occur. The only acceptable solution should be the PSNE D, L.
The following deﬁnition is very important for the discussion that follows. It helps distinguish between actions that would be taken if the equilibrium strategies are implemented and those that should not.
Definition 2. Given any behavior strategy proﬁle σ, and information set is said to be on the path of play if, and only if, the information set is reached with positive probability according to σ. If σ is an equilibrium strategy proﬁle, then we refer to the equilibrium path of play.
To anticipate a bit of what follows, the problem with the U, R solution is that it speciﬁes the incredible action at an information set that is oﬀ the equilibrium path of play. Player 2’s information set is never reached if player 1 chooses U. Consequently, Nash equilibrium cannot pin down the optimality of the action at that information set. The problem will not extend to strategy proﬁles which visit all information sets with positive probability. The reason for this is that if the Nash equilibrium proﬁle reaches all information sets with positive probability, then it will also reach all outcomes with positive probability. But if it does so, the fact that no player can proﬁt by deviating from his Nash strategy implies that there would exist no information set where he would want to deviate. In other words, his actions at all information sets are credible. If, on the other hand, the Nash strategies leave some information sets oﬀ the path of play, then the Nash requirement has no bite: whatever the player does at these information sets is “irrelevant” as it cannot aﬀect his payoﬀs. It is under these circumstances that he may be picking an action that he would not never choose if the information set is actually reached. Notice that unlike U, R, the other PSNE D, L does reach all information sets with positive probability. In this case, Nash’s requirement is suﬃcient to establish optimality of the strategies everywhere. As we shall see, our solutions will always be Nash equilibria. It’s just that not all Nash equilibria will be reasonable.
We now look at two examples that demonstrate that this problem occurs not only in games of certainty, complete and perfect information, but also in games of certainty with imperfect information, and games of uncertainty with imperfect information.
1.2 The Little Horsey
What are the Nash equilibria? Let’s convert this to normal form, the result is in Fig. 4 (p. 5).
By inspection, we see that there are two Nash equilibria in pure strategies, U, L and D, R.
However, there is something unsatisfying about the second one. In the D, R equilibrium, player 2 seems to behave irrationally. If her information set is ever reached, playing L strictly dominates R. So player 1 should not be induced to player D by the incredible threat to play L.
However since player 2’s information set is oﬀ the equilibrium path, Nash equilibrium does not evaluate the optimality of play there.
1.3 Giving Gifts There are two players and player 1 receives a book which, with probability p is a small game theory pocket reference, and with probability 1 − p is a Star Trek data manual. The player sees the book, wraps it up, and decides whether to oﬀer it to player 2 as a gift. Player 2 hates Star Trek and is currently suﬀering in a graduate game theory course, so she would prefer to get the game theory references but not the Star Trek manual. Unfortunately, she cannot know what is being oﬀered until she accepts it.
Consider the extensive-form game in Fig. 5 (p. 5). Player 1, who knows what gift he has for player 2, oﬀers the wrapped gift to player 2. If the gift is accepted, then player 1 derives a positive payoﬀ because everyone likes when their gifts are accepted. Player 1 hates the humiliation of having a gift rejected, so the payoﬀ is −1.
Player 2 strictly prefers accepting the game theory book to not accepting it; she is indiﬀerent between not accepting this book and accepting the Star Trek manual, but hates rejecting the Star Trek manual more than the game theory book because while dissing game theory is cool, dissing Star Trek is embarrassing.
Let’s construct the strategic form of this game. Player 2 has only two strategies: accept or reject. Player 1 has two information sets, and therefore four pure strategies that specify what to do with each book. Each strategy has two components: aG aS, with aG specifying what to do if the book is the game theory reference, and aS specifying what to do if it is the Star Trek manual. Fig. 6 (p. 6) shows the strategic form.
GG, Y is a Nash equilibrium for any value of p because p − 1 0 p 1. Player 1 oﬀers the gift regardless of its type, and player 2 accepts always. In addition, NN, N is a Nash equilibrium. Player 1 never oﬀers any gifts, and player 2 refuses any gifts if oﬀered.
However, the problem with proﬁle NN, N is that it prescribes an action for player 2 that is clearly irrational: if the game ever reaches player 2’s information set, then accepting a gift strictly dominates not accepting a gift regardless of what the gift is.
Because the strategic form ignores timing, Nash equilibrium only ensures optimality at the start of the game. That is, equilibrium strategies are optimal if the other players follow their equilibrium strategies. But we cannot see whether the strategies continue to be optimal once the game begins. We now turn to solving extensive form games directly in behavior strategies. (Recall that we shall refer to them as mixed strategies.) 2 Perfect Bayesian Equilibrium
2.1 Conditional Beliefs In the example game in Fig. 3 (p. 4), player 2 does not observe player 1’s choice: if she gets to move, all she knows is that player 1 has either chosen U or M. In the game in Fig. 5 (p. 5), player 2 only observes a gift oﬀer but does not know what the gift is. We shall require that player 2 form beliefs about the probability of being at any particular node in her information set. Obviously, if the information set consists of a single node, then, if that information set is reached, the probability of being at that node is 1. If there are more than one nodes in the information set, the belief will be a probability distribution over the nodes.
Let’s look at the game in Fig. 5 (p. 5). Let q denote player 2’s belief that she is at the left node in her information set (given that the set is reached, this is the probability of player 1 having oﬀered the game theory book), and 1 − q be the probability that she is at the right node (given that the set is reached, this is the probability of player 1 having oﬀered the Star Trek book). That is, p is player 2’s initial belief (or the prior) of the book being a game theory reference; and q is player 2’s conditional belief (updated belief, or posterior) about the book being a game theory reference given that player 1 has oﬀered it as a gift.
For example, suppose that player 2 believes player 1 is playing the strategy NG. If player 2 observes a gift oﬀer, then she concludes that the probability of the gift being the Star Trek book is 1 because only player 1 would ever oﬀer this book as a gift. More generally, player 2 will update her belief conditional on arriving at the information set. Of course, if player 2 thinks player 1 plays the strategy GG, then the updated belief will equal the prior: Player 2 does not learn anything about the gift from the player 1’s behavior. Putting all this together
yields the ﬁrst requirement: