«Contents. 1 Nash Equilibrium in Extensive Form Games 2 1.1 Selten’s Game......... 1.2 The Little Horsey....... 1.3 Giving Gifts.. ...»
Requirement 1 (Beliefs). At each information set, the player who gets to move must have a belief about which node in the information set has been reached by the play of the game.
Note that this requirement does not tell us anything about how these beliefs are formed (but you can already probably guess that they involve Bayes rule). Before studying this, let’s see how simply having these beliefs can help solving some of the problems we encountered.
For this, we need to introduce a second requirement.
2.2 Sequential Rationality
A strategy is sequentially rational for player i at the information set h if player i would actually want to chose the action prescribed by the strategy if h is reached. Because we require players to have beliefs for each information set, we can “split” each information set and evaluate optimal behavior at all of its nodes, including ones in information sets that are not reached along the equilibrium path. For now, we are not interested how these beliefs are formed, just that players have them. For example, regardless of what player 1 does, player 2 will have some updated belief q at her information set.2 A continuation game refers to the information set and all nodes that follow from that information set. Using their beliefs, players can calculate the expected payoﬀs from continuation games. In general, the optimal action at an information set may depend on which node in that set the play has reached. To calculate the expected payoﬀ of an action at information set h, we have to consider the continuation game.
More formally, take any two nodes x y (that is, y follows x in the game tree), and consider the mixed strategy proﬁle σ. Let P (y|σ, x) denote the probability of reaching y starting from x and following the moves prescribed by σ. That is, P (y|σ, x) is the conditional probability that the path of play would go through y after x if all players chose according to σ and the game started at x. It is the multiplicative product of all chance probabilities and move probabilities given by σ for the branches connecting x and y. Naturally, if y does not follow x, then P (y|σ, x) = 0. Player i’s expected utility in the continuation
game starting at node x then is:
Ui (σ |x) = P (z|σ, x)ui (z), z∈Z where Z is the set of terminal nodes in the game, and ui (·) is the Bernoulli payoﬀ function specifying player i’s utility from the outcome z ∈ Z. Note that this payoﬀ only depends on the components of σ that are applied to nodes that follow x. Anything prior to x is ignored.
These expected utilities are therefore conditional on node x having been reached by the path of play.
For example, consider Selten’s game from Fig. 2 (p. 3) and suppose σ1 = ( 1/3, 2/3), and σ2 = (0.5, 0.5). The continuation game from player 1’s information set includes the entire game tree. What is player 1’s expected payoﬀ? Let z1 denote the outcome (D, L), z2 denote
the outcome (D, R), and z3 denote the outcome (U ). The expected payoﬀ then is:
How does player 2 evaluate the optimality of her behavior using her beliefs? She calculates expected utilities as usual and then chooses the strategy that yields the highest expected payoﬀ in the continuation game. That is, if the information set h is a singleton, the behavior strategy proﬁle σ is sequentially rational for i at h if, and only if, player i’s strategy σi maximized player i’s expected payoﬀ at h if all other players behaved according to σ.
In Selten’s example, player 2 has only one sequentially rational strategy at her information set D, which is trivial to verify. The expected payoﬀ in the continuation game following this node is maximized if she chooses L with certainty. For player 1, the calculation involves
player 2’s strategy. Player 1 would choose D if:
U1 (D, σ2 ) ≥ U1 (U, σ2 ) σ2 (L)(3) + σ2 (R)(0) ≥ 2 σ2 (L) ≥ 2/3.
Thus, strategy D is sequentially rational for any σ2 (L) 2/3; strategy U is sequentially rational for any σ2 (L) 2/3; and both D and U are sequentially rational if σ2 (L) = 2/3.
If information sets are singletons, sequential rationality is straightforward to deﬁne in the way we just did. What do we do for information sets that contain more than one node?
This is where beliefs come in: we calculate the expected payoﬀ from choosing a particular action conditional on these beliefs. That is, σi is sequentially rational for player i at the (nonsingleton) information set h if, and only if, σi maximizes player i’s expected payoﬀ when node x ∈ h occurs given the belief probabilities that player i assigns to the nodes x given that h has occurred, and assuming that the play continues according to σ.
The intuition is as follows. When a player ﬁnds himself at some information set, he must be able to assess the consequences of choosing some action. This is straightforward when the information set is a singleton: the player will know precisely what will happen given his choice. But what if the information set is not a singleton? In that case, the same action could have diﬀerent consequences depending on which node in the information set the player is actually at. This is where beliefs come in. Since a belief is just a probability distribution over the nodes in this information set, the player can compute the expected payoﬀ from choosing a particular action. To do this, he takes the payoﬀ from taking that action at some node in the information set and multiplies it by the probability that he is at that node; he does that for all nodes in the information set and adds the results together. In other words, the usual expected payoﬀ calculation!
For example, consider the game in Fig. 3 (p. 4), and let p denote player 2’s belief that she is at the left node in her information set conditional on this set having been reached. Her expected payoﬀ from choosing L is then p × (1) + (1 − p) × (2) = 2 − p; analogously, her expected payoﬀ from choosing R is p × (0) + (1 − p) × (1) = 1 − p. She would choose L whenever 2 − p 1 − p; that is, always (this inequality holds regardless of the value of p). Hence, the unique sequentially rational strategy for player 2 at her information set is to choose L with certainty.
A similar thing happens with the gift game from Fig. 5 (p. 5). Player 2 can calculate the expected payoﬀ from choosing Y, which is q(1)+(1−q)(0) = q, and the expected payoﬀ from choosing N, which is q(0) + (1 − q)(−1) = q − 1. Since q q − 1 for all values of q, it is never optimal for player 2 to choose N regardless of the beliefs player 2 might hold. Therefore, the strategy N is not sequentially rational because there is no belief that player 2 can have that will make it optimal at her information set. In other words, the unique sequentially rational strategy is to choose Y with certainty.
We now require that in equilibrium players should only choose sequentially rational strategies (actually, we can prove that this must be the case given beliefs, but we need to learn how to derive these beliefs ﬁrst).
Requirement 2 (Sequential Rationality). Given their beliefs, the players’ strategies must be sequentially rational. That is, at each information set the actions by the players must form a Nash equilibrium in the continuation game.
This is now suﬃcient to rule out the implausible Nash equilibria in the three examples we have seen. Consider Selten’s game. As we saw above, player 2’s unique sequentially rational strategy is L (and it forms a Nash equilibrium in the trivial decision game at her information ∗ set). Therefore, σ2 (L) = 1 in equilibrium. Since choosing D is sequentially rational for player 1 for any σ2 (L) 2/3, this implies that D is the unique equilibrium sequentially rational rational strategy for player 1. We conclude that the only Nash equilibrium that involves sequentially rational strategies is D, L. In other words, we ruled out all implausible equilibria.
Let’s apply this to the game in Fig. 3 (p. 4). As we have seen, the expected utility from playing L then is 2 − p and the expected utility from playing R is 1 − p. Since 2 − p 1 − p for all values of p, Requirement 2 prevents player 2 from choosing R. Simply requiring player 2 to have beliefs and act optimally given these beliefs eliminates the implausible equilibrium D, R.
Finally, as we have seen in the game from Fig. 5 (p. 5), the only sequentially rational strategy for player 2 is Y. Hence, Requirement 2 eliminates the implausible equilibrium NN, N there as well.
Let’s now look at a more interesting example in Fig. 7 (p. 10). This is the same game as the one in Fig. 5 (p. 5) but with a slight variation in the payoﬀs. This actually looks a lot more like a gift-giving game. In this situation, player 2 strictly prefers not to accept the Star Trek book, and is indiﬀerent between rejecting gifts regardless of what they are. Accepting the preferred gift is the best outcome. Player 1, in turn, still wants his gift accepted but he also prefers to please player 2. Even then, he’d rather have her accept even the gift she does not like than suﬀer the humiliation of rejection.
Player 2’s expected payoﬀ from Y is q(1) + (1 − q)(−1) = 2q − 1 and her expected payoﬀ from N is q(0) = (1 − q)(0) = 0. Therefore, player 2 will accept whenever 2q − 1 0, or q 1/2, reject whenever q 1/2, and indiﬀerent between accepting and rejecting whenever q = 1/2. Her sequentially rational response now depends on her beliefs but we still don’t know where these beliefs come from. How do we determine what q is going to be?
Requirements 1 and 2 ensure that players have beliefs and behave optimally given these beliefs. However, they do not require that these beliefs be reasonable. We now investigate how players can form these beliefs in a reasonable way.
2.3 Consistent Beliefs
In equilibrium, player 2’s updated belief must be consistent with the probability distribution induced by any chance events and player 1’s strategy. That is, we require that player 2 update her prior using information consistent with what she thinks player 1’s strategy is. For example, in the gift-giving game, if player 2 thinks player 1 is playing the strategy NG, then if player 2 ever observes a gift oﬀer, she must conclude that the book is the Star Trek manual, and therefore q = 0.
Recall that for a given equilibrium, an information set is on the equilibrium path if it is reached with positive probability if the game is played according to the equilibrium strategies, and is oﬀ the equilibrium path if it is certain that it cannot be reached.
Consider the NN, N equilibrium in the game from Fig. 5 (p. 5). If the game is played according to these strategies, then player 2’s information set is never reached, and so it is oﬀ the equilibrium path. If, however, player 1 plays G with positive probability at least at one of his information sets (e.g. he plays a mixed strategy when the book is the game theory reference, in which he oﬀers the gift with probability 1/2), then the information set is on the equilibrium path because there is some positive probability that it will actually be reached if the game is played according to these strategies.
Along the equilibrium path, beliefs can be easily updated via Bayes rule from the strategies.
That is, given player 2’s prior belief and player 1’s strategy, player 2 can observe player 1’s action and form a posterior belief; i.e. a belief conditional on the observable action of player
1. The belief gives an intuitive answer to the question: Given that the information set is reached, what is the probability of being at a particular decision node?
which we got from Bayes rule.
Similarly, suppose there is a mixed strategy equilibrium in the game in Fig. 3 (p. 4), where player 1 chooses U with probability q, D with probability r, and D with probability 1 − q − r.
In this case, p will be p = q/(q + r ). More generally, we have the following requirement.
Requirement 3 (Weak Consistency). Beliefs must be weakly consistent with strategies.
That is, they are obtained from strategies and observed actions via Bayes rule whenever possible.
With the ideas of consistent beliefs and sequentially rational strategies, we have the following important result.
Theorem 2. Suppose σ is an equilibrium in behavior strategies in an extensive form game with perfect recall.
Let h ∈ H be an information set that occurs with positive probability under σ, and let π be a vector of beliefs that is weakly consistent with σ. Then σ is sequentially rational for player i at information set h with beliefs π.
That is, at information sets that are visited with positive probability in an equilibrium, the equilibrium strategies are always sequentially rational. This means that if all information sets are reached with positive probability in some Nash equilibrium, then the equilibrium strategies must be sequentially rational. This implies that in these cases, nothing is lost by the strategic form: The actions speciﬁed by the Nash equilibrium strategy proﬁle for the strategic form representation would still be sequentially rational to carry out in the extensive form as well. This should not be too surprising: the examples we gave had problems because the incredible actions occurred at information sets oﬀ the equilibrium path.
To illustrate the result of the theorem, recall Myerson’s card game whose solution expressed in behavior strategies is:
(( 2/3[F ], 1/3[R]), r ), 2/ [m], 1/ [p].
All information sets are reached with positive probability in this strategy proﬁle, and so the theorem tells us that these strategies must be sequentially rational. Let’s verify that this is the case and check that each player would actually want to choose the actions speciﬁed by the strategies if the moves were not planned in advance.
Consider information set c for player 1 (chance has chosen the winning red card). If he raises, he would get at least 1 (if player 2 passes) and will get at most 2 (if player 2 meets).