# «Contents. 1 Nash Equilibrium in Extensive Form Games 2 1.1 Selten’s Game......... 1.2 The Little Horsey....... 1.3 Giving Gifts.. ...»

This now means that if you use the multiple-deviation strategy up to that point and follow the original OSDP strategy from that point on, you would get at least as good a payoﬀ (again, because the last deviation could not have been the proﬁtable one, so the original OSDP strategy will do at least as good in that subgame). You then go up one step to the new “last” deviation and argue that this deviation cannot be proﬁtable either: since we are comparing a subgame with this deviation and the original OSDP strategy to follow with the OSDP strategy itself, the fact that the original strategy satisﬁes OSDP implies that this particular deviation cannot be proﬁtable. Hence, we can replace this deviation with the action from the OSDP strategy too and obtain at least as good a payoﬀ as the multi-deviation strategy. You repeat this process until you reach the ﬁrst stage with a deviation and you reach the contradiction because this deviation cannot be proﬁtable by itself either. In other words, if a strategy satisﬁes OSDP, it must be subgame perfect.

An example here may be helpful. Since in equilibrium we hold all other players strategies constant when we check for proﬁtable deviations, the diagram in Fig. 22 (p. 36) omits the strategies for the other players and shows only player 1’s moves at his information sets.

Label the information sets consecutively with small Roman numerals for ease of exposition.

Suppose that the strategy (adegi) satisﬁes OSDP. We want to show that there will be no more proﬁtable other strategies even if they involve multiple deviations from this one. To make the illustration even more helpful, I have bolded the actions speciﬁed by the OSDP strategy.

i

Because (adegi) satisﬁes OSDP, we can infer certain things about the ordering of the payoﬀs. For example, OSDP implies that changing from g to h at (iv) cannot be proﬁtable, which implies u ≥ v. Also, at (v), changing from i to j cannot be proﬁtable, so y ≥ z. At (ii), changing to c cannot be proﬁtable; since the strategy speciﬁes playing g at (iv), this deviation leads to w ≥ u. At (iii), changing to f cannot be proﬁtable. Since the original strategy speciﬁes i at (v), this deviation will lead to x ≥ y. Finally, at (i) changing to b cannot be proﬁtable. Since the original strategy speciﬁes e at (iii), this deviation will lead to w ≥ x. The implications of

**OSDP are listed as follows:**

These inequalities now imply some further relationships: from the ﬁrst and the third, we get w ≥ y, and putting this together with the last yields w ≥ z as well. Furthermore, from the third and last we obtain x ≥ z, and from the second and fourth we obtain w ≥ v. Putting

**everything together yields the following orderings of the payoﬀs:**

w≥x≥y ≥z w ≥ u ≥ v.

and We can now check whether there exist any proﬁtable multi-stage deviations. (Obviously, there will be no single-stage proﬁtable deviations because the strategy satisﬁes OSDP.) Take, for example, an alternative strategy that deviates at (ii) and (iv); that is in the subgame starting at (ii), it speciﬁes ch. This will lead to the outcome v, which cannot improve on w, the outcome from following the original strategy. Consider another alternative strategy which deviates twice in the subgame starting at (iii); i.e., it prescribes f j, which would lead to the outcome z. This cannot improve on x, the outcome the player would get from following the original strategy. Going up to (i), consider a strategy that deviates at (i) and (v). That is, it prescribes b and j at these information sets. Since (v) is still never reached, this actually boils down to a one-shot deviation with the outcome x, which (not surprisingly) cannot improve on w, which is what the player can get from following the original strategy. What if he deviated at (i) and (iii) instead? This would lead to y, which is also no better than w. What if he deviated at (i), (iii), and (v)? This would lead to z, which is also no better than w. Since all other deviations that start at (i) leave (ii) and (iv) oﬀ the path of play, there is no need to consider them. This example then shows how OSDP implies subgame-perfection. Intuitively, if a strategy satisﬁes OSDP, then it implies a certain preference ordering, which in turn ensures that no multi-stage deviations will be proﬁtable.

To see how the proof would work here. Take the longest deviation, e.g., a strategy that deviates at (i), (iii), and (v). Since it leaves (ii) and (iv) oﬀ the path, let’s consider (bdf gj) as such a supposedly better alternative. Observe now that because (adegi) satisﬁes OSDP, the deviation to j at (v) cannot be improving. This means that the strategy (bdf gi) is at least as good as (bdf gj). Hence, if (bdf gj) is better than the original, then (bdf gi) must also be better. Consider now (bdf gi): since it matches the original at (v), OSDP implies that the deviation to f cannot be improving. Hence, the strategy (bdegi) is at least as good as (bdf gi), which implies it is also at least as good as (bdf gj). Hence, if (bdf gj) is better than the original, then (bdegi) must also be better. However, (bdegi) matches the

Let’s now check if player 1’s strategy is indeed subgame perfect. Recall that this requires that it is optimal for all subgames. This is easy to see for the subgame that begins with player 1’s second information set that follows history (b, d). How about player 1’s choice at the ﬁrst information set? If we were to examine all possible deviations, we must check the alternative strategies (ae), (af ), and (be) because these are the other strategies available for that subgame. The one-shot deviation principle allows us to check just one thing: whether player 1 can beneﬁt by deviating from b to a at his ﬁrst information set! (This is why the OSDP is your friend.) In this case, deviating to a would get player 1 a payoﬀ of 1 instead of 3, which he would get if he stuck to his equilibrium strategy. Therefore, this deviation is not proﬁtable. We already saw that deviating to e in the subgame that begins with player 1’s second information set is not proﬁtable either. Therefore, by the OSDP, the strategy is subgame perfect.

This proof does not work for games with inﬁnite horizon because the essential step in it requires that there is a ﬁnite number of possible deviations. However, in a game with inﬁnite horizon, there are strategies with an inﬁnite number of deviations. Fortunately, if we assume that the payoﬀ function takes the form of a discounted sum of per-period payoﬀs, we can get the result “back.” Again fortunately, all inﬁnite-horizon games that we shall look at will have payoﬀ functions that meet this requirement. The OSDP may not be that helpful in ﬁnding SPE. However, it’s extremely useful for checking whether some proﬁle is SPE.

**4.5 The Rubinstein Bargaining Model**

There are at least two basic ways one can approach the bargaining problem. (The bargaining problem refers to how people would divide some ﬁnite beneﬁt among themselves.) Nash initiated the axiomatic approach with his Nash Bargaining Solution (he did not call it that, of course). This involves postulating some desirable characteristics that the distribution must meet and then determining whether there is a solution that meets these requirements. This is very prominent in economics but we shall not deal with it here.

Instead, we shall look at strategic bargaining. Unlike the axiomatic solution, this approach involves specifying the bargaining protocol (i.e. who gets to make oﬀers, who gets to respond to oﬀers, and when) and then solving the resulting extensive form game.

People began analyzing simple two-stage games (e.g. ultimatum game where one player makes an oﬀer and the other gets to accept or reject it) to gain insight into the dynamics of bargaining. Slowly they moved to more complicated settings where one player makes all the oﬀers while the other accepts or rejects, with no limit to the number of oﬀers that can be made. The most attractive protocol is the alternating-oﬀers protocol where players take turns making oﬀers and responding to the other player’s last oﬀer.

The alternating-oﬀers game was made famous by Ariel Rubinstein in 1982 when he published a paper showing that while this game has inﬁnitely many Nash equilibria (with any division supportable in equilibrium), it had a unique subgame perfect equilibrium! Now this is a great result and since it is the foundation of most contemporary literature on strategic bargaining, we shall explore it in some detail.15

**4.5.1 The Basic Alternating-Oﬀers Model**

Two players, A and B, bargain over a partition of a pie of size π 0 according to the following procedure. At time t = 0 player A makes an oﬀer to player B about how to partition the pie.

If player B accepts the oﬀer, then an agreement is made and they divide the pie accordingly, ending the game. If player B rejects the oﬀer, then she makes a counteroﬀer at time t = 1. If the counteroﬀer is accepted by player A, the players divide the pie accordingly and the game ends. If player A rejects the oﬀer, then he makes a counter-counteroﬀer at time t = 2. This process of alternating oﬀers and counteroﬀers continues until some player accepts an oﬀer.

To make the above a little more precise, we describe the model formally. Two players, A and B, make oﬀers at discrete points in time indexed by t = (0, 1, 2,...). At time t when t is even (i.e. t = 0, 2, 4,...) player A oﬀers x ∈ [0, π ] where x is the share of the pie A would keep and π − x is the share B would keep in case of an agreement. If B accepts the oﬀer, the division of the pie is (x, π − x). If player B rejects the oﬀer, then at time t + 1 she makes a counteroﬀer y ∈ [0, π ]. If player A accepts the oﬀer, the division (π − y, y) obtains.

Generally, we shall specify a proposal as an ordered pair, with the ﬁrst number representing player A’s share. Since this share uniquely determines player B’s share (and vice versa) each proposal can be uniquely characterized by the share the proposer oﬀers to keep for himself.

The payoﬀs are as follows. While players disagree, neither receives anything (which means that if they perpetually disagree then each player’s payoﬀ is zero). If some player agrees on a partition (x, π −x) at some time t, player A’s payoﬀ is δt x and player B’s payoﬀ is δt (π −x).

The players discount the future with a common discount factor δ ∈ (0, 1). This captures the time preferences of the players: it is better to obtain an agreement sooner than later.

Here’s how it works. Suppose agreement is reached on a partition (2, π − 2) at some time t. Player A’s payoﬀ is 2δt. Since 0 δ 1, as t increases, δt decreases. Let δ =.9 (so a dollar tomorrow is only worth 90 cents today). If this agreement is reached at t = 0, player A’s payoﬀ (2)(.9)0 = 2. If the agreement is reached at t = 1, player A’s payoﬀ is (2)(.9)1 = 1.8. If it happens at t = 2, player A’s payoﬀ is (2)(.9)2 = 1.62. At t = 10, the payoﬀ is (2)(.9)10 ≈.697, at t = 100, it is only (2)(.9)100 =.000053, and so on and so forth.

The point is that the further in the future a player gets some share, the less attractive this same share is compared to getting it sooner.

The Rubinstein bargaining model is extremely attractive because it can be easily modiﬁed, adapted, and extended to various settings. There is signiﬁcant cottage industry that does just that. The Muthoo (1999) book gives an excellent overview of the most important developments. The discussion that follows is taken almost verbatim from Muthoo’s book. If you are serious about studying bargaining, you should deﬁnitely get this book.

Yours truly also tends to use variations of the Rubinstein model in his own work on intrawar negotiations.

This completes the formal description of the game. You can draw the extensive form tree for several periods, but since the game is not ﬁnite (there’s an inﬁnite number of possible oﬀers at each information set and the longest terminal history is inﬁnite—the one where players always reject oﬀers), we cannot draw the entire tree.

**4.5.2 Nash Equilibria**

Let’s ﬁnd the Nash equilibria in pure strategies for this game. Actually, we cannot ﬁnd all Nash equilibria because there’s an inﬁnite number of those. What we can do, however, is characterize the payoﬀs that players can get in equilibrium. It turns out that any division of the pie can be supported in some Nash equilibrium. To see this, consider the strategies where player A demands x ∈ (0, π ) in the ﬁrst period, then π in each subsequent period where he gets to make an oﬀer, and always rejects all oﬀers. This is a valid strategy for the bargaining game. Given this strategy, player B does strictly better by accepting x instead of rejecting forever, so she accepts the initial oﬀer and rejects all subsequent oﬀers. Given that B accepts the oﬀer, player A’s strategy is optimal.

The problem, of course, is that Nash equilibrium requires strategies to be mutually best responses only along the equilibrium path. It is just not reasonable to suppose that player A can credibly commit to rejecting all oﬀers regardless of what player B does. To see this, suppose at some time t 0, player B oﬀers y π to player A. According to the Nash equilibrium strategy, player A would reject this (and all subsequent oﬀers) which yields a payoﬀ of 0. But player A can do strictly better by accepting π − y 0! The Nash equilibrium is not subgame perfect because player A cannot credibly threaten to reject all oﬀers.

**4.5.3 The Unique Subgame Perfect Equilibrium**

Since this is an inﬁnite horizon game, we cannot use backward induction to solve it. However, since every subgame that begins with an oﬀer by some player is structurally identical with all subgames that begin with an oﬀer by that player, we shall look for an equilibrium with two intuitive properties: (1) no delay: whenever a player has to make an oﬀer, the equilibrium oﬀer is immediately accepted by the other player; and (2) stationarity: in equilibrium, a player always makes the same oﬀer.