Scholarly article on 1-Phenyl-2-thiourea 103-85-5 from Chemische Berichte p. 494

Econometrica, Vol. 68, No. 3 ŽMay, 2000., 605᎐641

REINFORCEMENT-BASED VS. BELIEF-BASED

LEARNING MODELS IN EXPERIMENTAL

ASYMMETRIC-INFORMATION GAMES

BY NICK FELTOVICH¹

This paper examines the abilities of learning models to describe subject behavior in

experiments. A new experiment involving multistage asymmetric-information games is

conducted, and the experimental data are compared with the predictions of Nash

equilibrium and two types of learning model: a reinforcement-based model similar to that

used by Roth and Erev Ž1995., and belief-based models similar to the ‘‘cautious ﬁctitious

play’’ of Fudenberg and Levine Ž1995, 1998.. These models make predictions that are

qualitatively similarᎏcycling around the Nash equilibrium that is much more apparent

than movement toward it. While subject behavior is not adequately described by Nash

equilibrium, it is consistent with the qualitative predictions of the learning models. We

examine several criteria for quantitatively comparing the predictions of alternative mod-

els. According to almost all of these criteria, both types of learning model outperform

Nash equilibrium. According to some criteria, the reinforcement-based model performs

better than any version of the belief-based model; according to others, there exist versions

of the belief-based model that outperform the reinforcement-based model. The abilities

of these models are further tested with respect to the results of other published

experiments. The relative performance of the two learning models depends on the

experiment, and varies according to which criterion of success is used. Again, both models

perform better than equilibrium in most cases.

KEYWORDS: Equilibrium, asymmetric information, zero-sum game, learning, calibra-

tion, model comparison.

1

. INTRODUCTION

LEARNING MODELS ARE BECOMING more and more widely used by experimental

economists as alternatives to the equilibrium predictions of game theory. A

number of researchers have presented results demonstrating that, in many cases,

learning models are better able to describe and predict experimental results

than static Nash equilibrium. The models that have been proposed vary widely

in the way they suppose learning occurs, and they often differ to some extent in

their predictions. Many fall into one of two broad classes: belief-based models

and reinforcement-based models. Belief-based models have in common the

assumption that players hold beliefs over the likely play of others, and they

choose their own actions based in some way on their expected payoff given these

beliefs. Reinforcement-based models, on the other hand, do not explicitly

1

Part of this research was done while the author was supported by a University of Pittsburgh

Andrew Mellon predoctoral fellowship. I thank David Cooper, John Duffy, Ido Erev, John Kagel,

John Miller, Jack Ochs, Florenz Plassman, Bob Slonim, Nathaniel Wilcox, three anonymous

referees, and especially an editor, Alvin Roth, and Shmuel Zamir for helpful discussions and

comments. Any remaining errors are actually part of the equilibrium strategy of a more complex

game.

6

05

6

06

NICK FELTOVICH

require players to form beliefs about other players’ likely actions Žin fact, they

may not even require players to realize that there are other players.. Rather,

their strategies receive reinforcement related to the payoffs they earn, and over

time players adjust their play so that strategies leading to higher payoffs become

more likely.

The objective of this paper is to examine the abilities of reinforcement-based

and belief-based learning models to characterize the play of subjects in experi-

ments. We accomplish this objective in two ways. First, we conduct a new

experiment and compare the observed subject behavior to the predictions of

these two classes of learning model, as well as Nash equilibrium, using several

criteria of goodness of ﬁt. Second, we use these same criteria to compare data

from previous experiments by other researchers to the predictions of these

models.

The new experiment uses members of a class of multistage two-player

constant-sum games Ždescribed in Section 2., whose strategic structure involves

the optimal use of private information. Besides being a severe test of the

assumptions of Nash equilibrium, this environment seems a natural one for

eliciting belief-based learning. Both types of learning model can capture some of

the subtle strategic interplay essential to games of this type. In addition, the

learning models give rise to dynamics which behave quite differently from

equilibrium play, and to some extent, from each other. Play in the experiment is

described poorly by stationary Nash equilibrium; in both games, play starts and

remains far away from the equilibrium outcome, even after many repetitions. In

contrast, given the behavior of subjects during early rounds of the experiment,

both types of learning model describe well some gross qualitative features of the

changes in subject behavior from early rounds to later rounds. We quantitatively

compare the models using several criteria of goodness of ﬁt. Which learning

model best characterizes the experimental data depends on which criterion is

used. The belief-based models Žfor some parameterizations. describe aggregate

behavior better, while the reinforcement-based model makes more accurate

predictions of behavior at the individual level. However, according to almost all

of our criteria, both types of learning model perform substantially better than

Nash equilibrium; the improvement achieved by moving from one learning

model to another is small relative to that achieved by moving from equilibrium

to either type of learning model.

This paper is not the ﬁrst attempt to compare learning models. Boylan and

El-Gamal Ž1993. compared the predictions of two belief-based models ŽCournot

play and ﬁctitious play. applied to two earlier experiments using three 15=15

simultaneous-move games and six 3=3 simultaneous-move games. They found

that for some games, play was more likely to be consistent with Cournot

learning, while for other games, play was more likely to be consistent with

ﬁctitious play learning. However, when the entire set of experimental data was

considered Žusing a Bayesian updating procedure., ﬁctitious play was over-

whelmingly more likely to be the correct model. Cheung and Friedman Ž1997.

also compared different belief-based models; in addition to Cournot play and

LEARNING MODELS

607

ﬁctitious play, they considered a hybrid model that included these two models as

special cases, and further generalized the model with two additional parameters.

ŽThe resulting model is very similar to the belief-based models we use in this

paper.. Cheung and Friedman performed an experiment using four 2=2 games

and several experimental procedures, and found that there is substantial hetero-

geneity in learning behavior across experimental subjects. The play of some

subjects was consistent with Cournot, the play of others was consistent with

ﬁctitious play, and the play of yet others was consistent with neither, but with

the more general model. Like Boylan and El-Gamal, Cheung and Friedman

found that both the game and the experimental procedures used affected the

type of learning that took place, in the sense that the distribution of model

parameters varied across games Žthough one parameter changed very little from

game to game.. Sarin and Vahid Ž1999. used data from several experiments Žthe

same experiments as used by Erev and Roth Ž1998., described below. to

compare two reinforcement-based models: the ‘‘ﬁxed reference point’’ model of

Erev and Roth Ž1998., and their own ‘‘SV’’ model, which assumes that an agent

chooses the strategy that maximizes payoff, given the agent’s beliefs about the

likely payoff to each strategy, which are a weighted average of past payoffs to

each strategy. Sarin and Vahid found that the SV model describes play at least

as well as, and sometimes better than, the ﬁxed reference point model.

Recently, some researchers have compared not just different learning models,

but different types of learning model. Mookherjee and Sopher Ž1994. compared

the predictions of a reinforcement-based model to those of a belief-based model

Žﬁctitious play.. They used data from a Matching Pennies experiment, which

they performed with two information treatments: one in which players were told

only their own payoffs at the end of a round, and one in which they were told

the actual payoff matrix and their opponent choices, as well as their own

payoffs. The former treatment gives subjects enough information for reinforce-

ment learning, but not enough for belief learning, and thus seems tailor-made

for reinforcement learning; indeed, observed play was consistent with the

reinforcement-based model. The latter treatment gives subjects enough informa-

tion for belief learning, but neither the belief-based model nor the reinforce-

ment-based model described play well. Mookherjee and Sopher Ž1997. also

compared reinforcement-based and belief-based models, this time with regard to

data from a new experiment which used two 4=4 and two 6=6 games. When

these slightly more complex games were used, they found that the reinforce-

ment-based model did a better job of describing play than the belief-based

model, even though subjects were given enough information for belief learning.

A possible shortcoming of Mookherjee and Sopher’s work is that, though they

compared many different learning models, they used only the data from their

2

own experiments. Thus the generality of their results is open to question. Two

recent papers have used data from several experiments, performed by different

2

The advantage of using data from many experiments has been pointed out by Erev and Roth

Ž1998..

6

08

NICK FELTOVICH

researchers under different sets of experimental procedures. Erev and Roth

Ž1998. compared the ability of reinforcement-based models and belief-based

models to characterize aggregate behavior in several one-stage simultaneous-

move games with unique equilibria in completely mixed strategies. They found

that the reinforcement-based models describe play better than the belief-based

models, and that the better belief-based models are the ones that are more

similar to reinforcement-based models. Camerer and Ho Ž1996, 1999. consid-

ered a general model that includes both belief-based and reinforcement-based

models as special cases, and used several one-stage simultaneous-move games.

By ﬁtting model parameters via maximum likelihood estimation, they concluded

that versions of their model that combine elements of belief-based and rein-

forcement-based learning perform much better than either pure belief-based

models or pure reinforcement-based models.

These papers have shed light on the nature of learning in games, but they

are not without their shortcomings. Erev and Roth considered only a handful

of belief-based models chosen ex ante, so the fact that they found a rein-

forcement-based model to work well may merely reﬂect ‘‘bad luck’’ in their

3

choice of belief-based model speciﬁcation. ŽThey perform sensitivity analyses

on the reinforcement-based model, so it is unlikely that their results are due to

particularly ‘‘good luck’’ in their choice of reinforcement-based model speciﬁca-

tion.. On the other hand, Camerer and Ho’s model is so general that there may

exist a parameterization that is consistent with any experimental outcome;

moreover, when they ﬁnd that a particular combination of parameters best ﬁts a

set of experimental data, questions come to mind concerning how sensitive these

best parameters are to small changes in the game, whether one could predict

which parameter values are appropriate for which games, and so on. Even within

their sample of games, they found that the best parameters vary greatly.

Additionally, all of the work cited above considered only a narrow class of games

ᎏthose in which players’ strategies are single actionsᎏso it is unclear how

generalizable the conclusions are to more complex games. The asymmetric-in-

formation games used in our new experiment address this last limitation; they

are more complex than the simultaneous-move games studied by Erev and Roth

and by Camerer and Ho. Also, we look at a two-parameter family of belief-based

models, which is part of the class proposed by Fudenberg and Levine Ž1995,

1

998.; this family of models is more general than that used by Erev and Roth,

but not as general as that used by Camerer and Ho.

The results of the second part of this paper reconcile somewhat the seemingly

different results of Erev-Roth and Camerer-Ho. We revisit some of the data sets

they considered, and apply the same criteria we used to compare the learning

models’ predictions regarding the asymmetric-information game. Some of our

criteria are similar in spirit to those used by Erev and Roth, and some are

similar to those used by Camerer and Ho. No criterion is used by both, however,

3

To be fair, it should be pointed out that they make no claim to have found the ‘‘correct’’ model

of learning, merely that the reinforcement-based model does a good job of characterizing certain

features of the data.

LEARNING MODELS

609

so even though they appeared to reach different conclusions Žparticularly

concerning the utility of reinforcement-based models., it was unclear whether

this was because they used different games or different criteria of goodness of

ﬁt. We ﬁnd that both are important; the relative abilities of the learning models

to characterize experimental data sets depend on both the game used and the

criterion of goodness of ﬁt. Thus neither type of model is always better.

However, as was true when the asymmetric-information game data were used,

both types of learning model usually represent a substantial improvement over

static Nash equilibrium.

2

. THE NEW EXPERIMENT

The new experiment uses a class of games from Aumann and Maschler

Ž1995.; see Figure 1. First, nature chooses one of two payoff matrices; the Left

matrix is chosen with probability pgŽ0, 1.. The chosen matrix determines a

4

constant-sum stage game, which is played twice. We denote by stage one play of

FIGURE 1.ᎏThe component game.

4

A constant-sum game is used in an attempt to keep subjects’ preferences as closely tied as

possible to the payoffs of the game; we try to limit the inﬂuence of subjects’ tastes for nonpecuniary

aspects of outcomes, such as fairness or spite. ŽA further step in this direction is our repeated use of

loaded words such as ‘‘opponent’’ in the instructions to subjects.. These steps are taken in order to

maximize the likelihood that any lack of equilibrium play in the experiment is not due to subjects’

unwillingness to play what we think to be their equilibrium strategies, but rather their inability to

ﬁgure out what they are Žor belief that their opponents are unable to ﬁgure out what they are, or

belief that their opponents believe that they are unable to ﬁgure out what they are, etc... This is

important when studying learning, because some researchers have shown that static concepts that

generalize Nash equilibrium to allow for players’ tastes for such nonpecuniary aspects can often

describe experimental subject behavior well. Some examples of such generalizations are Rabin’s

Ž1993. fairness equilibrium, Fehr and Schmidt’s Ž1999. fairness theory, Bolton and Ockenfels’s Ž1998.

theory of equity, reciprocity, and competition, and Levine’s Ž1998. theory of altruism.

6

10

NICK FELTOVICH

the stage game, and by round one play of the entire Žtwo-stage. game. Player 1

Žthe row player. is told at the beginning of the round which matrix was chosen;

Player 2 Žthe column player. is not told until the end of the round. A player’s

payoff for the round is equal to the sum of the two stage-game payoffs,

calculated according to the matrix that was chosen. Players’ actions are an-

nounced at the end of each stage, but payoffs are not announced until the end

of the round. We refer to members of this class of games as GŽ p. for given p.

We will deal with two particular games: GŽ.50. and GŽ.34..

2

.1. Strategies and Notation

We will restrict our attention to two components of players’ behavioral

strategies: Player 1’s ﬁrst-stage move Žconditional on nature’s move. and Player

2

’s second-stage move Žconditional on Player 1’s ﬁrst-stage move.. It is in these

components that the complexity of the game lies. To see why, recall that Player

has a piece of private informationᎏthe actual payoff matrix. It can be seen

1

from Figure 1 that in either matrix, one of her actions Ž A in the Left matrix and

B in the Right matrix. can possibly earn her a payoff of one for the stage

Ždepending on Player 2’s choice of action., while the other action gives her zero

for the stage with certainty. We will use the term stage-dominant action Žsda. to

refer to the former, since these actions correspond to Player 1’s weakly domi-

nant strategy in the one-stage analogue to this game. Since the sda depends on

which matrix was chosen by nature, Player 1’s private information is potentially

valuable in the sense that she may use it to earn a payoff higher than 2 pŽ1yp.,

the maxmin expected payoff she would receive if she did not know the payoff

matrix. However, the only way Player 1 can beneﬁt from her private information

is by re¨ealing it, that is, by playing in the Left game differently Žpossibly in a

stochastic way. from her play in the Right game. Player 1 completely re¨eals by

playing the sda in the ﬁrst stage with certainty, that is, by playing A if the Left

game is chosen and B if the Right game is chosen. She partially re¨eals by

playing A in the ﬁrst stage with higher probability in the Left game than in the

Right game. It may seem that completely revealing is the best course of action

for Player 1, since it gives her the best chance of earning the point in the ﬁrst

stage. But since her ﬁrst-stage actions are observable by Player 2 before he

chooses his second-stage actions, Player 2 would then be able to infer the payoff

matrix from Player 1’s ﬁrst-stage action, and could hold Player 1 to a payoff of

zero in the second stage by playing B in the Left game and A in the Right

game; this set of actions is the best response to Player 1’s completely revealing

strategy Žbrcr.. Player 2’s playing the brcr against a completely revealing Player

1

, along with suitable play in the ﬁrst stage, limits Player 1 to an expected payoff

of Min p, 1yp , making her no better off, and for almost all p strictly worse

ꢀ

4

off, than if she hadn’t had the information at all. Even if Player 1 is only

partially revealing, Player 2 can use Bayesian updating to revise his prior beliefs

about which matrix was chosen and lower Player 1’s expected payoff. Player 1’s

primary strategic problem is to choose her ﬁrst-stage action to balance the

LEARNING MODELS

611

TABLE I

NOTATION

Symbol

Meaning: ‘‘Probability of ... ’’

PŽsda NL.:

stage-dominant action in the ﬁrst stage of the Left matrix ŽPlayer 1.

stage-dominant action in the ﬁrst stage of the Right matrix ŽPlayer 1.

1

PŽsda NR.:

1

PŽbrcrNA.:

best response to completely revealing following a Player 1 choice of A ŽPlayer 2.

best response to completely revealing following a Player 1 choice of B ŽPlayer 2.

stage-dominant action in the ﬁrst stage ŽPlayer 1.

PŽbrcrNB.:

PŽsda .:

1

PŽbrcr.:

best response to completely revealing ŽPlayer 2.

current Žﬁrst-stage. gain from using her extra information against the future

Žsecond-stage. loss caused by revealing it. Player 2’s main strategic problem is to

choose his second-stage action to take into account his best inference of which

matrix was chosen, given his observation of Player 1’s behavior. These problems

are closely related

Table I shows some of the notation used to describe players’ behavioral

strategies, and Table II summarizes the Nash equilibrium predictions for the

two games. ŽA full description of the Nash equilibria of GŽ p. for any pgŽ0, 1.

is given by Feltovich Ž1997... The prediction for GŽ.50. is not unique; there is a

one-parameter family of Player 2 strategies, each of which is consistent with

equilibrium play. Also, the equilibrium value of PŽbrcr. depends on Player 1’s

choices of PŽsda NL. and PŽsda NR. and is generally not equal to ␤. In both

1

games, PŽsda NL. and PŽsda NR. are strictly less than one, reﬂecting Player 1’s

1

strategic problemᎏhow quickly to reveal her private information. ŽIn the

second stage, Player 1 optimally plays her sda with probability one, reﬂecting the

fact that at that point, there is no harm to her in revealing all her information..

In GŽ.50., Player 1’s equilibrium strategy has her essentially ignoring the private

information in the ﬁrst stage, while in GŽ.34., she optimally partially reveals her

private information.

2

.2. Experimental Design

In the experiment, one of these two-stage component games was chosen and

played for 40 rounds, with players remaining in the same roles and playing

against the same opponent, but with the payoff matrix ŽLeft or Right. chosen

TABLE II

NASH EQUILIBRIUM PREDICTIONS

Expected payoff

Game

PŽsda1 NL.

PŽsda1 NR.

PŽbrcrNA.

PŽbrcrNB.

PŽ sda1.

PŽbrcr.

Player 1

Player 2

GŽ.50.

GŽ.34.

.500

.971

.500

␤

.500

1.500y␤

1.000

.500

.660

w.500,1.000x

0.75

0.67

1.25

1.33

.670

Note: ␤ may take on any value in 0.500, 1.000 .

w x

6

12

NICK FELTOVICH

independently each round Žand p held ﬁxed.. A total of 122 subjects, mostly

undergraduates at the University of Pittsburgh, participated in the experiment.

Nine experimental sessions were conducted, with between eight and twenty

subjects in a session. In addition to a participation fee of $10.00, subjects could

earn a bonus of $10.00 if they earned a point in a round and stage that were

5

randomly chosen after the experiment.

The payment scheme used here is equivalent to the ‘‘binary lottery’’ scheme

of Roth and Malouf Ž1979.; in particular, the utility functions of expected utility

maximizing individuals would be linear in the number of points earned. Without

loss of generality, we can therefore deﬁne payoffs in the forty-times-repeated

game to be the sum of the payoffs in each individual component game. Since the

games are constant-sum, the corresponding repeated games are also constant-

sum. The unique equilibrium of the repeated GŽ.34. game is the component

game equilibrium repeated 40 times. Since GŽ.50. has a continuum of Nash

equilibria, the repeated game does not have a unique Nash equilibrium; any

sequence of component-game Nash equilibria Žincluding stationary play of a

particular component-game equilibrium. yields a Nash equilibrium of the re-

peated game.

2

.3. The Reinforcement-Based Model

The equilibrium prediction for GŽ.50. and GŽ.34. is play that is stationary

from round to round Žthe only exception being that, in GŽ.50., Player 2 can

choose each round from a one-parameter family of equilibrium strategies.. We

now examine the alternative predictions of two types of learning model.

The reinforcement-based model is adapted from the models of Roth and Erev

Ž1995.. ŽSee Roth and Erev Ž1995. and Erev and Roth Ž1998. for additional

motivation behind this model and its generalizations.. Because of its close

relation to their basic model, and its lack of free parameters, it will be referred

to as the RE model. Roth and Erev successfully use variations of this model for

0

predicting subject behavior in many experiments, despite Žor perhaps because

of. the low level of rationality the model attributes to individuals. Rather than

endowing players with the high degree of cognitive sophistication implicit in

equilibrium predictions, this model posits that players merely learn, over time,

to play better strategies Žstrategies leading to higher realized payoffs. more

often and worse strategies less often.

t

Speciﬁcally, in round t, players have a nonnegative initial propensity q Ž␣N⌿ .

for playing action ␣ Ž␣sA or B. at information set ⌿. The strength of

t

propensities ŽQ Ž⌿ .. in round t is the sum of the propensities for playing both

t

actions at information set ⌿ : Q Ž⌿ .sq Ž AN⌿ .qq ŽBN⌿ .. For any tG1,

propensities for round tq1 are found by adding the payoff earned in round t to

5

The Appendix contains the instructions given to subjects. More details about the experimental

procedures are available from the author.

LEARNING MODELS

the round-t propensities of the actions that were played in round t:

q^t^q¹Ž␣N⌿ .

613

¡

q^tŽ␣N⌿ . q␲ if ⌿ was reached, ␣ was played, and

~

s

the payoff for the round was ␲ ,

^¢q^tŽ␣N⌿ .

if ⌿ was not reached or ␣ was not played.

Thus in each round, each player will augment two propensities by the total

6

payoff for the round. Initial Žround-1. propensities are exogenous. The proba-

bility of playing strategy ␣ at information set ⌿ in round t is the correspond-

ing propensity, divided by the strength of propensities at information set ⌿ in

round t:

t

q Ž␣N⌿ .

Q Ž⌿ .

p^tŽ␣N⌿ . s

.

t

2

.4. The Belief-Based Models

The RE model assumes that players make decisions according only to past

0

payoffs from actions. Taken literally, this means that decisions are made without

regard to many features of the payoff matrix and the history of opponent’s plays

Žthough, of course, these affect the player’s payoffs and thus indirectly inﬂuence

later decisions.. In the experiment, however, players do know the payoff matrix

and are matched to the same opponent in every round Žthis latter fact should

increase the relative usefulness in this experiment of the history of past play in

forecasting future opponent actions, compared to many experimental designs.. It

seems reasonable to expect that an appropriate model of learning in this

experiment should take this knowledge into account. Also, the very nature of

Player 2’s situation Žtrying to infer the payoff matrix from Player 1’s move.

suggests that he ought to form beliefs about the way his opponent behaves.

Therefore, in addition to the RE model, we consider a two-parameter family of

0

belief-based models adapted from the model of Fudenberg and Levine Ž1995,

7

1

998..

According to these models, players hold beliefs Žconjectures. concerning the

likely play of their opponentŽs., and they choose strategies based on their

expected payoffs given these beliefs. Speciﬁcally, players’ beliefs are character-

ized by nonnegative belief weights over opponents’ actions at each information

6

The alternative, augmenting each propensity by the payoff for the stage in which it was played,

would result in blufﬁng ŽPlayer 1 actions that sacriﬁce any chance of positive payoff in the ﬁrst stage

in favor of a higher probability of positive payoff in the second stage. never being reinforced, even

when it is successful.

7

Similar models have been used for describing subject behavior in experiments by, for example,

Cheung and Friedman Ž1997. and Mookherjee and Sopher Ž1997..

6

14

NICK FELTOVICH

set. The weight on an opponent playing action ␣ Ž␣sA or B. at information

t

set ⌿ in round t is ␻ Ž␣N⌿ .. The strength of beliefs at information set ⌿

t

Ž⍀ Ž⌿ .. is the sum of the weights at ⌿ : ⍀ Ž⌿ .s␻ Ž AN⌿ .q␻ ŽBN⌿ .. For

any tG1, weights for round tq1 are found by increasing the weight of each

action that was observed in round t:

tq1

␻

Ž␣N⌿ .

t

¡

Ž1y␦ .␻ Ž␣N⌿ . q1

if ⌿ was reached and ␣ was played by the opponent,

~

s

t

Ž1y␦ .␻ Ž␣N⌿ .

¢

if ⌿ was not reached or ␣ was not played by the opponent.

The parameter ␦ determines the relative amount of bearing given to past

outcomes relative to current outcomes in forming beliefs. If ␦s0, outcomes in

all rounds have equal import, while if ␦s1, only the most recent outcome is

considered. If ␦gŽ0, 1., more recent outcomes are more important than previ-

ous outcomes, while if ␦-0, the opposite is true. Initial belief weights are

exogenous. The assessed probability of the opponent’s playing action ␣ at

information set ⌿ in round t is the corresponding belief weight, divided by the

strength of beliefs at information set ⌿ in round t:

␻^tŽ␣N⌿ .

⍀ ^tŽ⌿ .

␮^tŽ␣N⌿ . s

.

e

t

Given these assessed probabilities, a player’s perceived expected payoff ⌸ Žs< ␮ .

to each available pure strategy s can be calculated. The player’s chosen strategy

in round t is determined from these expected payoffs; the probability of a player

t

choosing the pure strategy s in round t given beliefs ␮ is

e

t

expŽ␭и⌸ Ž sN␮ ..

Ž1.

ProbŽ s chosen in round t. s

,

e

t

Ý_s_g_SexpŽ␭и⌸ Ž sN␮ ..

where S is the set of all pure strategies. The parameter ␭ determines the extent

to which the player responds optimally to her beliefs. If ␭s0, she chooses A

and B with equal likelihood at each information set irrespective of expected

payoff, while as ␭ gets large, her strategy approaches best-response play; we will

call this limiting case the version of the model with ␭sϱ. We will refer to

members of this family of models as BAŽ␭, ␦ . for given ␭ and ␦ Žwhere BA

8

stands for Bayesian..

8

Fudenberg and Levine Ž1995, 1998. examine some of the theoretical properties of models such

as this one, which they call ‘‘cautious ﬁctitious play.’’ In our notation, standard ﬁctitious play would

be called the BAŽϱ, 0. model. The relationship of cautious ﬁctitious play to standard ﬁctitious play is

analogous to that between McKelvey and Palfrey’s Ž1995, 1998. quantal Žlogistic. response equilib-

rium to Nash equilibrium. In particular, the Žstochastic. ﬁxed points of cautious ﬁctitious play are

logistic response equilibria.

LEARNING MODELS

615

2

.5. Implications of the Learning Models

Since play according to either the reinforcement-based or the belief-based

models is allowed to change over time, it is plausible that the predictions of

these models could differ from those of static Nash equilibrium. In order to get

some idea of what these differences are, we simulate experiments based on the

two games. For each game, sets of 200 simulations were performed according to

the RE₀and BAŽϱ, 0. models. Each simulation consisted of one simulated

Player 1 and one simulated Player 2 playing the same game for at least forty

rounds. Initial propensities and beliefs were chosen randomly; strengths of

initial propensities and beliefs were set to six at each information set.⁹

The simulation results are shown in Figure 2. In order to show the dynamic

features more clearly, the unit ŽPŽsda ., PŽbrcr.. square is partitioned into four

1

rectangular subregions determined by the Nash equilibrium points Žin the case

of PŽbrcr. in GŽ.50., the midpoint of the interval of equilibrium values.. In each

subregion is a circle and several connected line segments. The center of each

circle represents the average starting point of all simulations with starting points

in that region, the area of a circle is proportional to the number of simulations

FIGURE 2.ᎏTrajectories of learning model simulations Žrandom initial propensities and beliefs..

9

The qualitative aspects of the simulation results are robust to small changes in parameter

values. For details, see Feltovich Ž1997..

6

16

NICK FELTOVICH

with starting points in that region, and the line segments follow the average path

of play of the simulations starting in that region. In the RE₀simulations,

segment endpoints show average play in rounds 1, 40, 100, and 400; in the

BAŽϱ, 0. simulations, they show average play in rounds 1 and multiples of 5 up

to 40.

In both games, and for both learning models, there is a rough counterclock-

wise movement of simulation trajectories around the equilibrium Žthe intersec-

tion of the dotted lines.. In the RE simulations, there is little if any discernible

0

movement of trajectories toward equilibrium. In the BAŽϱ, 0. simulations, such

movement can readily be seen; however, it is less pronounced than the counter-

clockwise movement. Thus, if these learning models accurately describe the way

players change their behavior over time, we would expect the trajectories from

the actual experiment to move counterclockwise about the equilibrium point Žor

center of the segment of equilibria. with much less, if any, movement toward

this point. Importantly, this implication is quite robust to the choice of learning

model. Here, we see that ﬁctitious play and RE₀have similar implications;

Feltovich Ž1997. shows that simulation trajectories arising from other BA

models are qualitatively similar to both.¹⁰

The implications of the learning models contrast sharply with the equilibrium

prediction of stationary play. However, like the equilibria of the games, the

learning models demonstrate the dilemma faced by Player 1 in deciding how

much Žif at all. to reveal in the ﬁrst stage. If players play according to RE₀,

Player 1’s over-revealing makes Player 2’s expected payoff to playing the brcr

higher, so that it will be reinforced more on average and eventually played more

often. Then, sda will result in lower payoffs on average and will eventually be

1

played less often. If players play according to one of the BA models, Player 1’s

revealing changes Player 2’s beliefs so as to increase his assessed probability of

Player 1 revealing, making brcr a better response and increasing the likelihood

that he actually plays the brcr; such play eventually changes Player 1’s beliefs,

lowering the expected payoff to, and eventually the probability of choosing, the

sda. By similar reasoning, according to either learning model, Player 1’s

‘

‘under-revealing’’ would eventually result in Player 2’s playing the brcr less

often, and the change in Player 2’s play would eventually cause Player 1 to

reveal more often. This argument also explains the counterclockwise direction of

the simulation trajectories. At any point in the unit square except for the Nash

equilibrium point itself, the best-response correspondence gives ŽPŽsda .,

1

0

The similarity between belief-based and reinforcement-based models is not surprising, in view

of the fact that they can be shown to be special cases of a more general model. Camerer and Ho’s

Ž1999. ‘‘experience-weighted attraction ŽEWA.’’ is a learning model in which, like RE , strategies

0

that are played are reinforced by their payoff, but unlike RE , strategies that are not played are also

0

reinforced by the payoff they would have earned had they been played, scaled by some constant

Žnormally between zero and one.. If the value of this constant is zero, EWA reduces to a model

similar to RE , given suitable values for EWA’s other parameters. Less obviously, if the value of the

0

constant is one, Camerer and Ho show that EWA reduces to a model very much like our BA models

Žagain, given suitable values for other parameters..

LEARNING MODELS

617

PŽbrcr.. pairs that lie counterclockwise from that point. According to either

model, if play is initially out of equilibrium, it will Žon average. move in the

direction of the best responseᎏcounterclockwise. As an extreme example, note

that Cournot best-response play Žthe BAŽϱ, 1. model. would result in individual

play paths that jump from the top-right corner to the top-left corner, to the

bottom-left corner, to the bottom-right corner, back to the top-right corner, and

so on, each found moving to the next corner in the counterclockwise direction.

3

. SUMMARY OF EXPERIMENTAL RESULTS

Table III gives a summary of aggregate behavior in the experiment.¹¹Shown

are the population relative frequencies corresponding to the probabilities

PŽsda . and PŽbrcr. for the entire forty rounds of each cell, and also for each of

1

four ten-round ‘‘blocks’’ within each cell. ŽWe will refer to the rounds 1᎐10 as

‘

‘block 1,’’ rounds 11᎐20 as ‘‘block 2,’’ and so on.. Starred relative frequencies

are those that are signiﬁcantly different from Nash equilibrium behavior.

1

2

It is easy to see that Nash equilibrium poorly describes aggregate behavior in

this experiment. In both cells, the population relative frequency of sda starts

1

and remains well above the equilibrium prediction, though it decreases over

1

3

time. In both cells, the population relative frequency of brcr increases over

time Žthe direction of the best response to the play of the Player 1 population..¹⁴

These features of behavior are also visible in Figure 3, which plots the relative

frequencies of sda and brcr in the two cells for each block Žthe dots connected

1

by lines. and the Nash equilibria. The equilibrium of the GŽ.34. cell is the center

of the ‘q’ and the equilibria of the GŽ.50. cell are the points of the vertical line

segment Žthe horizontal segment marks the center of the vertical segment.. The

1

The raw data from this experiment are available on request; see Feltovich Ž1997. for further

analysis of the data.

1

2

Binomial test. For descriptions of the nonparametric statistical tests used in this paper, see

Siegel and Castellan Ž1988..

1

3

A referee pointed out the similarity between the tendency of Player 1s here to overplay the sda

in the ﬁrst stage, earning a higher ﬁrst-stage payoff than in equilibrium but sacriﬁcing the

second-stage payoff, and the ‘‘melioration’’ theory of Herrnstein and Prelec Ž1992.. The motivation

behind this theory usually relies on examples of dynamic inconsistency in individual choice Žfor

example, ‘‘self-control’’ problems such as overeating or overspending.; however, the experimental

results used as evidence for melioration generally rely on individuals’ not understanding the

relationship between current actions and the future decision-making environment Žsee, e.g., Herrn-

stein, Prelec, and Vaughan Ž1986... A hypothesis explaining the high frequency of sda play in the

ﬁrst stage is that Player 1s notice that this action is better Žin a myopic sense., but fail to consider

the likely response of Player 2s in the second stage; this explanation is consistent with the latter

aspect of melioration. It is difﬁcult to imagine a realistic scenario in which high frequencies of sda

play in the ﬁrst stage are due to lack of self-control.

1

4

As was mentioned in Section 2.1, the equilibrium probability of brcr in GŽ.34. depends on

Player 1 play in the ﬁrst stage. Taking the observed values of PŽsda . into account, the equilibrium

1

predictions are .800, .769, .770, .772, for blocks 1 through 4, respectively, and .778 for the entire

GŽ.34. cell.

6

18

NICK FELTOVICH

TABLE III

EXPERIMENTAL RELATIVE FREQUENCIES

Block 1

Block 2

Block 3

Block 4

All Rounds

GŽ.50.

GŽ.34.

PŽsda .

.845**

.759

.852**

.886

.797**

.917

.741**

.914

.808**

.869

1

PŽbrcr.

PŽsda₁.

PŽbrcr.

.872**

.684**

.819**

.794

.800**

.728

.781**

.179

.818**

.749*

*

: signiﬁcantly different from equilibrium at 5% level.

*

*: signiﬁcantly different from equilibrium at 0.1% level.

directions of changes between blocks are consistent with the learning models,

though more so for the GŽ.50. cell than the GŽ.34. cell.

In order to quantify this apparent counterclockwise movement and detect

convergence, if any, of play to equilibrium, we disaggregate the data by pairs of

players and change the coordinate system from standard rectangular coordinates

to polar coordinates. The location of a point is represented by the ordered pair

Žr, ␪ ., where r is the distance from that point to the origin Žhere, the Nash

equilibrium point or center of the segment of equilibria., and ␪gw0, 2␲ . is the

measure of the angle the point makes with the origin and some ﬁxed reference

ray Žhere, the ray pointing straight down from the origin.. The values for r and ␪

are given by

2

rs

'

Ž

PŽ sda . yP_e_q_u_i_l_i_b_r_i_u_mŽ sda₁.

.

q

Ž

PŽbrcr. yP_e_q_u_i_l_i_b_r_i_u_mŽbrcr.

.

;

1

.

75yPŽbrcr.

cos ␪s

.

r

FIGURE 3.ᎏAggregate subject behavior in experiment Žall players, 10-round blocks..

LEARNING MODELS

619

For each pair of players, increases in ␪ indicate movement in the counterclock-

wise direction around the equilibrium point and decreases in r indicate move-

ment toward equilibrium. Wilcoxon signed ranks tests of changes in r and ␪

from the ﬁrst to the last block in GŽ.50. ﬁnd a signiﬁcant increase in the

population distribution of ␪ Ž p-.001. but no signiﬁcant change in the popula-

tion distribution of r Ž p).10.. Page tests of changes in r and ␪ over all four

blocks yield the same result in all cases. Wilcoxon tests of changes in r and ␪

from the ﬁrst to the last block in GŽ.34. ﬁnd a signiﬁcant increase in the

population distribution of r Ž p-.01., as well as a signiﬁcant increase in the

population distribution of ␪ Ž p-.001.. Page tests of changes in GŽ.34. yield

somewhat weaker results; the increase in r is not signiﬁcant, while the increase

in ␪ is only signiﬁcant at the 10% level. We conclude that individual

ŽPŽsda ., PŽbrcr.. pairs move counterclockwise over time in both games, but

1

5

there is no evidence of movement toward equilibrium.

4

. COMPARISON BETWEEN EXPERIMENTAL RESULTS AND LEARNING

MODEL PREDICTIONS

The results mentioned in the previous section suggest the experimental data

possess features that are inconsistent with static Nash equilibrium, but consis-

tent with the gross qualitative predictions of the learning models.¹⁶We now wish

to compare the degree of success of these models. In the next section, we test

the individual-level predictions of the models by comparing the decisions made

by players in particular situations with the decisions predicted by the models in

those situations. In the following section, we test the aggregate-level predictions

of the models by running additional simulations in which initial behavior is

similar to that in the experiment, and seeing how closely simulation trajectories

track observed experimental trajectories.

4

.1. Use of the Learning Models for Characterizing Indi¨idual

Decision-Making Beha¨ior

One way to look at learning models is as forecast rules that, given information

from previous rounds, predict Žpossibly probabilistically. a subject’s choices in

the current round. In this section, we will examine the accuracy of the forecasts

of the RE and BA models, as well as stationary equilibrium, using as baselines

0

1

5

There is no inconsistency in this difference between aggregate results and individual-level

results. Changes in individual play paths may not show up when aggregated, particularly if there is a

lot of heterogeneity in the individual play paths. As a simple example, consider two pairs of players.

The ﬁrst pair’s ŽPŽsda ., PŽbrcr.. pair is Ž.7, .7. in the ﬁrst block and Ž.5, .8. in the second, and the

1

second pair’s ŽPŽsda ., PŽbrcr.. pair is Ž.3, .8. in the ﬁrst block and Ž.5, .7. in the second. The

1

ŽPŽsda ., PŽbrcr.. pair for each pair of players moves counterclockwise about Ž.50, .75. as well as

1

toward it, but aggregating the pairs gives stationary play of Ž.50, .75. ‘‘on average.’’

1

6

Feltovich Ž1999. gives some evidence that the RE₀model outperforms static Nash equilibrium

in characterizing the experimental data.

6

20

NICK FELTOVICH

three inertial models. The inertial models predict that players will behave in the

current round exactly the same as in the previous round with probability p_s_a_m_e

For example, a Player 1 who chose the sda in the ﬁrst stage in the previous

round will choose it in the ﬁrst stage of this round with probability p_s_a_m_e

.

,

irrespective of which payoff matrix was chosen by nature in either round. We

denote this class of inertial models by INŽ p_s_a_m_e.; the members we examine are

the INŽ0.50., INŽ0.75., and INŽ1.00. models ŽINŽ0.50. predicts that both actions

are played with equal probability at each reached information set and could thus

be considered ‘‘completely random’’ play..

Because predictions of early-round play according to the RE and BA models

0

depend heavily on unknown initial conditions Žpropensities or beliefs., we look

only at the models’ predictions of behavior in the last thirty rounds. In the case

of the RE₀model, we assume that the propensity for playing action ␣ at

information set ⌿ in round t is exactly equal to the sum of payoffs received in

rounds up to ty1 in which ⌿ was reached and ␣ was played; in the case of the

BA models, we assume that the belief weight on an opponent’s playing action ␣

at information set ⌿ is equal to the number of times ⌿ was reached and ␣ was

played Žfor models with ␦s0. or the weighted sum of times ⌿ was reached and

␣

was played Žfor models with ␦/0.. In other words, initial propensities or

1

7

initial beliefs have been completely drowned out. Given their propensities or

beliefs, players’ predicted probabilities are obtained as discussed in Section 2.

We ﬁrst assess the accuracy of our models using three measures of closeness

of predictions to actual choices: mean squared deviation ŽMSD., log likelihood

ŽlnŽL.., and a proportion of inaccuracy ŽPOI. score. All three criteria are

derived by pairing the predicted probability of A being chosen Ždenoted

p_p_r_e_dŽ A.. according to the behavioral model being considered and the actual

probability that A was chosen Ždenoted p Ž A..ᎏwhich is either zero or

act

oneᎏfor each choice made by either type of player at any information set. Then

1

r2

1

2

MSDs

Ý p_p_r_e_dŽ A. yp_a_c_tŽ A.

,

ž

/

N

lnŽ L. s Ý ln

p_p_r_e_dŽ A.

q Ý ln

1yp_p_r_e_dŽ A.

,

Ž

.

Ž

.

A

chosen

B chosen

where N is the total number of observations. The MSD statistic used here is

equivalent to the ‘‘quadratic scoring rule,’’ whose desirable theoretical proper-

ties are examined by Selten Ž1998.. It is also very similar to the ‘‘mean

probability score’’ discussed by Yates Ž1982. and used later in this section. The

POI score is meant to put models with deterministic predictions on the same

1

7

If, contrary to this assumption, initial propensities or beliefs have not been drowned out at this

time, there should be substantial differences between the analysis performed here and the analysis

that would have been performed had we used the last twenty rounds or the last ten Žin which case

initial propensities or beliefs would have a noticeably smaller effect., rather than the last thirty. This

turns out not to be the case; using the last ten or twenty rounds does not change the ‘‘ranking’’ of

the various behavioral models. Some evidence of this will be given shortly.

LEARNING MODELS

TABLE IV

621

ABILITIES OF BEHAVIORAL MODELS TO PREDICT DISAGGREGATED DECISIONS

ROUNDS 11᎐40 Ž21᎐40.ᎏNs1830 Ž1220.

Model

MSD

POI

ln

Ž

L

.

Posterior Prob.

RE

.348 Ž.350.

.548 Ž.553.

.424 Ž.423.

.492 Ž.493.

.414 Ž.414.

.182 Ž.186.

.302 Ž.307.

.264 Ž.266.

.236 Ž.236.

.299 Ž.299.

y3161.3 Žy2037.9.

yϱ Žyϱ.

)0.999 Ž)0.999.

0.000 Ž0.000.

-0.001 Ž-0.001.

0.000 Ž0.000.

0

BAŽϱ, 0.

BAŽ4.44,y0.105.

BAŽ0.19, y0.205.

equilibrium

y3969.6 Žy2627.7.

y4963.9 Žy3313.2.

yϱ Žyϱ.

INŽ0.50.

INŽ0.75.

INŽ1.00.

.500 Ž.500.

.426 Ž.428.

.489 Ž.491.

.500 Ž.500.

.239 Ž.241.

y5073.8 Žy3382.6.

y4025.1 Žy2695.9.

yϱ Žyϱ.

-0.001 Ž-0.001.

0.000 Ž0.000.

footing as those with stochastic predictions; it treats the highest-probability

choice according to a model as ‘‘the’’ prediction of the model and determines

the proportion of wrong predictions. The POI score is found by calculating the

mean of the following values over all choices: when action ␣ Žeither A or B. is

chosen, the value 0 if p_p_r_e_dŽ␣.)1r2, the value 1r2 if p Ž␣.s1r2, or the

pred

value 1 if p_p_r_e_dŽ␣.-1r2. For a deterministic model such as ﬁctitious play or

the INŽ1.00. model, the POI score is exactly equal to the square of the model’s

MSD score.

Table IV summarizes the predictive abilities of the models. In addition to

RE , Nash equilibrium, ﬁctitious play ŽBAŽϱ, 0.., and the three inertial models,

0

we show the MSD, POI, and lnŽL. of the ‘‘best’’ BA models in terms of each of

1

8

these criteria. Keeping in mind that better predictive power is implied by lower

MSD and POI and by higher lnŽL., we can see that RE is best according to all

0

three criteria. Fictitious play fares particularly badly hereᎏworse than ﬁfty-ﬁfty

randomization according to the MSD criterion. Even the best of the BA models

performs better than the INŽ0.75. model according to only two of the three

criteria, and is far worse than RE . We also see that while Nash equilibrium

0

performed poorly in describing many features of the data, it actually does

reasonably well here. In fact, according to the MSD criterion, RE is the only

0

model that performs better than stationary Nash equilibrium play. Shown in

parentheses are corresponding statistics that cover only the last twenty rounds,

rather than the last thirty. Using this smaller set of rounds does not change any

of our conclusions.

1

8

The best model, according to a given criterion, was found by a grid search over values of ␭ and

to three signiﬁcant digits. In order to give the BA model the best possible chance, different

␦

parameter values were allowed each time the criterion of goodness of ﬁt was changed. The best

model according to the lnŽL. criterion was almost the same as the best one according to MSD, not

only in the parameter space but also in their values for the three criteria, so the best model

according to lnŽL. is reported here as their common optimizer. A sensitivity analysis suggests that at

the optima, all three statistics are robust to small changes in both ␭ and ␦, so adding signiﬁcant

digits will not lead to substantially better values.

6

22

NICK FELTOVICH

Because the models used here are not nested, we cannot use a straightforward

likelihood-ratio test to compare them. However, we can use the very similar

‘minimal prior information’’ posterior odds criterion that was developed by

‘

Klein and Brown Ž1984. and used by Harless and Camerer Ž1994. to compare

models of decision making. The posterior odds criterion for some Model 1

versus some other Model 2 is

yŽ K yK .r2

w n

1

2

xwMaximized Likelihood under Model 1rMaximized

Likelihood under Model 2x,

where n is the sample size and K and K are the number of free parameters in

1

2

Model 1 and Model 2, respectively. ŽThe BA model has two free parameters; the

other models have none.. Given these pairwise odds, we can calculate the

posterior probability of each model being correct, given that one of these

models is correct; these are shown in Table IV. Not surprisingly, given the

values of the lnŽL. statistic, RE is the ‘‘odds-on’’ favorite. Of course, there is

0

always the possibility that none of the given models is the ‘‘correct’’ one, so we

cannot conclude that RE is the correct model, only that the other models are

0

Žwith extremely high probability. incorrect.

In order to illustrate the predictive ability of the models, we also show

1

9

reliability diagrams for each model. A reliability diagram is a graphical repre-

sentation of the predictive ability Žsometimes called external correspondence. of a

model, and depicts its decomposition into two distinct components, calibration

and resolution. Calibration is a measure of the ‘‘accuracy’’ of a model’s forecasts;

in a well calibrated model, predicted probabilities conform closely on average to

actual relative frequencies. ŽFor example, if one looks at the situations in which

a well calibrated model predicted that the probability of choosing A is 0.2, A

should actually have been chosen about 20% of the time.. Resolution is a

measure of the ‘‘informativeness’’ of a model’s forecasts; a well resolved model

partitions the set of predictedractual choices into subsets in which A is actually

chosen either almost never or almost always. Calibration and resolution are

both desirable properties for a model to have, but it is possible for a model to be

well calibrated but poorly resolved, well resolved but poorly calibrated, good at

both, or poor at both Žhypothetical examples are shown in Figure 4.. For

example, since there are only two possible predictions at any information set, a

model that always makes the wrong prediction is very poorly calibrated but very

well resolved. The forecasts from an ideal behavioral model would be both

perfectly calibrated and perfectly resolved; the model would predict A Žwith

probability one. every time A is actually chosen, and B every time B is chosen.

For probabilistic models such as RE and the BA models with ﬁnite ␭, as well as

0

for mixed-strategy Nash equilibrium, this is generally not possible. However, we

will examine the extent to which the models approach this ideal.

1

9

A thorough review of many of the concepts used here, particularly the notions of calibration

and resolution, is given by Yates Ž1982..

LEARNING MODELS

623

6

24

NICK FELTOVICH

FIGURE 5.ᎏCalibration graphs of inertial models.

The reliability diagrams are constructed as follows: ﬁrst, actual and predicted

decisions over rounds 11᎐40 are paired; second, each decision pair is classiﬁed

according to its predicted probability of choosing A into one of eleven intervals

Žw0, 0.05x, Ž0.05,0.15x, Ž0.15,0.25x, . . . , Ž0.95,1x.; third, the mean predicted probabil-

ity of an A choice Ždenoted p_p_r_e_dŽ A.. and the actual relative frequency of A

choices Ždenoted p Ž A.. are calculated for the pairs in each interval, yielding

act

up to eleven ordered pairs; fourth, these ordered pairs are graphed in

ŽPŽsda ., PŽbrcr.. space, along with the number of observations represented by

1

each ordered pair. A perfectly calibrated model would yield ordered pairs on the

4

5Њ line. A perfectly resolved model would yield ordered pairs on the p_a_c_ts0

and p_a_c_ts1 lines Žsee Figure 4..

The models’ reliability diagrams are shown in Figures 5 and 6. We can see the

high degree of calibration of the INŽ0.50. and INŽ0.75. models, though both are

very poorly resolved. The INŽ1.00. model is both poorly calibrated and poorly

resolved. The RE and BA models, as well as equilibrium, are well calibrated in

0

the very weak sense that p_p_r_e_dand p_a_c_tare positively correlated. However, RE₀

and the ‘‘best’’ BA model are clearly better calibrated than ﬁctitious play or

equilibrium play Žthough it should be noted that, as poorly as equilibrium

characterized other aspects of subject behavior, it performs rather well here..

It is difﬁcult to tell from the reliability diagrams whether RE or the best BA

0

model is better calibrated, but based on the diagrams, we can quantitatively

examine the success of each behavioral model using the following three criteria:

LEARNING MODELS

625

6

26

NICK FELTOVICH

Sanders calibration ŽC ., Sanders resolution ŽR ., and mean probability score

s

ŽPS., measures of calibration, resolution, and overall predictive ability, respec-

tively. These measures are deﬁned as follows:

1

2

C s

Ý N

Ž

p_p_r_e_dŽ A. yp_a_c_tŽ A.

,

.

s

j

ž /

N

j

s1

1

R s

Ý N p Ž A.Ž1yp_a_c_tŽ A..,

s

j

act

ž /

N

j

s1

PSsC qR ,

s

where N is the number of pairs whose predicted probability lies in the jth

j

2

0

interval, and N is the total number of pairs. Table V reports the performance

of the models according to these criteria. According to calibration scores alone,

the best models are BAŽ0.19,y0.205. and ﬁfty-ﬁfty randomization; however,

they are the worst resolved. The difﬁculty in ascertaining from the diagrams

whether RE or the best BA model is better calibrated is apparent here; RE is

0

slightly better than BAŽ4.46,y0.107., but the difference between them is

negligible. The RE model has by far the best resolution, and while the best BA

0

model has better calibration than equilibrium, its resolution is worse.

4

.2. Use of Initialized Models for Tracking Aggregate Play

It was shown in the previous section that the RE model predicts individual

0

decisions, given histories of play up to the current round, better than the other

models we considered. While this provides evidence that individuals’ decision-

making processes Žin this game. might be usefully approximated by a reinforce-

TABLE V

CALIBRATION AND RESOLUTION OF MODELS

ROUNDS 11᎐40ᎏNs1830

Model

Cs

Rs

PS

RE

.0034

.0799

.0037

.0000

.0035

.0050

.1329

.2015

.1750

.2499

.1750

.1726

.1362

.2815

.1788

.2500

.1786

.1775

0

BAŽϱ, 0.

BAŽ4.44,y0.105.

BAŽ0.19,y0.205.

BAŽ4.46,y0.107.

Equilibrium

INŽ0.50.

INŽ0.75.

INŽ1.00.

.0001

.0007

.0629

.2499

.1917

.2500

.1923

.2546

2

0

It can be shown that PS is approximately the square of MSD, so we already know that RE will

0

have the best PS score. However, it is still instructive to see if it is best calibrated, best resolved, or

both.

LEARNING MODELS

627

ment-based model, such predictions are not the only use of learning models.

Also useful are predictions of aggregate behavior; a model that performs poorly

in predicting individual decisions may work well on average. Furthermore, we

have been looking at models’ predictions of play in a given round gi¨en what has

happened from the ﬁrst round up to that round. Since a researcher will not have

this information until after the experiment is performed, it would be useful for a

model to make accurate predictions of behavior in all rounds, given only some

initial conditions. We will now analyze the success of the models in making

these types of predictions.

In order to accomplish this, we have run additional sets of simulations similar

to those in Section 2.5, but different in two notable ways. First, the output of the

new simulations consists of observed actions averaged over blocks, instead of

behavioral strategies from particular rounds, so that the simulation results are

more directly comparable to the experimental data reported in Section 3.

Second, instead of using randomly chosen ﬁrst-round propensities or beliefs, we

initialize propensities or beliefs so that ﬁrst-round play in the simulations

matches ﬁrst-round play in the experiment as closely as possible. For the RE₀

simulations, we use the actual ﬁrst-round relative frequencies from the experi-

ment as estimates of initial probabilities. For the BA simulations, we set initial

belief probabilities to obtain play as close as possible to the actual ﬁrst-round

relative frequencies as the outcome of expected-payoff maximization for the

simulations with ␭sϱ or via Equation 1 in Section 2.4 for the simulations with

ﬁnite ␭. As before, strengths of initial propensities and beliefs were set to six.

For each model, we ran 100 sets of 29 simulations of the GŽ.50. game and 100

sets of 32 simulations of the GŽ.34. game, 100 times the actual number of

observations in the experiment.

Figures 7 and 8 plot, for the RE and BAŽ6.67,y0.030. simulations, respec-

0

tively, mean relative frequencies of sda and brcr over each ten-round block.

1

ŽThe BAŽ6.67,y0.030. model is the best-ﬁtting according to the criterion that

will be used in this section.. Each small circle represents the ŽPŽsda ., PŽbrcr..

1

pair corresponding to the average play of a set of pairs of simulated players;

each large circle represents the ŽPŽsda ., PŽbrcr.. pair corresponding to the

1

average play of all pairs of actual players over the same block. ŽIn these ﬁgures,

the radii of the circles are not meant to indicate any aspect of the data; they are

meant only to make the one experimental point stand out among the 100

simulation points.. Large pluses show the Nash equilibria, and in the ﬁgures

corresponding to the BA model, small pluses show the logistic response equilib-

rium.

None of the simulations exactly match the experimental data. The RE

0

simulation means and those of the BAŽ6.67,y0.030. model are clustered near

the corresponding experimental means in each block, though the former seems

to be somewhat closer to the GŽ.50. data and the latter to the GŽ.34. data. This

closeness is quantiﬁed in Table VI, which reports the squared difference

between simulation and experimental means for both games, both players, and

each of the four blocks. In the bottom row of the table, the sixteen squared

6

28

NICK FELTOVICH

LEARNING MODELS

629

6

30

NICK FELTOVICH

TABLE VI

FIT OF MODEL SIMULATIONS TO EXPERIMENTAL MEANS ŽSQUARED DEVIATIONS.

BA models

Equil.

i.i.d.

Game

Variable

Block

RE 0

Žϱ, 0.

Ž4.46, y0.107.

Ž6.67, y0.030.

INŽ0.50.

GŽ.50.

PŽsda₁.

1

2

3

4

.00006 .00002

.00001 .02826

.00179 .05471

.00781 .05724

.00020

.00067

.00035

.00425

.00032

.00001

.00002

.00015

.00639 .11902

.12390 .12390

.08821 .08821

.05808 .05808

PŽbrcr.

PŽsda₁.

PŽbrcr.

1

2

3

4

.00602 .00407

.00038 .00062

.00108 .00000

.00028 .00049

.00063

.01069

.01831

.01156

.00000

.00128

.00156

.06708 .00000

.14900 .00000

.17389 .00000

.17140 .00000

GŽ.34.

1

2

3

4

.00001 .00043

.00462 .00571

.00845 .03047

.01249 .03532

.00432

.00038

.00007

.00004

.00036

.00040

.00000

.00032

.13838 .04494

.10176 .02528

.09000 .01960

.07896 .01464

1

2

3

4

.00837 .00001

.02043 .00722

.00163 .01506

.00601 .09992

.00763

.02565

.00871

.02289

.00026

.00317

.00073

.00142

.03386 .00020

.08644 .01538

.05198 .00336

.08468 .01464

Sum of Squared Deviations 0.07945 0.33955

0.12310

0.02762

1.6166 0.52726

differences are summed to produce a single measure of a model’s closeness to

actual play. For comparison, we also report squared differences between the

experimental means and the means implied by stationary equilibrium play,

ﬁctitious play, the BAŽ4.46,y0.107. model Žthe best BA model according to

lnŽL.., and the INŽ0.50. model.²¹

Keeping in mind that lower numbers imply simulated play closer to actual

play, we see that the worst models by far were Nash equilibrium play and the

INŽ0.50. model. Even ﬁctitious play performs better than these models, though

it is also far from actual play. The RE model and the BA model that was best

0

at describing individual decisions Žaccording to lnŽL.. are much closer, though

the former is somewhat better than the latter. However, the BAŽ6.67,y0.030.

model is much closer to the actual data than even the RE model.

0

Since there exist values of ␭ and ␦ that make the BA model better able to

track the experimental data than RE , one may be tempted to conclude that the

0

BA model is better than the RE model. Two caveats apply, though. First, which

0

model is ‘‘better’’ depends on what criteria of ‘‘goodness’’ are being used; as was

shown earlier, RE is better than all of our BA models when the criteria are

0

those used in the Section 4.1 Žthe BAŽ6.67,y0030. model has a MSD of .434, a

2

1

Here, we are only comparing simulation and experimental means, with no regard paid to

variances. This is intentional; our interest here is in determining the accuracy of predictions of

experimental means using simulation means, not in determining which model would have been

‘

‘most likely’’ to produce the observed data Žin which case a measure of dispersion would have been

important..

LEARNING MODELS

631

FIGURE 9.ᎏParameter values for which the BA model outperforms the RE₀model.

POI of .280, a lnŽL. of y4162.4, a C of .0061, a R of .1768, and a PS of .1830,

s

all worse than those of RE .. Second, even if the criterion used is the one from

0

this section, there are also values of ␭ and ␦ that make the BA model less able

to track the experimental data than RE . Before running the experiment, there

0

was no way of knowing which parameter values should have been chosen.

It is therefore of some value to know how likely we would have been to have

chosen parameter values that gave predictions better than those of RE . Figure

0

9

shows the Ž␭, ␦ . pairs that make the BA model better according to the

criterion used in this section. We can see that while it would certainly have been

possible to luckily guess good parameter values, it would by no means have been

a sure bet. In particular, if one supposed a positive value for ␦ᎏin forming

beliefs, players put more weight on recent opponent actions than on earlier ones

ᎏthen it would have been unlikely indeed that a better model than RE would

0

have been selected, and the improvement over RE would have been negligible

0

6

32

NICK FELTOVICH

Žthe best model with positive ␦, the BAŽ6.50,0.01. model, yields a sum of

2

squared deviations of .07252, only slightly better than RE ..

0

This is not to say that the BA model is not appropriate as a model of learning

in this experiment, only that at this time, RE is more useful for generating

0

quantitative predictions ex ante Žas opposed to ﬁtting play ex post.. Before the

experiment, we had no idea which BA model was appropriate, and the results of

this, and the previous, section suggest that an ‘‘uneducated guess’’ of parameter

values would have produced predictions worse than those of RE . There is

0

currently no intuition available regarding how to make an ‘‘educated guess.’’

Further research may accomplish this; eventually it may be possible to know,

before the experiment is performed, which BA model to use to generate

predictions. The simplest way this could occur would be if one small subset of

the BA parameter space always produced suitable models Žthe results of the

next section imply that this is unlikely.. More generally, it might be the case that

there exists a mapping from ‘‘types’’ of game Žand possibly experimental proce-

dures. to parameter combinations, such that given a strategic environment, one

could choose appropriate parameters. If it is eventually determined that such a

correspondence exists, the BA model will turn out to be very useful indeed. But

until this is determined, a simple model like RE will continue to be functional

0

not only as a model of individual choice, but also as a model of aggregate

behavior.

5

. SOME EVIDENCE FROM OTHER EXPERIMENTS

We now look at the abilities of the RE and BA models to characterize data

0

from other experiments. Using data from other experiments allows us to

determine how generalizable our results are; that is, whether RE is best for a

0

large class of games, or whether the game used here happened to be a lucky

choice. As mentioned in the introduction, Erev and Roth Ž1998. and Camerer

and Ho Ž1999. use data from previous experiments to test learning models. Erev

and Roth examine the data from six previous experiments and one new experi-

ment. They assess the ability of several models to characterize aggregate

features of data and ﬁnd that in the games they examine, a generalization of the

RE model works best. Camerer and Ho examine the data from ﬁve previous

0

experiments Žall different from those examined by Erev and Roth.. They

develop a very general learning model that includes as special cases all of the

models in this paper, and use likelihood ratios to test the general model versus

‘

‘restricted’’ versions of the model such as reinforcement learning and ﬁctitious

2

Furthermore, we should remember that we have only used one particular reinforcement-based

model throughout this paper. As mentioned previously, Roth and Erev Ž1995. and Erev and Roth

Ž1998. consider several variations of the RE₀model. We use only one here because it does such a

good job that it seems that the small improvement in ﬁt that comes from adding free parameters

would not be worth the reduced generality. However, if it is claimed that one should try all versions

of BA models in order to give it the best possible chance, an argument could be made that one

should do this for the reinforcement-based model, too.

LEARNING MODELS

633

play. In almost all cases, they ﬁnd that the best Žmaximum-likelihood. version of

the model is quite different from the restricted versions, and that the former is

signiﬁcantly better than any of the latter.

We will look at four data sets, each of which comes from an experiment

involving a simultaneous-move game. Two were used by Erev and Roth: Erev

and Roth’s Ž1998. ‘‘Matching Pennies’’ game and Ochs’s Ž1995. 2=2 games.

The other two were used by Camerer and Ho: Mookherjee and Sopher’s Ž1997.

6

=6 constant-sum games, and Van Huyck, Battalio, and Beil’s Ž1991. median-

action game. As we did in Section 4.1 with the asymmetric-information game

data, we look at the models’ abilities to predict individual decisions. As before,

we attempt to limit the reliance of the models on unobservable initial propensi-

ties or beliefs by not comparing predictions of behavior with actual play in early

rounds. For each data set, we compare model predictions with play from round

1

1 through the last round of play, with the exception of the data from van Huyck

et al., whose experiment only lasted ten rounds; we consider rounds 3 through

0 of this data set. The data we examine range from 75% to 96% of each

1

experiment. For comparison, and in order to examine the plausibility of our

assumption that initial propensities or beliefs are drowned out by the time we

reach the rounds at which we look, we also consider the data from a smaller

subset of rounds of each experiment Žroughly the last 50% of each experiment..

As a basis for comparison, in addition to the RE and BA models, we

0

consider stationary equilibrium Žif the equilibrium is unique., the INŽ0.75.

model in which players repeat their previous action with probability 0.75 and

play each other action with equal share of the remaining 0.25 probability, and

the INŽ1rm. model in which players play each of the m possible actions with

equal probability. For each of these models, we report in Table VII the MSD,

POI, and lnŽL. scores, as well as the posterior probability of each model Žagain,

given that one of these models is the correct one. and the rounds considered.

According to both MSD and lnŽL. Žand thus posterior probability., RE is

0

the best for characterizing Erev and Roth’s data, but it is only slightly better

than the INŽ0.75. model according to MSD, and worse according to POI. Even

the best BA model fares worse than RE according to all three criteria, but it

0

2

3

nonetheless improves upon the stationary equilibrium prediction. None of

these results changes when we look at rounds 251᎐500 rather than 11᎐500.

Ochs’s data present a somewhat weaker case for the RE model; according to

0

MSD, it outperforms the rest of the models, and the best BA model is even

worse than the INŽ0.75. model. On the other hand, according to POI, RE is

0

worse than BA and only slightly better than INŽ0.75.. According to lnŽL., RE is

0

worse than BA and both inertial models; as with POI, the BA model outper-

2

3

The best BA model is determined separately for each data set, time frame, and criterion, again

using a grid search. The optimal values of ␭ and ␦ vary greatly between experiments and between

criteria; and vary somewhat between time frames. For example, using the longer time frame and the

lnŽL. criterion, the best BA model is BAŽ0.791,y0.133. for the Erev and Roth data set,

BAŽ0.612,0.236. for the Ochs data set, BAŽ31.0,y27.6. for the van Huyck et al. data set, and

BAŽ0.556,0.153. for the Mookherjee and Sopher data set.

6

34

NICK FELTOVICH

TABLE VII

ABILITIES OF MODELS TO PREDICT DISAGGREGATED DECISIONS ŽPREVIOUS EXPERIMENTS.

Experiment

Model

MSD

POI

ln

Ž

L

.

Posterior Prob.

ER: Matching pennies

RE₀

BA

.474 Ž.459. .361 Ž.333. y6370.3 Žy3043.7. )0.999 Ž)0.999.

.496 Ž.495. .441 Ž.428. y6715.9 Žy3411.0. -0.001 Ž-0.001.

equilibrium .500 Ž.500. .500 Ž.500. y6792.8 Žy3465.7. -0.001 Ž-0.001.

INŽ0.50. .500 Ž.500. .500 Ž.500. y6792.8 Žy3465.7. -0.001 Ž-0.001.

INŽ0.75. .482 Ž.473. .340 Ž.322. y6474.4 Žy3206.1. -0.001 Ž-0.001.

Rounds 11Ž251.᎐500

Ns4900Ž2500.

Ochs: three 2=2 games

RE₀

BA

.335 Ž.333. .271 Ž.261. y7490.4 Žy4045.7. -0.001 Ž-0.001.

.370 Ž.380. .260 Ž.265. y4344.4 Žy2508.9. )0.999 Ž)0.999.

equilibrium .379 Ž.380. .393 Ž.393. y10441.8 Žy5628.3. -0.001 Ž-0.001.

Rounds 11Ž33.᎐56 or 64 INŽ0.50. .390 Ž.397. .500 Ž.500. y7674.4 Žy4219.2. -0.001 Ž-0.001.

Ns2464Ž1408. INŽ0.75. .350 Ž.348. .273 Ž.260. y7472.7 Žy4113.6. -0.001 Ž-0.001.

vHBB: three 7=7 games RE₀

.188 Ž.145. .134 Ž.070. y598.8 Žy190.5. -0.001 Ž-0.001.

BA

equilibrium

.154 Ž.112. .090 Ž.046. y526.0 Žy130.9. )0.999 Ž)0.999.

ᎏ

Rounds 3Ž6.᎐10

Ns864Ž540.

INŽ1r7. .350 Ž.350. .979 Ž.991. y1681.3 Žy1050.8. -0.001 Ž-0.001.

INŽ0.75. .190 Ž.147. .126 Ž.056. y563.6 Žy242.1. -0.001 Ž-0.001.

MS: two 6=6 games

RE₀

BA

.369 Ž.372. 6.70 Ž.691. y2896.1 Žy1919.8. -0.001 Ž-0.001.

.372 Ž.372. .771 Ž.762. y2144.3 Žy1429.3.

yϱ

Žyϱ.

INŽ1r6. .373 Ž.373. .833 Ž.833. y2150.1 Žy1433.4.

0.269 Ž0.079.

0.000 Ž0.000.

0.731 Ž0.921.

equilibrium .355 Ž.357. .622 Ž.634.

Rounds 11Ž21.᎐40

Ns1200Ž800.

INŽ0.75. .422 Ž.429. .708 Ž.734. y2647.1 Žy1819.8. -0.001 Ž-0.001.

Note: Statistics outside inside parentheses correspond to round numbers outside in parentheses.

Ž . Ž .

forms all other models. Looking at only rounds 33 and afterward does not

change the MSD or lnŽL. results, but it does eliminate the BA model’s

advantage in POI Žthe INŽ0.75. model is now slightly better..

The data of Van Huyck, Battalio, and Beil present even stronger evidence

against RE ; according to all three criteria, the best BA model is better than

0

RE Žand every other model., and in all cases the difference is substantial. The

0

RE model performs even more poorly than INŽ0.75. Žthough slightly better

0

according to MSD.. The BA model is still the best when we look at rounds 6᎐10

rather than 3᎐10, but RE is better than INŽ0.75. according to lnŽL..

0

The conclusions based on the Mookherjee and Sopher data are ambiguous.

According to the lnŽL. criterion and hence the posterior probabilities, the best

BA model and the two inertial models are far better than RE , which is in turn

0

far better than stationary equilibrium; the best model according to lnŽL. is the

BA model, though the difference between BA and the INŽ1r6. model is small

enough that the posterior odds actually favor the INŽ1r6. model by about 3 to 1.

On the other hand, according to the MSD and POI criteria, stationary equilib-

rium is best, followed by RE , then by the other three. ŽThis game has a

0

dominated strategy which is played occasionally by subjects. Thus, though

equilibrium does rather well in terms of MSD and POI, its lnŽL. score is yϱ..

Article Doi

DOI: 10.1111/1468-0262.00125

Source and publish data:

Authors:

Article abstract of DOI:10.1111/1468-0262.00125

Full text of DOI:10.1111/1468-0262.00125

Products guided by the article

R&D Labs maybe for 103-85-5

Relevant to this article

Hot Product