Perspective-Taking and Depth of Theory-of-Mind Reasoning in Sequential-Move Games
Abstract
Theory-of-mind (ToM) involves modeling an individual’s mental states to plan one’s action and to anticipate others’ actions through recursive reasoning that may be myopic (with limited recursion) or predictive (with full recursion). ToM recursion was examined using a series of two-player, sequential-move matrix games with a maximum of three steps. Participants were assigned the role of Player I, controlling the initial and the last step, or of Player II, controlling the second step. Appropriate for the assigned role, participants either anticipated or planned Player II’s strategy at the second step, and then determined Player I’s optimal strategy at the first step. Participants more readily used predictive reasoning as Player II (i.e., planning one’s own move) than as Player I (i.e., anticipating an opponent’s move), although they did not differ when translating reasoning outcome about the second step to optimal action in the first step. Perspective-taking influenced likelihood of predictive reasoning, but it did not affect the rate at which participants acquired it during the experimental block. We conclude that the depth of ToM recursion (related to perspective-taking mechanisms) and rational application of belief–desire to action (instrumental rationality) constitute separate cognitive processes in ToM reasoning.
1. Introduction
Strategic interpersonal interaction, as modeled by mathematical game theory, involves rational players weighing their choice of actions through analyzing player-specific payoffs associated with outcomes that are jointly determined by their own and their opponents’ choices (Von Neumann & Morgenstern, 1944; Luce & Raiffa, 1957). Traditional game theoretic approaches assume that players take full advantage of common knowledge and rationality (CKR) in games of complete information (Binmore, 1992; Osborne & Rubinstein, 1994). Common knowledge is said to exist when all players know something to be true, know that all players know it to be true, know that all players know all players know it to be true, and so on. Normative solutions (equilibria) of games require recursive modeling of other players to its full depth, leading to the framework of epistemic game theory (Brandenburger & Dekel, 1993; Mertens & Zamir, 1985). However, experimental evidence has repeatedly demonstrated the failure of human players to fully employ CKR in a number of ways (see Colman, 2003a). For simultaneous-move, dominance-solvable games, in which a player is allowed to iteratively eliminate dominated strategies through successive role-switching, shallow depths of recursion were empirically observed, for example, in the Beauty-Contest game (Duffy & Nagel, 1997; Ho, Camerer, & Weigelt, 1998; Nagel, 1995, 1998) and, in two-person, three-strategy games (Stahl & Wilson, 1994, 1995). In sequential-move games with finite horizon, such as simple Stackelberg games (Beard & Beil, 1994; Brandts & Holt, 1995; Schotter, Weigelt, & Wilson, 1994) and the Centipede game (Aumann, 1998; McKelvey & Palfrey, 1992; Parco, Rapoport, & Stein, 2002; Rosenthal, 1981), human performance deviates from the optimal solutions prescribed under the CKR assumption. Repeated findings of lack of full depth of reasoning have led to the notion of “bounded rationality” (Simon, 1957), in which limitations of the cognitive system (such as a limited-capacity working memory) may constrain reasoning processes in games (Colman, 2003b; Rosenthal, 1989).
Though working-memory limitation may actually be an organism’s adaptive response to an environment that is complex and with limitless information fidelity (see Gigerenzer & Selten, 2001), another likely cause of lack of full-depth recursive reasoning about games is that humans rely on specific mechanisms for understanding the intentionality of other players, so-called theory-of-mind (ToM), which is rooted in developmental psychology (Baron-Cohen, 1995; Flavell, 1999; Leslie, 1994). ToM is the ability to understand others as possessing mental lives and having intentions, desires, beliefs, and thoughts like one’s own, and to comprehend and theorize about the existence of private mental states that influence actions and choices (Estes, Wellman, & Woolley, 1989; Halford, 1993; Perner, 1991; Wellman, 1990, 1993). We hypothesized that when reasoning strategically in games, participants invoke ToM to construct a mental model of their opponents based on lay theories about the opponents’ intentions and beliefs and use this model to craft strategies (Hedden & Zhang, 2002; Zhang & Hedden, 2003). This mental model approach is consistent with earlier studies of reasoning in games. For example, McCabe, Smith, and LePore (2000) found that cooperative outcomes were more frequent in extensive-form than normal-form games because the extensive form facilitates the attribution of cooperative intentions to the opponent, suggesting that “intentionality detection” was invoked during strategic reasoning. Perner (1979) presented children (4–10 years old) with two-person, two-strategy games in which one player had a dominant strategy. Only older children were able to predict the opponent to choose the dominant strategy, whereas younger children tended to perseverate on their own payoffs, consistent with the developmental stages of ToM reasoning and perspective-taking in children (Selman, 1980).
The mental model approach to bounded rationality in games calls for a differentiation of human suboptimality in gaming as arising from failures of “instrumental rationality” (namely, finding an optimal action consistent with the constructed mental model) versus failures due to insufficient depth in the ToM model constructed. In the latter case, depth of ToM recursion is subject to both cognitive architectural constraints (i.e., one’s reflexive ability) and the knowledge embodied in one’s mental model (i.e., an estimate about the level of strategic sophistication an opponent possesses). When one’s model takes into account only one’s own intentions and desires, zeroth-order reasoning is said to be used. First-order reasoning anticipates the intentions and desires of the opponent, whereas second-order reasoning accounts for the opponent’s anticipation of one’s own intentions and desires.
Hedden and Zhang (2002) developed a paradigm to address the use of ToM-based models for determining an optimal action (see Fig. 1 below for sample games). Participants, in the role of Player I, played a series of two-player, sequential-move games with a single opponent, an experimental confederate in the role of Player II who used either a zeroth-order or first-order strategy. In each game, participants predicted the opponent’s action, and in doing so, revealed their model of the opponent, before making their own choice, which revealed their ability for making optimal choices based jointly on their model of the opponent and an analysis of relevant payoffs. Results demonstrated that, although, on average, participants’ ToM models were dynamically modified following interactions with the opponent, a significant number of college-student participants operated with a myopic (first-order) model even toward the end of the experiment. In contrast, their ability to rationally apply their predictions to optimal decision making remained invariably adequate. Hedden and Zhang’s (2002) paradigm has recently been applied to second-order reasoning in children (Flobbe, Verbrugge, Hendriks, & Krämer, 2008) and to reasoning in competitive games (Goodie, Doshi, & Young, 2012).

The two-person, three-step sequential-move game. (a) A generic game board, with capital letters indicating cells A–D, and arrows indicating the direction of play. Each game begins in Cell A and terminates whenever one of the players decides not to move to the next cell during his or her turn. Players receive the payoffs indicated in the game-ending cell; the left payoff number is for Player I and the right payoff number for Player II. (b) A diagnostic game in which myopic and predictive reasoning yield opposite choices. In this example, a myopic Player II will move to Cell C (with payoff 4) if given the opportunity, whereas a predictive Player II, knowing that Player I would move to Cell D from C (preferring payoff 4 to 3), will end the game in Cell B (to get a payoff of 2 instead of 1). So Player I will stop at A (contented with payoff 2) if reasoning with second-order ToM but will move to Cell B from A (hoping to get payoff 3 or 4 afterward) if reasoning with first-order ToM. (c) A nondiagnostic game in which myopic and predictive reasoning yield identical choices. In this example, Player II should always decide to end the game in Cell B (and get payoff 2) instead of moving to Cell C (and get payoff 1), whether out of myopic reasoning (without considering Player I’s potential countermove at Cell C) or predictive reasoning (having considered and determined that Player I will not move from Cells C to D due to a reduction in payoff from 4 to 2). Therefore, Player I, reasoning with first- or second-order ToM, will end the game at A (to get a payoff of 3 instead of 1) without moving to B.
One might argue that the failure of certain participants in Hedden and Zhang (2002) to reason with full depth (i.e., second-order prediction) could be due to limits on the number of look-ahead steps in sequential planning (a working-memory failure in carrying out backward induction), as opposed to failures in “reflexive” reasoning, in which a player reasons about the opponent’s beliefs about the player’s own intentions (immaturity of ToM mechanism). A method for distinguishing these hypotheses is to ask participants to reason about the same games from different perspectives (roles) in which the steps of reasoning are identical. If sequential planning (including backward induction) is invoked without necessarily involving ToM reflexivity, then exchanging participant role should not lead to any differences in performance. On the other hand, if participants use ToM as a basis for reasoning, the perspective one takes will have a profound consequence on the depth of reasoning employed because, almost by definition, ToM order is increased/decreased by adding/removing one layer of recursion associated with a shift of perspective.
This study used the same design as Hedden and Zhang (2002), but it manipulated the perspective taken by participants. Participants have been either assigned the role of Player I or Player II, but they reasoned about the same choice (at Cell B), from a first-person (“planning”) perspective or from a third-person (“anticipation”) perspective. Our goal was to determine whether participants invoked ToM-based mental models in reasoning about games, and to investigate the relationship of perspective-taking to the depth of ToM reasoning.
2. Methods
2.1. Participants
Participants were 85 undergraduate students at the University of Michigan who received class credit for participating. All participants gave informed consent and were debriefed at the conclusion of the experiment. To monitor their motivation, we calculated a participant’s error at the end of the training block and error on catch-trials embedded into the experimental block—using nondiagnostic games in which players should give the same response regardless of whether they are engaging in predictive or myopic reasoning. Apparently unmotivated or confused participants were excluded from further analysis: This removed 20% of participants in the role of Player I (P-I), leaving 28 participants in the analyses, and 28% of participants in the in the role of Player II (P-II), leaving 36 participants. Data for participants in the role of P-I were taken from Experiment 2 of Hedden and Zhang (2002), but they were reanalyzed to allow comparison with data collected from participants assigned the role of P-II.
2.2. Materials
2.2.1. Experimental game blocks
Games used in this study were identical to the ones used by Hedden and Zhang (2002). Two players, called Player I (P-I) and Player II (P-II), take turns deciding whether a game should continue or terminate, with the goal of earning maximal points for themselves (see Fig. 1). The stage and outcome of a game can be one of four cells (labeled A, B, C, and D), with each game beginning in Cell A. Each player, in turn (indicated by the block letters), decides whether to continue the game by moving to the next cell in the direction indicated by the arrow, or to terminate the game so that both players collect the respective payoffs indicated by the pair of numbers in that cell—left payoff number to P-I, right payoff number to P-II. The first (from A to B) as well as the final (from C to D) choice point are controlled by P-I, whereas the second choice point (from B to C) by P-II. The integer payoff values are understood so that 4 (or 1) is the most (or least) preferred outcome. The “payoff structure,” that is, possible combination of payoffs in all cells for both players, is unique in each game and is common knowledge to both players.
Sixty-four distinct games were assembled into three blocks as described by Hedden and Zhang (2002). The first (“training block”) consists of 24 trivial games, in which P-II’s payoff for B is either smaller or larger than P-II’s payoffs at both C and D. These games were used to familiarize players with the rules, procedures, and the noncooperative nature of this game. Two test blocks followed, each consisting of 16 diagnostic games plus 4 nondiagnostic “catch” games. A game is diagnostic (see Fig. 1b) if myopic and predictive reasoning about P-II’s move at B, and consequently P-I’s move at A, yield opposite choices. In a nondiagnostic game (see Fig. 1c), both myopic and predictive reasoning yield the same, though not necessarily obvious, decision. Nondiagnostic games were used to discern (catch) unmotivated or otherwise inattentive participants (because either shallower or deeper reasoning would lead to the same “correct” answer, but not its opposite choice). Games used for the two test blocks were balanced for the number of stay/move choices of P-II (for either myopic or predictive reasoning) and the number of stay/move decisions by P-I, if performed optimally. The 64 games were presented in a fixed order, with each game trial-unique. The entire game sequence is given in Appendix B of Hedden and Zhang (2002).
2.2.2. Design and procedure
Participants were assigned the role of either P-I or P-II to play against an opponent who used predictive reasoning throughout the games. The experimenter introduced a confederate (ostensibly another student participating in the study) to each participant with instructions that they would play simple matrix games in separate rooms via computer terminals. Participants watched the confederate being led into one room before taken into another room. Participants were instructed that the game was noncooperative with a goal to earn as many points as possible without regard to points earned by the opponent. (However, they were not instructed to disregard the opponent’s payoff values.) Participants started the training block after an explanation of the rules and a sample game. After the training block, two blocks of 20 test games were presented. All games were presented on a Power Macintosh 9500 using an AppleVision 1710 AV (Apple, Inc., Cupertino, CA) display monitor or a Dell Dimension 4300 using SyncMaster 570V TFT (Samsung Electronics America, Ridgefield, NJ) display monitor (for participants assigned as P-I and P-II, respectively). Response via mouse clicks to on-screen prompts was recorded electronically. A player’s points were tallied and displayed after each game, whereas the opponent’s points were not tallied.
For each game, participants were prompted to reason about (i) P-II’s move at B and (ii) P-I’s move at A. When assigned as P-I, participants were first asked “whether the opponent will stay or move if the game progresses to Cell B,” and secondly “whether or not you will move away from Cell A.” When assigned as P-II, the participant was first asked “whether you will move if the game progresses to Cell B,” and secondly “whether it is best for the opponent to stay in or move away from Cell A.” In the following, (i) will be referred to as “anticipation” question (when assigned as P-I) or “planning” question (when assigned as P-II), and (ii) as “rationality” question regardless of the participant’s assigned role. To avoid possible dynamic inconsistency at different nodes of a decision tree (Busemeyer, Weg, Barkan, Li, & Ma, 2000), the answer by a P-II participant regarding the decision at B is binding, as is the answer of a P-I participant regarding the decision at A. The role of a participant was consistent throughout the experiment.
Despite appearances, participants were playing against a computerized opponent programmed to play with an always predictive ToM strategy. Random delays were incorporated into the software so that participants would believe that they were actually playing with an attentive opponent. After finishing all games, participants completed an exit questionnaire, were thanked, and debriefed about the nature of the computerized opponent. The experimental session lasted 40–50 min.
2.2.3. Data analysis and scoring
Answers to the anticipation/planning question were scored based on whether a participant’s choice indicated myopic or predictive reasoning. The answer was given a score of 0 (myopic), if a P-II participant, when planning, had not considered P-I’s potential countermove at Cell C, or if a P-I participant, when anticipating, did not realize that P-II’s action would have taken into account the participant’s own future action at Cell C. Otherwise, the answer is give a score of 1 (predictive). For the training block, and for catch trials nondiagnostic games in the testing blocks, the anticipation/planning question was scored simply as correct or incorrect.
The participant’s answer to the rationality question was compared with the optimal decision consistent with the answer to the anticipation/planning question regarding P-II’s move at Cell B. Any inconsistency was scored as a rationality error, regardless of whether the planned or anticipated action in Cell B was myopic or predictive.
3. Results
The 32 diagnostic games in the two test blocks were grouped into eight sets, each containing four successively presented games with one of each strategically equivalent types (see Hedden & Zhang, 2002, for a discussion of game types). A participant’s response to the planning/anticipation questions in each set would have a score (called “predictive score,” or PS) of 1, 0.75, 0.5, 0.25, or 0, if the number of predictive choices made was 4, 3, 2, 1, or 0, respectively. A distribution of PSs was generated separately for participants as P-I and for participants as P-II. Fig. 2 presents the frequency histograms of the PSs for the eight serial positions (“game-set position”) in the experiment under the two conditions of role assignment. Two trends are obvious: First, participants initially acted more myopically and gradually became more predictive after progressive interactions with and feedback from their predictive opponent. Second, participants more readily switched to the predictive mode of reasoning when planning for their own move (i.e., acting as P-II) compared with those anticipating their opponent’s move (i.e., acting as P-I). At the first game set position, participants in the role of P-I and P-II had roughly similar PS distributions, χ2(4, N = 64) = 6.91, p = .14. By eighth position, participants in the role of P-II had distributions reflecting a greater proportion of predictive responses than those in the role of P-I, χ2(4, N = 64) = 10.97, p = .03.

Histograms showing the percentage of participants with each prediction score at each game set position in the experiment. A game set consists of four successive diagnostic games that are balanced with respect to optimal choices based on myopic and predictive reasoning for both Player I and Player II. (a) Histogram for participants assigned the role of Player I. (b) Histogram for participants assigned the role of Player II.
We calculated the mean PS as a function of game-set position for the two experimental conditions (Fig. 3a). A repeated-measures anova was conducted on this mean PS in which game-set position was a within-subject variable and participant role was a between-subject variable. The main effect of role was significant, F(1, 62) = 11.43, MSE = 0.594, p = .001. The main effect of position was also significant, F(7, 434) = 40.16, MSE = 0.048, p < .001, with both the linear, F(1, 62) = 91.74, MSE = 0.131, p < .001, and quadratic, F(1, 62) = 19.82, MSE = 0.061, p < .001, trends being significant. The interaction of Role × Position was significant, F(7, 434) = 2.60, MSE = 0.048, p = .012, with only the quadratic trend achieving significance, F(1, 62) = 5.56, MSE = 0.061, p = .022. Simple effects tests at each position found no significant differences between participants in the two roles at either the first position, t(62) = 0.75, MD = 0.07, p = .45, or the second position, t(62) = 1.32, MD = 0.12, p = .19. Significant differences were found in all positions thereafter—smallest t(62) = 2.57, MD = 0.25, p = .012. The ratio of the means, calculated as the mean PS for P-I divided by the mean PS for P-II, for the significant positions (3–8) were 0.56, 0.66, 0.65, 0.66, 0.68, 0.71, with an average ratio of 0.65. This ratio, which estimates Prob(predictive reasoning as P-I)/Prob(predictive reasoning as P-II), reflects a disadvantage of depth in reasoning from a third-person perspective compared with that from a first-person perspective, and it remains relatively constant despite improvement in the PS as the experiment progressed.

Average performance of the participant population (a and c) and a subset of “converters” (b) as a function of game set position, when participants take the role of Player I or Player II. (a) The mean prediction score for participants in each role (0 = myopic, 1 = predictive). (b) The cumulative proportion of participants “converting” from myopic reasoning to predictive reasoning at each game set position. Cumulative proportion is shown among the set of converters (i.e., participants who always had a prediction score of 1 beginning with a given game set position). (c) The mean rationality error for participants in each role (0 = no rationality error, 1 = rationality error).
By examining individual performance, we were able to identify a subset (43% for P-I role and 64% for P-II role) of participants who started from myopic reasoning, “converted” to predictive reasoning during the test blocks, and remained predictive for the rest of the games. The game-position after which each “converter” consistently scored 1s was identified, and chi-square tests on this time-of-conversion found no significant difference for converters as P-I and P-II, whether nonconverters were excluded, χ2(df = 7, N = 35) = 5.51, p = .60, or included, χ2(df = 8, N = 64) = 7.85, p = .45, as a category in the analysis; the cumulative proportions are shown in Fig. 3b. This analysis indicates that the 0.65 ratio calculated above reflects the relative likelihood that a participant will learn to engage predictive reasoning (43%/64% = 0.67) when playing as P-I compared with as P-II, instead of reflecting the time course (rate of learning) through which predictive reasoning is acquired.
In contrast, when rationality errors are averaged over each game-set (Fig. 3c), a repeated-measure anova yielded no significant differences in the error rates for participant’s role, F(1, 62) = 0.01, MSE = 0.136, p = .91, or for the Role × Position interaction, F(7, 434) = 1.23, MSE = 0.029, p = .28. Indeed, participants in the two roles did not differ in their rationality errors for any single game-set position—largest t(62) = 1.50, MD = 0.08, p = .14. The main effect of position was significant, F(7, 434) = 5.90, MSE = 0.029, p < .001. This was due primarily to lower levels of rationality errors in the second test block than in the first, as evidenced by the significant linear contrast, F(1, 62) = 19.81, MSE = 0.043, p < .001. These results indicate that instrumental rationality, which deals with translating mental model prediction to an optimal decision, is distinct from depth of recursive reasoning, which is affected by perspective-taking.
To investigate whether the difference in PS between the planning and the anticipation versions of the question reflects a difference in the sophistication of ToM reasoning, we measured response time (RT) for participants to generate myopic and predictive choices when acting as P-I or P-II (Fig. 4). Participants who made exclusively myopic or predictive choices were excluded from the analysis. A repeated-measures anova with choice type (myopic versus predictive) as a within-subject variable and participant role as a between-subject variable found a significant main effect of choice type, F(1, 55) = 31.50, MSE = 3.53, p < .001, with myopic predictions (M = 7.27, SE = 0.44) taking less time than predictive ones (M = 9.25, SE = 0.56). There was no main effect of role, F(1, 55) = 0.82, MSE = 3.53, p = .37, nor a Role × Choice Type interaction, F(1, 55) = 0.08, MSE = 25.32, p = .79. Simple effects tests showed a significant difference in RT between myopic and predictive choices both for participants acting as P-I, t(25) = 3.96, MD = 2.30, p = .001, and for participants acting as P-II, t(30) = 3.90, MD = 1.66, p < .001, with predictive choices associated with longer RTs.1

Mean response times for answering the planning/anticipation question and for answering the rationality question, following either myopic or predictive responses to the planning/anticipation question. (a) Mean response times when participants are assigned the role of Player I (anticipation mode). (b) Mean response times when participants are assigned the role of Player II (planning mode).
In comparison, when average RTs for answering the rationality question following myopic and predictive choices (“choice type”) were separately calculated for both participant roles, a repeated-measures anova with choice type as a within-subject variable and participant role as a between-subject variable found no effect of choice type, F(1, 52) = 0.31, MSE = 0.92, p = .58, no significant effect of role, F(1, 52) = 3.52, MSE = 0.92, p = .07, nor a Role × Choice Type interaction, F(1, 52) = 2.76, MSE = 6.70, p = .10. Taken together, results in 3, 4 indicate that anticipation/planning, but not instrumental rationality, was influenced by perspective-taking, and that perspective taken influenced the likelihood (frequency) of, but not RT consumed for, engaging in predictive reasoning.
4. Discussion
This study manipulated participants’ perspectives in games in order to differentiate ToM-based recursive reasoning from the confounding factor of decision horizon in look-ahead planning or backward induction in multistage games. In predicting Player II’s optimal choice at Cell B, participants adopt a first-person perspective (1PP, “planning”) when assigned the role of Player II, or a third-person perspective (3PP, “anticipation”) when assigned the role of Player I. The need for sequential planning is equivalent for both assignments—the payoff comparisons involved are formally identical and require the same working memory load. However, the present data showed a clear advantage for 1PP over 3PP in achieving predictive reasoning (i.e., in taking into account Player I’s countermove upon arriving in Cell C). Although most participants in 1PP and 3PP began with a myopic ToM strategy, those in 1PP were more likely to eventually acquire predictive ToM reasoning. Participants in 3PP are placed farther up in the analysis stream (compared with 1PP), with the corresponding disadvantage of having to process one more level of ToM recursion, cf. 3, 4 of Hedden and Zhang (2002). This suggests that we are more ready to anticipate others’ reactions to an action we plan and less prepared that others, when planning their action, may have already taken into account possible counterreactions from ourselves.
Although perspective-taking (i.e., shifting perspective from 3PP to 1PP) did increase the likelihood of predictive reasoning, as revealed by the relative proportion of participants eventually acquiring predictive reasoning (converters), perspective-taking did not affect the rate of acquisition for predictive reasoning,2 as seen in the relatively stable ratio (at different game-positions) between mean prediction scores for 1PP and 3PP. Nor did perspective-taking change the cognitive processing used to carry out predictive reasoning, as evidenced by equivalent RTs for 1PP and 3PP. Finally, participants’ ability to rationally apply the knowledge of planned or anticipated actions to optimize decisions at an earlier step was not affected by their perspectives.
The delineation of “instrumental rationality’’ (the ability to rationally choose optimal actions given a belief–desire state) from “inductive rationality’’ (the ability to establish the most predictive model of the opponent) in games strongly suggests that these are separate components of theory-of-mind (see also Jones & Zhang, 2003). The former may consist of a utility-based decision-making system involving lay psychology of belief/desire/action, whereas the latter may involve, in addition, reflexive modeling of minds that shifts perspectives freely. Recent functional imaging studies of the neural basis of ToM (reviewed in Gallagher & Frith, 2003) revealed differential brain activations associated with 1PP and 3PP (Ruby & Decety, 2001; Vogeley et al., 2001; Vogeley & Fink, 2003). Specific neurocognitive processes associated with systematic reference to self from 3PP (i.e., second-order reasoning) remain to be demonstrated. Future research will determine the cognitive factors involved and their limitations when people establish and revise their mental models about opponents as intentional agents, and the conditions for the emergence of a reflexive mental model in which they themselves are humbly contained.
Footnotes
Acknowledgments
We thank Sabrina Yeung, Jill Finster, R. Scott Kobetis, and Elisabeth Mohr for help in executing the experiment, and Kirk Hedden and Xiaoqin Hu for programming assistance.