ArticlePDF Available

The dynamics of human behaviour in poker

January 2008

January 2008

Authors:

Karl Tuyls

DeepMind and University of Liverpool

Steven de Jong

Jan Ramon

KU Leuven

Show all 6 authorsHide

In this paper we investigate the evolutionary dynamics of strategic behaviour in the game of poker by means of data gathered from a large number of real-world poker games. We perform this study from an evolutionary game theoretic perspective using the Replicator Dynamics model. We investigate the dynamic properties by studying how players switch between different strategies under different circum- stances, what the basins of attraction of the equilibria look like, and what the stability properties of the attractors are. We illustrate the dynamics using a simplex analysis. Our experimental results confirm ex- isting domain knowledge of the game, namely that certain strategies are clearly inferior while others can be successful given certain game conditions.

The direction field of the RD using the heuristic payoff table using data of games with active players at the flop.

…

The direction field of the RD using the heuristic payoff table considering the four described metastrategies . Dots represent the Nash equilbria.

…

Figures - uploaded by Karl Tuyls

Content may be subject to copyright.

Content uploaded by Karl Tuyls

Content may be subject to copyright.

The Dynamics of Human Behaviour in Poker

Marc Ponsen

Karl Tuyls

Steven de Jong

Jan Ramon

Tom Croonenborghs

Kurt Driessens

Universiteit Maastricht, Netherlands

Technische Universiteit Eindhoven, Netherlands

Katholieke Universiteit Leuven, Belgium

Biosciences and Technology Department, KH Kempen University College, Belgium

Abstract

In this paper we investigate the evolutionary dynamics of strategic behaviour in the game of poker by

means of data gathered from a large number of real-world poker games. We perform this study from

an evolutionary game theoretic perspective using the Replicator Dynamics model. We investigate the

dynamic properties by studying how players switch between different strategies under different circum-

stances, what the basins of attraction of the equilibria look like, and what the stability properties of the

attractors are. We illustrate the dynamics using a simplex analysis. Our experimental results conﬁrm ex-

isting domain knowledge of the game, namely that certain strategies are clearly inferior while others can

be successful given certain game conditions.

1 Introduction

Although the rules of the game of poker are simple, it is a challenging game to master. There exist many

books written by domain experts on how to play the game (see, e.g., [2, 4, 9]). A general consensus is that a

winning poker strategy should be adaptive: a player should change the style of play to prevent becoming too

predictable, but moreover, the player should adapt the game strategy based on the opponents. In the latter

case, players may want to vary their actions during a speciﬁc game, but they can also consider changing

their overall game strategy over a series of games (e.g., play a more aggressive or defensive style of poker).

Although some studies exist on modeling poker players and providing a best-response given the oppo-

nent model (see, e.g., [1, 8, 10]), not much research focuses on overall strategy selection. In this paper

we address this issue by investigating the evolutionary dynamics of strategic player behaviour in the game

of poker. We perform this study from an evolutionary game-theoretic perspective using the Replicator Dy-

namics (RD) [5, 6, 11, 12]. More precisely, we investigate the dynamic properties by studying how players

switch between different strategies (based on the principle of selection of the ﬁttest), under different cir-

cumstances, what the basins of attraction of the equilibria look like, and what the stability properties of the

attractors are.

A complicating factor is that the RD can only be applied straightforwardly to simple normal form games

as for instance the Prisoner’s Dilemma game [3]. Applying the RD to poker by assembling the different

actions in the different phases of the game for each player will not work, because this leads to an overly

complex table with too many dimensions. To address this problem, overall strategies (i.e., behaviour over a

series of games, henceforth referred to as meta strategies) of players may be considered. Using these meta

strategies, a heuristic payoff table can then be created that enables us to apply different RD models and

perform our analysis. This approach has been used before in the analysis of behaviour of buyers and sellers

in automated auctions [7, 13, 14]. Conveniently, for the game of poker several meta strategies are already

deﬁned in literature. This allows us to apply RD to the game of poker. An important difference with previous

work, is that we use real-world poker games from which the heuristic payoff table is derived, as opposed to

the artiﬁcial data used in the auction studies. We observed poker games played on a poker website, in which

human players competed for real money at various stakes.

Therefore, the contributions of this paper are twofold. First, we provide new insights in the dynamics

of strategic behaviour in the complex game of poker using RD models. These insights may prove useful

for strategy selection by human players but can also aid in creating strong artiﬁcial poker players. Second,

unlike other studies, we apply RD models to real-world human data.

The remainder of this paper is structured as follows. We start by explaining the poker variant we focus

on in our research, namely No-Limit Texas Hold’em poker, and describe some well-known meta strate-

gies for this game. Next we elaborate on the Replicator Dynamics and continue with a description of our

methodology. We end with experiments and a conclusion.

2 Background

In this section we will ﬁrst brieﬂy explain the rules of the game of poker. Then we will discuss meta

strategies as deﬁned by domain experts.

2.1 Poker

Poker is a card game played between at least two players. In a nutshell, the object of the game is to win

games (and consequently win money) by either having the best card combination at the end of the game,

or by being the only active player. The game includes several betting rounds wherein players are allowed

to invest money. Players can remain active by at least matching the largest investment made by any of the

players, or they can choose to fold (i.e., stop investing money and forfeit the game). In the case that only

one active player remains, i.e., all other players chose to fold, the active player automatically wins the game.

The winner receives the money invested by all the players.

In this paper we focus on the most popular poker variant, namely No-Limit Texas Hold’em. This game

includes 4 betting rounds (or phases), respectively called the pre-ﬂop, ﬂop, turn and river phase. During the

ﬁrst betting round, all players are dealt two private cards (what we will now refer to as a player’s hand) that

are only known to that speciﬁc player. To encourage betting, two players are obliged to invest a small amount

the ﬁrst round (the so-called small- and big-blind). One by one, the players can decide whether or not they

want to participate in this game. If they indeed want to participate, they have to invest at least the current

bet. This is known as calling. Players may also decide to raise the bet. If they do not wish to participate,

players fold, resulting in possible loss of money they bet thus far. During the remaining three betting phases,

the same procedure is followed. In every phase, community cards appear on the table (respectively 3 in

the ﬂop phase, and 1 in the other phases). These cards apply to all the players and are used to determine

the card combinations (e.g., a pair or three-of-a-kind may be formed from the player’s private cards and the

community cards).

2.2 Meta strategies

There exists a lot of literature on winning poker strategies, mostly written by domain experts (see, e.g.,

[2, 4, 9]). These poker strategies may describe how to best react in detailed situations in a poker game, but

also how to behave over large numbers of games. Typically, experts describe these so-called meta strategies

based on only a few features. For example, an important feature in describing a player’s meta strategy is the

percentage of times this player voluntarily sees the ﬂop (henceforth abbreviated as VSF ), since this may

give insight in the player’s hand selection. If a particular player chooses to play more than, let’s say, 40% of

the games, he or she may play with less quality hands (see [9] for hand categorization) compared to players

that only see the ﬂop rarely. The standard terminology used for respectively the ﬁrst approach is a loose

and for the latter a tight strategy. Another important feature is the so-called aggression-factor of a player

(henceforth abbreviated as AGR). The aggression-factor illustrates whether a player plays offensively (i.e.,

bets and raises often), or defensively (i.e., calls often). This aggression factor is calculated as:

%bet + %raise

%calls

A player with a low aggression-factor is called passive, while a player with a high aggression-factor is simply

called aggressive. The thresholds for these features can vary depending on the game context. Taking into

account these two features, we can construct four meta strategies, namely: 1) loose-passive (LP), 2) loose-

aggressive (LA), 3) tight-passive (TP), and 4) tight-aggressive (TA). Again note that these meta-strategies

are derived from poker literature.

Experts argue that the TA strategy is the most proﬁtable strategy, since it combines patience (waiting

for quality hands) with aggression after the ﬂop. One could already claim that any aggressive strategy

dominates all passive strategies, simply by looking at the rules of the poker game. Note that games can be

won by having the best card combination, but also by betting all opponents out of the pot. However, most

poker literature will argue that adapting a playing style is the most important feature of any winning poker

strategy. This applies to detailed poker situations, i.e., varying actions based on current opponent(s), but also

varying playing style on a broader scale (e.g., switching from meta strategy). We will next investigate how

players (should) switch between meta strategies in the game of No-Limit Texas Hold’em poker.

3 Methodology

In this section we concisely explain the methodology we will follow to perform our analysis. We start by

explaining Replicator Dynamics (RD) and the heuristic payoff table that is used to derive average payoffs

for the various meta strategies. Then we explain how we approximate the Nash equilibria of interactions

between the various meta strategies. Finally, we elucidate our algorithm for visualizing and analyzing the

dynamics of the different meta strategies in a simplex plot.

3.1 Replicator Dynamics

The RD [11, 16] are a system of differential equations describing how a population of strategies evolves

through time. The RD presumes a number of agents (i.e., individuals) in a population, where each agent is

programmed to play a pure strategy. Hence, we obtain a certain mixed population state x, where x

denotes

the population share of agents playing strategy i. Each time step, the population shares for all strategies

are changed based on the population state and the rewards in a payoff table. Note that single actions are

typically considered in this context, but in our study we look at meta strategies.

An abstraction of an evolutionary process usually combines two basic elements, i.e., selection and mu-

tation. Selection favors some population strategies over others, while mutation provides variety in the pop-

ulation. In this research, we will limit our analysis to the basic RD model based solely on selection of the

most ﬁt strategies in a population. Equation 1 represents this form of RD.

= [(Ax)

− x · Ax] x

(1)

In Equation 1, the state x of the population can be described as a probability vector x = (x

, x

, ..., x

)

which expresses the different densities of all the different types of replicators (i.e., strategies) in the popu-

lation, with x

representing the density of replicator i. A is the payoff matrix that describes the different

payoff values that each individual replicator receives when interacting with other replicators in the popu-

lation. Hence (Ax)

is the payoff that replicator i receives in a population with state x, whereas x · Ax

describes the average payoff in the population. The growth rate

of the proportion of replicator i in

the population equals the difference between the replicator’s current payoff and the average payoff in the

population. For more information, we refer to [3, 5, 15].

3.2 The Heuristic Payoff Table

The heuristic payoff table represents the payoff table of the poker game for the different meta strategies

the different agents can employ. In essence it replaces the Normal Form Game (NFG) payoff table for the

atomic actions. For a complex game such as poker it is impossible to use the atomic NFG, simply because

the table has too many dimensions to be able to represent it. Therefore, we look at heuristic strategies as

outlined in Section 2.2.

Let’s assume we have A agents and S strategies. This would require S

entries in our NFG table. We

now make a few simpliﬁcations, i.e., we do not consider different types of agents, we assume all agents can

choose from the same strategy set and all agents receive the same payoff for being in the same situation. This

setting corresponds to the setting of a symmetric game. This means we consider a game where the payoffs

for playing a particular strategy depend only on the strategies employed by the other agents, but not on who

is playing them. Under this assumption we can seriously reduce the number of entries in the heuristic payoff

table. More precisely, we need to consider the different ways of dividing our A agents over all possible S

strategies. This boils down to:



A + S − 1



Suppose we consider 3 heuristic strategies and 6 agents, this leads to a payoff table of 28 entries, which is

a serious reduction from 3

= 729 entries in the general case. As an example the next table illustrates what

the heuristic payoff table looks like for three strategies S

, S

and S

P =





... ...





Consider for instance the ﬁrst row of this table: in this row there are s

agents that play strategy S

, s

agents

that play strategy S

and s

agents play strategy S

. Furthermore, u

is the respective expected payoff for

playing strategy S

. We call a tuple (s

, s

, u

) a proﬁle of the game. To determine the payoffs u

in the table, we compute expected payoffs for each proﬁle from real-world poker data we assembled. More

precisely, we look in the data for the appearance of each proﬁle and compute from these data points the

expected payoff for the used strategies. However, because payoff in the game of poker is non-deterministic,

we need a signiﬁcant number of independent games to be able to compute representative values for our table

entries. In Section 4 we provide more details on the data we used and on the process of computing the payoff

table.

3.3 Approximating Nash Equilibria

In this section we describe how we can determine which of our restpoints of the RD are effectively Nash

equilibria (so note that a restpoint of the RD is not necessarily Nash). The approach we describe is based on

work of Walsh et al. and Vytelyngum et al. [13, 14]. An Nash equilibria occurs when no player can increase

its payoff by changing strategy unilaterally. For the sake of clarity we follow the notation of [14].

The expected payoff of an agent playing a strategy j ∈ S

, given a mixed-strategy p (the population

state), is denoted as u(e

, p). This corresponds to (Ax)

in Equation 1. The value of u(e

, p) can be

computed by considering the results from a large number of poker games with a player playing strategy j

and the other agents selected from the population, with a mixed-strategy p. For each game and every strategy,

the individual payoffs of agents using strategy j are averaged. The Nash equilibrium is then approximated

as the argument to the minimisation problem given in Equations 2 and 3.

v(p) =

j=1

(max[u(e

, p) − u(p, p), 0])

(2)

nash

= argmin

[v(p)] (3)

Here, u(p, p) is the average payoff of the entire population and corresponds with term x · Ax of Equation

1. Speciﬁcally, p

nash

is a Nash equilibrium if and only if it is a global minimum of v(p), and p is a

global minimum if v(p) = 0. We solve this non-linear minimisation problem using the Amoeba non-linear

optimiser [14].

3.4 Simplex Analysis

The simplex analysis allows us to graphically and analytically study the dynamics of strategy changes.

Before explaining this analysis, we ﬁrst introduce a deﬁnition of a simplex. Given n elements which are

randomly chosen with probabilities (x

, x

, . . . , x

), there holds x

, x

, . . . , x

≥ 0 and

i=1

= 1. We

denote the set of all such probability distributions over n elements as Σ

or simply Σ if there is no confusion

possible. Σ

is a (n − 1)-dimensional structure and is called a simplex. One degree of freedom is lost

due to the normality constraint. For example in Figure 1, Σ

and Σ

are shown. In the ﬁgures throughout

the experiments we use Σ

, projected as an equilateral triangle as in Figure 1(b), but we drop the axes and

The use of S differs from that in Section 3.2. Here S represents the set of strategies, unlike the number of strategies in Section 3.2.

(a) Σ

(b) Σ

Figure 1: The unit simplices Σ

(a; left) and Σ

(b; right).

labels. Since we use four meta strategies and Σ

concerns only three, this implies that we need to show four

simplexes Σ

, from each of which one strategy is missing.

Using the generated heuristic payoff table, we can now visualize the dynamics of the different agents

in a simplex as follows. To calculate the RD at any point s = (x

, x

) in our simplex, we consider

N (i.e., many) runs with mixed-strategy s; x

is the percentage of the population playing strategy S

, x

is the percentage playing strategy S

and x

is is the percentage playing strategy S

. For each run, each

poker agent selects their (pure) strategy based on this mixed-strategy. Given the number of players using the

different strategies (S

, S

), we have a particular proﬁle for each run. This proﬁle can be looked up in

our table, yielding a speciﬁc payoff for each player. The average of the payoffs of each of these N proﬁles

gives the payoffs at s = (x

, x

). Provided with these payoffs we can easily compute the RD by ﬁlling

in the values of the different variables in Equation 1. This yields us a gradient at the point s = (x

, x

Starting from a particular point within the simplex, we can now generate a smooth trajectory (consisting

of a piecewise linear curve) by moving a small distance in the calculated direction, until the trajectory reaches

an equilibrium. A trajectory does not necessarily settle at a ﬁxed point. More precisely, an equilibrium to

which trajectories converge and settle is known as an attractor, while a saddle point is an unstable equilibrium

at which trajectories do not settle. Attractors and saddle points are very useful measures of how likely it is

that a population converges to a speciﬁc equilibrium.

4 Experiments and results

We collected a total of 1599057 No-Limit Texas Hold’em games with 6 or more players starting. As a ﬁrst

step we needed to determine the strategy for a player at any given point. If a player played less than 50

games in total, we argue that we do not have sufﬁcient data to establish a strategy, and therefore we ignore

this player (and game). If the player played at least 50 games, we used an interval of 50 games to collect

statistics for this speciﬁc player, and then determined the VSF and AGR values. We set the thresholds

respectively to 0.35 and 2.0, i.e., if VSF > 0.35, then the player is considered loose (and tight otherwise),

and if AGR > 2 then the player is considered aggressive (and passive otherwise). These are commonly

used thresholds for a No-Limit Texas Hold’em game (see e.g., [2, 4, 9]). The resulting strategy was then

associated with the speciﬁc player for all games in the interval of 50 games. Having estimated all players’

strategies, it is now possible to determine the table conﬁguration (i.e., the number of players playing any of

the four meta strategies) for all games. Finally, we can compute the average payoffs for all strategies given

a particular table conﬁguration and produce a proﬁle (see Section 3.2).

We plotted four simplexes that resulted from our RD analysis in Figure 2. Recall from Section 3.4

that these simplexes show the dynamic behavior of the participating players having a choice from three

strategies. This means that the evolution of the strategies, employed in the population, is visualized for

every possible initial condition of the game. The initial condition determines in which basin of attraction

we end up, leading to some speciﬁc attractor or repeller. These restpoints (i.e. attractors or repellers) are

potentially Nash equilibria.

What we can immediately see from the plots is that both passive strategies LP and TP (except in plot a)

are repellers. In particular the LP strategy is a strong repeller. This suggests that no matter what the game

situation is, when playing the LP strategy, it is always rational to switch strategy to for example TA or LA.

(a) (b)

Figure 2: The direction ﬁeld of the RD using the heuristic payoff table considering the four described meta-

strategies. Dots represent the Nash equilbria.

This nicely conﬁrms the claim made earlier (and in literature), namely that aggressive strategies dominate

their passive counterparts.

The dots indicated on the plots represent the Nash equilibria of the respective games

. Figure 2a contains

three Nash equilibria of which two are mixed and one is pure. The mixed equilibrium at the axis TP-LP is

evolutionarily unstable as a small deviation in a players’ strategy might lead the dynamics away from this

equilibrium to one of the others. The mixed equilibrium at the axis LP-TA is stable. As one can see this

equilibrium lies close to the pure strategy TA. This means that TA is played with a higher probability than

LP. Finally, there is also one stable pure equilibrium present, i.e., TP. Of the stable equilibria TP has the

largest basin of attraction.

Figure 2b contains 3 Nash equilibria of which one is mixed and two are pure. As one can see from the

picture, the mixed Nash equilibrium is evolutionarily unstable, i.e., any small perturbation of this equilibrium

immediately leads the dynamics away from it to one of the other pure Nash equilibria. This means that if

one of the players would decide to slightly change its strategy at the equilibrium point, the dynamics of the

entire population would drastically change. The mixed Nash equilibrium almost corresponds to the situation

in which the three strategies are played with equal probability, i.e., a uniform distribution. The pure Nash

equilibria LA and TA are both evolutionarily stable. LA has a larger basin of attraction than TA (similar to

plot a), which does not completely correspond with the expectations of domain experts (it is assumed by

Due to space constraints we only discuss the Nash equilibria of Figures 2a-2b and Figures 3a-3b. For completeness the equilibria

of Figures 2c and 2d are also indicated.

(a) (b)

Figure 3: The direction ﬁeld of the RD using the heuristic payoff table using data of games with active

players at the ﬂop.

domain experts that in general TA is the most proﬁtable strategy).

One possible explanation is the following: we noticed that some strategies (depending on the used

thresholds for VSF and AGR) are less played by humans compared to other strategies. Therefore, a table

conﬁguration with a large number of agents playing these scarcely played strategies, results in few instances

and possibly a distorted average payoff due to the high variance of proﬁts in the game of No-Limit Texas

Hold’em. In particular, we observed that table conﬁgurations with many humans playing a tight strategy had

only few instances (e.g., the payoffs used in plot a, with two tight strategies in the simplex, were calculated

using 40% less instances compared to those in plot b). A severe constraint on the number of instances is

currently our chosen representation for a proﬁle. In the previous experiment, we used games with 6 or

more starting players, and counted the number of occurrences of the four strategies. An alternative way

of interpreting the data is only considering players active at the ﬂop. Since most of the times only 4 or

less players (and a maximum of 6 players in our data) are active at the ﬂop, this results in fewer proﬁles.

Basically, we generalize over the number of players starting at the beginning of the game and only focus

on the interaction between strategies during the phases that most inﬂuence the average payoffs. The results

from these experiments are illustrated in Figure 3.

In Figure 3a and 3b we have one pure Nash equilibrium being a dominant strategy, i.e., TA. These

equilibria, and the evolution to them from any arbitrary initial condition, conﬁrm the conclusions of domain

experts.

5 Conclusion

In this paper we investigated the evolutionary dynamics of strategic behaviour of players in the game of No-

Limit Texas Hold’em poker. We performed this study from an evolutionary game theoretic perspective using

Replicator Dynamic models. We investigated the dynamic properties by studying how human players should

switch between different strategies under different circumstances, and what the Nash equilibria look like.

We observed poker games played at an online poker site and used this data for our analysis. Based on domain

knowledge, we identiﬁed four distinct meta strategies in the game of poker. We then computed the heuristic

payoff table to which we applied the Replicator Dynamic model. The resulting plots conﬁrm that what is

claimed by domain experts, namely that often aggressive strategies dominate their passive counterparts, and

that the Loose-Passive strategy is an inferior one.

For future work, we will examine the interactions between the meta strategies among several other di-

mensions, namely, more detailed meta strategies (i.e., based on more features), a varying number of players,

different parameter settings and different Replicator Dynamic models (e.g., including mutation). We are

also interested in performing this study using simulated data (which we can generate much faster). Finally,

since it is clear from our current experiments that the Loose-Passive strategy is an inferior one, we can focus

on the switching dynamics between the remaining strategies given the presence of a ﬁxed number of players

playing the Loose-Passive strategy. This way, we focus on the dynamics for the strategies that matter.

6 Acknowledgments

Marc Ponsen is sponsored by the Interactive Collaborative Information Systems (ICIS) project, supported

by the Dutch Ministry of Economic Affairs, grant nr: BSIK03024. Jan Ramon and Kurt Driessens are post-

doctoral fellow of the Research Foundation - Flanders (FWO). The authors wish to express their gratitude

to P. Vytelingum for his insightful comments on the construction of the heurisitic payoff table.

References

[1] A. Davidson, D. Billings, J. Schaeffer, and D. Szafron. Improved opponent modeling in poker. In

Proceedings of The 2000 International Conference on Artiﬁcial Intelligence (ICAI’2000), pages 1467–

1473, 2000.

[2] D. Doyle Brunson. Doyle Brunson’s Super System: A Course in Power Poker. Cardoza, 1979.

[3] H. Gintis. Game Theory Evolving: A Problem-Centered Introduction to Modeling Strategic Interaction.

Princeton University Press, 2001.

[4] D. Harrington. Harrington on Hold’em Expert Strategy for No Limit Tournaments. Two Plus Two

Publisher, 2004.

[5] J. Hofbauer and K. Sigmund. Evolutionary Games and Population Dynamics. Cambridge University

Press, 1998.

[6] J. Maynard-Smith. Evolution and the Theory of Games. Cambridge University Press, 1982.

[7] S. Phelps, S. Parsons, and P. McBurney. Automated trading agents versus virtual humans: an evolu-

tionary game-theoretic comparison of two double-auction market designs. In Proceedings of the 6th

Workshop on Agent-Mediated Electronic Commerce, New York, NY, 2004.

[8] M. Ponsen, J. Ramon, T. Croonenborghs, K. Driessens, and K. Tuyls. Bayes-relational learning of

opponent models from incomplete information in no-limit poker. In Twenty-third Conference of the As-

sociation for the Advancement of Artiﬁcial Intelligence (AAAI-08), pages 1485–1487, Chicago, USA,

2008.

[9] D. Slansky. The Theory of Poker. Two Plus Two Publisher, 1987.

[10] F. Southey, M. Bowling, B. Larson, C. Piccione, N. Burch, D. Billings, and D. C. Rayner. Bayes’

bluff: Opponent modelling in poker. In Proceedings of the 21st Conference in Uncertainty in Artiﬁcial

Intelligence (UAI ’05), pages 550–558, 2005.

[11] P. Taylor and L. Jonker. Evolutionary stable strategies and game dynamics. Math. Biosci., 40:145–156,

1978.

[12] K. Tuyls, P. ’t Hoen, and B. Vanschoenwinkel. An evolutionary dynamical analysis of multi-agent

learning in iterated games. The Journal of Autonomous Agents and Multi-Agent Systems, 12:115–153,

2006.

[13] P. Vytelingum, D. Cliff, and N. R. Jennings. Analysing buyers and sellers strategic interactions in

marketplaces: an evolutionary game theoretic approach. In Proc. 9th Int. Workshop on Agent-Mediated

Electronic Commerce, Hawaii, USA, 2007.

[14] W. E. Walsh, R. Das, G. Tesauro, and J. O. Kephart. Analyzing complex strategic interactions in multi-

agent systems. In P. Gymtrasiwicz and S. Parsons, editors, Proceedings of the 4th Workshop on Game

Theoretic and Decision Theoretic Agents, 2001.

[15] J. W. Weibull. Evolutionary Game Theory. MIT Press, 1996.

[16] E. Zeeman. Dynamics of the evolution of animal conﬂicts. Journal of Theoretical Biology, 89:249–

270, 1981.

Kaksi kuvaa työntekijästä : sisäisen viestinnän opit ja postmoderni näkökulma

Article

Full-text available

Jan 2006

Sanna Joensuu

From Poverty Traps to Indigenous Philanthropy: Complexity in a Rapidly Changing World

Article

Jun 2013

Stephen Jarrett

Poverty persists around the world and is exacerbated by growing inequality especially within countries. The majority of the poor are ‘trapped’ in specific rural and urban localities in countries now classified as middle‐income where domestic policy and resources are not sufficiently focused on poverty and where international aid is not significant. The majority of those who manage to move out of poverty report that they achieve this through their own initiative, adapting to changing circumstances. Poverty must be treated as principally domestic and local, with the poor as the principal actors in its reduction. Poverty is characterised by its multidimensionality, spanning across a number of factors that can be broadly related to education, health, finance and environment, and which can create poverty traps from which the poor have difficulty in escaping. The dominance and interconnectedness of any of these factors can differ between poverty traps, as can the effect they have on different population groups, with young children and girls being particularly vulnerable. Traditional coping mechanisms help alleviate some immediate aspects of poverty in some populations, but with increasing urbanisation they are weakening considerably, and a greater ‘monetisation’ of help is emerging. More modern coping mechanisms have come into play, not just support from the diaspora, but also help mechanisms being set up by the growing number of wealthy and influential indigenous philanthropists in developing countries, who are on the ‘winning side’ of growing inequality. Poverty is closely aligned with deficits in assets, where assets are broadly defined as physical, human, infrastructural and institutional, and originate in the household, community, private sector and the State at local, regional and national levels. The State is often unable or unwilling to deal with poverty, given the difficulty and cost of reaching the poor, and the low political return. Publicly‐run services, such as health and education, which are central to building human capital, are notoriously ineffective, with public workers underpaid and conducting private business on the side, leading to the poor, especially in middle income countries, using the private sector for many of their needs. Public works programmes and social protection measures do not have significant penetration amongst the poor in many countries, and the granting of entitlements to specific population groups is of generally low coverage. The rapid spread of mobile phone access, even within poorer populations, is a powerful new asset to be taken into account, both in terms of immediate access to information and also as a channel for financial flows, both of which can enhance livelihoods. The poor, trapped in poverty, the State, deficient in its response to poverty, and the evolving development environment, spurred by changing inter‐country alliances, rapid technological change and the increasing activity of the private sector and indigenous philanthropy, all demand a new way of analysing solutions to poverty. This has to be context‐specific and with the engagement of all stakeholders, including the poor themselves, throughout the entire process of decision‐making and action. Analysis using complexity science, which is potentially better able to address the multidimensional framework of poverty, should be added to the more traditional and often sectoral linear cause‐effect analyses. Methods and tools generated from complexity science offer an interdisciplinary approach to reaching emerging solutions based on a multiplicity of factors facing the poor in any given locality, using local knowledge and accounting for patterns of change that are taking place. Pluralism is considered key to analysing poverty and identifying and monitoring solutions, and applies as much to the State as it does to the for‐profit and not‐for‐profit private sector interventions. A central question to be answered in analysing emerging patterns will be how to increase the poor's physical (financial) and human (labour) assets to enable their effective access to infrastructural (services) and institutional (political) assets. The State, through its governmental bodies and inter‐governmental allies, will have the key responsibility of ensuring the availability of necessary and appropriate infrastructural and institutional assets, and providing entitlements to specific populations, aimed at reducing inequality of access and opportunity. The private sector, both for‐profit and not‐for‐profit, will play a significant role in building physical, human and infrastructural assets, both through contracting with the public sector and through indigenous and global philanthropy.

Factors in Learning Dynamics Influencing Relative Strengths of Strategies in Poker Simulation

Article

Full-text available

Nov 2023

Poker is a game of skill, much like chess or go, but distinct as an incomplete information game. Substantial work has been done to understand human play in poker, as well as the optimal strategies in poker. Evolutionary game theory provides another avenue to study poker by considering overarching strategies, namely rational and random play. In this work, a population of poker playing agents is instantiated to play the preflop portion of Texas Hold’em poker, with learning and strategy revision occurring over the course of the simulation. This paper aims to investigate the influence of learning dynamics on dominant strategies in poker, an area that has yet to be investigated. Our findings show that rational play emerges as the dominant strategy when loss aversion is included in the learning model, not when winning and magnitude of win are of the only considerations. The implications of our findings extend to the modeling of sub-optimal human poker play and the development of optimal poker agents.

Comparative Study of Human Behavior in Card Playing with a Humanoid Playmate

Article

Full-text available

Jan 2014

This paper describes the study of human behaviors in a poker game with the game playing humanoid robot. Betting decision and nonverbal behaviors of human players were analyzed between human–human and the human–humanoid poker game. It was found that card hand strength is related to the betting strategy and nonverbal interaction. Moreover, engagement in the poker game with the humanoid was assessed through questionnaire and by measuring the nonverbal behaviors between playtime and breaktime. The findings of this study contribute to not only design of socially interactive game playing robot, but also the theoretical approach on the realization of the robot that behaves in the way of human doing in game playing.

Microscale social network analysis for ultra-long space flights

Article

Full-text available

Jan 2009

This paper proposes to leverage the mathematical means of game theory to analyze on-board social crew dynam- ics. We describe how game theory facilitates capturing the essence of interactive decision making, thereby rais- ing the potential for a fully automated and unintrusive monitoring and diagnosis tool. Finally, we present pre- liminary findings based on the base-line data collection and the first phase of the ground based Mars-500 isola- tion experiment.

SEARCHING PARADIGMS FOR COMMUNICATION OF WORK ORGANISATIONS

Article

Elisa Juholin

Dynamic Strategies and Opponent Hands Estimation for Reinforcement Learning in Gin Rummy Game

Chapter

Jan 2022

Gin Rummy Card game is an old and popular game, which was created by Elwood T. Baker and his son C. Graham Baker in the 20th century. This imperfect information card game allows players to create strategies and mathematical calculations to maximize their win rate. In this paper, we presented an AI Gin Rummy player by using Java. Based on the game's rules, we developed both a discard algorithm and a draw algorithm to maximize the win rate by switching different strategies. Moreover, to win more points, we developed dynamic strategies, which can be more responsive and intelligent. The strategies’ algorithms are dynamic based on previous games’ results. By implementing discard and draw algorithms, the win rate increases from 50% to 57.85%. Including our dynamic strategies further increase the win rate to 67.735% (among 100,000 games).

Gameplay Analysis of Multiplayer Games with Verified Action-Costs

Article

Full-text available

Jun 2021

Measuring player skill cannot be done by considering their historical success alone as the relative skill of their opponents must be considered along with confounding factors such as luck and circumstance. With a specifically designed game, every possible player action can be attributed a cost, the value by which a player reduces their maximum probability of winning. By considering the costs of the actions made by a player we can obtain a more accurate representation of how skilful they are. We developed such a game, the mobile game RPGLite, and compared the actions players made with the cost values we had calculated. Through this analysis we made several observations about RPGLite which we share here to demonstrate the utility of action-costs for gameplay analysis. We show how they can be used to identify game states which players have difficulty making the best moves from, to measure how players learn over time and to compare the strengths and complexity of the characters of RPGLite. Commercial titles could benefit from similar tools—we discuss the feasibility of applying our approach to more complex games.

Automated trading agents versus virtual humans: an evolutionary game-theoretic comparison of two double-auction market designs

Article

Full-text available

In this paper we describe an analysis, using evolutionary game theory, of two double auction markets—the clearing house auction and the continuous double auction. We use heuristic-strategy approximation to analyse two broad classes of traders. One heuristic strategy approximates human behavior, and the other is a simple automated strategy, so our analysis permits us to predict the evolution of markets in which human and agent-based traders coexist. Our anal-ysis also allows us to compare market-design objectives such as the expected efficiency of the two markets.

Analyzing Complex Strategic Interactions in Multi-Agent Systems

Article

Full-text available

Jan 2002

We develop a model for analyzing complex games with re-peated interactions, for which a full game-theoretic analy-sis is intractable. Our approach treats exogenously specified, heuristic strategies, rather than the atomic actions, as primi-tive, and computes a heuristic-payoff table specifying the ex-pected payoffs of the joint heuristic strategy space. We ana-lyze a particular game based on the continuous double auc-tion, and compute Nash equilibria of previously published heuristic strategies. To determine the most plausible equi-libria, we study the replicator dynamics of a large population playing the strategies. To account for errors in estimation of payoffs or improvements in strategies, we analyze the dynam-ics and equilibria based on perturbed payoffs.

An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games

Article

Full-text available

Jan 2006

In this paper, we investigate Reinforcement learning (RL) in multi-agent systems (MAS) from an evolutionary dynamical perspective. Typical for a MAS is that the environment is not stationary and the Markov property is not valid. This requires agents to be adaptive. RL is a natural approach to model the learning of individual agents. These Learning algorithms are however known to be sensitive to the correct choice of parameter settings for single agent systems. This issue is more prevalent in the MAS case due to the changing interactions amongst the agents. It is largely an open question for a developer of MAS of how to design the individual agents such that, through learning, the agents as a collective arrive at good solutions. We will show that modeling RL in MAS, by taking an evolutionary game theoretic point of view, is a new and potentially successful way to guide learning agents to the most suitable solution for their task at hand. We show how evolutionary dynamics (ED) from Evolutionary Game Theory can help the developer of a MAS in good choices of parameter settings of the used RL algorithms. The ED essentially predict the equilibriums outcomes of the MAS where the agents use individual RL algorithms. More specifically, we show how the ED predict the learning trajectories of Q-Learners for iterated games. Moreover, we apply our results to (an extension of) the COllective INtelligence framework (COIN). COIN is a proved engineering approach for learning of cooperative tasks in MASs. The utilities of the agents are re-engineered to contribute to the global utility. We show how the improved results for MAS RL in COIN, and a developed extension, are predicted by the ED.

Bayes-Relational Learning of Opponent Models from Incomplete Information in No-Limit Poker.

Conference Paper

Full-text available

Jan 2008

We propose an opponent modeling approach for no- limit Texas hold-em poker that starts from a (learned) prior, i.e., general expectations about opponent behav- ior and learns a relational regression tree-function that adapts these priors to specific opponents. An important asset is that this approach can learn from incomplete in- formation (i.e. without knowing all players' hands in training games).

Analysing Buyers’ and Sellers’ Strategic Interactions in Marketplaces: An Evolutionary Game Theoretic Approach

Conference Paper

Full-text available

Jan 2007

We develop a new model to analyse the strategic behaviour of buyers and sellers in market mechanisms. In particular, we wish to understand how the different strategies they adopt affect their economic efficiency in the market and to understand the impact of these choices on the overall efficiency of the marketplace. To this end, we adopt a two-population evolutionary game theoretic approach, where we consider how the behaviours of both buyers and sellers evolve in marketplaces. In so doing, we address the shortcomings of the previous state-of-the-art analytical model that assumes that buyers and sellers have to adopt the same mixed strategy in the market. Finally, we apply our model in one of the most common market mechanisms, the Continuous Double Auction, and demonstrate how it allows us to provide new insights into the strategic interactions of such trading agents.

Dynamics of the evolution of animal conflicts

Article

Mar 1981

E.C. Zeeman

Dynamics are introduced into Maynard Smith's game about the evolution of strategies in animal conflicts. The retaliator strategy is a weak attractor, but this is only a transient property because the game is structurally unstable. When the game is stabilised the retaliator becomes an evolutionarily stable strategy. At the same time another evolutionarily stable strategy appears comprising a mixture of hawks and bullies, and if individuals are allowed to play mixed strategies then this tends to produce a pecking order. Thus the stabilised game offers an explanation for the evolution of hierarchical societies in terms of natural selection acting on individuals.

Evolutionary Games and Population Dynamics

Book

May 1998

Every form of behaviour is shaped by trial and error. Such stepwise adaptation can occur through individual learning or through natural selection, the basis of evolution. Since the work of Maynard Smith and others, it has been realised how game theory can model this process. Evolutionary game theory replaces the static solutions of classical game theory by a dynamical approach centred not on the concept of rational players but on the population dynamics of behavioural programmes. In this book the authors investigate the nonlinear dynamics of the self-regulation of social and economic behaviour, and of the closely related interactions between species in ecological communities. Replicator equations describe how successful strategies spread and thereby create new conditions which can alter the basis of their success, i.e. to enable us to understand the strategic and genetic foundations of the endless chronicle of invasions and extinctions which punctuate evolution. In short, evolutionary game theory describes when to escalate a conflict, how to elicit cooperation, why to expect a balance of the sexes, and how to understand natural selection in mathematical terms.

Evolutionary Stable Strategies and Game Dynamics

Article

Jul 1978
MATH BIOSCI

We consider a class of matrix games in which successful strategies are rewarded by high reproductive rates, so become more likely to participate in subsequent playings of the game. Thus, over time, the strategy mix should evolve to some type of optimal or stable state. Maynard Smith and Price (1973) have introduced the concept of ESS (evolutionarily stable strategy) to describe a stable state of the game. We attempt to model the dynamics of the game both in the continuous case, with a system of non-linear first-order differential equations, and in the discrete case, with a system of non-linear difference equations. Using this model, we look at the notions of stability and asymptotic behavior. Our notion of stable equilibrium for the continuous dynamic includes, but is somewhat more general than, the notion of ESS.

Game Theory Evolving : A Problem-Centered Introduction to Modeling Strategic Interaction / H. Gintis.

Article

Jan 2009

Herbert Gintis

Contenido: 1) Teoría de la probabilidad; 2) Teoría de la decisión bayesiana; 3) Teoría de juegos: conceptos básicos; 4) Eliminación de estrategias dominadas; 5) Estrategia pura de equilibrios de Nash; 6) Estrategia mixta de equilibrios de Nash; 7) Modelos del agente principal; 8) Juegos de señalización; 9) Juegos reiterativos; 10) Estrategias evolutivamente estables; 11) Sistemas dinámicos; 12) Dinámica evolutiva; 13) Economías de Markov y sistemas dinámicos estocásticos; 14) Cuadro de los símbolos; 15 Respuestas.

Evolution and Theory of Games

Article

Dec 1975

John Maynard Smith

I want in this article to trace the history of an idea. It is beginning to become clear that a range of problems in evolution theory can most appropriately be attacked by a modification of the theory of games, a branch of mathematics first formulated by Von Neumann and Morgenstern in 1944 for the analysis of human conflicts. The problems are diverse and include not only the behaviour of animals in contest situations but also some problems in the evolution of genetic mechanisms and in the evolution of ecosystems. It is not, however, sufficient to take over the theory as it has been developed in sociology and apply it to evolution. In sociology, and in economics, it is supposed that each contestant works out by reasoning the best strategy to adopt, assuming that his opponents are equally guided by reason. This leads to the concept of a ‘minimax’ strategy, in which a contestant behaves in such a way as to minimise his losses on the assumption that his opponent behaves so as to maximise them. Clearly, this would not be a valid approach to animal conflicts. A new concept has to be introduced, the concept of an ‘evolutionary stable strategy’.

The dynamics of human behaviour in poker

Abstract and Figures

Recommended publications

Research on the Enterprise Safety and Low-Carbon Behaviors Management and Application Based on the E...

A dynamic model of competition of political movements

Enemies and Friends in Hedonic Games: Individual Deviations, Stability and Manipulation

On validity of personalized planning and integrated testing algorithms in reproducible games