ArticlePDF Available

The dynamics of human behaviour in poker

Authors:
  • DeepMind and University of Liverpool

Abstract and Figures

In this paper we investigate the evolutionary dynamics of strategic behaviour in the game of poker by means of data gathered from a large number of real-world poker games. We perform this study from an evolutionary game theoretic perspective using the Replicator Dynamics model. We investigate the dynamic properties by studying how players switch between different strategies under different circum- stances, what the basins of attraction of the equilibria look like, and what the stability properties of the attractors are. We illustrate the dynamics using a simplex analysis. Our experimental results confirm ex- isting domain knowledge of the game, namely that certain strategies are clearly inferior while others can be successful given certain game conditions.
Content may be subject to copyright.
The Dynamics of Human Behaviour in Poker
Marc Ponsen
a
Karl Tuyls
b
Steven de Jong
a
Jan Ramon
c
Tom Croonenborghs
d
Kurt Driessens
c
a
Universiteit Maastricht, Netherlands
b
Technische Universiteit Eindhoven, Netherlands
c
Katholieke Universiteit Leuven, Belgium
d
Biosciences and Technology Department, KH Kempen University College, Belgium
Abstract
In this paper we investigate the evolutionary dynamics of strategic behaviour in the game of poker by
means of data gathered from a large number of real-world poker games. We perform this study from
an evolutionary game theoretic perspective using the Replicator Dynamics model. We investigate the
dynamic properties by studying how players switch between different strategies under different circum-
stances, what the basins of attraction of the equilibria look like, and what the stability properties of the
attractors are. We illustrate the dynamics using a simplex analysis. Our experimental results confirm ex-
isting domain knowledge of the game, namely that certain strategies are clearly inferior while others can
be successful given certain game conditions.
1 Introduction
Although the rules of the game of poker are simple, it is a challenging game to master. There exist many
books written by domain experts on how to play the game (see, e.g., [2, 4, 9]). A general consensus is that a
winning poker strategy should be adaptive: a player should change the style of play to prevent becoming too
predictable, but moreover, the player should adapt the game strategy based on the opponents. In the latter
case, players may want to vary their actions during a specific game, but they can also consider changing
their overall game strategy over a series of games (e.g., play a more aggressive or defensive style of poker).
Although some studies exist on modeling poker players and providing a best-response given the oppo-
nent model (see, e.g., [1, 8, 10]), not much research focuses on overall strategy selection. In this paper
we address this issue by investigating the evolutionary dynamics of strategic player behaviour in the game
of poker. We perform this study from an evolutionary game-theoretic perspective using the Replicator Dy-
namics (RD) [5, 6, 11, 12]. More precisely, we investigate the dynamic properties by studying how players
switch between different strategies (based on the principle of selection of the fittest), under different cir-
cumstances, what the basins of attraction of the equilibria look like, and what the stability properties of the
attractors are.
A complicating factor is that the RD can only be applied straightforwardly to simple normal form games
as for instance the Prisoner’s Dilemma game [3]. Applying the RD to poker by assembling the different
actions in the different phases of the game for each player will not work, because this leads to an overly
complex table with too many dimensions. To address this problem, overall strategies (i.e., behaviour over a
series of games, henceforth referred to as meta strategies) of players may be considered. Using these meta
strategies, a heuristic payoff table can then be created that enables us to apply different RD models and
perform our analysis. This approach has been used before in the analysis of behaviour of buyers and sellers
in automated auctions [7, 13, 14]. Conveniently, for the game of poker several meta strategies are already
defined in literature. This allows us to apply RD to the game of poker. An important difference with previous
work, is that we use real-world poker games from which the heuristic payoff table is derived, as opposed to
the artificial data used in the auction studies. We observed poker games played on a poker website, in which
human players competed for real money at various stakes.
Therefore, the contributions of this paper are twofold. First, we provide new insights in the dynamics
of strategic behaviour in the complex game of poker using RD models. These insights may prove useful
for strategy selection by human players but can also aid in creating strong artificial poker players. Second,
unlike other studies, we apply RD models to real-world human data.
The remainder of this paper is structured as follows. We start by explaining the poker variant we focus
on in our research, namely No-Limit Texas Hold’em poker, and describe some well-known meta strate-
gies for this game. Next we elaborate on the Replicator Dynamics and continue with a description of our
methodology. We end with experiments and a conclusion.
2 Background
In this section we will first briefly explain the rules of the game of poker. Then we will discuss meta
strategies as defined by domain experts.
2.1 Poker
Poker is a card game played between at least two players. In a nutshell, the object of the game is to win
games (and consequently win money) by either having the best card combination at the end of the game,
or by being the only active player. The game includes several betting rounds wherein players are allowed
to invest money. Players can remain active by at least matching the largest investment made by any of the
players, or they can choose to fold (i.e., stop investing money and forfeit the game). In the case that only
one active player remains, i.e., all other players chose to fold, the active player automatically wins the game.
The winner receives the money invested by all the players.
In this paper we focus on the most popular poker variant, namely No-Limit Texas Hold’em. This game
includes 4 betting rounds (or phases), respectively called the pre-flop, flop, turn and river phase. During the
first betting round, all players are dealt two private cards (what we will now refer to as a player’s hand) that
are only known to that specific player. To encourage betting, two players are obliged to invest a small amount
the first round (the so-called small- and big-blind). One by one, the players can decide whether or not they
want to participate in this game. If they indeed want to participate, they have to invest at least the current
bet. This is known as calling. Players may also decide to raise the bet. If they do not wish to participate,
players fold, resulting in possible loss of money they bet thus far. During the remaining three betting phases,
the same procedure is followed. In every phase, community cards appear on the table (respectively 3 in
the flop phase, and 1 in the other phases). These cards apply to all the players and are used to determine
the card combinations (e.g., a pair or three-of-a-kind may be formed from the player’s private cards and the
community cards).
2.2 Meta strategies
There exists a lot of literature on winning poker strategies, mostly written by domain experts (see, e.g.,
[2, 4, 9]). These poker strategies may describe how to best react in detailed situations in a poker game, but
also how to behave over large numbers of games. Typically, experts describe these so-called meta strategies
based on only a few features. For example, an important feature in describing a player’s meta strategy is the
percentage of times this player voluntarily sees the flop (henceforth abbreviated as VSF ), since this may
give insight in the player’s hand selection. If a particular player chooses to play more than, let’s say, 40% of
the games, he or she may play with less quality hands (see [9] for hand categorization) compared to players
that only see the flop rarely. The standard terminology used for respectively the first approach is a loose
and for the latter a tight strategy. Another important feature is the so-called aggression-factor of a player
(henceforth abbreviated as AGR). The aggression-factor illustrates whether a player plays offensively (i.e.,
bets and raises often), or defensively (i.e., calls often). This aggression factor is calculated as:
%bet + %raise
%calls
A player with a low aggression-factor is called passive, while a player with a high aggression-factor is simply
called aggressive. The thresholds for these features can vary depending on the game context. Taking into
account these two features, we can construct four meta strategies, namely: 1) loose-passive (LP), 2) loose-
aggressive (LA), 3) tight-passive (TP), and 4) tight-aggressive (TA). Again note that these meta-strategies
are derived from poker literature.
Experts argue that the TA strategy is the most profitable strategy, since it combines patience (waiting
for quality hands) with aggression after the flop. One could already claim that any aggressive strategy
dominates all passive strategies, simply by looking at the rules of the poker game. Note that games can be
won by having the best card combination, but also by betting all opponents out of the pot. However, most
poker literature will argue that adapting a playing style is the most important feature of any winning poker
strategy. This applies to detailed poker situations, i.e., varying actions based on current opponent(s), but also
varying playing style on a broader scale (e.g., switching from meta strategy). We will next investigate how
players (should) switch between meta strategies in the game of No-Limit Texas Hold’em poker.
3 Methodology
In this section we concisely explain the methodology we will follow to perform our analysis. We start by
explaining Replicator Dynamics (RD) and the heuristic payoff table that is used to derive average payoffs
for the various meta strategies. Then we explain how we approximate the Nash equilibria of interactions
between the various meta strategies. Finally, we elucidate our algorithm for visualizing and analyzing the
dynamics of the different meta strategies in a simplex plot.
3.1 Replicator Dynamics
The RD [11, 16] are a system of differential equations describing how a population of strategies evolves
through time. The RD presumes a number of agents (i.e., individuals) in a population, where each agent is
programmed to play a pure strategy. Hence, we obtain a certain mixed population state x, where x
i
denotes
the population share of agents playing strategy i. Each time step, the population shares for all strategies
are changed based on the population state and the rewards in a payoff table. Note that single actions are
typically considered in this context, but in our study we look at meta strategies.
An abstraction of an evolutionary process usually combines two basic elements, i.e., selection and mu-
tation. Selection favors some population strategies over others, while mutation provides variety in the pop-
ulation. In this research, we will limit our analysis to the basic RD model based solely on selection of the
most fit strategies in a population. Equation 1 represents this form of RD.
dx
i
dt
= [(Ax)
i
x · Ax] x
i
(1)
In Equation 1, the state x of the population can be described as a probability vector x = (x
1
, x
2
, ..., x
n
)
which expresses the different densities of all the different types of replicators (i.e., strategies) in the popu-
lation, with x
i
representing the density of replicator i. A is the payoff matrix that describes the different
payoff values that each individual replicator receives when interacting with other replicators in the popu-
lation. Hence (Ax)
i
is the payoff that replicator i receives in a population with state x, whereas x · Ax
describes the average payoff in the population. The growth rate
dx
i
dt
/x
i
of the proportion of replicator i in
the population equals the difference between the replicator’s current payoff and the average payoff in the
population. For more information, we refer to [3, 5, 15].
3.2 The Heuristic Payoff Table
The heuristic payoff table represents the payoff table of the poker game for the different meta strategies
the different agents can employ. In essence it replaces the Normal Form Game (NFG) payoff table for the
atomic actions. For a complex game such as poker it is impossible to use the atomic NFG, simply because
the table has too many dimensions to be able to represent it. Therefore, we look at heuristic strategies as
outlined in Section 2.2.
Let’s assume we have A agents and S strategies. This would require S
A
entries in our NFG table. We
now make a few simplifications, i.e., we do not consider different types of agents, we assume all agents can
choose from the same strategy set and all agents receive the same payoff for being in the same situation. This
setting corresponds to the setting of a symmetric game. This means we consider a game where the payoffs
for playing a particular strategy depend only on the strategies employed by the other agents, but not on who
is playing them. Under this assumption we can seriously reduce the number of entries in the heuristic payoff
table. More precisely, we need to consider the different ways of dividing our A agents over all possible S
strategies. This boils down to:
A + S 1
A
Suppose we consider 3 heuristic strategies and 6 agents, this leads to a payoff table of 28 entries, which is
a serious reduction from 3
6
= 729 entries in the general case. As an example the next table illustrates what
the heuristic payoff table looks like for three strategies S
1
, S
2
and S
3
.
P =
S
1
S
2
S
3
U
1
U
2
U
3
s
1
s
2
s
3
u
1
u
2
u
3
... ...
Consider for instance the first row of this table: in this row there are s
1
agents that play strategy S
1
, s
2
agents
that play strategy S
2
and s
3
agents play strategy S
3
. Furthermore, u
i
is the respective expected payoff for
playing strategy S
i
. We call a tuple (s
1
, s
2
, s
3
, u
1
, u
2
, u
3
) a profile of the game. To determine the payoffs u
i
in the table, we compute expected payoffs for each profile from real-world poker data we assembled. More
precisely, we look in the data for the appearance of each profile and compute from these data points the
expected payoff for the used strategies. However, because payoff in the game of poker is non-deterministic,
we need a significant number of independent games to be able to compute representative values for our table
entries. In Section 4 we provide more details on the data we used and on the process of computing the payoff
table.
3.3 Approximating Nash Equilibria
In this section we describe how we can determine which of our restpoints of the RD are effectively Nash
equilibria (so note that a restpoint of the RD is not necessarily Nash). The approach we describe is based on
work of Walsh et al. and Vytelyngum et al. [13, 14]. An Nash equilibria occurs when no player can increase
its payoff by changing strategy unilaterally. For the sake of clarity we follow the notation of [14].
The expected payoff of an agent playing a strategy j S
1
, given a mixed-strategy p (the population
state), is denoted as u(e
j
, p). This corresponds to (Ax)
i
in Equation 1. The value of u(e
j
, p) can be
computed by considering the results from a large number of poker games with a player playing strategy j
and the other agents selected from the population, with a mixed-strategy p. For each game and every strategy,
the individual payoffs of agents using strategy j are averaged. The Nash equilibrium is then approximated
as the argument to the minimisation problem given in Equations 2 and 3.
v(p) =
S
X
j=1
(max[u(e
j
, p) u(p, p), 0])
2
(2)
p
nash
= argmin
p
[v(p)] (3)
Here, u(p, p) is the average payoff of the entire population and corresponds with term x · Ax of Equation
1. Specifically, p
nash
is a Nash equilibrium if and only if it is a global minimum of v(p), and p is a
global minimum if v(p) = 0. We solve this non-linear minimisation problem using the Amoeba non-linear
optimiser [14].
3.4 Simplex Analysis
The simplex analysis allows us to graphically and analytically study the dynamics of strategy changes.
Before explaining this analysis, we first introduce a definition of a simplex. Given n elements which are
randomly chosen with probabilities (x
1
, x
2
, . . . , x
n
), there holds x
1
, x
2
, . . . , x
n
0 and
P
n
i=1
x
i
= 1. We
denote the set of all such probability distributions over n elements as Σ
n
or simply Σ if there is no confusion
possible. Σ
n
is a (n 1)-dimensional structure and is called a simplex. One degree of freedom is lost
due to the normality constraint. For example in Figure 1, Σ
2
and Σ
3
are shown. In the figures throughout
the experiments we use Σ
3
, projected as an equilateral triangle as in Figure 1(b), but we drop the axes and
1
The use of S differs from that in Section 3.2. Here S represents the set of strategies, unlike the number of strategies in Section 3.2.
x
1
x
2
0
1
1
(a) Σ
2
x
1
x
2
x
3
0
1
1
1
(b) Σ
3
Figure 1: The unit simplices Σ
2
(a; left) and Σ
3
(b; right).
labels. Since we use four meta strategies and Σ
3
concerns only three, this implies that we need to show four
simplexes Σ
3
, from each of which one strategy is missing.
Using the generated heuristic payoff table, we can now visualize the dynamics of the different agents
in a simplex as follows. To calculate the RD at any point s = (x
1
, x
2
, x
3
) in our simplex, we consider
N (i.e., many) runs with mixed-strategy s; x
1
is the percentage of the population playing strategy S
1
, x
2
is the percentage playing strategy S
2
and x
3
is is the percentage playing strategy S
3
. For each run, each
poker agent selects their (pure) strategy based on this mixed-strategy. Given the number of players using the
different strategies (S
1
, S
2
, S
3
), we have a particular profile for each run. This profile can be looked up in
our table, yielding a specific payoff for each player. The average of the payoffs of each of these N profiles
gives the payoffs at s = (x
1
, x
2
, x
3
). Provided with these payoffs we can easily compute the RD by filling
in the values of the different variables in Equation 1. This yields us a gradient at the point s = (x
1
, x
2
, x
3
).
Starting from a particular point within the simplex, we can now generate a smooth trajectory (consisting
of a piecewise linear curve) by moving a small distance in the calculated direction, until the trajectory reaches
an equilibrium. A trajectory does not necessarily settle at a fixed point. More precisely, an equilibrium to
which trajectories converge and settle is known as an attractor, while a saddle point is an unstable equilibrium
at which trajectories do not settle. Attractors and saddle points are very useful measures of how likely it is
that a population converges to a specific equilibrium.
4 Experiments and results
We collected a total of 1599057 No-Limit Texas Hold’em games with 6 or more players starting. As a first
step we needed to determine the strategy for a player at any given point. If a player played less than 50
games in total, we argue that we do not have sufficient data to establish a strategy, and therefore we ignore
this player (and game). If the player played at least 50 games, we used an interval of 50 games to collect
statistics for this specific player, and then determined the VSF and AGR values. We set the thresholds
respectively to 0.35 and 2.0, i.e., if VSF > 0.35, then the player is considered loose (and tight otherwise),
and if AGR > 2 then the player is considered aggressive (and passive otherwise). These are commonly
used thresholds for a No-Limit Texas Hold’em game (see e.g., [2, 4, 9]). The resulting strategy was then
associated with the specific player for all games in the interval of 50 games. Having estimated all players’
strategies, it is now possible to determine the table configuration (i.e., the number of players playing any of
the four meta strategies) for all games. Finally, we can compute the average payoffs for all strategies given
a particular table configuration and produce a profile (see Section 3.2).
We plotted four simplexes that resulted from our RD analysis in Figure 2. Recall from Section 3.4
that these simplexes show the dynamic behavior of the participating players having a choice from three
strategies. This means that the evolution of the strategies, employed in the population, is visualized for
every possible initial condition of the game. The initial condition determines in which basin of attraction
we end up, leading to some specific attractor or repeller. These restpoints (i.e. attractors or repellers) are
potentially Nash equilibria.
What we can immediately see from the plots is that both passive strategies LP and TP (except in plot a)
are repellers. In particular the LP strategy is a strong repeller. This suggests that no matter what the game
situation is, when playing the LP strategy, it is always rational to switch strategy to for example TA or LA.
(a) (b)
(c) (d)
Figure 2: The direction field of the RD using the heuristic payoff table considering the four described meta-
strategies. Dots represent the Nash equilbria.
This nicely confirms the claim made earlier (and in literature), namely that aggressive strategies dominate
their passive counterparts.
The dots indicated on the plots represent the Nash equilibria of the respective games
2
. Figure 2a contains
three Nash equilibria of which two are mixed and one is pure. The mixed equilibrium at the axis TP-LP is
evolutionarily unstable as a small deviation in a players’ strategy might lead the dynamics away from this
equilibrium to one of the others. The mixed equilibrium at the axis LP-TA is stable. As one can see this
equilibrium lies close to the pure strategy TA. This means that TA is played with a higher probability than
LP. Finally, there is also one stable pure equilibrium present, i.e., TP. Of the stable equilibria TP has the
largest basin of attraction.
Figure 2b contains 3 Nash equilibria of which one is mixed and two are pure. As one can see from the
picture, the mixed Nash equilibrium is evolutionarily unstable, i.e., any small perturbation of this equilibrium
immediately leads the dynamics away from it to one of the other pure Nash equilibria. This means that if
one of the players would decide to slightly change its strategy at the equilibrium point, the dynamics of the
entire population would drastically change. The mixed Nash equilibrium almost corresponds to the situation
in which the three strategies are played with equal probability, i.e., a uniform distribution. The pure Nash
equilibria LA and TA are both evolutionarily stable. LA has a larger basin of attraction than TA (similar to
plot a), which does not completely correspond with the expectations of domain experts (it is assumed by
2
Due to space constraints we only discuss the Nash equilibria of Figures 2a-2b and Figures 3a-3b. For completeness the equilibria
of Figures 2c and 2d are also indicated.
(a) (b)
Figure 3: The direction field of the RD using the heuristic payoff table using data of games with active
players at the flop.
domain experts that in general TA is the most profitable strategy).
One possible explanation is the following: we noticed that some strategies (depending on the used
thresholds for VSF and AGR) are less played by humans compared to other strategies. Therefore, a table
configuration with a large number of agents playing these scarcely played strategies, results in few instances
and possibly a distorted average payoff due to the high variance of profits in the game of No-Limit Texas
Hold’em. In particular, we observed that table configurations with many humans playing a tight strategy had
only few instances (e.g., the payoffs used in plot a, with two tight strategies in the simplex, were calculated
using 40% less instances compared to those in plot b). A severe constraint on the number of instances is
currently our chosen representation for a profile. In the previous experiment, we used games with 6 or
more starting players, and counted the number of occurrences of the four strategies. An alternative way
of interpreting the data is only considering players active at the flop. Since most of the times only 4 or
less players (and a maximum of 6 players in our data) are active at the flop, this results in fewer profiles.
Basically, we generalize over the number of players starting at the beginning of the game and only focus
on the interaction between strategies during the phases that most influence the average payoffs. The results
from these experiments are illustrated in Figure 3.
In Figure 3a and 3b we have one pure Nash equilibrium being a dominant strategy, i.e., TA. These
equilibria, and the evolution to them from any arbitrary initial condition, confirm the conclusions of domain
experts.
5 Conclusion
In this paper we investigated the evolutionary dynamics of strategic behaviour of players in the game of No-
Limit Texas Hold’em poker. We performed this study from an evolutionary game theoretic perspective using
Replicator Dynamic models. We investigated the dynamic properties by studying how human players should
switch between different strategies under different circumstances, and what the Nash equilibria look like.
We observed poker games played at an online poker site and used this data for our analysis. Based on domain
knowledge, we identified four distinct meta strategies in the game of poker. We then computed the heuristic
payoff table to which we applied the Replicator Dynamic model. The resulting plots confirm that what is
claimed by domain experts, namely that often aggressive strategies dominate their passive counterparts, and
that the Loose-Passive strategy is an inferior one.
For future work, we will examine the interactions between the meta strategies among several other di-
mensions, namely, more detailed meta strategies (i.e., based on more features), a varying number of players,
different parameter settings and different Replicator Dynamic models (e.g., including mutation). We are
also interested in performing this study using simulated data (which we can generate much faster). Finally,
since it is clear from our current experiments that the Loose-Passive strategy is an inferior one, we can focus
on the switching dynamics between the remaining strategies given the presence of a fixed number of players
playing the Loose-Passive strategy. This way, we focus on the dynamics for the strategies that matter.
6 Acknowledgments
Marc Ponsen is sponsored by the Interactive Collaborative Information Systems (ICIS) project, supported
by the Dutch Ministry of Economic Affairs, grant nr: BSIK03024. Jan Ramon and Kurt Driessens are post-
doctoral fellow of the Research Foundation - Flanders (FWO). The authors wish to express their gratitude
to P. Vytelingum for his insightful comments on the construction of the heurisitic payoff table.
References
[1] A. Davidson, D. Billings, J. Schaeffer, and D. Szafron. Improved opponent modeling in poker. In
Proceedings of The 2000 International Conference on Artificial Intelligence (ICAI’2000), pages 1467–
1473, 2000.
[2] D. Doyle Brunson. Doyle Brunson’s Super System: A Course in Power Poker. Cardoza, 1979.
[3] H. Gintis. Game Theory Evolving: A Problem-Centered Introduction to Modeling Strategic Interaction.
Princeton University Press, 2001.
[4] D. Harrington. Harrington on Hold’em Expert Strategy for No Limit Tournaments. Two Plus Two
Publisher, 2004.
[5] J. Hofbauer and K. Sigmund. Evolutionary Games and Population Dynamics. Cambridge University
Press, 1998.
[6] J. Maynard-Smith. Evolution and the Theory of Games. Cambridge University Press, 1982.
[7] S. Phelps, S. Parsons, and P. McBurney. Automated trading agents versus virtual humans: an evolu-
tionary game-theoretic comparison of two double-auction market designs. In Proceedings of the 6th
Workshop on Agent-Mediated Electronic Commerce, New York, NY, 2004.
[8] M. Ponsen, J. Ramon, T. Croonenborghs, K. Driessens, and K. Tuyls. Bayes-relational learning of
opponent models from incomplete information in no-limit poker. In Twenty-third Conference of the As-
sociation for the Advancement of Artificial Intelligence (AAAI-08), pages 1485–1487, Chicago, USA,
2008.
[9] D. Slansky. The Theory of Poker. Two Plus Two Publisher, 1987.
[10] F. Southey, M. Bowling, B. Larson, C. Piccione, N. Burch, D. Billings, and D. C. Rayner. Bayes’
bluff: Opponent modelling in poker. In Proceedings of the 21st Conference in Uncertainty in Artificial
Intelligence (UAI ’05), pages 550–558, 2005.
[11] P. Taylor and L. Jonker. Evolutionary stable strategies and game dynamics. Math. Biosci., 40:145–156,
1978.
[12] K. Tuyls, P. ’t Hoen, and B. Vanschoenwinkel. An evolutionary dynamical analysis of multi-agent
learning in iterated games. The Journal of Autonomous Agents and Multi-Agent Systems, 12:115–153,
2006.
[13] P. Vytelingum, D. Cliff, and N. R. Jennings. Analysing buyers and sellers strategic interactions in
marketplaces: an evolutionary game theoretic approach. In Proc. 9th Int. Workshop on Agent-Mediated
Electronic Commerce, Hawaii, USA, 2007.
[14] W. E. Walsh, R. Das, G. Tesauro, and J. O. Kephart. Analyzing complex strategic interactions in multi-
agent systems. In P. Gymtrasiwicz and S. Parsons, editors, Proceedings of the 4th Workshop on Game
Theoretic and Decision Theoretic Agents, 2001.
[15] J. W. Weibull. Evolutionary Game Theory. MIT Press, 1996.
[16] E. Zeeman. Dynamics of the evolution of animal conflicts. Journal of Theoretical Biology, 89:249–
270, 1981.
... Tällöin kulttuuri nähdään keskiössä. Aula (1996) liittää tähän ajatteluun dissipatiivisen paradigman. Dissipatiivinen paradigma ei näe viestintää lineaariseksi ja loogiseksi prosessiksi. ...
... Myös näkemys dissipatiivisesta viestinnästä, mikä pohjautuu kaaosteoriaan ja korostaa innovatiivisuutta (ks. Aula 1996), näyttäisi jollakin tavalla toimivan viestintävajeisten suhteen. He ovat kriittisiä ja aiheuttavat omalla käyttäytymisellään turbulenssia ja kaaosta, joka voi johtaa innovatiivisuuteen. ...
... Incentives will need to be positive to engage in action for which there is little or no preexisting motivation or ethical obligation . Reciprocity, i.e. altruism in the expectation of indirect return, is among the most potent constructors of community-building ties in social networks (White 2008). The degree of reciprocity expected by users has to be carefully gauged, as cooperation is likely to be negatively affected the more reciprocity is expected ). ...
... There is certainly a cost to be borne through maintaining vibrant coalitions, but it has been shown that strong reciprocity and other social preferences that support cooperation can evolve and persist even when there are many 'self-regarding' players, where group sizes are substantial and when behavioural or perception errors are significant (Bowles 2007). Group size can increase the probability of a critical mass of people who develop common goods through collective or cooperative action (White 2008). Furthermore, it has been demonstrated that more complex problems can be solved with larger numbers of contributing individuals (Johnson 1998). ...
Article
Poverty persists around the world and is exacerbated by growing inequality especially within countries. The majority of the poor are ‘trapped’ in specific rural and urban localities in countries now classified as middle‐income where domestic policy and resources are not sufficiently focused on poverty and where international aid is not significant. The majority of those who manage to move out of poverty report that they achieve this through their own initiative, adapting to changing circumstances. Poverty must be treated as principally domestic and local, with the poor as the principal actors in its reduction. Poverty is characterised by its multidimensionality, spanning across a number of factors that can be broadly related to education, health, finance and environment, and which can create poverty traps from which the poor have difficulty in escaping. The dominance and interconnectedness of any of these factors can differ between poverty traps, as can the effect they have on different population groups, with young children and girls being particularly vulnerable. Traditional coping mechanisms help alleviate some immediate aspects of poverty in some populations, but with increasing urbanisation they are weakening considerably, and a greater ‘monetisation’ of help is emerging. More modern coping mechanisms have come into play, not just support from the diaspora, but also help mechanisms being set up by the growing number of wealthy and influential indigenous philanthropists in developing countries, who are on the ‘winning side’ of growing inequality. Poverty is closely aligned with deficits in assets, where assets are broadly defined as physical, human, infrastructural and institutional, and originate in the household, community, private sector and the State at local, regional and national levels. The State is often unable or unwilling to deal with poverty, given the difficulty and cost of reaching the poor, and the low political return. Publicly‐run services, such as health and education, which are central to building human capital, are notoriously ineffective, with public workers underpaid and conducting private business on the side, leading to the poor, especially in middle income countries, using the private sector for many of their needs. Public works programmes and social protection measures do not have significant penetration amongst the poor in many countries, and the granting of entitlements to specific population groups is of generally low coverage. The rapid spread of mobile phone access, even within poorer populations, is a powerful new asset to be taken into account, both in terms of immediate access to information and also as a channel for financial flows, both of which can enhance livelihoods. The poor, trapped in poverty, the State, deficient in its response to poverty, and the evolving development environment, spurred by changing inter‐country alliances, rapid technological change and the increasing activity of the private sector and indigenous philanthropy, all demand a new way of analysing solutions to poverty. This has to be context‐specific and with the engagement of all stakeholders, including the poor themselves, throughout the entire process of decision‐making and action. Analysis using complexity science, which is potentially better able to address the multidimensional framework of poverty, should be added to the more traditional and often sectoral linear cause‐effect analyses. Methods and tools generated from complexity science offer an interdisciplinary approach to reaching emerging solutions based on a multiplicity of factors facing the poor in any given locality, using local knowledge and accounting for patterns of change that are taking place. Pluralism is considered key to analysing poverty and identifying and monitoring solutions, and applies as much to the State as it does to the for‐profit and not‐for‐profit private sector interventions. A central question to be answered in analysing emerging patterns will be how to increase the poor's physical (financial) and human (labour) assets to enable their effective access to infrastructural (services) and institutional (political) assets. The State, through its governmental bodies and inter‐governmental allies, will have the key responsibility of ensuring the availability of necessary and appropriate infrastructural and institutional assets, and providing entitlements to specific populations, aimed at reducing inequality of access and opportunity. The private sector, both for‐profit and not‐for‐profit, will play a significant role in building physical, human and infrastructural assets, both through contracting with the public sector and through indigenous and global philanthropy.
... Others have applied evolutionary game models to real world data of online poker play [28,29]. The learning of the players in the data set was summarized using a handful of strategy descriptors, and the learning of agents over the course of play was analyzed. ...
Article
Full-text available
Poker is a game of skill, much like chess or go, but distinct as an incomplete information game. Substantial work has been done to understand human play in poker, as well as the optimal strategies in poker. Evolutionary game theory provides another avenue to study poker by considering overarching strategies, namely rational and random play. In this work, a population of poker playing agents is instantiated to play the preflop portion of Texas Hold’em poker, with learning and strategy revision occurring over the course of the simulation. This paper aims to investigate the influence of learning dynamics on dominant strategies in poker, an area that has yet to be investigated. Our findings show that rational play emerges as the dominant strategy when loss aversion is included in the learning model, not when winning and magnitude of win are of the only considerations. The implications of our findings extend to the modeling of sub-optimal human poker play and the development of optimal poker agents.
... Betting strategies of players rely upon given circumstances such as card hand strength [32]. Hence, bluffing strategy is used according to player's own hand strength. ...
Article
Full-text available
This paper describes the study of human behaviors in a poker game with the game playing humanoid robot. Betting decision and nonverbal behaviors of human players were analyzed between human–human and the human–humanoid poker game. It was found that card hand strength is related to the betting strategy and nonverbal interaction. Moreover, engagement in the poker game with the humanoid was assessed through questionnaire and by measuring the nonverbal behaviors between playtime and breaktime. The findings of this study contribute to not only design of socially interactive game playing robot, but also the theoretical approach on the realization of the robot that behaves in the way of human doing in game playing.
... Heuristic payoff tables have been proven to be beneficial beyond the domain of auctions and trading strategies. In [10] the same approach has been used to analyze the game of Poker (No-Limit Texas Holdem) using empirical data collected during human play. The authors use expert domain knowledge found in Poker literature to cluster complex atomic actions into a small set of heuristics, called meta strategies. ...
Article
Full-text available
This paper proposes to leverage the mathematical means of game theory to analyze on-board social crew dynam- ics. We describe how game theory facilitates capturing the essence of interactive decision making, thereby rais- ing the potential for a fully automated and unintrusive monitoring and diagnosis tool. Finally, we present pre- liminary findings based on the base-line data collection and the first phase of the ground based Mars-500 isola- tion experiment.
... The dissipative thinking goes beyond the traditional process school of communication and suggests, that the operational environment is far from predictability and management-driven systems. E.g., [Stacey, 1991;Aula, 1996]. In turn, disorder may be an opportunity to an organisation forcing it to re-invent its functions and values. ...
Chapter
Gin Rummy Card game is an old and popular game, which was created by Elwood T. Baker and his son C. Graham Baker in the 20th century. This imperfect information card game allows players to create strategies and mathematical calculations to maximize their win rate. In this paper, we presented an AI Gin Rummy player by using Java. Based on the game's rules, we developed both a discard algorithm and a draw algorithm to maximize the win rate by switching different strategies. Moreover, to win more points, we developed dynamic strategies, which can be more responsive and intelligent. The strategies’ algorithms are dynamic based on previous games’ results. By implementing discard and draw algorithms, the win rate increases from 50% to 57.85%. Including our dynamic strategies further increase the win rate to 67.735% (among 100,000 games).
Article
Full-text available
Measuring player skill cannot be done by considering their historical success alone as the relative skill of their opponents must be considered along with confounding factors such as luck and circumstance. With a specifically designed game, every possible player action can be attributed a cost, the value by which a player reduces their maximum probability of winning. By considering the costs of the actions made by a player we can obtain a more accurate representation of how skilful they are. We developed such a game, the mobile game RPGLite, and compared the actions players made with the cost values we had calculated. Through this analysis we made several observations about RPGLite which we share here to demonstrate the utility of action-costs for gameplay analysis. We show how they can be used to identify game states which players have difficulty making the best moves from, to measure how players learn over time and to compare the strengths and complexity of the characters of RPGLite. Commercial titles could benefit from similar tools—we discuss the feasibility of applying our approach to more complex games.
Article
Full-text available
In this paper we describe an analysis, using evolutionary game theory, of two double auction markets—the clearing house auction and the continuous double auction. We use heuristic-strategy approximation to analyse two broad classes of traders. One heuristic strategy approximates human behavior, and the other is a simple automated strategy, so our analysis permits us to predict the evolution of markets in which human and agent-based traders coexist. Our anal-ysis also allows us to compare market-design objectives such as the expected efficiency of the two markets.
Article
Full-text available
We develop a model for analyzing complex games with re-peated interactions, for which a full game-theoretic analy-sis is intractable. Our approach treats exogenously specified, heuristic strategies, rather than the atomic actions, as primi-tive, and computes a heuristic-payoff table specifying the ex-pected payoffs of the joint heuristic strategy space. We ana-lyze a particular game based on the continuous double auc-tion, and compute Nash equilibria of previously published heuristic strategies. To determine the most plausible equi-libria, we study the replicator dynamics of a large population playing the strategies. To account for errors in estimation of payoffs or improvements in strategies, we analyze the dynam-ics and equilibria based on perturbed payoffs.
Article
Full-text available
In this paper, we investigate Reinforcement learning (RL) in multi-agent systems (MAS) from an evolutionary dynamical perspective. Typical for a MAS is that the environment is not stationary and the Markov property is not valid. This requires agents to be adaptive. RL is a natural approach to model the learning of individual agents. These Learning algorithms are however known to be sensitive to the correct choice of parameter settings for single agent systems. This issue is more prevalent in the MAS case due to the changing interactions amongst the agents. It is largely an open question for a developer of MAS of how to design the individual agents such that, through learning, the agents as a collective arrive at good solutions. We will show that modeling RL in MAS, by taking an evolutionary game theoretic point of view, is a new and potentially successful way to guide learning agents to the most suitable solution for their task at hand. We show how evolutionary dynamics (ED) from Evolutionary Game Theory can help the developer of a MAS in good choices of parameter settings of the used RL algorithms. The ED essentially predict the equilibriums outcomes of the MAS where the agents use individual RL algorithms. More specifically, we show how the ED predict the learning trajectories of Q-Learners for iterated games. Moreover, we apply our results to (an extension of) the COllective INtelligence framework (COIN). COIN is a proved engineering approach for learning of cooperative tasks in MASs. The utilities of the agents are re-engineered to contribute to the global utility. We show how the improved results for MAS RL in COIN, and a developed extension, are predicted by the ED.
Conference Paper
Full-text available
We propose an opponent modeling approach for no- limit Texas hold-em poker that starts from a (learned) prior, i.e., general expectations about opponent behav- ior and learns a relational regression tree-function that adapts these priors to specific opponents. An important asset is that this approach can learn from incomplete in- formation (i.e. without knowing all players' hands in training games).
Conference Paper
Full-text available
We develop a new model to analyse the strategic behaviour of buyers and sellers in market mechanisms. In particular, we wish to understand how the different strategies they adopt affect their economic efficiency in the market and to understand the impact of these choices on the overall efficiency of the marketplace. To this end, we adopt a two-population evolutionary game theoretic approach, where we consider how the behaviours of both buyers and sellers evolve in marketplaces. In so doing, we address the shortcomings of the previous state-of-the-art analytical model that assumes that buyers and sellers have to adopt the same mixed strategy in the market. Finally, we apply our model in one of the most common market mechanisms, the Continuous Double Auction, and demonstrate how it allows us to provide new insights into the strategic interactions of such trading agents.
Article
Dynamics are introduced into Maynard Smith's game about the evolution of strategies in animal conflicts. The retaliator strategy is a weak attractor, but this is only a transient property because the game is structurally unstable. When the game is stabilised the retaliator becomes an evolutionarily stable strategy. At the same time another evolutionarily stable strategy appears comprising a mixture of hawks and bullies, and if individuals are allowed to play mixed strategies then this tends to produce a pecking order. Thus the stabilised game offers an explanation for the evolution of hierarchical societies in terms of natural selection acting on individuals.
Book
Every form of behaviour is shaped by trial and error. Such stepwise adaptation can occur through individual learning or through natural selection, the basis of evolution. Since the work of Maynard Smith and others, it has been realised how game theory can model this process. Evolutionary game theory replaces the static solutions of classical game theory by a dynamical approach centred not on the concept of rational players but on the population dynamics of behavioural programmes. In this book the authors investigate the nonlinear dynamics of the self-regulation of social and economic behaviour, and of the closely related interactions between species in ecological communities. Replicator equations describe how successful strategies spread and thereby create new conditions which can alter the basis of their success, i.e. to enable us to understand the strategic and genetic foundations of the endless chronicle of invasions and extinctions which punctuate evolution. In short, evolutionary game theory describes when to escalate a conflict, how to elicit cooperation, why to expect a balance of the sexes, and how to understand natural selection in mathematical terms.
Article
We consider a class of matrix games in which successful strategies are rewarded by high reproductive rates, so become more likely to participate in subsequent playings of the game. Thus, over time, the strategy mix should evolve to some type of optimal or stable state. Maynard Smith and Price (1973) have introduced the concept of ESS (evolutionarily stable strategy) to describe a stable state of the game. We attempt to model the dynamics of the game both in the continuous case, with a system of non-linear first-order differential equations, and in the discrete case, with a system of non-linear difference equations. Using this model, we look at the notions of stability and asymptotic behavior. Our notion of stable equilibrium for the continuous dynamic includes, but is somewhat more general than, the notion of ESS.
Article
Contenido: 1) Teoría de la probabilidad; 2) Teoría de la decisión bayesiana; 3) Teoría de juegos: conceptos básicos; 4) Eliminación de estrategias dominadas; 5) Estrategia pura de equilibrios de Nash; 6) Estrategia mixta de equilibrios de Nash; 7) Modelos del agente principal; 8) Juegos de señalización; 9) Juegos reiterativos; 10) Estrategias evolutivamente estables; 11) Sistemas dinámicos; 12) Dinámica evolutiva; 13) Economías de Markov y sistemas dinámicos estocásticos; 14) Cuadro de los símbolos; 15 Respuestas.
Article
I want in this article to trace the history of an idea. It is beginning to become clear that a range of problems in evolution theory can most appropriately be attacked by a modification of the theory of games, a branch of mathematics first formulated by Von Neumann and Morgenstern in 1944 for the analysis of human conflicts. The problems are diverse and include not only the behaviour of animals in contest situations but also some problems in the evolution of genetic mechanisms and in the evolution of ecosystems. It is not, however, sufficient to take over the theory as it has been developed in sociology and apply it to evolution. In sociology, and in economics, it is supposed that each contestant works out by reasoning the best strategy to adopt, assuming that his opponents are equally guided by reason. This leads to the concept of a ‘minimax’ strategy, in which a contestant behaves in such a way as to minimise his losses on the assumption that his opponent behaves so as to maximise them. Clearly, this would not be a valid approach to animal conflicts. A new concept has to be introduced, the concept of an ‘evolutionary stable strategy’.