Conference PaperPDF Available

A Monte-Carlo approach for ghost avoidance in the Ms. Pac-Man game

Authors:

Abstract and Figures

Ms. Pac-Man is a challenging, classic arcade game that provides an interesting platform for Artificial Intelligence (AI) research. This paper reports the first Monte-Carlo approach to develop a ghost avoidance module of an intelligent agent that plays the game. Our experimental results show that the look-ahead ability of Monte-Carlo simulation often prevents Ms. Pac-Man being trapped by ghosts and reduces the chance of losing Ms. Pac-Man's life significantly. Our intelligent agent has achieved a high score of around 21,000. It is sometimes capable of clearing the first three stages and playing at the level of a novice human player.
Content may be subject to copyright.
A Monte-Carlo Approach for Ghost Avoidance in the
Ms. Pac-Man Game
Bruce K. B. Tong
Dept. of Electronic Engineering
City University of Hong Kong
kbtong@cityu.edu.hk
Chi Wan Sung
Dept. of Electronic Engineering
City University of Hong Kong
albert.sung@cityu.edu.hk
Abstract— Ms. Pac-Man is a challenging, classic arcade game
that provides an interesting platform for Artificial Intelligence
(AI) research. This paper reports the first Monte-Carlo approach
to develop a ghost avoidance module of an intelligent agent that
plays the game. Our experimental results show that the look-
ahead ability of Monte-Carlo simulation often prevents Ms. Pac-
Man being trapped by ghosts and reduces the chance of losing
Ms. Pac-Man’s life significantly. Our intelligent agent has
achieved a high score of around 21,000. It is sometimes capable of
clearing the first three stages and playing at the level of a novice
human player.
Keywords- Monte-Carlo simulation; Ms. Pac-Man; game;
competition; computational intelligence
I. INTRODUCTION
Computer games are often used as test-beds for
development of AI techniques. One of the reasons is that
computer games share similar problems with the real world but
with simpler rules and clearer objectives. In recent years,
computer games have received much attention in AI and
Computational Intelligence (CI) research. Some competitions
have been held in various IEEE conferences, including the
competitions of Ms. Pac-Man [1], Mario [2] and simulated car
racing [3]. These competitions provide a level playing field to
compare the performance of different CI techniques.
Pac-Man was a popular and classic arcade game developed
by the Namco Company in 1980. Different variations of the
games have been developed subsequently. The original Pac-
Man is a one-player game where the human player controls
Pac-Man to traverse a maze, avoid the four ghosts (non-player
characters) and eat all the pills to clear a stage. The ghosts in
the original version have deterministic actions and travel path,
making it possible for players to find the optimal strategy to
play the game. Ms. Pac-Man, a variation of the original game,
adds random elements in the ghosts’ movements. Based on the
current game status, each ghost chooses its own path according
to its own probability distribution. This variation avoids players
hard-coding an optimal path and thus increases the fun and
difficulty of the game.
Same as Pac-Man, the objective of Ms. Pac-Man in each
stage is to eat all the pills throughout the maze in that stage.
Totally, there are four mazes in the Ms. Pac-Man game. The
first two stages utilize the first maze, the next three stages the
second maze, the next three the third maze, and the next the
fourth maze. Fig. 1 shows the screen capture of the first stage
of the game. There are four ghosts, namely Blinky (red), Pinky
(pink), Inky (light blue) and Sue (orange). The ghosts are
movable non-player characters. They live in the cage (center of
the maze) initially and go out one by one after the game has
started. They aim to shorten the distance between Ms. Pac-Man
and themselves, but each with different distribution of
probability of random move.
Figure 1. The first maze of Ms. Pac-Man (left) and the edible ghosts (right)
The number of pills is different between mazes. The first
maze contains 220 pills (small white dot) and 4 power pills
(big white dot). Both kinds of pills are non-movable objects.
When a pill is eaten, the player receives 10 points. Once Ms.
Pac-Man eats a power pill, the ghosts are changed to edible
state (dark blue) temporarily as shown in Fig. 1. The power
pills give Ms. Pac-Man the ability to capture the edible ghosts,
allowing a player to get many bonus points. Eating the power
pill gets 50 points, the first ghost 200, the second consecutive
ghost 400, the third consecutive ghost 800 and the fourth
consecutive ghost 1600.
Bonus fruits occasionally appear in the maze and move
throughout the game randomly. The score of eating bonus fruit
ranges from 100 to 5,000, depending on which stage the player
is playing. At the beginning of the game, there are three lives
for Ms. Pac-Man. An extra life will be given for the first
10,000 points the player has earned.
This paper describes an initial Monte-Carlo approach for
ghost avoidance in the Ms. Pac-Man game. The rest of this
paper is organized as follows: in Section II we review the
2010 2nd International IEEE Consumer Electronics Society's Games Innovations Conference
978-1-4244-7180-5/10/$26.00 ©2010 IEEE
related research in Pac-Man and Ms. Pac-Man, Section III
describes our designed framework of agent, Section IV
describes the Monte-Carlo method for ghost avoidance, Section
V presents details and results of our experiments, and Section
VI provides conclusions and discusses the future work.
II. RELATED WORK
A. Pac-Man
Koza [4] and Rosca [5] are two of the earliest researchers
who use Pac-Man as an example problem domain to study the
effectiveness of genetic programming for task prioritization.
Koza’s work utilized a modified version of the game with a
maze that replicated the first level of Ms. Pac-Man and using
different score values for the items. According to Szita and
Lorincz [6], the score reported on Koza’s implementation
would have been equivalent to approximately 5,000 points
which equivalent to human novice level in their Pac-Man
version.
Bonet and Stauffer [7] proposed a reinforcement learning
technique. They used a neural network and temporal difference
learning (TDL) in a 10 x 10 Pac-Man-centred window. The
mazes are simplified to have only one ghost and no power pills.
They showed basic pill pursuit and ghost avoidance behaviors
using complex learning tasks.
Gallagher and Ryan [8] firstly attempted to develop a rule-
based evolutionary Pac-Man agent which based on a simple
finite-state machine and parameterized rules to control the
movement of Pac-Man. The parameters were evolved using
Population-Based Incremental Learning (PBIL) algorithm. The
learning process was run in a reduced version of Pac-Man with
no power pills and one deterministic ghost. The results showed
that it was able to achieve some degree of machine learning,
although the complexity of the game has been reduced much.
Gallagher and Ledwich [9] proposed to use neuroevolution
to learn to play Pac-Man based on the raw on-screen
information. Their work utilized a modified version of the
original game with no power pills and one non-deterministic
ghost only. Although no evolved agent was able to clear a
maze in their experiments, this work showed the potential of
pure machine learning from very limited information.
Thompson, McMillan, Levine and Andrew [10] proposed a
simple finite state machine with look-ahead strategy that plays
Pac-Man. The agent look-ahead by deploying an A* searching
algorithm to find the fastest path to the node containing pills.
The optimized agent achieved 7,826 points in 50 games of Pac-
Man.
B. Ms. Pac-Man
Lucas [11] described an initial approach to develop a Ms.
Pac-man agent. His control algorithm used a neural network to
evaluate the next move of Ms. Pac-Man. His agent utilized a
feature vector composed of handcraft data consisting of the
distances from Ms. Pac-Man to each ghost and edible ghost, to
the nearest pill, to the nearest power pills and to the nearest
junction. The best evolved agent ran in the full-scale Ms. Pac-
Man game achieved 4,780 points in over 100 games,
equivalent to a reasonable human novice.
Szita and Lorincz [6] proposed a cross-entropy
optimization algorithm that learns to play Ms. Pac-Man. The
best performing agent obtained an average score of 8,186,
comparable to the performance of a set of five non-experienced
human players played on the same version of the game, who
averaged 8,064 points.
Wirth and Gallagher [12] described an influence-map
model for producing a Ms. Pac-Man agent. The model captures
the essentials of the game. The highest average score of the
optimized agent is 6,848, which showed comparable
performance to a novice human player.
Handa and Isozaki [13][14] proposed an evolutionary fuzzy
system for playing Ms. Pac-Man. Their method employs fuzzy
logic and the evolution of fuzzy rule parameters. In addition,
reinforcement learning was proposed to learn the action in
critical situation. The best evolved agent achieved 5,739 in 10
runs of Ms. Pac-Man game.
More recently RAMP [15], a rule-based Ms. Pac-Man
agent developed by Fitzgerald, Kemeraitis and Congdon, won
WCCI 2008, with a high score of 15,970 points and an average
of 11,166.
Burrow and Lucas [16] compared the performance of
learning to play Ms. Pac-Man using evolution and Temporal
Difference Learning (TDL). The study was run on a modified
version of Ms. Pac-Man and results showed that evolution of
multi-layer perceptrons (MLP) performed better.
DeLooze and Viner [17] proposed using fuzzy Q-learning
algorithm to train agent playing Ms. Pac-Man. The optimized
agent averaged between 3,000 and 5,000 points.
Robles and Lucas [18] proposed a simple tree search
method to evaluate the best path for Ms. Pac-Man. The tree
was simplified to ignore the movements of ghosts and/or
changes in ghost state. It achieved a high score of 15,640 and
an average of 7,664 in CIG 2009 and was the runner-up in the
competition. This work demonstrated the importance of look-
ahead strategy of agent design.
ICE Pambush 2 [19] and 3 [20], developed by Thawonmas
et al, has won the competition held in CEC 2009 and in CIG
2009 respectively. It achieved a maximum score of 24,640 and
an average of 13,059 in CEC 2009 and a maximum of 30,010
and an average of 17,102 in CIG 2009. It is also a rule-based
agent. It used A* algorithm to search for the pill with lowest
cost which is composed mainly by the distances between Ms.
Pac-Man and ghosts. The main reason that it achieved high
score is it aggressively lures the ghosts to the power pills
followed by ambushing them.
Different techniques have been applied on Pac-Man and
Ms. Pac-Man. Except the scores reported from the
competitions, it should be emphasized that many of the above
quoted scores have been obtained on different version of Pac-
Man and Ms. Pac-Man simulators. This section gives a rough
idea of the relative performance of previous work only.
2010 2nd International IEEE Consumer Electronics Society's Games Innovations Conference
978-1-4244-7180-5/10/$26.00 ©2010 IEEE
function hash(int[][] pixel) {
int h = 0;
for (i = 0; i < 8; i++)
for (j = 0; j < 8, j++)
h += r[i][j] У pixel[i][j];
return h;
}
III. FRAMEWORK OF AGENT
Our agent first captures the entire screen and looks for the
game window. Once the game window has been found, it will
capture the game screen (which is a rectangular area much
smaller than the full screen) every 30 milliseconds periodically
until the end of the game. After capturing the game screen, our
agent will analyze and extract the game objects. Some of the
objects are movable (e.g., Ms. Pac-Man, ghosts and fruits)
while the others are unmovable (e.g., pills and power pills).
Based on the extracted information and the history of the
locations of the movable objects, the agent should decide the
most suitable direction for Ms. Pac-Man’s next move.
A. Game maze representation
We represent the game mazes using weighted graphs. Road
intersections are represented by vertices. Two vertices are
connected by an edge if the corresponding road intersections
can reach each other without passing other intersections. The
distance between these neighboring intersections is represented
by the weight of the corresponding edge. This information
about the topologies of mazes is preprocessed and stored in a
file. At the beginning of the game, our agent loads the file into
memory. It then computes the shortest distance between any
pair of vertices using Dijkstra’s algorithm, and the result is
stored in memory. The shortest distance between any two
positions is computed on-demand. Our algorithm is the same as
that in [13]. We present it as follows:
We denote the two given points by pa and pb. In the mazes
of the Ms Pac-Man game, a point pi that does not lie on road
intersections must be connected to two and only two
intersections. We name them vi1, and vi2. There are four cases
we need to consider:
Case 1: pa and pb are both vertices
Case 2: pa is a vertex and pb is not
Case 3: pb is a vertex and pa is not
Case 4: Neither pa nor pb is a vertex
For case 1, the result can be obtained directly as it has been
computed by the Dijkstra’s algorithm Dijkstra(pa, pb) at the
beginning of the game. For case 2, the shortest distance is
min(Dijkstra(pa, vb1) + distance(vb1, pb), Dijkstra(pa, vb2) +
distance(vb2, pb)) where distance(a, b) is the straight distance
between the two points a and b that lie on the same edge. Case
3 can be handled in the same way as in case 2. For case 4, the
shortest distance is min(distance(pa, va1) + Dijkstra(va1, vb1) +
distance(vb1, pb), distance(pa, va1) + Dijkstra(va1, vb2) +
distance(vb2, pb), distance(pa, va2) + Dijkstra(va2, vb1) +
distance(vb1, pb), distance(pa, va2) + Dijkstra(va2, vb2) +
distance(vb2, pb)).
The shortest direction from one point to another can also be
obtained simultaneously. Once the distance and direction have
been computed, the results will be cached in memory. Later
when the agent needs the information again, it can be retrieved
without further calculations.
B. Game object recognition
We divide each maze into 28 × 30 square cells, each of
which contains 8 × 8 pixels. There are two kinds of cells:
passable or impassable. The walls and blocked area are
examples of impassable cells. As these cells cannot be entered
by Ms. Pac-Man, ghosts nor fruits, their color pixels always
remain unchanged throughout the game. To reduce
computation time, we ignore all impassable cells in the game
object recognition process.
For the passable cells, they either are empty cells or contain
pill or power pill, and sometimes with partial movable objects
(Ms. Pac-man, ghosts and/or fruits) staying over them. To
speed up the recognition process, the bitboard idea from Chess
programming [21] is used. Sixty four 32-bit random numbers
(rx,y) have been assigned to the 8 x 8 pixels in the cell initially.
The hash value of a cell is obtained by the equation:
¦
=
=
7
0,
,),()(
yx
yx yxpixelrpixelhash
The corresponding pseudo code of obtaining the hash value
of a cell is shown in Fig. 2. Pixel is represented by a 2D array
of 32-bit integers. The hash value is also a 32-bit number
which can uniquely identify the different combinations of the
64 pixels in a cell if the initial random numbers are selected
carefully.
For most cases, passable cells do not contain movable
objects. The hash value is either equal to the hash value of an
empty cell, pill cell or power pill cell. The worst case will be
sixteen cells’ hash values are different from these three. It is
when Ms. Pac-Man, the four ghosts and a bonus fruit each of
them occupies two cells, which involves totally sixteen cells.
Figure 2. The pseudo code of obtaining the hash value of a cell
In order to determine these movable objects, ICE Pambush
2’s pixel color counting method [19] together with the history
of movable objects is used. The color counting method is to
count the number of colors in the cell. For example, if the
number of red color is greater than a certain threshold, it will
be considered as Blinky. The pixel color counting method is
simple but not accurate enough in some cases since it does not
consider the shape of object but color only. Wrong recognition
may happen when there is a same color object (e.g. bonus fruit)
appearing on the screen. A simple solution is to consider the
history of the object as well. For example, if there is a red-type
object but Blinky (and edible ghosts) is not moving nearby
previously, which guarantees that the red-type object is not
Blinky.
2010 2nd International IEEE Consumer Electronics Society's Games Innovations Conference
978-1-4244-7180-5/10/$26.00 ©2010 IEEE
Since calculating the hash value of cells is relatively fast, it
is able to identify the locations of movable objects quickly and
the further recognition of these objects can be left for the
counting method.
C. Multi-mode framework
The objective of the game is to achieve highest scores. In
order to do so, one must eat pills and fruits as many as possible,
avoid being captured by the ghosts and eat power pills in
suitable time followed by capturing the edible ghosts as many
as possible, at the same time.
As our agent uses the Graph structure to model the map, the
distances between ghosts and Ms. Pac-Man are computed
accurately. If the ghosts are far away, we say that Ms. Pac-Man
is in safe situation; otherwise it is in dangerous situation. We
further divide the safe situation into different modes, which
have different short term objectives, as described below.
If there is an edible ghost near Ms. Pac-Man and no non-
edible ghost near her, Ms. Pac-Man will enter the capture
mode. In this mode, her objective is to chase the edible ghosts
and capture them as many as possible.
If there is a power pill near Ms. Pac-Man, she enters the
ambush mode. In this mode, Ms. Pac-Man waits at the corner
which is close to the power pill. When the ghosts are close
enough, she eats the power pill and enters the capture mode to
capture the nearby edible ghosts. If the ghosts are not
approaching, Ms. Pac-Man leaves this mode (and enters the pill
mode described below so as to eat other pills instead of wasting
time at the corner). Since capturing the edible ghosts provides
significant bonus scores in the game, the ambush mode is the
key part of obtaining high scores in the agent design.
For all other cases, Ms. Pac-Man enters the pill mode. In
this mode, her objective is to eat the pills as many and fast as
possible in order to get higher scores and clear the stage.
Currently we use greedy methods in all the modes under
the safe situations. When Ms. Pac-Man is in the capture mode,
she chases the nearest ghost using shortest path. When she is in
the pill mode, she moves towards the nearest pill with
maximum ghost distance. When Ms. Pac-Man is in dangerous
situation, Monte-Carlo simulation is used, which will be
described in Section IV.
Our multi-mode framework design allows us to easily add
(or remove) mode modules for including new (or removing
old) features. For example, we observed that our agent
sometimes cannot clear the stage as there are some orphan pills
on disjoined edges. It is hard for current agent to plan a path to
eat the pills (and at the same time avoid being captured by
ghosts). In this case, we may add a new mode to handle this
situation. If the number of remaining pills is less than a
threshold value, Ms. Pac-Man enters the endgame mode. And
we can design an algorithm to tackle the problem specifically.
D. Mode transition
Simply with the above multiple modes, we observe that the
behavior of our agent is strange especially when she leaves a
mode and enters another mode. The reason is that the short
term objectives of different mode usually have conflict with
each other. For example, in the capture mode Ms. Pac-Man
chases the edible ghosts and may instruct her to move far away
from a set of pills. But when the edible ghosts run too far away,
Ms. Pac-Man may enter the pill mode and go in reverse
direction (towards the set of pills) immediately. After several
time units, one edible ghost may go back and is approaching
Ms. Pac-Man (edible ghost’s direction is purely random). At
that moment, Ms. Pac-Man enters capture mode again and
reverses her direction immediately. The above scenario is
summarized in Fig. 3.
Figure 3. The scenario of mode conflicts: (a) Ms. Pac-Man enters the capture
mode to chase the edible ghost (b) Ms. Pac-Man enters the pi ll mode when
edible ghosts is too far away (c) the edible ghost approachs Ms. Pac-Man (due
to random move) (d) Ms. Pac-Man enters the capture mode again to chase the
edible ghost
The above problem cannot be adequately handled by using
hysteresis alone. A better solution is to add transition modes in
between the designed modes, and with hysteresis between
neighboring modes. A transition mode can be modeled as a
weighted function of the two connected modes. For example,
the capture mode wants to minimize the distance between Ms.
Pac-Man and edible ghosts (fcapture = min(ědistance(pacman,
edible ghost i))) while the pill mode wants to minimize the
distance between Ms. Pac-Man and a set of pills (fpill = min(ě
distance(pacman, pill i))). The transition mode between the
capture mode and the pill mode is defined as fcapture_pill = fcapture
×  + fpill × (1 - ) where  is the weighting parameter [0, 1].
IV. MONTE-CARLO SIMULATION IN ADVANCE IN
DANGEROUS STATE
The Monte-Carlo simulation module is called when Ms.
Pac-Man is in the dangerous state. We define the dangerous
state as the case where Ms. Pac-Man is surrounded by one or
more ghosts and likely to lose her life immediately if she
makes a wrong move. An example of the dangerous state is
shown in Fig. 4. It is not easy to design a simple static method
to find the safe path as the ghosts are not static objects. The
ghosts do move according to Ms. Pac-Man’s action together
with a probabilistic random decision. It is reasonable to use
(a) (b)
(c) (d)
2010 2nd International IEEE Consumer Electronics Society's Games Innovations Conference
978-1-4244-7180-5/10/$26.00 ©2010 IEEE
Monte-Carlo simulation which has the look-ahead capability to
obtain the accurate survival probability of different moving
decisions of Ms. Pac-Man.
Figure 4. An example of dangerous situation
In our simulation module, we have made a number of
assumptions and changes to the game rules so as to simplify
the implementation and reduce the complexity. The
assumptions and changes are listed as follows:
The speeds of Ms. Pac-Man and the four ghosts are the
same and are constant throughout the game.
The coordinates of the Ms. Pac-Man and the four
ghosts are integers. The coordinates reported from the
recognition process have been rounded up to integer
already.
Ms. Pac-Man will decide her direction when she is in
the intersecting point only. If we allow Ms. Pac-Man to
change direction in any non-intersection point in the
simulation, she will often perform random walk and
the simulation result often do not reflect the truth
value.
Blinky moves towards Ms. Pac-Man’s current
coordinate by the shortest path with probability 0.9
when he is in the intersecting point. With probability
0.1, he selects a valid direction randomly.
With probability 0.8 when Pinky is in the intersecting
point, he goes ahead towards Ms. Pac-Man’s next
arrival vertex by the shortest path. But if he is already
very close to the Ms. Pac-Man, he will move towards
Ms. Pac-Man’s current coordinate instead. With
probability 0.2, he selects a valid direction randomly.
With probability 0.7 when Inky is in the intersecting
point, he follows Ms. Pac-Man’s previous move by
moving towards Ms. Pac-Man’s previous vertex using
the shortest path. But if he is already very close to the
Ms. Pac-Man, he will move towards Ms. Pac-Man’s
current coordinate instead. With probability 0.3, he
selects a valid direction randomly.
With probability 0.5 when Sue is in the intersecting
point, she moves towards Ms. Pac-Man’s current
coordinate by the shortest path. With probability 0.5,
she selects a valid direction randomly.
Ms. Pac-Man enters the energetic state after she has
eaten a power pill. The time that Ms. Pac-Man will
stay in this state is 30 units of cell travelling time. In
other words, Ms. Pac-Man will resume to normal state
after eating a power pill followed by travelling 30
cells.
The game tree of Ms. Pac-Man is relatively long and it is
not feasible to take enough samples of simulation in real time.
Our proposed Monte-Carlo method is a modified version. We
perform simulation in advance and with shallow search instead
of traditional full-depth search.
A. Shallow search
When the depth of the game tree is shortened, the leaf
nodes represent the intermediate game states instead of ending
game states. There is a problem in general situation but not in
the dangerous state. The first reason is that it is not easy to
decide a reasonable function to evaluate the leaf nodes as they
are intermediate game states only. The evaluation of
intermediate game states usually has conflicts with final game
objectives. Maximizing the intermediate game state does not
necessarily lead to a good ending game state and vice versa.
The local optimum in shallow search is not necessarily the
global optimum in full search.
However in the dangerous state, when Ms. Pac-Man makes
wrong decision she will die and that simulation will be stopped
immediately. So these leaf nodes are not representing
intermediate game states but ending game states. The
evaluation of these ending game states is much easier and more
accurate. Each simulation stops when Ms. Pac-Man dies or she
remains alive after travelling a number of vertices. The
simulation backward propagates the life status (alive or dead).
The agent then selects the move with the maximum alive rate.
B. Simulation in advance
Just making shallow search does not completely solve the
problem. As for all real-time games, the time constraint is very
tight. For example, Ms. Pac-Man has to make a decision in the
intersection of maze within about 10 milliseconds, otherwise
she will move along the original direction and miss the
intersection to make turn. The time is so short that it is not
feasible to take enough samples in simulation.
To solve the problem, we simulate the next move from
current situation in advance. The scenario is shown in Fig. 6.
When Ms. Pac-Man is still in location (x, y), and we know that
she is going along direction d, the next location (x’, y’) can be
determined. From the moment that Ms. Pac-Man enters (x, y),
simulation starts. The agent simulates the situation that Ms.
Pac-Man is forced to enter (x’, y’) after one time unit (ghosts
and fruits also update their locations accordingly), then she
selects the subsequent moves randomly until the maximum
distance or maximum number of vertices has been reached.
The life status is backward propagated to (x’, y’) and another
simulation from (x, y) is started again. When Ms. Pac-Man
enters (x’, y’) in the playing game, simulations can be stopped.
The simulation result is now retrieved and the move with
maximum alive rate (e.g. direction d’) can be decided. Now the
current location of Ms. Pac-Man is (x’, y’) and current direction
is d’, the corresponding next location (x’’, y’’) can be
determined. The next in advance simulation is going to be
started immediately.
2010 2nd International IEEE Consumer Electronics Society's Games Innovations Conference
978-1-4244-7180-5/10/$26.00 ©2010 IEEE
We observe that Ms. Pac-Man uses about 100 milliseconds
to travel one cell in the game (the time is slower if the cell
contains pill, and faster if Ms. Pac-Man turns around the
corner). In other words, our agent can make use of the 100
milliseconds interval to simulate the next result. Experiments
have shown that the time is now enough to take reasonable
samples.
Figure 5. The scenario of simulation in advance: (a) starts simulati on when
Ms. Pac-Man in (x, y) (b) Ms. Pac-Man is forced to enter (x’, y’) in simulation
(c) the subsequent moves is selected randomly and the life status is backward
propagated to (x’, y’)
V. PERFORMANCE EVALUATION
We designed a strategy for ghost avoidance of Ms. Pac-
Man agent by using Monte-Carlo simulation:
MC-Alive: Simulation backward propagates the dead or
alive condition. The depth of the shallow simulation is
to visit at most five vertices of the game maze. The
agent then selects a move with the maximum alive rate.
We designed another simple greedy strategy for ghost
avoidance for comparison:
Greedy: Selects a move to maximize the minimum
distance between Ms. Pac-Man and ghosts. If there is a
tie, maximize the second minimum distance and so on.
It should be emphasized that MC-Alive and Greedy use the
same framework but invoke different modules for ghost
avoidance in dangerous situation only. The dangerous state in
the following experiments is defined as when the minimum
distance between Ms. Pac-Man and the nearest ghost is less
than 7 cells or the minimum distance between Ms. Pac-Man
and the second nearest ghost is less than 10 cells. If it is already
in dangerous situation, the minimum distance threshold is
extended to 9 and 12 respectively so as to create the hysteresis
effect.
A. Experiments in the real Ms. Pac-Man game
The experiments were conducted in the real Ms. Pac-Man
game (via screen capture mechanism) for 20 games for each
strategy that ran on Windows XP Professional SP3 on Intel
Core-i7 860 2.8GHz CPU with 4GB Memory. The screen
capture mechanism is widely used in the competition [1]. The
average game time of MC-Alive and Greedy per game is
roughly 10 minutes and 5 minutes respectively.
TABLE I. THE SCORES OF DIFFERENT STRATEGIES FOR 25 GAMES
Agent Min Max Mean Std Dev
Greedy 1,980 13,260 6,361 3,473.7
MC-Alive 2,890 22,670 12,872 5,397.6
Agent The 25th
Percentile
Median The 75th
Percentile
Greedy 3,610 5,590 10,320
MC-Alive 8,980 13,860 16,990
Figure 6. The score distrubution of the two strategies for 25 games.
The scores of the two strategies are listed in Table I. There
is significant difference between the scores. MC-Alive achieved
a high score of 22,670 and an average score of 12,872 in 25
games. Greedy achieved a maximum score of 13,260 and an
average score of 6,361 only. The average score, 25th percentile,
median and 75th percentile of MC-Alive are all higher and about
double of those of Greedy. The look-ahead ability of Monte-
Carlo simulation prevents Ms. Pac-Man being trapped by the
ghosts. It successfully saves Ms. Pac-Man’s life in dangerous
situation, and allows the agent to play longer time and to
achieve higher scores.
We also ran 25 games of ICE Pambush 3 [22], which is the
only open source Ms. Pac-Man agent in the recent Ms. Pac-
Man competitions, on the same system for comparison. The
high score and the mean score are 36,860 and 18,440
respectively. Our scores are lower than that of ICE Pambush 3
by 38% and 30% respectively. But we have to reemphasize that
our agent is currently focusing on ghost avoidance only and
does not have any intelligent pills and ghosts hunting strategy
but straight forward and greedy strategy only.
(x, y)
(x’, y’)
(c)
shallow Monte-
Carlo simulation
(a) (b)
11
8
6
00
2
5
7
9
2
0
2
4
6
8
10
12
(0,
5000]
(5000,
10000]
(10000,
15000]
(15000,
20000]
(20000,
25000]
Range of Scores
No. of Games
Greedy MC- A li v e
2010 2nd International IEEE Consumer Electronics Society's Games Innovations Conference
978-1-4244-7180-5/10/$26.00 ©2010 IEEE
TABLE II. THE NUMBER OF STAGES CLEARED BY DIFFERENT
STRATEGIES FOR 25 GAMES
Agent Min The 25th
Percentile
Median The 75th
Percentile
Max
Greedy 0 0 0 1 1
MC-Alive 0 1 2 2 3
Figure 7. The distrubution of number of stages cleared by the two strategies
for 25 games.
Besides the scores, we also compare the number of stages
cleared by the two strategies. The results are listed in Table II.
The results are consistent with that in Table I. MC-Alive is able
to clear more stages than Greedy in general. 50% of the time,
Greedy cannot clear the first stage. In other words, Greedy
often loses all its three lives on stage 1. But MC-Alive is able to
advance to stage 4 in its best run, and to the median of the
number of cleared stages is three.
B. Experiments in the dangerous situations
In order to pay more attention to the dangerous situations,
we conducted an individual set of experiments. Instead of
running a whole game, different typical and dangerous
situations were selected to test the ghost avoidance
performance of the two strategies. In dangerous situations it
seems to be a good move for Ms. Pac-Man in short term but
often be a poor choice if she can look-ahead deeper in the long
run. The correct move is sometimes a bad choice in short term
but indeed a good move if we look-ahead evaluating it.
Figure 8. A selected example of dangerous situation
In the example of Fig. 8, Ms. Pac-Man has three choices,
namely A, B and C. The correct move for Ms. Pac-Man should
be moving towards A and escaping via D. Greedy advises Ms.
Pac-Man to move to B since it maximizes the distances to
Blinky and Pinky. But Ms. Pac-Man will be trapped in the
corner easily. The right choice A although shortens the distance
between Ms. Pac-Man and Pinky, MC-Alive is able to look-
ahead and determine that Ms. Pac-Man can reach D via A and
escape from the two ghosts.
Figure 9. Another selected example of dangerous situation
Fig. 9 is another typical example. Ms. Pac-Man also has
three choices, namely A, B and C. The correct move for Ms.
Pac-Man should be moving towards A and escaping via D
followed by E. Greedy advises Ms. Pac-Man to move to B
since this move maximizes her distances to Blinky, Pinky and
Inky. But Ms. Pac-Man will be trapped in the top long corridor
easily. In contrast, MC-Alive selects A correctly since
simulations report that B and C’s alive rate are much lower
than that of A.
VI. CONCLUSIONS
The aim of this paper is to study the effectiveness of
applying Monte-Carlo simulation for ghost avoidance in the
Ms. Pac-Man. This work is the first approach to decide the next
move of Ms. Pac-Man in dangerous situation by Monte-Carlo
method. The look-ahead ability of Monte-Carlo allows Ms.
Pac-Man to find a safe path that avoids being captured by
ghosts in a dynamic game environment. We propose a
modified version of Monte-Carlo method. The shallow
simulation simulates the situation in advance to overcome the
tight time requirement in this real-time game. The result has
shown that our method can reduce the chance of losing Ms.
Pac-Man’s life and raise the score significantly.
We are currently targeting to develop a high-scoring agent
to participate in the Ms. Pac-Man competition. We plan to use
CI techniques, instead of currently greedy method, to eat pills
efficiently and to lure ghosts to power pills and ambush them
effectively in order to achieve higher scores.
Monte-Carlo simulation is good at evaluating the average
performance in short reasonable time, but weak in planning
long term goals. For example, the result of using Monte-Carlo
simulation to decide the path to eat the pills is unsatisfactory.
We are considering Monte-Carlo Tree Search (MCTS), an
extension of Monte-Carlo simulation applied to MinMax tree,
to overcome this problem. MCTS is good at exploring
promising paths in a tree and is successful in applying to Go
[23][24] and Poker [25]. Further experiments of using MCTS
in Ms. Pac-Man agent will be conducted.
REFERENCES
17
8
00
3
6
14
2
0
5
10
15
20
0123
No. of Stages Cleared
No. of Games
Greedy MC- A l iv e
2010 2nd International IEEE Consumer Electronics Society's Games Innovations Conference
978-1-4244-7180-5/10/$26.00 ©2010 IEEE
[1] S. M. Lucas, “Ms. Pac-Man competition,” SIGEVOlution, vol. 2, no. 4,
pp. 37-38, 2007.
[2] J. Togelius, S. Karakovskiy and R. Baumgarten, “The 2009 Mario AI
Competition,” IEEE Congress on Evolutionary Computation, pp. 1-8,
2010
[3] D. Loiacono, J. Togelius, P. L. Lanzi, L. Kinnaird-Heether, S. M. Lucas,
M. Simmerson, D. Perez, R. G. Reynolds, Y. Saez “The WCCI 2008
Simulated Car Racing Competition,” IEEE Symposium on
Computational Intelligence and Games, pp. 119-126, 2008
[4] J. R. Koza, Genetic Programming: on the Programing of Computers by
Means of Natural Selection. MIT Press, 1992.
[5] J. P. Rosca, “Generality versus size in genetic programming.”
Proceedings of the Genetic Programming 1996 Conference, pp. 381-387,
MIT Press, 1996.
[6] Szita and A. Lorincz, “Learning to play using low-complexity rule-based
policies: illustrations through Ms. Pac-Man,” Journal of Artificial
Intelligence Research, vol. 30, no. 1, pp. 659-684, 2007.
[7] J. S. D. Bonet and C. P. Stauffer, “Learning to play Pac-Man using
incremental rein forcement learning,” Proceedings of the Congress on
Evolutionary Computation, 1999.
[8] M. Gallagher and A. Ryan, “Learning to play Pac-Man: an evolutionary,
rule-based approach,” The 2003 Congress on Evolutionary Computation,
vol. 4, pp. 2462-2469, 2003.
[9] M. Gallagher and M. Ledwich, “Evolving Pac-Man players: Can we
learn from raw input?,” IEEE Symposium on Computational Intelligence
and Games, pp. 282-287, 2007.
[10] T. Thompson, L. McMillan, J. Levine and A. Andrew, “An evaluation of
the benefits of look-ahead in Pac-Man,” IEEE Symposium on
Computational Intelligence and Games, pp. 310-315, 2008.
[11] S. M. Lucas, “Evolving a neural network location evaluator to play Ms.
Pac-Man,” IEEE Symposium on Computational Intelligence and Games,
pp. 203-210, 2005.
[12] N. Wirth and M. Gallagher, “An influence map model for playing Ms.
Pac-Man,” IEEE Symposium on Computational Intelligence and Games,
pp. 228-233, 2008.
[13] H. Handa and M. Isozaki, “Evolutionary fuzzy systems for generating
better Ms. Pac-Man players,” IEEE International Conference on Fuzzy
Systems, pp. 2182-2185, 2008.
[14] H. Handa, “Constitution of Ms. Pac-Man player with critical-situation
learning mechanism,” Fourth International Workshop on Computational
Intelligence & Applications, pp. 48-53, 2008
[15] A. Fitzgerald, P. Kemeraitis and C. B. Congdon, “RAMP: A rule-based
agent for Ms. Pac-Man,” IEEE Congress on Evolutionary Computation,
pp. 2646-2653, 2009.
[16] P. Burrow and S. M. Lucas, “Evolution versus temporal difference
learning for learning to play Ms. Pac-Man,” IEEE Symposium on
Computational Intelligence and Games, pp. 53-60, 2009.
[17] L. DeLooze and W. Viner, “Fuzzy Q-Learning in a Nondeterministic
Environment: Developing an Intelligent Ms. Pac-Man Agent,” IEEE
Symposium on Computational Intelligence and Games, pp. 162-169,
2009.
[18] D. Robles, S. M. Lucas, “A simple tree search method for playing Ms.
Pac-Man,” IEEE Symposium on Computational Intelligence and Games,
pp. 249-255, 2009.
[19] R. Thawonmas and H. Matsumoto, “Automatic controller of Ms. Pac-
Man and its performance: Winner of the IEEE CEC 2009 software agent
Ms. Pac-Man competition,” Proc. of Asia Simulation Conference 2009
(JSST 2009), 2009.
[20] H. Matsumoto, T. Ashida, Y. Ozasa, T. Maruyama and R. Thawonmas,
“ICE Pambush 3,” Controller description paper,
http://cswww.essex.ac.uk/staff/sml/pacman/cig2009/ICEPambush3/ICE
%20Pambush%203.pdf (accessed 25/10/2010)
[21] A. L. Samuel, “Some studies in machine learnin g using the game of
Checkers,” IBM Journal of Research and Development, vol. 3, issue 3,
pp. 210-229, 1959.
[22] H. Matsumoto, T. Ashida, Y. Ozasa, T. Maruyama and R. Thawonmas,
“ICE Pambush 3,” Controller source code,
http://cswww.essex.ac.uk/staff/sml/pacman/cig2009/ICEPambush3/ICE
%20Pambush%203.zip (accessed 25/10/2010)
[23] S. Gelly and D. Silver, “Achieving master level play in 9×9 computer
go,” Proceedings of AAAI, pp. 1537–1540, 2008.
[24] C. S. Lee, M. H. Wang, G. Chaslot J. B. Hoock, A. Rimmel, O. Teytaud,
S. R. Tsai, S. C. Hsu and T. P Hong, “The computational intelligence of
MoGo revealed in Taiwan's Computer Go tournaments,” vol. 1, issue 1,
pp. 73-89, 2009.
[25] G. V. d. Broeck, K. Driessens and J. Ramon, “Monte-Carlo Tree Search
in Poker Using Expected Reward Distributions,” Proceedings of the 1st
Asian Conference on Machine Learning: Advances in Machine
Learning, pp. 367-381, 2009
2010 2nd International IEEE Consumer Electronics Society's Games Innovations Conference
978-1-4244-7180-5/10/$26.00 ©2010 IEEE
... games, the time constraint is very tight [31] expecting a player to make a decision towards an action within about 10 ms (milliseconds). The next move is simulated based on the information taken from the current situation. ...
... • Visit count N (v) (a non negative integer) [31] The ratio of number of times the node has been visited Q(v), to the total reward of playouts N(v) is defined in Eqn .1, [3], [32], [23] and [31]. ...
... • Visit count N (v) (a non negative integer) [31] The ratio of number of times the node has been visited Q(v), to the total reward of playouts N(v) is defined in Eqn .1, [3], [32], [23] and [31]. ...
Conference Paper
Full-text available
Artificial and Computational Intelligence in computer games play an important role that could simulate various aspects of real life problems. Development of artificial intelligence techniques in real time decision-making games can provide a platform for the examination of tree search algorithms. In this paper, we present a rehabilitation system known as RehabGame in which the Monte-Carlo Tree Search algorithm is used. The objective of the game is to combat the physical impairment of stroke/ brain injury casualties in order to improve upper limb movement. Through the process of a real-time rehabilitation game, the player decides on paths that could be taken by her/his upper limb in order to reach virtual goal objects. The system has the capability of adjusting the difficulty level to the player's ability by learning from the movements made and generating further subsequent objects. The game collects orientation, muscle and joint activity data and utilizes them to make decisions on game progression. Limb movements are stored in the search tree which is used to determine the location of new target virtual fruit objects by accessing the data saved in the background from different game plays. It monitors the enactment of the muscles strain and stress through the Myo armband sensor and provides the next step required for the rehabilitation purpose. The results from two samples show the effectiveness of the Monte-Carlo Tree Search in the RehabGame by being able to build a coherent hand motion. It progresses from highly achievable paths to the less achievable ones, thus configuring and personalizing the rehabilitation process.
... References Original (Screen-Capture) [9], [10], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35] Public Variant [36], [37], [38], [39], [40], [41], [42], [43] Ms Pac-Man vs Ghosts engine [12], [44], [20], [45], [46], [47], [13], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [67] Ms Pac-Man vs Ghost Team engine [14] Own implementation [68], [69], [70], [71], [72], [73], [74], [75], [76], [77], [78], [79], [80], [81], [82], [83], [84], [85], [86], [87], [88], [89], [90], [91], [92] in the most publications. Prior to the competitions described above, papers were largely fragmented, with each using their own, often much simplified version of the game. ...
... Rule-based & Finite State Machines [71], [16], [15], [72], [18], [23], [24], [9], [52], [10], [65] 67 Tree Search & Monte Carlo [20], [25], [26], [74], [13], [29], [30], [49], [51], [59], [56], [61] Evolutionary Algorithms [68], [69], [47], [45], [46], [48], [53], [50], [58], [57], [59], [60], [63] Neural Networks [70], [38], [75] Neuro-evolutionary [12], [36], [37], [44], [28], [31], [32], [33], [77], [62], [67], [64], [43] Reinforcement Learning [73], [21], [19], [22], [78], [41], [42], [34], [82], [92], [35] Other [27], [17], [54], [79], [90], [91] Game psychology [93], [94], [95], [96], [97], [98], [99] 7 Psychology [100], [101], [81] 3 Robotics [102], [103] 2 Sociology [104], [105] 2 Brain Computer Interfaces [83], [84], [85] 3 Biology and Animals [106] 1 Education [102], [107], [103], [80] 4 Other [108], [39], [109], [40] 4 ...
... In a similar fashion to [29], Tong and Sung [26] propose a ghost avoidance module based on MC simulations (but not MCTS) to allow their controller to evade ghosts more efficiently. The authors' algorithm is based on [16]. ...
Article
Full-text available
Pac-Man and its equally popular successor Ms Pac-Man are often attributed to being the frontrunners of the golden age of arcade video games. Their impact goes well beyond the commercial world of video games and both games have featured in numerous academic research projects over the last two decades. In fact, scientific interest is on the rise and many avenues of research have been pursued, including studies in robotics, biology, sociology and psychology. The most active field of research is computational intelligence, not least because of popular academic gaming competitions that feature Ms Pac-Man. This paper summarises peer-reviewed research that focuses on either game (or close variants thereof) with particular emphasis on the field of computational intelligence. The potential usefulness of games like Pac-Man for higher education is also discussed and the paper concludes with a discussion of prospects for future work.
... The average score in the experiments was 55785.5 points, with a maximum score of 111610. Tong and Sung (2010) designed Ms. Pac-Man agent based on the Monte Carlo (MC) simulation to help it avoid being caught by ghosts and able to survive for longer periods of time in the game. The DA is used as the search algorithm for finding the shortest path between two points in the maze. ...
... Tong et al. (2011) designed Ms. Pac-Man agent based on the Monte Carlo Endgame Module (EM) to help it avoid being caught by the ghosts and able to survive for longer periods of time in the game. The system is basically an extension of Tong and Sung (2010) work. The EM has two main functions, called the path generation and path testing. ...
Article
Full-text available
AbstrakTeknik Kecerdasan Buatan (AI) berjaya digunakan dan diaplikasikan dalam pelbagai bidang, termasukpembuatan, kejuruteraan, ekonomi, perubatan dan ketenteraan. Kebelakangan ini, terdapat minat yangsemakin meningkat dalam Permainan Kecerdasan Buatan atau permainan AI. Permainan AI merujukkepada teknik yang diaplikasikan dalam permainan komputer dan video seperti pembelajaran, pathfinding,perancangan, dan lain-lain bagi mewujudkan tingkah laku pintar dan autonomi kepada karakter dalampermainan. Objektif utama kajian ini adalah untuk mengemukakan beberapa teknik yang biasa digunakandalam merekabentuk dan mengawal karakter berasaskan komputer untuk permainan Ms Pac-Man antaratahun 2005-2012. Ms Pac-Man adalah salah satu permainan yang digunakan dalam siri pertandinganpermainan diperingkat antarabangsa sebagai penanda aras untuk perbandingan pengawal autonomi.Kaedah analisis kandungan yang menyeluruh dijalankan secara ulasan dan sorotan literatur secara kritikal.Dapatan kajian menunjukkan bahawa, walaupun terdapat berbagai teknik, limitasi utama dalam kajianterdahulu untuk mewujudkan karakter permaianan Pac Man adalah kekurangan Generalization Capabilitydalam kepelbagaian karakter permainan. Hasil kajian ini akan dapat digunakan oleh penyelidik untukmeningkatkan keupayaan Generalization AI karakter permainan dalam Pasaran Permainan KecerdasanBuatan. Abstract Artificial Intelligence (AI) techniques are successfully used and applied in a wide range of areas, includingmanufacturing, engineering, economics, medicine and military. In recent years, there has been anincreasing interest in Game Artificial Intelligence or Game AI. Game AI refers to techniques applied incomputer and video games such as learning, pathfinding, planning, and many others for creating intelligentand autonomous behaviour to the characters in games. The main objective of this paper is to highlightseveral most common of the AI techniques for designing and controlling the computer-based charactersto play Ms. Pac-Man game between years 2005-2012. The Ms. Pac-Man is one of the games that used asbenchmark for comparison of autonomous controllers in a series of international Game AI competitions.An extensive content analysis method was conducted through critical review on previous literature relatedto the field. Findings highlight, although there was various and unique techniques available, the majorlimitation of previous studies for creating the Ms. Pac-Man game characters is a lack of generalizationcapability across different game characters. The findings could provide the future direction for researchersto improve the Generalization A.I capability of game characters in the Game Artificial Intelligence market.
... It was first proposed as a framework for game AI in the paper [124], which illustrated the detailed procedures for the MCTS. It is further implemented in path planning of multiple games, including Ms. Pac-Man [125,126] and a two-player turn-based strategy board game called Go (in which the multi-agent Monte Carlo is considered) [127]. Simulation in games has shown positive results. ...
Article
Full-text available
In recent years, unmanned aerial vehicles (UAVs) have gained popularity due to their flexibility, mobility, and accessibility in various fields, including search and rescue (SAR) operations. The use of UAVs in SAR can greatly enhance the task success rates in reaching inaccessible or dangerous areas, performing challenging operations, and providing real-time monitoring and modeling of the situation. This article aims to help readers understand the latest progress and trends in this field by synthesizing and organizing papers related to UAV search and rescue. An introduction to the various types and components of UAVs and their importance in SAR operations is settled first. Additionally, we present a comprehensive review of sensor integrations in UAVs for SAR operations, highlighting their roles in target perception, localization, and identification. Furthermore, we elaborate on the various applications of UAVs in SAR, including on-site monitoring and modeling, perception and localization of targets, and SAR operations such as task assignment, path planning, and collision avoidance. We compare different approaches and methodologies used in different studies, assess the strengths and weaknesses of various approaches, and provide insights on addressing the research questions relating to specific UAV operations in SAR. Overall, this article presents a comprehensive overview of the significant role of UAVs in SAR operations. It emphasizes the vital contributions of drones in enhancing mission success rates, augmenting situational awareness, and facilitating efficient and effective SAR activities. Additionally, the article discusses potential avenues for enhancing the performance of UAVs in SAR.
... An achievement of MCTS on Pac-Man is the one presented in [61], where an agent is designed to solve the problem of avoiding pincer moves (every escape path for Pac-Man is blocked) of the ghosts. A number of MCTS agents has also proposed for achieving specific goals in the Ms. Pac-Man game, such as ghost avoidance [131] and endgame situations [132]. Recently, in [97] a real-time MCTS approach has been proposed for controlling the Pac-Man character. ...
Thesis
Full-text available
This dissertation studies the problem of developing intelligent agents, which are able to acquire skills in an autonomous way, simulating human behaviour. An autonomous intelligent agent acts effectively in an unknown environment, directing its activity towards achieving a specific goal based on some performance measure. Through this interaction, a rich amount of information is received, which allows the agent to perceive the consequences of its actions, identify important behavioural components, and adapt its behaviour through learning. In this direction, the present dissertation concerns the development, implementation and evaluation of machine learning techniques for building intelligent agents. Three important and very challenging tasks are considered: i) approximate reinforcement learning, where the agent's policy is evaluated and improved through the approximation of the value function, ii) Bayesian reinforcement learning, where the reinforcement learning problem is modeled as a decision-theoretic problem, by placing a prior distribution over Markov Decision Processes (MDPs) that encodes the agent's belief about the true environment, and iii) Development of intelligent agents on games, which constitute a really challenging platform for developing machine learning methodologies, involving a number of issues that should be resolved, such as the appropriate choice of state representation, continuous action spaces, etc..In the first part, we focus on the problem of value function approximation suggesting two different methodologies. Firstly, we propose the Relevance Vector Machine Temporal Difference (RVMTD) algorithm, which constitutes an advanced kernelized Bayesian methodology for model-free value function approximation, employing the RVM regression framework as a generative model. The key aspect of RVMTD is the restructure of the policy evaluation problem as a linear regression problem. An online kernel sparsification technique is adopted, rendering the RVMTD practical in large scale domains. Based on this scheme, we derive recursive low-complexity formulas for the online update of the model observations. For the estimation of the unknown model coefficients a sparse Bayesian methodology is adopted that enhances model capabilities. Secondly, a model-based reinforcement learning algorithm is proposed, which is based on the online partitioning of the input space into clusters. As the data arrive sequentially to the learner, an online extension of the vanilla EM algorithm is used for clustering. In this way, a number of basis functions are created and updated automatically. Also, statistics are kept about the dynamics of the environment that are subsequently used for policy evaluation. Finally, the least-squares solution is used for the estimation of the unknown coefficients of the value function model.In the second part, we address the Bayesian reinforcement learning problem proposing two advanced Bayesian algorithms. Firstly, we present the Linear Bayesian Reinforcement Learning (LBRL) algorithm showing that the system dynamics can be estimated accurately by a Bayesian linear Gaussian model, which takes into account correlations in the state features. Policies are estimated by applying approximate dynamic programming on a transition model that is sampled from the current posterior. This form of approximate Thompson sampling results in a good exploration in unknown MDPs. Secondly, the Cover Tree Bayesian Reinforcement Learning (CTBRL) algorithm is proposed which constitutes an online tree-based Bayesian approach for reinforcement learning. The main idea of CTBRL is the construction of a cover tree from the observations, which remains efficient in high dimensional spaces. In this way, we create a set of partitions of the state space. An efficient non-parametric Bayesian conditional density estimator is also introduced on the cover tree structure.This is a generalized context tree, endowed with a multivariate linear Bayesian model at each node and is used for the estimation of the dynamics of the underlying environment. Thus, taking a sample for the posterior, we obtain a piecewise linear Gaussian model of the dynamics. The main advantages of this approach are its flexibility and efficiency, rendering it suitable for reinforcement learning problems in continuous state spaces. In the third part of this thesis, we consider the problem of developing intelligent agents in two challenging games, the Ms.PacMan and the AngryBirds. Firstly, we propose the RL-PacMan agent, which is based on an abstract but informative state space representation. The adopted representation is able to encode a game scene, giving the opportunity to our agent to distinguish different situations. For discovering a good or even optimal policy, we use the model-free SARSA(ι) reinforcement learning algorithm. In our study, we demonstrate that an efficient state representation is of central interest for the design of an intelligent agent. Finally, we propose the AngryBER agent, which is based on an efficient tree structure for representing each game screenshot. This representation has the advantage of establishing an informative feature space and modifying the task of game playing to a regression problem. A Bayesian ensemble regression framework is used for the estimation of the return of each action, where each pair of ‘object material' and ‘bird type' has its own regression model. After each shot, the regression model is incrementally updated, in a fully closed form.The AngryBER agent participated in the international AIBIRDS 2014 competition winning the 2nd price among 12 participants.
... In contrast, simulation based methods like Monte-Carlo Tree Search (MCTS) proved to be useful in previous years of this competition. The MCTS approach by Tong and Sung [11] was capable of avoiding Ghosts and attaining a maximal score of 21.000 points. Robles and Lucas [12] applied a Tree Search method at the screen capture version of the game. ...
... A number of authors made use of EAs to design NN-based controllers [17]- [19]. Beside EA and GA, a wide range of other techniques have been applied, such as Ant Colonies [20], Monte Carlo methodologies [21], [22], Monte Carlo Tree Search (MCTS) [23], [24], and Reinforcement Learning [25]. Alhejali and Lucas [26] applied genetic algorithm to enhance the performances of a Ms. Pac-Man agent created using MCTS, showing an impressive 18% increase on the average score. ...
Article
Full-text available
In the last year, thanks to the Ms. Pac-Man vs Ghosts competition, the game of Ms. Pac-Man has gained increasing attention from academics in the field of Computational Intelligence. In this work, we contribute to this research stream by presenting a simple Genetic Algorithm with Lexicographic Ranking for the optimization of Flocking Strategy-based ghost controllers. Flocking Strategies are a paradigm for intelligent agents characterized by showing emergent behavior and for having very little computational and memory requirements, making them bf well suited for commercial applications and mobile devices. In particular, we study empirically the effect of optimizing homogeneous and heterogeneous teams. The computational analysis shows that the Flocking Strategy-based controllers generated by the proposed GALR outperform other ghost controllers included in the competition framework and presented in the literature.
... Their method sets a target location in the current maze as a long-term goal for Pac-Man while MCTS computes the optimal route to the target in order to determine the best move. Other Monte- Carlo based agents were developed for achieving specific goals in Ms Pac-Man, such as ghost avoidance [13] and endgame situations [14], demonstrating the potential of Monte-Carlo methods for Pac-Man agents. In 2011, the first MCTS agent won the Ms Pac-Man screen-capture competition [7]. ...
Article
Full-text available
In this article, Monte-Carlo Tree Search (MCTS) is introduced for controlling the Pac-Man character in the real-time game Ms Pac-Man. MCTS is used to find an optimal path for an agent at each turn, determining the move to make based on the results of numerous randomized simulations. Several enhancements are introduced in order to adapt MCTS to the real-time domain. Ms Pac-Man is an arcade game, in which the protagonist has several goals but no conclusive terminal state. Unlike games such as Chess or Go there is no state in which the player wins the game. Instead, the game has two subgoals, 1) surviving and 2) scoring as many points as possible. Decisions must be made in a strict time-constraint of 40 ms. The Pac-Man agent has to compete with a range of different ghost teams, hence limited assumptions can be made about their behavior. In order to expand the capabilities of existing MCTS agents, four enhancements are discussed: 1) a variable-depth tree, 2) simulation strategies for the ghost team and Pac-Man, 3) including long-term goals in scoring, and 4) re-using the search tree several moves with a decay factor . The agent described in this article was entered in both the WCCI’12 and the CIG’12 Pac- Man vs Ghost Team competition, where it achieved a second and first place, respectively. In the experiments we show that using MCTS is a viable technique for the Pac-Man agent. Moreover, the enhancements improve overall performance against four different ghost teams.
Conference Paper
During the last years the well-known Ms. Pac Man video game has been - and still is - an interesting test bed for the research on various concepts from the broad area of Artificial Intelligence (AI). Among these concepts is the use of Genetic Programming (GP) to control the game from a human player's perspective. Several GP-based approaches have been examined so far, where traditionally they define two types of GP terminals: one type for information retrieval, the second type for issuing actions (commands) to the game world. However, by using these action terminals the controller has to manage actions issued to the game during their runtime and to monitor their outcome. In order to avoid the need for active task management this paper presents a novel approach for the design of a GP-based Ms. Pac Man controller: the proposed approach solely relies on information retrieval terminals in order to rate all possible directions of movement at every time step during a running game. Based on these rating values the controller can move the agent through the mazes of the the game world of Ms. Pac Man. With this design, which forms the main contribution of our work, we decrease the overall GP solution complexity by removing all action control management tasks from the system. It is demonstrated that by following the proposed approach such a system can successfully control an autonomous agent in a computer game environment on the level of an amateur human player.
Article
Full-text available
In this paper, we describe the outline of our Ms. Pac-Man controller, ICE Pambush 2, which is the winner of the IEEE CEC 2009 Software Agent Ms. Pac-Man Competition. One striking feature of ICE Pambush 2 is its ability to lure ghosts and ambush them. It is also equipped with improved image processing and decision making modules, leading to more than 8000 scores higher than its predecessor, ICE Pambush, submitted to the previous competition in the IEEE WCCI 2008. In addition, the score of ICE Pambush 2 is higher than those of controllers reported in the literature. At the time of writing this paper, ICE Pambush 2 is holding the world records for both the maximum score of 24640 and the average score of 13059, among ten trials, in this series of Software Agent Ms. Pac-Man Competitions.
Conference Paper
Full-text available
This paper investigates various factors that affect the ability of a system to learn to play Ms. Pac-Man. For this study Ms. Pac-Man provides a game of appropriate complexity, and has the advantage that there have been many other papers published on systems that learn to play this game. The results indicate that temporal difference learning (TDL) performs most reliably with a tabular function approximator, and that the reward structure chosen can have a dramatic impact on performance. When using a multi-layer perceptron as a function approximator, evolution outperforms TDL by a significant margin. Overall, the best results were obtained by evolving multi-layer perceptrons.
Conference Paper
Full-text available
This paper describes the simulated car racing competition held in association with the IEEE WCCI 2008 conference. The organization of the competition is described, along with the rules, the software used, and the submitted car racing controllers. The results of the competition are presented, followed by a discussion about what can be learned from this competition, both about learning controllers with evolutionary methods and about competition organization. The paper is co-authored by the organizers and participants of the competition.
Article
IEEE WCCI 2008 in Hong Kong played host to the latest Ms Pac-Man competition, organised by Simon Lucas as an activity of the IEEE CIS Games Technical Committee. The competition attracted 11 entries from teams all around the world, with the winning entry by Alan Fitzgerald, Peter Kemeraitis, and Clare Bates Congdon from the University of Southern Maine (USM) achieving a high-score of 15,970.
Conference Paper
This paper reports the results from training an intelligent agent to play the Ms. Pac-Man video game using variations of a fuzzy Q-learning algorithm. This approach allows us to address the nondeterministic aspects of the game as well as finding a successful self-learning or adaptive playing strategy. The strategy presented is a table based learning strategy, in which the intelligent agent analyzes the current situation of the game, stores various membership values for each of the several contributors to the situation (distance to closest pill, distance to closest power pill, and distance to closest ghost), and makes decisions based on these values.
Conference Paper
In this paper we develop a Ms. Pac-Man playing agent based on an influence map model. The proposed model is as simple as possible while capturing the essentials of the game. Our model has three main parameters that have an intuitive relationship to the agent's behavior. Experimental results are presented exploring the model's performance over its parameter space using random and systematic global exploration and a greedy algorithm. The model parameters can be optimized without difficulty despite the noisy fitness function used. The performance of the optimized agents is comparable to the best published results for a Ms. Pac-Man playing agent. Nevertheless, some difficulties were observed in terms of the model and the software system.