Conference Paper

Fuzzy Q-learning in a nondeterministic environment: Developing an intelligent Ms. Pac-Man agent

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper reports the results from training an intelligent agent to play the Ms. Pac-Man video game using variations of a fuzzy Q-learning algorithm. This approach allows us to address the nondeterministic aspects of the game as well as finding a successful self-learning or adaptive playing strategy. The strategy presented is a table based learning strategy, in which the intelligent agent analyzes the current situation of the game, stores various membership values for each of the several contributors to the situation (distance to closest pill, distance to closest power pill, and distance to closest ghost), and makes decisions based on these values.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... References Original (Screen-Capture) [9], [10], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35] Public Variant [36], [37], [38], [39], [40], [41], [42], [43] Ms Pac-Man vs Ghosts engine [12], [44], [20], [45], [46], [47], [13], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [67] Ms Pac-Man vs Ghost Team engine [14] Own implementation [68], [69], [70], [71], [72], [73], [74], [75], [76], [77], [78], [79], [80], [81], [82], [83], [84], [85], [86], [87], [88], [89], [90], [91], [92] in the most publications. Prior to the competitions described above, papers were largely fragmented, with each using their own, often much simplified version of the game. ...
... Rule-based & Finite State Machines [71], [16], [15], [72], [18], [23], [24], [9], [52], [10], [65] 67 Tree Search & Monte Carlo [20], [25], [26], [74], [13], [29], [30], [49], [51], [59], [56], [61] Evolutionary Algorithms [68], [69], [47], [45], [46], [48], [53], [50], [58], [57], [59], [60], [63] Neural Networks [70], [38], [75] Neuro-evolutionary [12], [36], [37], [44], [28], [31], [32], [33], [77], [62], [67], [64], [43] Reinforcement Learning [73], [21], [19], [22], [78], [41], [42], [34], [82], [92], [35] Other [27], [17], [54], [79], [90], [91] Game psychology [93], [94], [95], [96], [97], [98], [99] 7 Psychology [100], [101], [81] 3 Robotics [102], [103] 2 Sociology [104], [105] 2 Brain Computer Interfaces [83], [84], [85] 3 Biology and Animals [106] 1 Education [102], [107], [103], [80] 4 Other [108], [39], [109], [40] 4 ...
... DeLooze and Viner [19] make use of Fuzzy Q-Learning to develop a controller for the screen-capture version of Ms Pac-Man. Fuzzy Q-Learning is a technique that combines fuzzy state aggregation with Q-learning which may be applied naturally to the state aggregation obtained by the fuzzy sets: fuzzy state aggregation builds states given multiple fuzzy sets, reducing the number of total states that need to be considered and making Q-learning an applicable technique. ...
Article
Full-text available
Pac-Man and its equally popular successor Ms Pac-Man are often attributed to being the frontrunners of the golden age of arcade video games. Their impact goes well beyond the commercial world of video games and both games have featured in numerous academic research projects over the last two decades. In fact, scientific interest is on the rise and many avenues of research have been pursued, including studies in robotics, biology, sociology and psychology. The most active field of research is computational intelligence, not least because of popular academic gaming competitions that feature Ms Pac-Man. This paper summarises peer-reviewed research that focuses on either game (or close variants thereof) with particular emphasis on the field of computational intelligence. The potential usefulness of games like Pac-Man for higher education is also discussed and the paper concludes with a discussion of prospects for future work.
... 1. The existing methodologies addressing this problem are diverse, but all have fallen far short of expert human play- ers789101112. The previous approaches use fairly short-term greedy strategies and fail to effectively consider future game states, which is an essential capability needed for success. ...
... There are initially 220 dots in the maze, so note that the presented approach was capable of clearing at least one maze on most of its attempts before being caught by the ghosts. This level of success is significantly greater than what has been accomplished with most existing automated players that are not hand coded78910. One of the major strengths of the method is its ability to plan optimal paths relatively far into the future, which is a common shortfall shared by the currently highest scoring programs [ The methods were also tested with a slightly different game format to focus on the skills that may be more relevant to applications outside of Ms. Pac-Man. ...
Article
This paper presents a model-based approximate λ-policy iteration approach using temporal differences for optimizing paths online for a pursuit-evasion problem, where an agent must visit several target positions within a region of interest while simultaneously avoiding one or more actively pursuing adversaries. This method is relevant to applications, such as robotic path planning, mobile-sensor applications, and path exposure. The methodology described utilizes cell decomposition to construct a decision tree and implements a temporal difference-based approximate λ-policy iteration to combine online learning with prior knowledge through modeling to achieve the objectives of minimizing the risk of being caught by an adversary and maximizing a reward associated with visiting target locations. Online learning and frequent decision tree updates allow the algorithm to quickly adapt to unexpected movements by the adversaries or dynamic environments. The approach is illustrated through a modified version of the video game Ms. Pac-Man, which is shown to be a benchmark example of the pursuit-evasion problem. The results show that the approach presented in this paper outperforms several other methods as well as most human players. KeywordsApproximate dynamic programming–Reinforcement learning–Path planning–Pursuit evasion games
... The same group recently compared Temporal Difference Learning and Evolution approaches for this task in [6]. Fuzzy techniques were also exploited together with Evolutionary Strategy and Q-Learning in [7] and [8], respectively . Other recent work includes research focusing on map circumstances in [9] and [10], a controller based on simpletree-search in [11], and an attempt to learn low-complexity play policies in [12]. ...
... The same group recently compared Temporal Difference Learning and Evolution approaches for this task in [6]. Fuzzy techniques were also exploited together with Evolutionary Strategy and Q-Learning in [7] and [8] , respectively . Other recent work includes research focusing on map circumstances in [9] and [10] , a controller based on simpletree-search in [11], and an attempt to learn low-complexity play policies in [12]. ...
Conference Paper
Full-text available
This paper describes an application of Evolutionary Strategy to optimizing ten distance parameters and seven cost parameters in our Ms Pac-Man controller, ICE Pambush 3, which was the winner of the IEEE CIG 2009 competition. Targeting at the first game level, we report our results from 14 possible optimization schemes; arising from combinations of which initial values to chose, those originally used in ICE Pambush 3 or those randomly assigned, and which parameter types to optimize first, the distance parameters, the cost parameters, or both. We have found that the best optimization scheme is to first optimize the distance parameters, with their initial values set to random values, and then the cost parameters with their initial values set to random values. The optimized ICE Pambush 3 using the resulting parameters from this optimization scheme has an improvement of 17% in the performance for the first game level, compared to the original ICE Pambush 3.
... [2010],DeLooze and Viner [2009],Galván-López et al. [2010],Wirth and Gallagher [2008]. The Bell et. ...
Thesis
Full-text available
The merit of Evolutionary Algorithms (EAs) as a means of automatic problem solving has been demonstrated numerous times on a diverse set of problem types across a range of different domains. The central hypothesis of this thesis is that by improving the expressiveness of EAs we can better support their deployment in domains in which context sensitive decision making is useful. After describing the principal structures and operations which allow EAs operate effectively as a general problem solving technique, we describe a sample problem and outline how two EA types, Genetic Programming (GP) and Grammatical Evolution (GE), might be configured to solve it. After some foundational elements of the discipline game design are presented, we highlight how a move towards more formal specifications of design elements presents new opportunities for the deployment of EAs as a means of Procedural Content Generation (PCG). Subsequently a set of experiments are described in which a system, designed to support encoding of data type information using a variant of GP called Strongly Typed Genetic Programming (STGP), is used to generate Player Character (PC) controllers for the digital video game Ms. Pac-Man. Following this an overview of Formal Grammars (FGs) is presented and the principal structures and operations of a third EA type, GE, are described. After which a number of more expressive FGs than Context Free Grammar (CFG), the grammar traditionally used with GE, are outlined. Finally, we outline a new GE variant designed to support usage Attribute Grammars (AGs), a means of specifying solution semantics in addition to syntax, and outline a set of experiments conducted using it. After highlighting the gains that can be made by using this GE variant in traditional problem domains such as symbolic regression, we discuss its potential as a means of PCG in digital video games. as a means of automatic problem solving has been demonstrated numerous times on a diverse set of problem types across a range of different domains. The central hypothesis of this thesis is that by improving the expressiveness of EAs we can better support their deployment in domains in which context sensitive decision making is useful. After describing the principal structures and operations which allow EAs operate effectively as a general problem solving technique, we describe a sample problem and outline how two EA types, Genetic Programming (GP) and Grammatical Evolution (GE), might be configured to solve it. After some foundational elements of the discipline game design are presented, we highlight how a move towards more formal specifications of design elements presents new opportunities for the deployment of EAs as a means of Procedural Content Generation (PCG). Subsequently a set of experiments are described in which a system, designed to support encoding of data type information using a variant of GP called Strongly Typed Genetic Programming (STGP), is used to generate Player Character (PC) controllers for the digital video game Ms. Pac-Man. Following this an overview of Formal Grammars (FGs) is presented and the principal structures and operations of a third EA type, GE, are described. After which a number of more expressive FGs than Context Free Grammar (CFG), the grammar traditionally used with GE, are outlined. Finally, we outline a new GE variant designed to support usage Attribute Grammars (AGs), a means of specifying solution semantics in addition to syntax, and outline a set of experiments conducted using it. After highlighting the gains that can be made by using this GE variant in traditional problem domains such as symbolic regression, we discuss its potential as a means of PCG in digital video games.
... Other agents make use of much more complex behaviour to achieve success playing Ms PacMan. The strongest one at the time of writing is based on a hybrid reward architecture [6], with strong showings from Q-learning [7], Deep Q-Networks [8] and Monte-Carlo Tree Search [9]. Success has also been had with neural networks [10] [11] [12]. ...
... There are many AI agents available for Ms PacMan, from simple rules-based agents, to strong Q-learning [13], neural network [17] or MCTS agents [21]. For this experiment, to see if this method is applicable to the game, the quick rulesbased agent mentioned above is used, as it can represent a lower-level human player and many more games can be simulated in a short amount of time. ...
Conference Paper
Games, particularly online games, have an ongoing requirement to exhibit the ability to react to player behaviour and change their mechanics and available tools to keep their audience both entertained and feeling that their strategic choices and in-game decisions have value. Game designers invest time both gathering data and analysing it to introduce minor changes that bring their game closer to a state of balance, a task with a lot of potential that has recently come to the attention of researchers. This paper first provides a method for automating the process of finding the best game parameters to reduce the difficulty of Ms PacMan through the use of evolutionary algorithms and then applies the same method to a much more complex and commercially successful PC game, StarCraft, to curb the prowess of a dominant strategy. Results show both significant promise and several avenues for future improvement that may lead to a useful balancing tool for the games industry.
... Artificial intelligence techniques have also been utilized to create Pac-Man agents. Reinforcement learning is one such example [17][18][19][20][21][22]. In [17], Burrow and Lucas compared the performance of temporal-difference learning and evolutionary algorithms. ...
Article
Full-text available
Conventional reinforcement learning methods for Markov decision processes rely on weakly-guided, stochastic searches to drive the learning process. It can therefore be difficult to predict what agent behaviors might emerge. In this paper, we consider an information-theoretic approach for performing constrained stochastic searches that promote the formation of risk-averse to risk-favoring behaviors. Our approach is based on the value of information, a criterion that provides an optimal trade-off between the expected return of a policy and the policy's complexity. As the policy complexity is reduced, there is a high chance that the agents will eschew risky actions that increase the long-term rewards. The agents instead focus on simply completing their main objective in an expeditious fashion. As the policy complexity increases, the agents will take actions, regardless of the risk, that seek to decrease the long-term costs. A minimal-cost policy is sought in either case; the obtainable cost depends on a single, tunable parameter that regulates the degree of policy complexity. We evaluate the performance of value-of-information-based policies on a stochastic version of Ms. Pac-Man. A major component of this paper is demonstrating that ranges of policy complexity values yield different game-play styles and analyzing why this occurs. We show that low-complexity policies aim to only clear the environment of pellets while avoiding invulnerable ghosts. Higher-complexity policies implement multi-modal strategies that compel the agent to seek power-ups and chase after vulnerable ghosts, both of which reduce the long-term costs.
... Overall, the average score of evolved controller was around 5500 points. DeLooze and Viner (2009) reported an autonomous agent that has been trained to control Ms. Pac-Man in a non-deterministic environment by using the Fuzzy State Aggregation (FSA) and Q-Learning (QL). This combination system is known as the Fuzzy Q-Learning (FQL). ...
Article
Full-text available
Abstrak Penulisan ini akan membicarakan fungsi kartun sebagai satu medium dalam usaha penyebaran propaganda sewaktu pendudukan Jepun selama 3 tahun 8 bulan di Malaya. Penyelidikan terhadap kartun-kartun bercorak propaganda yang pro kepada pemerintahan Jepun (Dai Nippon) masih belum dianalisis secara kritis dalam konteks sumbangannya sebagai salah satu kaedah indoktrinasi yang berkesan selain daripada kaedah propaganda melalui tetuang udara (siaran radio), risalah bertulis dan filem-filem propaganda. Kartun-kartun yang tersiar di akhbar-akhbar seperti Malai Sinpo, The Malay Mail, Syonan Times dan Penang Daily News memberi gambaran kepada penduduk Malaya pada waktu itu mengenai doktrin romantik pembebasan ’Asia untuk Asia’ dan ’Lingkungan Kemakmuran Bersama Asia Timur Raya’ (kempen Dai Toa Senso) yang dicanang megah oleh Kerajaan Dai Nippon. Kartun-kartun propaganda turut tersiar di majalah-majalah utama seperti Semangat Asia, Fajar Asia dan Suara Timor. Sumbangan kartun sebagai salah satu sumber perekodan sejarah pendudukan Jepun di Malaya dilihat signifikan seperti mana rekod-rekod dan laporan-laporan bertulis. Tambahan pula, hal-hal yang kurang diperkatakan dalam dokumentasi persejarahan negara, iaitu propaganda yang pro kepada pemerintahan Jepun menjadi inti kepada penyelidikan ini. Lebih istimewa, dokumentasi tersebut menggunakan kuasa visual (kartun) sebagai daya tarik naratif pensejarahannya. Kata Kunci: Kartun, Propaganda, Akhbar, Majalah, Dai Nippon
... It seems that as the mutation probabilities increased, the performance of ESNet increased as well. Furthermore, 1 2860 5910 6250 6350 2 2050 4530 6430 5700 3 2950 5670 5310 6680 4 2430 5280 5560 5920 5 1760 4980 5570 6020 6 2890 5900 6040 6270 7 2910 5990 6080 5960 8 1600 5100 6030 6420 9 2360 5920 6620 6500 10 3220 6150 6110 5930 Mean 2503 5543 6000 6175 it was shown to produce relatively better results compared with other previous studies that utilized a computational intelligence approach (Lucas, 2005;Handa, 2010;DeLooze and Viner, 2009). Hence, we have shown empirical evidence that ESNet can improve the learning capability of the ANN by evolving their weights and biases in a dynamic video game setting. ...
Article
Full-text available
Problem statement: The retail sales of computer and video games have grown enormously during the last few years, not just in United States (US), but also all over the world. This is the reason a lot of game developers and academic researchers have focused on game related technologies, such as graphics, audio, physics and Artificial Intelligence (AI) with the goal of creating newer and more fun games. In recent years, there has been an increasing interest in game AI for producing intelligent game objects and characters that can carry out their tasks autonomously. Approach: The aim of this study is an attempt to create an autonomous intelligent controller to play the game with no human intervention. Our approach is to use a simple but powerful evolutionary algorithm called Evolution Strategies (ES) to evolve the connection weights and biases of feed-forward Artificial Neural Networks (ANN) and to examine its learning ability through computational experiments in a non-deterministic and dynamic environment, which is the well-known arcade game, called Ms. Pac-man. The resulting algorithm is referred to as an Evolution Strategies Neural Network or ESNet. Results: The comparison of ESNet with two random systems, Random Direction (RandDir) and Random Neural Network (RandNet) yields promising results. The contribution of this work also focused on the comparison between the ESNet with different mutation probabilities. The results show that ESNet with a high probability with high mean scores recorded compared to the mean scores of RandDir, RandNet and ESNet with a low probability. Conclusion: Overall, the proposed algorithm has a very good performance with a high probability of automatically generating successful game AI controllers for the video game.
... Another Fuzzy controller for a car racing championship is discussed in (Perez et al. 2009), and also a Fuzzy-based architecture has been tested in The Open Car Racing Simulator (TORCS) in (Onieva et al. 2010). There is a Fuzzy Q-learning method which has been implemented in the game of Pac-Man (DeLooze and Viner 2009). An agent-based Fuzzy system has been applied in the Battle City game into the bargain (Li et al. 2004). ...
Conference Paper
Full-text available
Nowadays computer games have become a billion dollar industry. One of the important factors in success of a game is its similarity to the real world. As a result, many AI approaches have been exploited to make game characters more believable and natural. One of these approaches which has received great attention is Fuzzy Logic. In this paper a Fuzzy Rule-Based System is employed in a fighting game to reach higher levels of realism. Furthermore, behavior of two fighter bots, one based on the proposed Fuzzy logic and the other one based on a scripted AI, have been compared. It is observed that the results of the proposed method have less behavioral repetition than the scripted AI, which boosts human players' enjoyment during the game.
... For instance, it has been used for the design of the behavior of the enemy ghosts in a Pac-Man clone (Namco, 1980;Shaout, King, & Reisner, 2006); however, heavy tuning was needed to achieve a reasonable behavior. Fuzzy Q-learning, borrowed from the fields of robotics, was used in a Ms. Pac-Man clone (DeLooze & Viner, 2009;Midway, 1982). Li et al. (2004) mention fuzzy control as a practical method for generating subtle behavior and use it in a Belief-Desire-Intention (BDI) framework as part of decision making for a BattleCity (Namco, 1995) clone. ...
Article
Artificial intelligence (AI) plays a major role in modern video games by making them feel both more realistic and more fun to play. Game intelligence usually works alongside the game logic, in the background, invisible to the players who enjoy the resulting character behaviors, the adaptive gameplay, and the procedurally generated content. However, artificial intelligence can also have a central role and become a major component of the overall gameplay (as for instance in the video game Black & White). In this paper, we define the genre of scripting video games and introduce Fuzzy Tactics, a video game we developed that has an innovative gameplay based on fuzzy logic and uses fuzzy rules as its core game mechanic and user interaction mechanism. In Fuzzy Tactics, players lead their troops into battle by specifying a set of fuzzy rules that determines the battle behavior of the units. Fuzzy logic is the only mean that players have to interact with the game and to command to their troops. Thus, it becomes the main game mechanic that allows us to (i) extend the depth of the game, (ii) keep the interaction intuitive, while also (iii) increasing the replayability and the educational value of the game.
... This level of success is greater than what has been accomplished with automated players that are not hand-coded [10]- [13], and the results would be very difficult for most human players to match. However, the approach has flaws that make it weaker than the average human player in some situations. ...
Article
This paper presents an approach for optimizing paths online for a pursuit-evasion problem where an agent must visit several target positions within a region of interest while simultaneously avoiding one or more actively-pursuing adversaries. This is relevant to applications such as robotic path planning, mobile-sensor applications, and path exposure. The methodology described utilizes cell decomposition to construct a modified decision tree to achieve the objective of minimizing the risk of being caught by an adversary and maximizing a reward associated with visiting the target locations. By computing paths online, the algorithm can quickly adapt to unexpected movements by the adversaries or dynamic environments. The approach is illustrated through a modified version of the video game Ms. Pac-Man which is shown to be a benchmark example of the pursuit-evasion problem. The results show that the approach presented in this paper runs in real-time and outperforms several other methods as well as most human players.
... Moreover, in [28] the performance of learning to play using evolution and temporal difference learning was compared, results showed that the evolution of multi-layer perceptron performed better. A fuzzy Q-learning algorithm was proposed in [29]. The work in [30] demonstrated the importance of a look ahead strategy in agent design by using a simple tree search method to evaluate the best path for Ms. Pac-Man. ...
Article
The video game industry is an emerging market which continues to expand. From its early beginning, developers have focused mainly on sound and graphical applications, paying less attention to developing game bots or other kinds of nonplayer characters (NPCs). However, recent advances in artificial intelligence offer the possibility of developing game bots which are dynamically adjustable to several difficulty levels as well as variable game environments. Previous works reveal a lack of swarm intelligence approaches to develop these kinds of agents. Considering the potential of particle swarm optimization due to its emerging properties and self-adaptation to dynamic environments, further investigation into this field must be undertaken. This research focuses on developing a generic framework based on swarm intelligence, and in particular on ant colony optimization, such as it allows general implementation of real-time bots that work over dynamic game environments. The framework has been adapted to allow the implementation of intelligent agents for the classical game Ms. Pac-Man. These were trialed at the Ms. Pac-Man competitions held during the 2011 International Congress on Evolutionary Computation.
... DeLooze and Viner [17] proposed using fuzzy Q-learning algorithm to train agent playing Ms. Pac-Man. The optimized agent averaged between 3,000 and 5,000 points. ...
Conference Paper
Full-text available
Ms. Pac-Man is a challenging, classic arcade game that provides an interesting platform for Artificial Intelligence (AI) research. This paper reports the first Monte-Carlo approach to develop a ghost avoidance module of an intelligent agent that plays the game. Our experimental results show that the look-ahead ability of Monte-Carlo simulation often prevents Ms. Pac-Man being trapped by ghosts and reduces the chance of losing Ms. Pac-Man's life significantly. Our intelligent agent has achieved a high score of around 21,000. It is sometimes capable of clearing the first three stages and playing at the level of a novice human player.
... Ohno and Ogasawara [14] aimed to provide a cognitive model that could estimate human performance in a computer operation task while emphasising the bidirectional interactive (BDI) tasks, which are the tasks by which a computer interacts with the user by changing the environment visually and audibly. DeLooze and Viner [15] combined fuzzy logic with reinforcement learning in a method known as fuzzy qlearning to develop an intelligent agent in a nondeterministic environment represented as Ms. Pac-Man. Genetic programming has also been used with many other games. ...
Conference Paper
Full-text available
This paper uses genetic programming (GP) to evolve a variety of reactive agents for a simulated version of the classic arcade game Ms. Pac-Man. A diverse set of behaviours were evolved using the same GP setup in three different versions of the game. The results show that GP is able to evolve controllers that are well-matched to the game used for evolution and, in some cases, also generalise well to previously unseen mazes. For comparison purposes, we also designed a controller manually using the same function set as GP. GP was able to significantly outperform this hand-designed controller. The best evolved controllers are competitive with the best reactive controllers reported for this problem.
Article
Full-text available
AbstrakTeknik Kecerdasan Buatan (AI) berjaya digunakan dan diaplikasikan dalam pelbagai bidang, termasukpembuatan, kejuruteraan, ekonomi, perubatan dan ketenteraan. Kebelakangan ini, terdapat minat yangsemakin meningkat dalam Permainan Kecerdasan Buatan atau permainan AI. Permainan AI merujukkepada teknik yang diaplikasikan dalam permainan komputer dan video seperti pembelajaran, pathfinding,perancangan, dan lain-lain bagi mewujudkan tingkah laku pintar dan autonomi kepada karakter dalampermainan. Objektif utama kajian ini adalah untuk mengemukakan beberapa teknik yang biasa digunakandalam merekabentuk dan mengawal karakter berasaskan komputer untuk permainan Ms Pac-Man antaratahun 2005-2012. Ms Pac-Man adalah salah satu permainan yang digunakan dalam siri pertandinganpermainan diperingkat antarabangsa sebagai penanda aras untuk perbandingan pengawal autonomi.Kaedah analisis kandungan yang menyeluruh dijalankan secara ulasan dan sorotan literatur secara kritikal.Dapatan kajian menunjukkan bahawa, walaupun terdapat berbagai teknik, limitasi utama dalam kajianterdahulu untuk mewujudkan karakter permaianan Pac Man adalah kekurangan Generalization Capabilitydalam kepelbagaian karakter permainan. Hasil kajian ini akan dapat digunakan oleh penyelidik untukmeningkatkan keupayaan Generalization AI karakter permainan dalam Pasaran Permainan KecerdasanBuatan. Abstract Artificial Intelligence (AI) techniques are successfully used and applied in a wide range of areas, includingmanufacturing, engineering, economics, medicine and military. In recent years, there has been anincreasing interest in Game Artificial Intelligence or Game AI. Game AI refers to techniques applied incomputer and video games such as learning, pathfinding, planning, and many others for creating intelligentand autonomous behaviour to the characters in games. The main objective of this paper is to highlightseveral most common of the AI techniques for designing and controlling the computer-based charactersto play Ms. Pac-Man game between years 2005-2012. The Ms. Pac-Man is one of the games that used asbenchmark for comparison of autonomous controllers in a series of international Game AI competitions.An extensive content analysis method was conducted through critical review on previous literature relatedto the field. Findings highlight, although there was various and unique techniques available, the majorlimitation of previous studies for creating the Ms. Pac-Man game characters is a lack of generalizationcapability across different game characters. The findings could provide the future direction for researchersto improve the Generalization A.I capability of game characters in the Game Artificial Intelligence market.
Conference Paper
Computer games are most engaging when their difficulty is well matched to the player’s ability, thereby providing an experience in which the player is neither overwhelmed nor bored. In games where the player interacts with computer-controlled opponents, the difficulty of the game can be adjusted not only by changing the distribution of opponents or game resources, but also through modifying the skill of the opponents. Applying evolutionary algorithms to evolve the artificial intelligence that controls opponent agents is one established method for adjusting opponent difficulty. Less-evolved agents (i.e., agents subject to fewer generations of evolution) make for easier opponents, while highly-evolved agents are more challenging to overcome. In this publication we test a new approach for difficulty adjustment in games: orthogonally evolved AI, where the player receives support from collaborating agents that are co-evolved with opponent agents (where collaborators and opponents have orthogonal incentives). The advantage is that game difficulty can be adjusted more granularly by manipulating two independent axes: by having more or less adept collaborators, and by having more or less adept opponents. Furthermore, human interaction can modulate (and be informed by) the performance and behavior of collaborating agents. In this way, orthogonally evolved AI both facilitates smoother difficulty adjustment and enables new game experiences.
Conference Paper
Artificial Intelligence constitute a continuum of attempts to model adaptive, learning and cognitive abilities in all the varying degrees of complexities we know from biology and psychology. The purpose of present research paper is to design cognitive cellular automata agent with conflict-level spatial problem-solving abilities. Such an agent will have the capability to reason, learn and plan in a manner similar to human being. The agent architecture has a fuzzy inference system to implement the “perceive-reason-act” decision cycle of a mobile cellular automata reflex agent. In essence the agent is expected to execute an Observe-Orient-Decide-Act (OODA) loop. A cognitive model is developed to compute the best-next-move at each time instant for the goal oriented, rational and utility-driven mobile cellular automata agent. Experiments are to be planned and conducted to evaluate the problem solving abilities of such an agent when immersed in a conflict situation.
Article
This paper presents a model-based approach for computing real-time optimal decision strategies in the pursuit-evasion game of Ms. Pac-Man . The game of Ms. Pac-Man is an excellent benchmark problem of pursuit-evasion game with multiple, active adversaries that adapt their pursuit policies based on Ms. Pac-Man ’s state and decisions. In addition to evading the adversaries, the agent must pursue multiple fixed and moving targets in an obstacle-populated environment. This paper presents a novel approach by which a decision-tree representation of all possible strategies is derived from the maze geometry and the dynamic equations of the adversaries or ghosts. The proposed models of ghost dynamics and decisions are validated through extensive numerical simulations. During the game, the decision tree is updated and used to determine optimal strategies in real time based on state estimates and game predictions obtained iteratively over time. The results show that the artificial player obtained by this approach is able to achieve high game scores, and to handle high game levels in which the characters speeds and maze complexity become challenging even for human players.
Conference Paper
Ms. Pac-Man, one of the classic arcade games has recently gained attention in the field of game AI through the yearly competitions of various kinds held at e.g. CIG. We have implemented an Influence Map-based controller for Ms. Pac-Man as well as for the ghosts within the game. We show that it is able to handle a number of various situations through the interesting behaviors emerging through the interplay of the different maps. It is also significantly better than the previous implementations based on similar techniques, such as potential fields.
Conference Paper
In this work, we develop a game controller called HillClimbingNet (Hill-Climbing Neural Network) for playing Ms. Pac-man that combines the hill-climbing concept and simple feedforward neural network. The computational experiments have been conducted to evaluate and compare the proposed algorithm against Random Direction (RandDir) and Random Neural Network (RandNet) systems. According to the simulation results, HillClimbingNet has achieved an average score of 6290, but only 439 and 735 on the RandDir and RandNet, respectively. HillClimbingNet has a very good performance.
Conference Paper
This paper explores the idea of combining the hill-climbing concept into feed-forward artificial neural networks (ANN) to develop intelligent controllers to play the Ms. Pacman game. The resulting algorithm is referred to as the HillClimbingNet. A comparison with a random system, called RandNet is conducted on the same problem. We also present a survey of the effects of two most popular probability density functions, uniform and Gaussian distributions/mutators on the introduced algorithm. The results clearly indicate the strong potential of the hill-climbing strategy as a direct search method in tandem with a Gaussian-based mutator to optimize the ANN for playing Ms. Pac-Man.
Conference Paper
Recently, there has been an increasing interest in game artificial intelligence (AI). Game AI is a system that makes the game characters behave like human beings that is able to make smart decisions to achieve the target in a computer or video game. Thus, this study focuses on an automated method of generating artificial neural network (ANN) controller that is able to display good playing behaviors for a commercial video game. In this study, we create neural-based game controller for screen-capture of Ms. Pac-Man using a multi-objective evolutionary algorithm (MOEA) for training or evolving the architectures and connection weights (including biases) in ANN corresponding to conflicting goals of minimizing complexity in ANN and maximizing Ms. Pac-man game score. In particular, we have chosen the commonly-used Pareto Archived Evolution Strategy (PAES) algorithm for this purpose. After the entire training process is completed, the controller is tested for generalization using the optimized networks in single network (single-net) and neural network ensemble (multi-net) environments. The multi-net model is compared to single-net model, and the results reveal that neural network ensemble is able learn to play with good strategies in a complex, dynamic and difficult game environment which is not achievable by the individual neural network.
Conference Paper
Full-text available
RAMP is a rule-based agent for playing Ms. Pac-Man according to the rules stipulated in the 2008 World Congress on Computational Intelligence Ms. Pac-Man Competition. During the competition, our highest score was 15,970, outscoring the eleven other entrants in the competition. In runs reported here, RAMP achieves an average score over 10,000 and a high score of 18,560 across 100 runs; the highest score RAMP has achieved to date is 19,000. These are scores that are better than typical human novice players, including the paper authors themselves. The system was designed to have an evolutionary component, however, this was not developed in time for the competition, which instead used hand-coded rules. We have found the process of tuning the rule sets and accompanying parameters to be a time consuming and inexact process that is expected to benefit from an evolutionary computation approach. This paper describes our initial implementation as well as our progress towards adding an evolutionary computation component to enable the agent learn to play the game.
Article
Full-text available
Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually learn even as the operating environment changes. Additionally, by applying reinforcement learning to multiple cooperative software agents (a multi-agent system) not only allows each individual agent to learn from its own experience, but also opens up the opportunity for the individual agents to learn from the other agents in the system, thus accelerating the rate of learning. This research presents the novel use of fuzzy state aggregation, as the means of function approximation, combined with the fastest policy hill climbing methods of Win or Lose Fast (WoLF) and policy-dynamics based WoLF (PD-WoLF). The combination of fast policy hill climbing and fuzzy state aggregation function approximation is tested in two stochastic environments: Tileworld and the simulated robot soccer domain, RoboCup. The Tileworld results demonstrate that a single agent using the combination of FSA and PHC learns quicker and performs better than combined fuzzy state aggregation and Q-learning reinforcement learning alone. Results from the multi-agent RoboCup domain again illustrate that the policy hill climbing algorithms perform better than Q-learning alone in a multi-agent environment. The learning is further enhanced by allowing the agents to share their experience through a weighted strategy sharing.
Article
Full-text available
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989). We show thatQ-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely. We also sketch extensions to the cases of non-discounted, but absorbing, Markov environments, and where manyQ values can be changed each iteration, rather than just one.
Article
Full-text available
It is widely accepted that the use of more compact representations than lookup tables is crucial to scaling reinforcement learning (RL) algorithms to real-world problems. Unfortunately almost all of the theory of reinforcement learning assumes lookup table representations. In this paper we address the pressing issue of combining function approximation and RL, and present 1) a function approximator based on a simple extension to state aggregation (a commonly used form of compact representation), namely soft state aggregation, 2) a theory of convergence for RL with arbitrary, but fixed, soft state aggregation, 3) a novel intuitive understanding of the effect of state aggregation on online RL, and 4) a new heuristic adaptive state aggregation algorithm that finds improved compact representations by exploiting the non-discrete nature of soft state aggregation. Preliminary empirical results are also presented. 1 INTRODUCTION The strong theory of convergence available for rein...
Article
IEEE WCCI 2008 in Hong Kong played host to the latest Ms Pac-Man competition, organised by Simon Lucas as an activity of the IEEE CIS Games Technical Committee. The competition attracted 11 entries from teams all around the world, with the winning entry by Alan Fitzgerald, Peter Kemeraitis, and Clare Bates Congdon from the University of Southern Maine (USM) achieving a high-score of 15,970.
Conference Paper
In this paper we develop a Ms. Pac-Man playing agent based on an influence map model. The proposed model is as simple as possible while capturing the essentials of the game. Our model has three main parameters that have an intuitive relationship to the agent's behavior. Experimental results are presented exploring the model's performance over its parameter space using random and systematic global exploration and a greedy algorithm. The model parameters can be optimized without difficulty despite the noisy fitness function used. The performance of the optimized agents is comparable to the best published results for a Ms. Pac-Man playing agent. Nevertheless, some difficulties were observed in terms of the model and the software system.
Article
A fuzzy set is a class of objects with a continuum of grades of membership. Such a set is characterized by a membership (characteristic) function which assigns to each object a grade of membership ranging between zero and one. The notions of inclusion, union, intersection, complement, relation, convexity, etc., are extended to such sets, and various properties of these notions in the context of fuzzy sets are established. In particular, a separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.
Article
We have previously proposed evolutionary fuzzy systems of playing Ms.PacMan for the competitions. As a consequence of the evolution, reflective action rules such that PacMan tries to eat pills effectively until ghosts come close to PacMan are acquired. Such rules work well. However, sometimes it is too reflective so that PacMan goes toward ghosts by herself in longer corridors. In this paper, a critical situation learning module is combined with the evolved fuzzy systems, i.e., reflective action module. The critical situation learning module is composed of Q-learning with CMAC. Location information of surrounding ghosts and the existence of power-pills are given to PacMan as state. This module punishes if the PacMan is caught by ghosts. Therefore, this module learning which pairs of (state, action) causes her death. By using learnt Q-value, PacMan tries to survive much longer. Experimental results on Ms.PacMan elucidate the proposed method is promising since it can capture critical situations well. However, as a consequence of the large amount of memory required by CMAC, real-time responses tend to be lost.
Conference Paper
Fuzzy logic plays an important role in the design of reactive robot behaviours. This paper presents a learning approach to the development of a fuzzy logic controller based on the delayed rewards from the real world. The delayed rewards are apportioned to the individual fuzzy rules by using reinforcement Q-learning. The efficient exploration of a solution space is one of the key issues in the reinforcement learning. A specific genetic algorithm is developed in this paper to trade off the exploration of learning spaces and the exploitation of learned experience. The proposed approach is evaluated on some reactive behaviour of the football-playing robots.
Conference Paper
In this paper, we propose a reinforcement learning method called a fuzzy Q-learning where an agent determines its action based on the inference result by a fuzzy rule-based system. We apply the proposed method to a soccer agent that tries to learn to intercept a passed ball, i.e., it tries to catch up with a passed ball by another agent. In the proposed method, the state space is represented by internal information that the learning agent maintains such as the relative velocity and the relative position of the ball to the learning agent. We divide the state space into several fuzzy subspaces. We define each fuzzy subspace by specifying the fuzzy partition of each axis of the state space. A reward is given to the learning agent if the distance between the ball and the agent becomes smaller or if the agent catches up with the ball. It is expected that the learning agent finally obtains the efficient positioning skill through trial-and-error.
Conference Paper
Presents the first results in understanding the reasons for cooperative advantage between reinforcement learning agents. We consider a cooperation method which consists of using and updating a common policy. We tested this method on a complex fuzzy reinforcement learning problem and found that cooperation brings larger than expected benefits. More precisely, we found that K cooperative agents each learning for N time steps outperform K independent agents each learning in a separate world for K*N time steps. We explain the observed phenomenon and determine the necessary conditions for its presence in a wide class of reinforcement learning problems
Conference Paper
We consider a pseudo-realistic world in which one or more opportunities appear and disappear in random locations. Agents use fuzzy reinforcement learning to learn which opportunities are most worthy of pursuing based on their promised rewards, expected lifetimes, path lengths and expected path costs. We show that this world is partially observable because the history of an agent influences the distribution of its future states. We implement a coordination mechanism for allocating opportunities to different agents in the same world. Our results show that optimal team performance results when agents behave in a partially selfish way. We also implement a cooperation mechanism in which agents share experience by using and updating one joint behavior policy. Our results demonstrate that K cooperative agents each learning in a separate world for N time steps outperform K independent agents each learning in a separate world for K*N time steps, with this result becoming more pronounced as the degree of partial observability in the environment increases
Article
Fuzzy logic is a natural basis for modelling and solving problems involving imprecise knowledge and continuous systems. Unfortunately, fuzzy logic systems are invariably static (once created they do not change) and subjective (the creator imparts their beliefs on the system). In this paper we address the question of whether systems based on fuzzy logic can e#ectively adapt themselves to dynamic situations.
Article
This paper presents a fuzzy logic controller (FLC) for the implementation of some behaviour of Sony legged robots. The adaptive heuristic Critic (AHC) reinforcement learning is employed to refine the FLC. The actor part of AHC is a conventional FLC in which the parameters of input membership functions are learned by an immediate internal reinforcement signal. This internal reinforcement signal comes from a prediction of the evaluation value of a policy and the external reinforcement signal. The evaluation value of a policy is learned by temporal difference (TD) learning in the critic part that is also represented by a FLC. A genetic algorithm (GA) is employed for learning internal reinforcement of the actor part because it is more efficient in searching than other trial and error search approaches.
Mastering Ms. Pac-Man
  • K Uston