Fig 1 - uploaded by Krzysztof Krawiec
Content may be subject to copyright.
The game of small-board Go

The game of small-board Go

Source publication
Conference Paper
Full-text available
In this paper we apply Coevolutionary Temporal Difference Learning (CTDL), a hybrid of coevolutionary search and reinforcement learning proposed in our former study, to evolve strategies for playing the game of Go on small boards (5×5). CTDL works by interlacing exploration of the search space provided by one-population competitive coevolution and...

Context in source publication

Context 1
... is played by two players, black and white, typically on an 19 × 19 board, though the rules can be easily applied to any board size. Figure 1 shows the board state of an examplary 5 × 5 Go game. Players make moves alternately, blacks first, by placing their stones on unoccupied intersections of the grid formed by the board. ...

Similar publications

Article
Full-text available
Ensemble pruning has been widely applied to improve the capacity of multiple learner system. Both diversity and classification accuracy of learners are considered as two key factors for achieving an ensemble with competitive classification ability. Considering that extreme learning machine (ELM) is characterized by excellent training rate and gener...
Article
Full-text available
α-L-fucosidases (EC 3.2.1.51, FUC), belonging to the glycoside hydrolase family 29 (GH29), play important roles in several biological processes and are markers used for detecting hepatocellular carcinoma. In this study, a protein sequence similarity network (SSN) was generated and a subsequent evolutionary analysis was performed to understand the e...
Article
Full-text available
This paper aims to explore coevolution of emotional contagion and behavior for microblog sentiment analysis. Accordingly, a deep learning architecture (denoted as MSA-UITC) is proposed for the target microblog. Firstly, the coevolution of emotional contagion and behavior is described by the tie strength between microblogs, that is, with the spread...
Article
Full-text available
In many species, male lifespan is shorter than that of females, often attributed to sexual selection favouring costly expression of traits preferred by females. Coevolutionary models of female preferences and male traits predict that males can be selected to have such life histories; however, this typically requires that females also pay some costs...

Citations

... The methods employed could iteratively build such a policy from data. For the policy to be evaluated by on-policy algorithms, it must primarily be greedy regarding value function estimations [27]- [35]. ...
Article
Full-text available
Blackjack is a classic casino game in which the player attempts to outsmart the dealer by drawing a combination of cards with face values that add up to just under or equal to 21 but are more incredible than the hand of the dealer he manages to come up with. This study considers a simplified variation of blackjack, which has a dealer and plays no active role after the first two draws. A different game regime will be modeled for everyone to ten multiples of the conventional 52-card deck. Irrespective of the number of standard decks utilized, the game is played as a randomized discrete-time process. For determining the optimum course of action in terms of policy, we teach an agent-a decision maker-to optimize across the decision space of the game, considering the procedure as a finite Markov decision chain. To choose the most effective course of action, we mainly research Monte Carlo-based reinforcement learning approaches and compare them with q-learning, dynamic programming, and temporal difference. The performance of the distinct model-free policy iteration techniques is presented in this study, framing the game as a reinforcement learning problem.
... Still, it sounds reasonable to combine these approaches into a hybrid algorithm exploiting different characteristics of the search process performed by each method. In our previous works [36], [16] a method termed Coevolutionary Temporal Difference Learning (CTDL) was proposed and applied to learn WPC strategies. CTDL maintains a population of players and alternately performs temporal difference learning and coevolutionary learning. ...
... Moreover, CEL and CTDL (also: EL and ETDL) differ only in the way they search the weight space (weight mutation vs. TDL). Furthermore, where possible, the parameters for the above algorithms were taken directly from our previous research [36], [16] or related works [20], [24], [26]. In some cases the parameters were determined by preliminary experiments. ...
Article
Full-text available
This study investigates different methods of learning to play the game of Othello. The main questions posed concern scalability of algorithms with respect to the search space size and their capability to generalize and produce players that fare well against various opponents. The considered algorithms represent strategies as n-tuple networks, and employ self-play temporal difference learning (TDL), evolutionary and coevolutionary learning, and hybrids thereof. To assess the performance, three different measures are used: score against an a priori given opponent (a fixed heuristic strategy), against opponents trained by other methods (round-robin tournament), and against the top-ranked players from the online Othello League. We demonstrate that although evolutionary-based methods yield players that fare best against a fixed heuristic player, it is the coevolutionary temporal difference learning (CTDL), a hybrid of coevolution and TDL, that generalizes better and proves superior when confronted with a pool of previously unseen opponents. Moreover, CTDL scales well with the size of representation, attaining better results for larger n-tuple networks. By showing that a strategy learned in this way wins against the top entries from the Othello League, we conclude that it is one of the best 1-ply Othello players obtained to date without explicit use of human knowledge.
... Encouraged by those results, we wondered whether CTDL would prove beneficial also for other purposes, and decided to apply it to the more challenging game of smallboard Go. Preliminary results of these efforts were presented in the paper by Krawiec and Szubert (2010), which provides a complete account of this endeavor. ...
Article
Full-text available
Evolving small-board Go players using coevolutionary temporal difference learning with archives We apply Coevolutionary Temporal Difference Learning (CTDL) to learn small-board Go strategies represented as weighted piece counters. CTDL is a randomized learning technique which interweaves two search processes that operate in the intra-game and inter-game mode. Intra-game learning is driven by gradient-descent Temporal Difference Learning (TDL), a reinforcement learning method that updates the board evaluation function according to differences observed between its values for consecutively visited game states. For the inter-game learning component, we provide a coevolutionary algorithm that maintains a sample of strategies and uses the outcomes of games played between them to iteratively modify the probability distribution, according to which new strategies are generated and added to the sample. We analyze CTDL's sensitivity to all important parameters, including the trace decay constant that controls the lookahead horizon of TDL, and the relative intensity of intra-game and inter-game learning. We also investigate how the presence of memory (an archive) affects the search performance, and find out that the archived approach is superior to other techniques considered here and produces strategies that outperform a handcrafted weighted piece counter strategy and simple liberty-based heuristics. This encouraging result can be potentially generalized not only to other strategy representations used for small-board Go, but also to various games and a broader class of problems, because CTDL is generic and does not rely on any problem-specific knowledge.
... Still, it sounds reasonable to combine these approaches into a hybrid algorithm exploiting dierent characteristics of the search process performed by each method. In [23] and [10] a method termed Coevolutionary Temporal Dierence Learning (CTDL) was proposed and applied to learn WPC strategies . CTDL maintains a population of players and alternately performs TD learning and coevolutionary learning. ...
Conference Paper
Full-text available
We propose Coevolutionary Gradient Search, a blueprint for a family of iterative learning algorithms that combine elements of local search and population-based search. The approach is applied to learning Othello strategies represented as n-tuple networks, using different search operators and modes of learning. We focus on the interplay between the continuous, directed, gradient-based search in the space of weights, and fitness-driven, combinatorial, coevolutionary search in the space of entire n-tuple networks. In an extensive experiment, we assess both the objective and relative performance of algorithms, concluding that the hybridization of search techniques improves the convergence. The best algorithms not only learn faster than constituent methods alone, but also produce top ranked strategies in the online Othello League.
... To this aim, we propose a hybrid method referred to as Coevolutionary Temporal Difference Learning (CTDL) that works by interlacing global search provided by competitive coevolution and local search by means of temporal difference learning. In our previous research we have already evaluated this method on the board games of Othello [3] and small-board Go [4]. Here, we perform further investigation on CTDL in the context of Othello, a domain, where both CEL and TDL were independently tested and compared [2]. ...
Article
Full-text available
Hybridization of global and local search techniques has already produced promising results in the fields of optimization and machine learning. It is commonly presumed that approaches employing this idea, like memetic algorithms that combine evolutionary algorithms with local search, benefit from complementary characteristics of constituent methods and maintain the right balance between exploration and exploitation of the search space. While such extensions of evolutionary algorithms have been intensively studied, hybrids of local search with coevolutionary algorithms have not received much attention yet. In this paper we attempt to fill this gap by presenting Coevolutionary Temporal Difference Learning (CTDL) that works by interlacing global search provided by competitive coevolution and local search by means of temporal difference learning. We verify CTDL by applying it to the board game of Othello, where it learns board evaluation functions represented by a linear architecture of weighted piece counter. The results of a computational experiment show CTDL’s superiority when compared to coevolutionary algorithm and temporal difference learning alone, both in terms of performance of elaborated strategies and computational cost. In order to further exploit CTDL’s potential, we also extend it by an archive that keeps track of selected well-performing solutions found so far and uses them to improve search convergence. The overall conclusion is that the fusion of various forms of coevolution with a gradient-based local search can be highly beneficial and deserves further research.