Fig 6 - uploaded by Richard Maclin
Content may be subject to copyright.
Learning curve for BreakAway with bad user advice.

Learning curve for BreakAway with bad user advice.

Source publication
Conference Paper
Full-text available
We describe a reinforcement learning system that transfers skills from a previously learned source task to a related target task. The system uses inductive logic programming to analyze experience in the source task, and transfers rules for when to take actions. The target task learner accepts these rules through an advice-taking algorithm, which al...

Context in source publication

Context 1
... its inequalities reversed, this bad advice instructs the learner to pass backwards, shoot when far away from the goal and at a narrow angle, and move when close to the goal. Figure 6 shows the results of this advice, both in AI 2 transfer and alone. This experiment shows that while bad advice can decrease the positive effect of transfer, it does not cause the AI 2 system to impact learning negatively. ...

Similar publications

Conference Paper
Full-text available
We describe an application of inductive logic programming to transfer learning. Transfer learning is the use of knowledge learned in a source task to improve learning in a related target task. The tasks we work with are in reinforcement learning domains. Our approach transfers relational macros, which are nite-state machines in which the transition...
Conference Paper
Full-text available
Repeatability of head-related transfer function (HRTF) measurements is a critical issue in intra- and inter- laboratory setups. In this paper, simulated perceptual variabilities of HRTFs are computed as an attempt to understand if different acquisition methods achieve similar results in terms of psychoacoustic features. We consider 12 HRTF indepen...
Conference Paper
Full-text available
Transfer learning could speed up reinforcement learning in many applications. Toward the fully autonomous reinforcement learning transfer agent, the mapping between the source task and target task should be learned instead of human-designed. To this end, this paper proposes an autonomous inter-task mapping learning method via artificial neural netw...
Article
Full-text available
The aim of this study was to investigate if making the skill acquisition phase more difficult or easier would enhance performance in soccer juggling, and if this practice has a positive inter-task transfer effect to ball reception performance. Twenty-two adolescent soccer players were tested in juggling a soccer ball and in the control of an approa...
Article
Full-text available
This report is an overview of our work on transfer in rein- forcement learning using advice-taking mechanisms. The goal in transfer learning is to speed up learning in a target task by transferring knowledge from a related, previously learned source task. Our methods are designed to do so robustly, so that positive transfer will speed up learning b...

Citations

... This field has continued to receive attention over the years, including early attempts at leveraging action advising for knowledge transfer between simple RoboCup tasks (Torrey et al., 2005). Subsequent work (Torrey et al., 2006;2010) leveraged inductive logic programming for similar purposes in order to accommodate imperfect advice, however, these methods were heavily constrained by the rigid incorporation of domain knowledge. ...
Preprint
Transfer learning can be applied in deep reinforcement learning to accelerate the training of a policy in a target task by transferring knowledge from a policy learned in a related source task. This is commonly achieved by copying pretrained weights from the source policy to the target policy prior to training, under the constraint that they use the same model architecture. However, not only does this require a robust representation learned over a wide distribution of states -- often failing to transfer between specialist models trained over single tasks -- but it is largely uninterpretable and provides little indication of what knowledge is transferred. In this work, we propose an alternative approach to transfer learning between tasks based on action advising, in which a teacher trained in a source task actively guides a student's exploration in a target task. Through introspection, the teacher is capable of identifying when advice is beneficial to the student and should be given, and when it is not. Our approach allows knowledge transfer between policies agnostic of the underlying representations, and we empirically show that this leads to improved convergence rates in Gridworld and Atari environments while providing insight into what knowledge is transferred.
... A significant amount of importance is given to developing deep networks that are able to learn strong heuristics (Ernandes and Gori 2004) and policies (Torrey et al. 2006). This is particularly common in Reinforcement Learning which uses positive and negative feedback to learn the correct sequence of next actions. ...
Preprint
Full-text available
Learning a well-informed heuristic function for hard task planning domains is an elusive problem. Although there are known neural network architectures to represent such heuristic knowledge, it is not obvious what concrete information is learned and whether techniques aimed at understanding the structure help in improving the quality of the heuristics. This paper presents a network model to learn a heuristic capable of relating distant parts of the state space via optimal plan imitation using the attention mechanism, which drastically improves the learning of a good heuristic function. To counter the limitation of the method in the creation of problems of increasing difficulty, we demonstrate the use of curriculum learning, where newly solved problem instances are added to the training set, which, in turn, helps to solve problems of higher complexities and far exceeds the performances of all existing baselines including classical planning heuristics. We demonstrate its effectiveness for grid-type PDDL domains.
... For example, if the agent were to control an airplane, changes in gravity, weight, or friction would count as transition dynamic differences. The second to fourth place goes to navigation-related tasks, namely changes in the goal-(95) [s f ] or start-(91) [s i ] position or the level layout (56). Values that the agent receives are also well suited to accelerate the transition. ...
... We also looked at what kind of problem TRL is mostly applied to. The most often recurring applications are navigation (122), robotics (56), classic control (42), and games (26). The navigation domain is the most diverse. ...
... Many of the experiments focus on increasing the number of players involved from 3v2 to 4v3 [54] or 3v2 to 6v5 [55] in KeepAway. Others explore multi-task experiments such as MoveDownfield to BreakAway [56]. ...
Preprint
Full-text available
The idea of transfer in reinforcement learning (TRL) is intriguing: being able to transfer knowledge from one problem to another problem without learning everything from scratch. This promises quicker learning and learning more complex methods. To gain an insight into the field and to detect emerging trends, we performed a database search. We note a surprisingly late adoption of deep learning that starts in 2018. The introduction of deep learning has not yet solved the greatest challenge of TRL: generalization. Transfer between different domains works well when domains have strong similarities (e.g. MountainCar to Cartpole), and most TRL publications focus on different tasks within the same domain that have few differences. Most TRL applications we encountered compare their improvements against self-defined baselines, and the field is still missing unified benchmarks. We consider this to be a disappointing situation. For the future, we note that: (1) A clear measure of task similarity is needed. (2) Generalization needs to improve. Promising approaches merge deep learning with planning via MCTS or introduce memory through LSTMs. (3) The lack of benchmarking tools will be remedied to enable meaningful comparison and measure progress. Already Alchemy and Meta-World are emerging as interesting benchmark suites. We note that another development, the increase in procedural content generation (PCG), can improve both benchmarking and generalization in TRL.
... In the second algorithm called PLPR, the Policy Library is created when learning new policies and reusing past policies. Torrey et al. (2006) introduced the induction logic programming for analyzing the previous experience of source task and transferred rules for when to take actions. Through an advicetaking algorithm, the target task learner could benefit from outside imperfect guidance. ...
Article
Full-text available
Collision avoidance for robots and vehicles in unpredictable environments is a challenging task. Various control strategies have been developed for the agent (i.e., robots or vehicles) to sense the environment, assess the situation, and select the optimal actions to avoid collision and accomplish its mission. In our research on autonomous ships, we take a machine learning approach to collision avoidance. The lack of available ship steering data of human ship masters has made it necessary to acquire collision avoidance knowledge through reinforcement learning (RL). Given that the learned neural network tends to be a black box, it is desirable that a method is available which can be used to design an agent's behavior so that the desired knowledge can be captured. Furthermore, RL with complex tasks can be either time consuming or unfeasible. A multi-stage learning method is needed in which agents can learn from simple tasks and then transfer their learned knowledge to closely related but more complex tasks. In this paper, we explore the ways of designing agent behaviors through tuning reward functions and devise a transfer RL method for multi-stage knowledge acquisition. The computer simulation-based agent training results have shown that it is important to understand the roles of each component in a reward function and the various design parameters in transfer RL. The settings of these parameters are all dependent on the complexity of the tasks and the similarities between them.
... However, this requires extensive prior knowledge about the task and agents to generate the appropriate mappings. Since there is a cost of human involvement to generate the mappings for each application, transfer learning is most applicable for tasks and agents that are not frequently subject to change, such as RoboCup competitions where it is commonly applied [22]. ...
... Equation (21) reaches a maximum of 1 only when the preference levels are equal, is zero when either agent has zero preference across all actions, and can be negative when the preference levels oppose each other. The similarity measure is updated as an exponential moving average each time the adviser is polled for its advice, given by (22), where ρ is a constant decay rate. During each advice round, advisers are then selected in the order of most to least similar: ...
... As all robots are homogeneous, the advice mechanism must evaluate advisers based on their skill at the task. The relevance of each adviser is indicated by the similarity measure in (22). The appropriate behaviour of the mechanism is expected to attribute a greater similarity to the experts with more experience as time proceeds. ...
... Each RoboCup-2D game pits two teams against each other in a game of soccer on a 2-D virtual soccer field, with game mechanics of the agent participants and the game controlled and communicated through a host server [3]. Reinforcement-based machine learning has shown promise in developing strategies, tactics, and set-play policies for autonomous teams playing in RoboCup-2D that could scale-up the domain and transfer to larger teams and playing field size [5], [19], [20]. Verbancsics has further demonstrated the advantages of exploiting and encoding domain spatial regularities that compress the representation of the learned task Keepaway using Hypercube-based NeuroEvolution of Augmenting Topologies (HyperNEAT) [7], [13], [22]. ...
... In the second algorithm called PLPR, the policy library is created when learning new policies and reusing past policies. Torrey (2006) introduced the induction logic programming for analyzing previous experience of source task and transferred rules for when to take actions. Through an advice-taking algorithm, the target task learner could benefit from outside imperfect guidance. ...
Chapter
Full-text available
It is often hard for a reinforcement learning (RL) agent to utilize previous experience to solve new similar but more complex tasks. In this research, we combine the transfer learning with reinforcement learning and investigate how the hyperparameters of both transfer learning and reinforcement learning impact the learning effectiveness and task performance in the context of autonomous robotic collision avoidance. A deep reinforcement learning algorithm was first implemented for a robot to learn, from its experience, how to avoid randomly generated single obstacles. After that the effect of transfer of previously learned experience was studied by introducing two important concepts, transfer belief—i.e., how much a robot should believe in its previous experience—and transfer period—i.e., how long the previous experience should be applied in the new context. The proposed approach has been tested for collision avoidance problems by altering transfer period. It is shown that transfer learnings on average had ~50% speed increase at ~30% competence levels, and there exists an optimal transfer period where the variance is the lowest and learning speed is the fastest.
... In [41], the authors have addressed this problem using a transfer hierarchy. Approaches such as a humanprovided mapping [45] and a statistical relational model [44] to assess similarities between a source and a target task have also been proposed to mitigate negative transfer. Other techniques to learn source to target task mappings efficiently include an experience-based learning framework called MASTER [42] and an experts algorithm which is used to select a candidate policy for solving an unknown Markov Decision Process task [40]. ...
Article
Full-text available
In this paper we propose a machine learning technique for real-time robot path planning for an autonomous robot in a planar environment with obstacles where the robot possess no a priori map of its environment. Our main insight in this paper is that a robot’s path planning times can be significantly reduced if it can refer to previous maneuvers it used to avoid obstacles during earlier missions, and adapt that information to avoid obstacles during its current navigation. We propose an online path planning algorithm called LearnerRRT that utilizes a pattern matching technique called Sample Consensus Initial Alignment (SAC-IA) in combination with an experience-based learning technique to adapt obstacle boundary patterns encountered in previous environments to the current scenario followed by corresponding adaptations in the obstacle-avoidance paths. Our proposed algorithm LearnerRRT works as a learning-based reactive path planning technique which enables robots to improve their overall path planning performance by locally improving maneuvers around commonly encountered obstacle patterns by accessing previously accumulated environmental information.We have conducted several experiments in simulations and hardware to verify the performance of the LearnerRRT algorithm and compared it with a state-of-the-art sampling-based planner. LearnerRRT on average takes approximately 10%of the planning time and 14%of the total time taken by the sampling-based planner to solve the same navigation task based on simulation results and takes only 33% of the planning time, 46% of total time and 95% of total distance compared to the sampling-based planner based on our hardware results.
... The workflow we have followed for our number plate detection, is a transfer learning [17] approach which is quite popular in deep learning applications [23,24]. The pre-trained CNN has already learned image features that are transferable by fine-tuning the network to our number plate detection task. ...
Conference Paper
Full-text available
In this paper, we propose a method to detect number plates of vehicles registered in Bangladesh. Our approach was to pre-train a deep Convolutional Neural Network with CIFAR-10 data, then fine tune the network by training it further with our dataset to create the Regions with Convolutional Neural Network (R-CNN) object detector. For training the R-CNN Region of Interest (ROI) labelled data was required. We have observed that using training data with a bigger ROI, which encapsulates the entire number plate within, enables the R-CNN to detect number plates more accurately. The proposed method can detect number plates with more than 99% accuracy.
... The essence of TL is that generalization of knowledge could come about across tasks (Taylor and Stone, 2009), or to say, the knowledge acquired from previous work is beneficial for subsequent learning. It is an excellent idea to integrate RL and TL, but not until recent years has transfer in RL domains achieved a great deal of attention (Konidaris et al., 2012;Lazaric et al., 2008;Taylor and Stone, 2009;Torrey et al., 2006). In order to devise outstanding transfer methods in RL domains, three points should be considered (Brys et al., 2015;Taylor and Stone, 2009): first and foremost, when to transfer, that is, the selection of source and target task; secondly, what to transfer, that is, the form of knowledge obtained in the previous task; lastly but not least important, how to transfer, that is, the way by which the agent reuses the obtained knowledge. ...
... All in all, it is feasible and meaningful to conduct some research on transferring knowledge from demonstration trajectories to RL in a different tasks, as we propose in this paper, and as far as we know, similar research is scarce (Guofang et al., 2015). As for what to transfer, the value function (Taylor and Stone, 2009), optimal policy (Ferna´ndezFerna´ndez andVeloso, 2006), option (Konidaris et al., 2012), skill (Torrey et al., 2006) and so on have been regularly chosen as acquired knowledge in previous relevant research, but in LfD, it is difficult to form knowledge for transfer. In this paper, two forms, which are the k-nearest neighbour (k-NN) of the current state in source samples (Harrington, 2012) and the visit frequency of homologous states (McGovern and Barto, 2001), are adopted as knowledge extracted from demonstration trajectories for transfer. ...