Atari 2600 game screenshot of Pong, Freeway and Beamrider, from left to right.

Source publication

Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning

Poster

Full-text available

Jul 2018

Deep reinforcement learning (deep RL) has achieved superior performance in complex sequential tasks by using a deep neural network as its function approximator and by learning directly from raw images. A drawback of using raw images is that deep RL must learn the state feature representation from the raw images in addition to learning a policy. As...

Context 1

... test our approach in three Atari games: Pong, Freeway, and Beamrider, as shown in Figure 1. The games have 6, 3, and 9 actions, respectively. ...

View in full-text

Using EGDL to represent domain knowledge for imperfect information automated negotiations

Article

Full-text available

Jul 2020

The current work has limitations in using GDL to represent domain knowledge for Automated Negotiations, which does not support imperfect information games in negotiation scenarios. In this paper, we expand the GDL and improve the automatic negotiation model so that the framework can describe the negotiation scenarios of imperfect information, and e...

Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning

Preprint

Full-text available

Mar 2023

Transfer learning in Reinforcement Learning (RL) has been widely studied to overcome training issues of Deep-RL, i.e., exploration cost, data availability and convergence time, by introducing a way to enhance training phase with external knowledge. Generally, knowledge is transferred from expert-agents to novices. While this fixes the issue for a novice agent, a good understanding of the task on expert agent is required for such transfer to be effective. As an alternative, in this paper we propose Expert-Free Online Transfer Learning (EF-OnTL), an algorithm that enables expert-free real-time dynamic transfer learning in multi-agent system. No dedicated expert exists, and transfer source agent and knowledge to be transferred are dynamically selected at each transfer step based on agents' performance and uncertainty. To improve uncertainty estimation, we also propose State Action Reward Next-State Random Network Distillation (sars-RND), an extension of RND that estimates uncertainty from RL agent-environment interaction. We demonstrate EF-OnTL effectiveness against a no-transfer scenario and advice-based baselines, with and without expert agents, in three benchmark tasks: Cart-Pole, a grid-based Multi-Team Predator-Prey (mt-pp) and Half Field Offense (HFO). Our results show that EF-OnTL achieve overall comparable performance when compared against advice-based baselines while not requiring any external input nor threshold tuning. EF-OnTL outperforms no-transfer with an improvement related to the complexity of the task addressed.

Efficient Reinforcement Learning from Demonstration Using Local Ensemble and Reparameterization with Split and Merge of Expert Policies

Preprint

Full-text available

May 2022

The current work on reinforcement learning (RL) from demonstrations often assumes the demonstrations are samples from an optimal policy, an unrealistic assumption in practice. When demonstrations are generated by sub-optimal policies or have sparse state-action pairs, policy learned from sub-optimal demonstrations may mislead an agent with incorrect or non-local action decisions. We propose a new method called Local Ensemble and Reparameterization with Split and Merge of expert policies (LEARN-SAM) to improve efficiency and make better use of the sub-optimal demonstrations. First, LEARN-SAM employs a new concept, the lambda-function, based on a discrepancy measure between the current state to demonstrated states to "localize" the weights of the expert policies during learning. Second, LEARN-SAM employs a split-and-merge (SAM) mechanism by separating the helpful parts in each expert demonstration and regrouping them into new expert policies to use the demonstrations selectively. Both the lambda-function and SAM mechanism help boost the learning speed. Theoretically, we prove the invariant property of reparameterized policy before and after the SAM mechanism, providing theoretical guarantees for the convergence of the employed policy gradient method. We demonstrate the superiority of the LEARN-SAM method and its robustness with varying demonstration quality and sparsity in six experiments on complex continuous control problems of low to high dimensions, compared to existing methods on RL from demonstration.

Model-free reinforcement learning from expert demonstrations: a survey

Article

Full-text available

Apr 2022
ARTIF INTELL REV

Reinforcement learning from expert demonstrations (RLED) is the intersection of imitation learning with reinforcement learning that seeks to take advantage of these two learning approaches. RLED uses demonstration trajectories to improve sample efficiency in high-dimensional spaces. RLED is a new promising approach to behavioral learning through demonstrations from an expert teacher. RLED considers two possible knowledge sources to guide the reinforcement learning process: prior knowledge and online knowledge. This survey focuses on novel methods for model-free reinforcement learning guided through demonstrations, commonly but not necessarily provided by humans. The methods are analyzed and classified according to the impact of the demonstrations. Challenges, applications, and promising approaches to improve the discussed methods are also discussed.

Newborn Eye Screening as an Application of AI

Article

Dec 2021

Jochen Kumm

Artificial intelligence (AI) applications are diverse and serve varied functions in clinical practice. The most successful products today are clinical decision tools used by physicians, but autonomous AI is gaining traction. Widespread use of AI is limited in part because of concerns about bias, fault-tolerance, and specificity. Adoption of AI often depends on removing cost and complexity in clinical workflow integration, providing clear incentives for use, and providing clear demonstration of clinical outcome. Existing wide-angle photographic screening could be integrated into the clinical workflow based on prior implementations for premature babies and linked with AI interpretation with existing technology. Incidence of retinal abnormality, clinical considerations, AI performance, grading variation for AI-augmented human grading, and cost and policy aspects play a significant role. Improved outcomes for newborns and a relatively high estimated incidence of abnormality have been named as benefits to counterweigh costs in the long term. [ Ophthalmic Surg Lasers Imaging Retina . 2021;52:S17–S22.]

Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning

Preprint

Full-text available

Oct 2021

Our work aims at efficiently leveraging ambiguous demonstrations for the training of a reinforcement learning (RL) agent. An ambiguous demonstration can usually be interpreted in multiple ways, which severely hinders the RL-Agent from learning stably and efficiently. Since an optimal demonstration may also suffer from being ambiguous, previous works that combine RL and learning from demonstration (RLfD works) may not work well. Inspired by how humans handle such situations, we propose to use self-explanation (an agent generates explanations for itself) to recognize valuable high-level relational features as an interpretation of why a successful trajectory is successful. This way, the agent can provide some guidance for its RL learning. Our main contribution is to propose the Self-Explanation for RL from Demonstrations (SERLfD) framework, which can overcome the limitations of traditional RLfD works. Our experimental results show that an RLfD model can be improved by using our SERLfD framework in terms of training stability and performance.

Pretraining & Reinforcement Learning: Sharpening the Axe Before Cutting the Tree

Preprint

Full-text available

Oct 2021

Pretraining is a common technique in deep learning for increasing performance and reducing training time, with promising experimental results in deep reinforcement learning (RL). However, pretraining requires a relevant dataset for training. In this work, we evaluate the effectiveness of pretraining for RL tasks, with and without distracting backgrounds, using both large, publicly available datasets with minimal relevance, as well as case-by-case generated datasets labeled via self-supervision. Results suggest filters learned during training on less relevant datasets render pretraining ineffective, while filters learned during training on the in-distribution datasets reliably reduce RL training time and improve performance after 80k RL training steps. We further investigate, given a limited number of environment steps, how to optimally divide the available steps into pretraining and RL training to maximize RL performance. Our code is available on GitHub

Integrating human experience in deep reinforcement learning for multi-UAV collision detection and avoidance

Article

Full-text available

Sep 2021
IND ROBOT

Purpose This paper aims to realize a fully distributed multi-UAV collision detection and avoidance based on deep reinforcement learning (DRL). To deal with the problem of low sample efficiency in DRL and speed up the training. To improve the applicability and reliability of the DRL-based approach in multi-UAV control problems. Design/methodology/approach In this paper, a fully distributed collision detection and avoidance approach for multi-UAV based on DRL is proposed. A method that integrates human experience into policy training via a human experience-based adviser is proposed. The authors propose a hybrid control method which combines the learning-based policy with traditional model-based control. Extensive experiments including simulations, real flights and comparative experiments are conducted to evaluate the performance of the approach. Findings A fully distributed multi-UAV collision detection and avoidance method based on DRL is realized. The reward curve shows that the training process when integrating human experience is significantly accelerated and the mean episode reward is higher than the pure DRL method. The experimental results show that the DRL method with human experience integration has a significant improvement than the pure DRL method for multi-UAV collision detection and avoidance. Moreover, the safer flight brought by the hybrid control method has also been validated. Originality/value The fully distributed architecture is suitable for large-scale unmanned aerial vehicle (UAV) swarms and real applications. The DRL method with human experience integration has significantly accelerated the training compared to the pure DRL method. The proposed hybrid control strategy makes up for the shortcomings of two-dimensional light detection and ranging and other puzzles in applications.

A novel policy for pre-trained Deep Reinforcement Learning for Speech Emotion Recognition

Preprint

Jan 2021

Reinforcement Learning (RL) is a semi-supervised learning paradigm which an agent learns by interacting with an environment. Deep learning in combination with RL provides an efficient method to learn how to interact with the environment is called Deep Reinforcement Learning (deep RL). Deep RL has gained tremendous success in gaming - such as AlphaGo, but its potential have rarely being explored for challenging tasks like Speech Emotion Recognition (SER). The deep RL being used for SER can potentially improve the performance of an automated call centre agent by dynamically learning emotional-aware response to customer queries. While the policy employed by the RL agent plays a major role in action selection, there is no current RL policy tailored for SER. In addition, extended learning period is a general challenge for deep RL which can impact the speed of learning for SER. Therefore, in this paper, we introduce a novel policy - "Zeta policy" which is tailored for SER and apply Pre-training in deep RL to achieve faster learning rate. Pre-training with cross dataset was also studied to discover the feasibility of pre-training the RL Agent with a similar dataset in a scenario of where no real environmental data is not available. IEMOCAP and SAVEE datasets were used for the evaluation with the problem being to recognize four emotions happy, sad, angry and neutral in the utterances provided. Experimental results show that the proposed "Zeta policy" performs better than existing policies. The results also support that pre-training can reduce the training time upon reducing the warm-up period and is robust to cross-corpus scenario.

Demonstration Guided Actor-Critic Deep Reinforcement Learning for Fast Teaching of Robots in Dynamic Environments

Conference Paper

Full-text available

Nov 2020

Using direct reinforcement learning (RL) to accomplish a task can be very inefficient, especially in robotic configurations where interactions with the environment are lengthy and costly. Instead, learning from expert demonstration (LfD) is an alternative approach to gain better performance in an RL setting, which also greatly improves sample efficiency. We propose a novel demonstration learning framework for actor-critic based algorithms. Firstly, we put forward an environment pre-training paradigm to initialize the model parameters without interacting with the target environment, which effectively avoids the cold start problem in deep RL scenarios.Secondly, we design a general-purpose LfD framework for most of the mainstream actor-critic RL algorithms that include a policy network and a value function like PPO, SAC, TRPO, A3C. Thirdly, we build a dedicated model training platform to perform the human-robot interaction and numerical experimentation. We evaluate the method in six Mujoco simulated locomotion environments and our robot control simulation platform. Results show that several epochs of pre-training can improve the agent's performance over the early stage of training. Also, the final converged performance of the RL algorithm is also boosted by external demonstration. In general the sample efficiency is improved by 30% with the proposed method. Our demonstration pipeline makes full use of the exploration property of the RL algorithm, which is feasible for fast teaching robots in dynamic environments.

Deep Reinforcement Learning with Pre-training for Time-efficient Training of Automatic Speech Recognition

Preprint

Full-text available

May 2020

Deep reinforcement learning (deep RL) is a combination of deep learning with reinforcement learning principles to create efficient methods that can learn by interacting with its environment. This has led to breakthroughs in many complex tasks, such as playing the game "Go", that were previously difficult to solve. However, deep RL requires significant training time making it difficult to use in various real-life applications such as Human-Computer Interaction (HCI). In this paper, we study pre-training in deep RL to reduce the training time and improve the performance of Speech Recognition, a popular application of HCI. To evaluate the performance improvement in training we use the publicly available "Speech Command" dataset, which contains utterances of 30 command keywords spoken by 2,618 speakers. Results show that pre-training with deep RL offers faster convergence compared to non-pre-trained RL while achieving improved speech recognition accuracy.

Atari 2600 game screenshot of Pong, Freeway and Beamrider, from left to right.

Context in source publication

Similar publications

Citations