The model-free approach. The dashed box occurs off-line while the solid boxes occur on-line, during actual test episodes

Source publication

Neuroevolutionary reinforcement learning for generalized control of simulated helicopters

Article

Full-text available

Dec 2011

This article presents an extended case study in the application of neuroevolution to generalized simulated helicopter hovering, an important challenge problem for reinforcement learning. While neuroevolution is well suited to coping with the domain's complex transition dynamics and high-dimensional state and action spaces, the need to explore effic...

Context 1

... MDPs were evolved using the procedure described in Sect. 3.1 Then, for each test MDP of the competition, the first 10 episodes were spent evaluating each of these specialized policies in that test MDP. Finally, whichever specialized policy performed the best was used for the remaining 990 episodes of that test MDP. This strategy, depicted in Fig. 3, allows the agent to adapt on-line to each test MDP in a sample-effi- cient way, without needing an accurate model. Figure 4 (left) shows the results of the generalized helicopter hovering event at the 2008 RL Competition, in which this model-free approach won first place. Of the six entries that successfully completed test runs, only ...

View in full-text

Context 2

... quality of learning. Lower values of k make faster progress at the beginning because they can evaluate policies more quickly; higher values of k plateau higher because they can more accurately make fine distinctions between policies. While values of k [ 1 perform best in the long run, good performance is still possible with k = 1. In contrast, Fig. 13 shows results when r = 0.3. In this case, k = 1 performs poorly, as a single fitness evaluation is not enough to guide evolution. When k = 2, evolution makes significant progress but plateaus early. To achieve good performance, k C 5 is ...

View in full-text

DeepVisage: Making face recognition simple yet with powerful generalization skills

Article

Full-text available

Mar 2017

Face recognition (FR) methods report significant performance by adopting the convolutional neural network (CNN) based learning methods. Although CNNs are mostly trained by optimizing the softmax loss, the recent trend shows an improvement of accuracy with different strategies, such as task-specific CNN learning with different loss functions, fine-t...

Design and Development of Neuroevolutionary Algorithms for Cyber Security and Optimizing AI Models through Genetic Programming

Article

Full-text available

Dec 2023

Different methods have been created to make optimization processes more efficient and effective, which is a big step forward in the area of evolutionary optimization. This abstract talks about four well-known methods: Neuroevolution of Augmenting Topologies (NEAT), Genetic Algorithms (GAs), Genetic Programming (GP), and Advanced Neuroevolutionary Genetic Algorithm (ANGA). It focuses on important performance indicators like Fitness Metrics, Generalization, Efficiency and Speed, and Overall Performance. With scores of 90% in Fitness Metrics and 88% in Generalization, NEAT, a neuroevolutionary program, shows strong success in competitive tasks. With an 80% score, it does poorly in Efficiency and Speed, though. GAs are known for using a population-based method. They do very well in Efficiency and Speed (90%), but they do a little worse in Fitness Metrics and Generalization (89% and 85%, respectively). With a focus on updated computer programs, GP gets marks that are equal in Fitness Metrics, Generalization, and Efficiency and Speed (88%, 85%, and 85%, respectively). The new ANGA algorithm stands out as a top worker, doing exceptionally well in all tests. ANGA gets great marks of 93% in Fitness Metrics, 94% in Generalization, and 93% in Efficiency and Speed. This shows how well it can optimize everything. Overall Performance score of 97.78% shows how well it works as a whole, making ANGA a potential method for genetic optimization.

Constrained continuous-action reinforcement learning for supply chain inventory management

Article

Full-text available

Dec 2023
COMPUT CHEM ENG

A Review of Symbolic, Subsymbolic and Hybrid Methods for Sequential Decision Making

Preprint

Full-text available

Apr 2023

The field of Sequential Decision Making (SDM) provides tools for solving Sequential Decision Processes (SDPs), where an agent must make a series of decisions in order to complete a task or achieve a goal. Historically, two competing SDM paradigms have view for supremacy. Automated Planning (AP) proposes to solve SDPs by performing a reasoning process over a model of the world, often represented symbolically. Conversely, Reinforcement Learning (RL) proposes to learn the solution of the SDP from data, without a world model, and represent the learned knowledge subsymbolically. In the spirit of reconciliation, we provide a review of symbolic, subsymbolic and hybrid methods for SDM. We cover both methods for solving SDPs (e.g., AP, RL and techniques that learn to plan) and for learning aspects of their structure (e.g., world models, state invariants and landmarks). To the best of our knowledge, no other review in the field provides the same scope. As an additional contribution, we discuss what properties an ideal method for SDM should exhibit and argue that neurosymbolic AI is the current approach which most closely resembles this ideal method. Finally, we outline several proposals to advance the field of SDM via the integration of symbolic and subsymbolic AI.

Model Based Reinforcement Learning for Personalized Heparin Dosing

Preprint

Apr 2023

A key challenge in sequential decision making is optimizing systems safely under partial information. While much of the literature has focused on the cases of either partially known states or partially known dynamics, it is further exacerbated in cases where both states and dynamics are partially known. Computing heparin doses for patients fits this paradigm since the concentration of heparin in the patient cannot be measured directly and the rates at which patients metabolize heparin vary greatly between individuals. While many proposed solutions are model free, they require complex models and have difficulty ensuring safety. However, if some of the structure of the dynamics is known, a model based approach can be leveraged to provide safe policies. In this paper we propose such a framework to address the challenge of optimizing personalized heparin doses. We use a predictive model parameterized individually by patient to predict future therapeutic effects. We then leverage this model using a scenario generation based approach that is capable of ensuring patient safety. We validate our models with numerical experiments by comparing the predictive capabilities of our model against existing machine learning techniques and demonstrating how our dosing algorithm can treat patients in a simulated ICU environment.

Safe deployment of reinforcement learning using deterministic optimization of trained neural networks

Preprint

Full-text available

Feb 2023

Enabling reinforcement learning (RL) to explicitly consider constraints is important for safe deployment in real-world process systems. This work exploits recent developments in deep RL and optimization over trained neural networks to introduce algorithms for safe training and deployment of RL agents. We show how optimization over trained neural-network state-action value functions (i.e., a critic function) can explicitly incorporate constraints and describe two corresponding RL algorithms: the first uses constrained optimization of the critic to give optimal actions for training an actor, while the second guarantees constraint satisfaction by directly implementing actions from optimizing a trained critic model. The two algorithms are tested on a supply chain case study from OR-Gym and are compared against state-of-the-art algorithms TRPO, CPO, and RCPO.

Robust and Cheap Safety Measure for Exoskeletal Learning Control with Estimated Uniform PAC (EUPAC)

Conference Paper

Full-text available

Jan 2023

A Hybrid Competitive Evolutionary Neural Network Optimization Algorithm for a Regression Problem in Chemical Engineering

Article

Full-text available

Sep 2022

Neural networks have demonstrated their usefulness for solving complex regression problems in circumstances where alternative methods do not provide satisfactory results. Finding a good neural network model is a time-consuming task that involves searching through a complex multidimensional hyperparameter and weight space in order to find the values that provide optimal convergence. We propose a novel neural network optimizer that leverages the advantages of both an improved evolutionary competitive algorithm and gradient-based backpropagation. The method consists of a modified, hybrid variant of the Imperialist Competitive Algorithm (ICA). We analyze multiple strategies for initialization, assimilation, revolution, and competition, in order to find the combination of ICA steps that provides optimal convergence and enhance the algorithm by incorporating a backpropagation step in the ICA loop, which, together with a self-adaptive hyperparameter adjustment strategy, significantly improves on the original algorithm. The resulting hybrid method is used to optimize a neural network to solve a complex problem in the field of chemical engineering: the synthesis and swelling behavior of the semi- and interpenetrated multicomponent crosslinked structures of hydrogels, with the goal of predicting the yield in a crosslinked polymer and the swelling degree based on several reaction-related input parameters. We show that our approach has better performance than other biologically inspired optimization algorithms and generates regression models capable of making predictions that are better correlated with the desired outputs.

Safe Reinforcement Learning with Contrastive Risk Prediction

Preprint

Sep 2022

As safety violations can lead to severe consequences in real-world robotic applications, the increasing deployment of Reinforcement Learning (RL) in robotic domains has propelled the study of safe exploration for reinforcement learning (safe RL). In this work, we propose a risk preventive training method for safe RL, which learns a statistical contrastive classifier to predict the probability of a state-action pair leading to unsafe states. Based on the predicted risk probabilities, we can collect risk preventive trajectories and reshape the reward function with risk penalties to induce safe RL policies. We conduct experiments in robotic simulation environments. The results show the proposed approach has comparable performance with the state-of-the-art model-based methods and outperforms conventional model-free safe RL approaches.

A Review of Safe Reinforcement Learning: Methods, Theory and Applications

Preprint

Full-text available

May 2022

Reinforcement learning has achieved tremendous success in many complex decision making tasks. When it comes to deploying RL in the real world, safety concerns are usually raised, leading to a growing demand for safe reinforcement learning algorithms, such as in autonomous driving and robotics scenarios. While safety control has a long history, the study of safe RL algorithms is still in the early stages. To establish a good foundation for future research in this thread, in this paper, we provide a review for safe RL from the perspectives of methods, theory and applications. Firstly, we review the progress of safe RL from five dimensions and come up with five problems that are crucial for safe RL being deployed in real-world applications, coined as "2H3W". Secondly, we analyze the theory and algorithm progress from the perspectives of answering the "2H3W" problems. Then, the sample complexity of safe RL methods is reviewed and discussed, followed by an introduction of the applications and benchmarks of safe RL algorithms. Finally, we open the discussion of the challenging problems in safe RL, hoping to inspire more future research on this thread. To advance the study of safe RL algorithms, we release a benchmark suite, an open-sourced repository containing the implementations of major safe RL algorithms, along with tutorials at the link: https://github.com/chauncygu/Safe-Reinforcement-Learning-Baselines.git.

Safe Reinforcement Learning Using Black-Box Reachability Analysis

Preprint

Full-text available

Apr 2022

Reinforcement learning (RL) is capable of sophisticated motion planning and control for robots in uncertain environments. However, state-of-the-art deep RL approaches typically lack safety guarantees, especially when the robot and environment models are unknown. To justify widespread deployment , robots must respect safety constraints without sacrificing performance. Thus, we propose a Black-box Reachability-based Safety Layer (BRSL) with three main components: (1) data-driven reachability analysis for a black-box robot model, (2) a trajectory rollout planner that predicts future actions and observations using an ensemble of neural networks trained online, and (3) a differentiable polytope collision check between the reachable set and obstacles that enables correcting unsafe actions. In simulation, BRSL outperforms other state-of-the-art safe RL methods on a Turtlebot 3, a quadrotor, and a trajectory-tracking point mass with an unsafe set adjacent to the area of highest reward.

The model-free approach. The dashed box occurs off-line while the solid boxes occur on-line, during actual test episodes

Contexts in source publication

Similar publications

Citations