Article

Multi-agent policy learning-based path planning for autonomous mobile robots

December 2023
Engineering Applications of Artificial Intelligence 129

December 2023
129

DOI:10.1016/j.engappai.2023.107631

Authors:

Lixiang Zhang

Beijing Institute of Technology

Yan Yan

Shanghai Dianji University

Chen Yang

The University of Western Ontario

Show all 5 authorsHide

The study addresses path planning problems for autonomous mobile robots (AMRs), considering their kinematics, where performance and responsiveness are often incompatible. This study proposes a multi-agent policy learning-based method to tackle this challenge in dynamic environments. The proposed method features a centralized learning and decentralized execution-based path planning framework designed to meet performance and responsiveness requirements. The problem is modeled as a partial observation Markov Decision Process for policy learning while considering the kinematics using conventional neural networks. Then, an improved proximal policy optimization algorithm is developed with highlight experience replay that corrects failed experiences to speed up the learning processes. The experimental results show that the proposed method outperforms the baselines in both static and dynamic environments. The proposed method shortens the movement distance and time in static environments by about 29.1% and 5.7%, as well as in dynamic environments by about 21.1% and 20.4%, respectively. The runtime is maintained in milliseconds across various environments, taking only 0.07 s. Overall, the proposed method is valid and efficient in ensuring the performance and responsiveness of AMRs when dealing with complex and dynamic path planning problems.

Path Planning for Autonomous Mobile Robot Using Intelligent Algorithms

Article

Full-text available

Jun 2024

Machine learning technologies are being integrated into robotic systems faster to enhance their efficacy and adaptability in dynamic environments. The primary goal of this research was to propose a method to develop an Autonomous Mobile Robot (AMR) that integrates Simultaneous Localization and Mapping (SLAM), odometry, and artificial vision based on deep learning (DL). All are executed on a high-performance Jetson Nano embedded system, specifically emphasizing SLAM-based obstacle avoidance and path planning using the Adaptive Monte Carlo Localization (AMCL) algorithm. Two Convolutional Neural Networks (CNNs) were selected due to their proven effectiveness in image and pattern recognition tasks. The ResNet18 and YOLOv3 algorithms facilitate scene perception, enabling the robot to interpret its environment effectively. Both algorithms were implemented for real-time object detection, identifying and classifying objects within the robot's environment. These algorithms were selected to evaluate their performance metrics, which are critical for real-time applications. A comparative analysis of the proposed DL models focused on enhancing vision systems for autonomous mobile robots. Several simulations and real-world trials were conducted to evaluate the performance and adaptability of these models in navigating complex environments. The proposed vision system with CNN ResNet18 achieved an average accuracy of 98.5%, a precision of 96.91%, a recall of 97%, and an F1-score of 98.5%. However, the YOLOv3 model achieved an average accuracy of 96%, a precision of 96.2%, a recall of 96%, and an F1-score of 95.99%. These results underscore the effectiveness of the proposed intelligent algorithms, robust embedded hardware, and sensors in robotic applications. This study proves that advanced DL algorithms work well in robots and could be used in many fields, such as transportation and assembly. As a consequence of the findings, intelligent systems could be implemented more widely in the operation and development of AMRs.

Highly Self-Adaptive Path-Planning Method for Unmanned Ground Vehicle Based on Transformer Encoder Feature Extraction and Incremental Reinforcement Learning

Article

Full-text available

Apr 2024

Path planning is an indispensable component in guiding unmanned ground vehicles (UGVs) from their initial positions to designated destinations, aiming to determine trajectories that are either optimal or near-optimal. While conventional path-planning techniques have been employed for this purpose, planners utilizing reinforcement learning (RL) exhibit superior adaptability within exceedingly complex and dynamic environments. Nevertheless, existing RL-based path planners encounter several shortcomings, notably, redundant map representations, inadequate feature extraction, and limited adaptiveness across diverse environments. In response to these challenges, this paper proposes an innovative and highly self-adaptive path-planning approach based on Transformer encoder feature extraction coupled with incremental reinforcement learning (IRL). Initially, an autoencoder is utilized to compress redundant map representations, providing the planner with sufficient environmental data while minimizing dimensional complexity. Subsequently, the Transformer encoder, renowned for its capacity to analyze global long-range dependencies, is employed to capture intricate correlations among UGV statuses at continuous intervals. Finally, IRL is harnessed to enhance the path planner’s generalization capabilities, particularly when the trained agent is deployed in environments distinct from its training counterparts. Our empirical findings demonstrate that the proposed method outperforms traditional uniform-sampling-based approaches in terms of execution time, path length, and trajectory smoothness. Furthermore, it exhibits a fivefold increase in adaptivity compared to conventional transfer-learning-based fine-tuning methodologies.

Collision avoidance for AGV based on deep reinforcement learning in complex dynamic environment

Article

Full-text available

Jan 2023

To improve the collision avoidance capability of Automated Guided Vehicles (AGV) in the complex dynamic environment of smart factories,enable them to carry out material handling tasks more safely and efficiently following the global path,a local collision avoidance method based on deep reinforcement learning was proposed.The problem of collision avoidance of AGV was formulated as Partial Observational Markov Decision Process (POMDP) in which observation space,action space and reward function were expatiated.Tracking of the global path was achieved by setting different reward values.Then a Deep Deterministic Policy Gradient (DDPG) method was further implemented to solve collision avoidance policy.The trained policy was validated in various simulated scenarios,and the effectiveness was proved.The experimental results showed the proposed approach could respond to the complex dynamic environment and reduce the time and distance of collision avoidance.

Deep reinforcement learning for dynamic scheduling of energy-efficient automated guided vehicles

Article

Full-text available

Oct 2023
J INTELL MANUF

Automated guided vehicle (AGV) scheduling has become a hot topic in recent years as manufacturing systems become flexible and intelligent. However, little research regards dynamic AGV scheduling considering energy consumption, particularly battery replacement. This paper proposes a novel method that employs deep reinforcement learning to address the dynamic scheduling of energy-efficient AGVs with battery replacement in production logistics systems. The bi-objective joint optimization problem of AGV scheduling and battery replacement management is modeled as a Markov Decision Process, which supports data-driven decision-making. Then, this paper constructs a deep reinforcement learning-based optimization architecture and develops a novel dueling deep double Q network algorithm to maximize the long-term rewards for optimizing material handling’s tardiness and energy consumption. Numerical experiments and a case study demonstrate that the proposed algorithm is more efficient and cleaner than state-of-the-art methods. The proposed method can significantly improve customer satisfaction and reduce production costs within flexible manufacturing processes, particularly in Industry 4.0.

Dynamic Obstacle Avoidance and Path Planning through Reinforcement Learning

Article

Full-text available

Jul 2023

The use of reinforcement learning (RL) for dynamic obstacle avoidance (DOA) algorithms and path planning (PP) has become increasingly popular in recent years. Despite the importance of RL in this growing technological era, few studies have systematically reviewed this research concept. Therefore, this study provides a comprehensive review of the literature on dynamic reinforcement learning-based path planning and obstacle avoidance. Furthermore, this research reviews publications from the last 5 years (2018–2022) to include 34 studies to evaluate the latest trends in autonomous mobile robot development with RL. In the end, this review shed light on dynamic obstacle avoidance in reinforcement learning. Likewise, the propagation model and performance evaluation metrics and approaches that have been employed in previous research were synthesized by this study. Ultimately, this article’s major objective is to aid scholars in their understanding of the present and future applications of deep reinforcement learning for dynamic obstacle avoidance.

A copula‐based method of risk prediction for autonomous underwater gliders in dynamic environments

Article

Full-text available

Apr 2023
RISK ANAL

Autonomous underwater gliders (AUGs) are effective platforms for oceanic research and environmental monitoring. However, complex underwater environments with uncertainties could pose the risk of vehicle loss during their missions. It is therefore essential to conduct risk prediction to assist decision making for safer operations. The main limitation of current studies for AUGs is the lack of a tailored method for risk analysis considering both dynamic environments and potential functional failures of the vehicle. Hence, this study proposed a copula‐based approach for evaluating the risk of AUG loss in dynamic underwater environments. The developed copula Bayesian network (CBN) integrated copula functions into a traditional Bayesian belief network (BBN), aiming to handle nonlinear dependencies among environmental variables and inherent technical failures. Specifically, potential risk factors with causal effects were captured using the BBN. A Gaussian copula was then employed to measure correlated dependencies among identified risk factors. Furthermore, the dependence analysis and CBN inference were performed to assess the risk level of vehicle loss given various environmental observations. The effectiveness of the proposed method was demonstrated in a case study, which considered deploying a Slocum G1 Glider in a real water region. Risk mitigation measures were provided based on key findings. This study potentially contributes a tailored tool of risk prediction for AUGs in dynamic environments, which can enhance the safety performance of AUGs and assist in risk mitigation for decision makers.

Dynamic RRT: Fast Feasible Path Planning in Randomly Distributed Obstacle Environments

Article

Full-text available

Mar 2023
J INTELL ROBOT SYST

For path planning problems based on Rapidly exploring Random Trees (RRT), most new nodes merely explore the environment unless they are sampled directly from the subset that can optimize the path. This paper proposes the Dynamic RRT algorithm, which aims to plan a feasible path while balancing the convergence time and path length in an environment with randomly distributed obstacles. It estimates the length of a path from the start node to the goal node that is constrained to pass through an extended tree node, and this path length is heuristically taken as the major axis diameter of the informed subset. Then new node sampling is performed directly in this subset to optimize the estimated path. In addition, the idea of dynamic programming is employed to decompose the planning problem into subproblems by updating the node selected through Pareto dominance as the new start node to optimize the distance to the goal. Simulation results confirm the performance of the proposed algorithm in balancing the convergence time and path length and demonstrate that the convergence time is faster than that of RRT, while the path length is better than that of RRT*. Dynamic RRT also shows better performance than Lower Bound Tree-RRT(LBT-RRT), and Informed RRT* takes more time to compute a path of the same length.

Review of Autonomous Path Planning Algorithms for Mobile Robots

Article

Full-text available

Mar 2023

Mobile robots, including ground robots, underwater robots, and unmanned aerial vehicles, play an increasingly important role in people’s work and lives. Path planning and obstacle avoidance are the core technologies for achieving autonomy in mobile robots, and they will determine the application prospects of mobile robots. This paper introduces path planning and obstacle avoidance methods for mobile robots to provide a reference for researchers in this field. In addition, it comprehensively summarizes the recent progress and breakthroughs of mobile robots in the field of path planning and discusses future directions worthy of research in this field. We focus on the path planning algorithm of a mobile robot. We divide the path planning methods of mobile robots into the following categories: graph-based search, heuristic intelligence, local obstacle avoidance, artificial intelligence, sampling-based, planner-based, constraint problem satisfaction-based, and other algorithms. In addition, we review a path planning algorithm for multi-robot systems and different robots. We describe the basic principles of each method and highlight the most relevant studies. We also provide an in-depth discussion and comparison of path planning algorithms. Finally, we propose potential research directions in this field that are worth studying in the future.

Reinforcement learning and digital twin-based real-time scheduling method in intelligent manufacturing systems

Article

Full-text available

Oct 2022

Optimization efficiency and decision-making responsiveness are two conflicting objectives to be considered in intelligent manufacturing. Therefore, we proposed a reinforcement learning and digital twin-based real-time scheduling method, called twins learning, to satisfy multiple objectives simultaneously. First, the interaction of multiple resources is constructed in a virtual twin, including physics, behaviors, and rules to support the decision-making. Then, the real-time scheduling problems are modeled as Markov Decision Process and reinforcement learning algorithms are developed to learn better scheduling policies. The case study indicates the proposed method has excellent adaptability and learning capacity in intelligent manufacturing.

Distributed Real-Time Scheduling in Cloud Manufacturing by Deep Reinforcement Learning

Article

Full-text available

Dec 2022

With the extensive application of automated guided vehicles, real-time production scheduling considering logistics services in cloud manufacturing (CM) becomes an urgent problem. Thus, this study focuses on the distributed real-time scheduling (DRTS) of multiple services to respond to dynamic and customized orders. First, a DRTS framework with cloud–edge collaboration is proposed to improve performance and satisfy responsiveness, where distributed actors and one centralized learner are deployed in the edge and cloud layer, respectively. And, the DRTS problem is modeled as a semi-Markov decision process, where the processing services sequencing and logistics services assignment are considered simultaneously. Then, we developed a distributed dueling deep Q network (D3QN) with cloud–edge collaboration to optimize the weighted tardiness of jobs. The experimental results show that the proposed D3QN obtains lower weighted tardiness and shorter flow-time than other state-of-the-art algorithms. It indicates the proposed DRTS method has significant potential to provide efficient real-time decision-making in CM.

Motion planning and control for mobile robot navigation using machine learning: a survey

Article

Full-text available

Jun 2022
AUTON ROBOT

Moving in complex environments is an essential capability of intelligent mobile robots. Decades of research and engineering have been dedicated to developing sophisticated navigation systems to move mobile robots from one point to another. Despite their overall success, a recently emerging research thrust is devoted to developing machine learning techniques to address the same problem, based in large part on the success of deep learning. However, to date, there has not been much direct comparison between the classical and emerging paradigms to this problem. In this article, we survey recent works that apply machine learning for motion planning and control in mobile robot navigation, within the context of classical navigation systems. The surveyed works are classified into different categories, which delineate the relationship of the learning approaches to classical methods. Based on this classification, we identify common challenges and promising future directions.

Thorough Review Analysis of Safe Control of Autonomous Vehicles: Path Planning and Navigation Techniques

Article

Full-text available

Feb 2022

Mobile robot path planning has passed through multiple phases of development and took up several challenges. Up to now and with the new technology in hands, it becomes less complicated to conduct path planning for mobile robots and avoid both static and dynamic obstacles, so that collision-free navigation is ensured. Thorough state of the art review analysis with critical scrutiny of both safe and optimal paths for autonomous vehicles is addressed in this study. Emphasis is given to several developed techniques based using sampling algorithms, node-based optimal algorithms, mathematic model-based algorithms, bio-inspired algorithms, which includes neural network algorithms, and then multi-fusion-based algorithms, which combine different methods to overcome the drawbacks of each. All of these approaches consider different conditions and they are used for multiple domains.

Path Planning for Autonomous Mobile Robots: A Review

Article

Full-text available

Nov 2021
SENSORS-BASEL

Providing mobile robots with autonomous capabilities is advantageous. It allows one to dispense with the intervention of human operators, which may prove beneficial in economic and safety terms. Autonomy requires, in most cases, the use of path planners that enable the robot to deliberate about how to move from its location at one moment to another. Looking for the most appropriate path planning algorithm according to the requirements imposed by users can be challenging, given the overwhelming number of approaches that exist in the literature. Moreover, the past review works analyzed here cover only some of these approaches, missing important ones. For this reason, our paper aims to serve as a starting point for a clear and comprehensive overview of the research to date. It introduces a global classification of path planning algorithms, with a focus on those approaches used along with autonomous ground vehicles, but is also extendable to other robots moving on surfaces, such as autonomous boats. Moreover, the models used to represent the environment, together with the robot mobility and dynamics, are also addressed from the perspective of path planning. Each of the path planning categories presented in the classification is disclosed and analyzed, and a discussion about their applicability is added at the end.

A dynamic scheduling method for self-organized AGVs in production logistics systems

Article

Full-text available

Nov 2021

Automated guided vehicles (AGV) with different carrying capacities are required for complex material handling in smart factories, which causes resource waste. To minimize the delay and reduce the cost of logistics systems, this paper proposes a dynamic scheduling method for self-organized AGVs (SAGV) in production logistics systems, where multiple identical SAGVs can communicate and freely combine with others as one vehicle to perform one task. Using an improved gene expression programming to learning dynamic dispatching rules, experimental results show that dispatching rules learned are efficient and the cost of logistic systems by using SAGVs is significantly reduced.

A Swarm Intelligence Graph-Based Pathfinding Algorithm Based on Fuzzy Logic (SIGPAF): A Case Study on Unmanned Surface Vehicle Multi-Objective Path Planning

Article

Full-text available

Nov 2021

Advances in robotic motion and computer vision have contributed to the increased use of automated and unmanned vehicles in complex and dynamic environments for various applications. Unmanned surface vehicles (USVs) have attracted a lot of attention from scientists to consolidate the wide use of USVs in maritime transportation. However, most of the traditional path planning approaches include single-objective approaches that mainly find the shortest path. Dynamic and complex environments impose the need for multi-objective path planning where an optimal path should be found to satisfy contradicting objective terms. To this end, a swarm intelligence graph-based pathfinding algorithm (SIGPA) has been proposed in the recent literature. This study aims to enhance the performance of SIGPA algorithm by integrating fuzzy logic in order to cope with the multiple objectives and generate quality solutions. A comparative evaluation is conducted among SIGPA and the two most popular fuzzy inference systems, Mamdani (SIGPAF-M) and Takagi–Sugeno–Kang (SIGPAF-TSK). The results showed that depending on the needs of the application, each methodology can contribute respectively. SIGPA remains a reliable approach for real-time applications due to low computational effort; SIGPAF-M generates better paths; and SIGPAF-TSK reaches a better trade-off among solution quality and computation time.

Quadrotor Autonomous Navigation in Semi-Known Environments Based on Deep Reinforcement Learning

Article

Full-text available

Oct 2021

In the application scenarios of quadrotors, it is expected that only part of the obstacles can be identified and located in advance. In order to make quadrotors fly safely in this situation, we present a deep reinforcement learning-based framework to realize autonomous navigation in semi-known environments. Specifically, the proposed framework utilizes the dueling double deep recurrent Q-learning, which can implement global path planning with the obstacle map as input. Moreover, the proposed framework combined with contrastive learning-based feature extraction can conduct real-time autonomous obstacle avoidance with monocular vision effectively. The experimental results demonstrate that our framework exhibits remarkable performance for both global path planning and autonomous obstacle avoidance.

A parallel algorithm for multi-AGV systems

Article

Full-text available

Apr 2022

Automated guided vehicles are widely used in various applications, especially in manufacturing. In this paper, we present a novel parallel algorithm for multi-AGV systems. The overall structure is composed of three parts: task assignment, path planning, and vehicle navigation. According to the priorities of AGVs, a greedy method is introduced to assign jobs. For path planning, a mixed-integer programming model of a multi-AGV system is formulated, which can be transformed into a series of sub-problems under certain conditions. Then, an improved routing method with a penalty item is adopted to generate the optimal paths of AGVs. To avoid collisions between vehicles, a simple and effective control method based on resource locking is proposed under the premise of parallelization. Furthermore, we design the experiments according to the warehousing systems. Various practical simulations are performed to illustrate the efficiency and robustness of the new algorithm.

Cloud-Based Multi-Robot Path Planning in Complex and Crowded Environment Using Fuzzy Logic and Online Learning

Article

Full-text available

Jun 2021

This paper considers an autonomous cloud-based multi-robot system designed to execute highly repetitive tasksin a dynamic environment such as a modern megastore. Cloud level is intended for performing the most demandingoperations in order to unload the robots that are users of cloud services in this architecture. For path planningon global level D* Lite algorithm is applied, bearing in mind its high efficiency in dynamic environments. In orderto introduce smart cost map for further improvement of path planning in complex and crowded environment, implementationof fuzzy inference system and learning algorithm is proposed. The results indicate the possibility ofapplying a similar concept in different real-world robotics applications, in order to reduce the total paths length,as well as to minimize the risk in path planning related to the human-robot interactions.

PRIMAL2: Pathfinding Via Reinforcement and Imitation Multi-Agent Learning - Lifelong

Article

Full-text available

Apr 2021

Multi-agent path finding (MAPF) is an indispensable component of large-scale robot deployments in numerous domains ranging from airport management to warehouse automation. In particular, this work addresses lifelong MAPF (LMAPF) – an online variant of the problem where agents are immediately assigned a new goal upon reaching their current one – in dense and highly structured environments, typical of real-world warehouse operations. Effectively solving LMAPF in such environments requires expensive coordination between agents as well as frequent replanning abilities, a daunting task for existing coupled and decoupled approaches alike. With the purpose of achieving considerable agent coordination without any compromise on reactivity and scalability, we introduce PRIMAL $_2$ , a distributed reinforcement learning framework for LMAPF where agents learn fully decentralized policies to reactively plan paths online in a partially observable world. We extend our previous work, which was effective in low-density sparsely occupied worlds, to highly structured and constrained worlds by identifying behaviors and conventions which improve implicit agent coordination, and enable their learning through the construction of a novel local agent observation and various training aids. We present extensive results of PRIMAL $_2$ in both MAPF and LMAPF environments and compare its performance to state-of-the-art planners in terms of makespan and throughput. We show that PRIMAL $_2$ significantly surpasses our previous work and performs comparably to these baselines, while allowing real-time re-planning and scaling up to 2048 agents.

Graph Neural Networks for Decentralized Multi-Robot Path Planning

Conference Paper

Full-text available

Oct 2020

A Motion Planning Algorithm for Redundant Manipulators Using Rapidly Exploring Randomized Trees and Artificial Potential Fields

Article

Full-text available

Feb 2021

This paper presents a novel motion planner for redundant robotic manipulators by utilizing rapidly exploring randomized trees and artificial potential fields. Rapidly exploring randomized trees and artificial potential fields are two well-known navigational strategies in the robotics industry; however, their potential benefits and synergy when implemented together has been largely unexplored. In the proposed motion planner, rapidly exploring randomized trees is first used to determine a suitable path for the end-effector to follow that maneuvers around all obstacles in the robot’s workspace. Once a path has been determined, attractive and repulsive potential fields are implemented at all points along the path and are used in a gradient optimization algorithm to determine joint trajectories to reach the desired location. To supplement the attractive and repulsive potential fields, an orientation field is proposed to minimize the error between the actual end-effector orientation and the desired orientation during joint trajectory planning. The motion planner is examined through both analytical and experimental evaluation using the 7 degrees of freedom Kinova JACO and Kinova Gen3 robotic arms. The conclusions drawn from this work substantiate the efficacy and superiority of the proposed two-stage motion planner for the control of redundant manipulators in obstacle-ridden environments.

HPPRM: Hybrid Potential Based Probabilistic Roadmap Algorithm for Improved Dynamic Path Planning of Mobile Robots

Article

Full-text available

Dec 2020

Path planning and navigation is a very important problem in robotics, especially for mobile robots operating in complex environments. Sampling based planners such as the probabilistic roadmaps (PRM) have been widely used for different robot applications. However, due to the random sampling of nodes in PRM, it suffers from narrow passage problem that generates unconnected graph. The problem is addressed by increasing the number of nodes but at higher computation cost affecting real-time performance. To address this issue, in this paper, we propose an improved sampling-based path planning method for mobile robot navigation. The proposed method uses a layered hybrid Probabilistic Roadmap (PRM) and the Artificial Potential Field (APF) method for global planning. We used a decomposition method for node distribution that uses map segmentation to produce regions of high and low potential, and propose a method of reducing the dispersion of sample set during the roadmap construction. Our method produces better goal planning queries with a smaller graph and is computationally efficient than the traditional PRM. The proposed planner called the Hybrid Potential based Probabilistic Roadmap (HPPRM) is an improved sampling method with respect to success rate and calculation cost. Furthermore, we present a method for reactive local motion planning in the presence of static and dynamic obstacles in the environment. The advantage of the proposed method is that it can avoid local minima and successfully generate plans in complex maps such as narrow passages and bug trap scenarios that are otherwise difficult for the traditional sample-based methods. We show the validity of our method with experiments in simulation and real environments for both local and global planning. The results indicate that the proposed HPPRM is effective for autonomous mobile robot navigation in complex environments. The success rate of the proposed method is higher than 95% both for local and global planning.

The autonomous navigation and obstacle avoidance for USVs with ANOA deep reinforcement learning method

Article

Full-text available

Mar 2020
KNOWL-BASED SYST

The unmanned surface vehicle (USV) has been widely used to accomplish missions in the sea or dangerous marine areas for ships with sailors, which greatly expands protective capability and detection range. When USVs perform various missions in sophisticated marine environment, autonomous navigation and obstacle avoidance will be necessary and essential. However, there are few effective navigation methods with real-time path planning and obstacle avoidance in dynamic environment. With tailored design of state and action spaces and a dueling deep Q-network, a deep reinforcement learning method ANOA (Autonomous Navigation and Obstacle Avoidance) is proposed for the autonomous navigation and obstacle avoidance of USVs. Experimental results demonstrate that ANOA outperforms deep Q-network (DQN) and Deep Sarsa in the efficiency of exploration and the speed of convergence not only in static environment but also in dynamic environment. Furthermore, the ANOA is integrated with the real control model of a USV moving in surge, sway and yaw and it achieves a higher success rate than Recast navigation method in dynamic environment.

RL-RRT: Kinodynamic Motion Planning via Learning Reachability Estimators from RL Policies

Article

Full-text available

Jul 2019

This paper addresses two challenges facing sampling-based kinodynamic motion planning: a way to identify good candidate states for local transitions and the subsequent computationally intractable steering between these candidate states. Through the combination of sampling-based planning, a Rapidly Exploring Randomized Tree (RRT) and an efficient kinodynamic motion planner through machine learning, we propose an efficient solution to long-range planning for kinodynamic motion planning. First, we use deep reinforcement learning to learn an obstacle-avoiding policy that maps a robot's sensor observations to actions, which is used as a local planner during planning and as a controller during execution. Second, we train a reachability estimator in a supervised manner, which predicts the RL policy's time to reach a state in the presence of obstacles. Lastly, we introduce RL-RRT that uses the RL policy as a local planner, and the reachability estimator as the distance function to bias tree-growth towards promising regions. We evaluate our method on three kinodynamic systems, including physical robot experiments. Results across all three robots tested indicate that RL-RRT outperforms state of the art kinodynamic planners in efficiency, and also provides a shorter path finish time than a steering function free method. The learned local planner policy and accompanying reachability estimator demonstrate transferability to the previously unseen experimental environments, making RL-RRT fast because the expensive computations are replaced with simple neural network inference.

Expanded Douglas–Peucker Polygonal Approximation and Opposite Angle-Based Exact Cell Decomposition for Path Planning with Curvilinear Obstacles

Article

Full-text available

Feb 2019

The Expanded Douglas–Peucker (EDP) polygonal approximation algorithm and its application method for the Opposite Angle-Based Exact Cell Decomposition (OAECD) are proposed for the mobile robot path-planning problem with curvilinear obstacles. The performance of the proposed algorithm is compared with the existing Douglas–Peucker (DP) polygonal approximation and vertical cell decomposition algorithm. The experimental results show that the path generated by the OAECD algorithm with EDP approximation appears much more natural and efficient than the path generated by the vertical cell decomposition algorithm with DP approximation.

An Overview of Nature-Inspired, Conventional, and Hybrid Methods of Autonomous Vehicle Path Planning

Article

Full-text available

Jul 2018
J ADV TRANSPORT

Safe and smooth mobile robot navigation through cluttered environment from the initial position to goal with optimal path is required to achieve intelligent autonomous ground vehicles. There are countless research contributions from researchers aiming at finding solution to autonomous mobile robot path planning problems. This paper presents an overview of nature-inspired, conventional, and hybrid path planning strategies employed by researchers over the years for mobile robot path planning problem. The main strengths and challenges of path planning methods employed by researchers were identified and discussed. Future directions for path planning research is given. The results of this paper can significantly enhance how effective path planning methods could be employed and implemented to achieve real-time intelligent autonomous ground vehicles.

3Dana: A path planning algorithm for surface robotics

Article

Full-text available

Apr 2017
ENG APPL ARTIF INTEL

Autonomous navigation is a research topic that has received considerable attention in robotics. Generally, it is a two step process: (i) generate a global route to the goal and (ii) local motion of the robot along the route. The focus of this paper is on the first part of the process. Some common techniques used are based on heuristic search algorithms that obtain (sub)optimal paths by usually exploiting a rather simplistic terrain representation. Then, the paths generated hardly take into account relevant terrain features, which leads to potentially unsafe paths in realistic environments. This paper presents two contributions: a mathematical formulation for any DTM that can be used by heuristic search algorithms, and a path planning algorithm that generates candidate paths that are safer than the ones obtained by previous approaches. This algorithm, called 3Dana, considers different parameters to maximize the path quality: the maximum slope allowed by the robot and the heading changes during the path. These constraints allow discarding infeasible paths while minimizing the heading changes. To demonstrate the effectiveness of the algorithm proposed, we present results for different scenarios, which include an evaluation of the algorithm in real Mars DTMs.

Dynamic Failure Analysis of Process Systems Using Principal Component Analysis and Bayesian Network

Article

Full-text available

Feb 2017

Modern industrial processes are highly instrumented with more frequent recording of data. This provides abundant data for safety analysis; however, these data resources have not been well used. This paper presents an integrated dynamic failure prediction analysis approach using principal component analysis (PCA) and the Bayesian network (BN). The key process variables that contribute the most to process performance variations are detected with PCA; while the Bayesian network is adopted to model the interactions among these variables to detect faults and predict the time-dependent probability of system failure. The proposed integrated approach uses big data analysis. The structure of BN is learned using past historical data. The developed BN is used to detect faults and estimate system failure risk. The risk is updated subsequently as new process information is collected. The updated risk is used as a decision-making parameter. The proposed approach is validated through a case of a crude oil distillation unit operation.

Optimal Any-Angle Pathfinding In Practice

Article

Full-text available

May 2016
JAIR

Any-angle pathfinding is a fundamental problem in robotics and computer games. The goal is to find a shortest path between a pair of points on a grid map such that the path is not artificially constrained to the points of the grid. Prior research has focused on approximate online solutions. A number of exact methods exist but they all require super-linear space and pre-processing time. In this study, we describe Anya: a new and optimal any-angle pathfinding algorithm. Where other works find approximate any-angle paths by searching over individual points from the grid, Anya finds optimal paths by searching over sets of states represented as intervals. Each interval is identified on-the-fly. From each interval Anya selects a single representative point that it uses to compute an admissible cost estimate for the entire set. Anya always returns an optimal path if one exists. Moreover it does so without any offline pre-processing or the introduction of additional memory overheads. In a range of empirical comparisons we show that Anya is competitive with several recent (sub-optimal) online and pre-processing based techniques and is up to an order of magnitude faster than the most common benchmark algorithm, a grid-based implementation of A∗.

Sampling-Based Robot Motion Planning: A Review

Article

Full-text available

Feb 2014

Motion planning is a fundamental research area in robotics. Sampling-based methods offer an efficient solution for what is otherwise a rather challenging dilemma of path planning. Consequently, these methods have been extended further away from basic robot planning into further difficult scenarios and diverse applications. A comprehensive survey of the growing body of work in sampling-based planning is given here. Simulations are executed to evaluate some of the proposed planners and highlight some of the implementation details that are often left unspecified. An emphasis is placed on contemporary research directions in this field. We address planners that tackle current issues in robotics. For instance, real-life kinodynamic planning, optimal planning, replanning in dynamic environments, and planning under uncertainty are discussed. The aim of this paper is to survey the state of the art in motion planning and to assess selected planners, examine implementation details and above all shed a light on the current challenges in motion planning and the promising approaches that will potentially overcome those problems.

Theta*: Any-Angle Path Planning on Grids

Article

Full-text available

Jan 2014
JAIR

Grids with blocked and unblocked cells are often used to represent terrain in robotics and video games. However, paths formed by grid edges can be longer than true shortest paths in the terrain since their headings are artificially constrained. We present two new correct and complete any-angle path-planning algorithms that avoid this shortcoming. Basic Theta* and Angle-Propagation Theta* are both variants of A* that propagate information along grid edges without constraining paths to grid edges. Basic Theta* is simple to understand and implement, fast and finds short paths. However, it is not guaranteed to find true shortest paths. Angle-Propagation Theta* achieves a better worst-case complexity per vertex expansion than Basic Theta* by propagating angle ranges when it expands vertices, but is more complex, not as fast and finds slightly longer paths. We refer to Basic Theta* and Angle-Propagation Theta* collectively as Theta*. Theta* has unique properties, which we analyze in detail. We show experimentally that it finds shorter paths than both A* with post-smoothed paths and Field D* (the only other version of A* we know of that propagates information along grid edges without constraining paths to grid edges) with a runtime comparable to that of A* on grids. Finally, we extend Theta* to grids that contain unblocked cells with non-uniform traversal costs and introduce variants of Theta* which provide different tradeoffs between path length and runtime.

Path Planning Techniques for Mobile Robots: Review and Prospect

Article

May 2023
EXPERT SYST APPL

Risk-based path planning for autonomous underwater vehicles in an oil spill environment

Article

Oct 2022
OCEAN ENG

Autonomous underwater vehicles (AUVs) are advanced platforms for detecting and mapping oil spills in deep water. However, their applications in complex spill environments have been hindered by the risk of vehicle loss. Path planning for AUVs is an effective technique for mitigating such risks and ensuring safer routing. Yet previous studies did not address path searching problems for AUVs based on probabilistic risk reasoning. This study aims to propose an offboard risk-based path planning approach for AUVs operating in an oil spill environment. A risk model based on the Bayesian network was developed for probabilistic reasoning of risk states given varied environmental observations. This risk model further assisted in generating a spatially-distributed risk map covering a potential mission area. An A*-based searching algorithm was then employed to plan an optimal-risk path through the constructed risk map. The proposed planner was applied in a case study with a Slocum G1 Glider in a real-world spill environment around Baffin Bay. Simulation results proved that the optimal-risk planner outperforms in risk mitigation while achieving competitive path lengths and mission efficiency. The proposed method is not constrained to AUVs but can be adapted to other marine robotic systems.

Deep Reinforcement Learning: A Survey

Article

Sep 2022

Deep reinforcement learning (DRL) integrates the feature representation ability of deep learning with the decision-making ability of reinforcement learning so that it can achieve powerful end-to-end learning control capabilities. In the past decade, DRL has made substantial advances in many tasks that require perceiving high-dimensional input and making optimal or near-optimal decisions. However, there are still many challenging problems in the theory and applications of DRL, especially in learning control tasks with limited samples, sparse rewards, and multiple agents. Researchers have proposed various solutions and new theories to solve these problems and promote the development of DRL. In addition, deep learning has stimulated the further development of many subfields of reinforcement learning, such as hierarchical reinforcement learning (HRL), multiagent reinforcement learning, and imitation learning. This article gives a comprehensive overview of the fundamental theories, key algorithms, and primary research domains of DRL. In addition to value-based and policy-based DRL algorithms, the advances in maximum entropy-based DRL are summarized. The future research topics of DRL are also analyzed and discussed.

Thermal Coal Price Forecasting via the Neural Network

Article

Apr 2022

Thermal coal price forecasts represent an essential issue to investors and policy makers, given its importance as a strategic energy source. The current work aims at exploring usefulness of non-linear auto-regressive neural networks for this forecast problem based upon a data-set of closing prices recorded on a daily basis of thermal coal traded in China Zhengzhou Commodity Exchange during January 4, 2016 – December 31, 2020, which is an important financial index not sufficiently explored in the literature in terms of its price forecasts. Through testing a variety of model settings over algorithms, delays, hidden neurons, and data splitting ratios, the model that produces performance of good accuracy and stabilities is reached. Particularly, the model has five delays and ten hidden neurons and is constructed with the Levenberg-Marquardt algorithm based on the ratio of 80%–10%–10% of the data for training–validation–testing. It leads to relative root mean square errors of 1.48%, 1.49%, and 1.47% for the training, validation, and testing phases, respectively. Usefulness of neural networks for the price forecast issue of thermal coal is demonstrated. Forecast results here could serve as standalone technical forecasts and be combined with other forecasts when conducting policy analysis that involves forming perspectives of trends in prices.

House price forecasting with neural networks

Article

Sep 2021

The house market has been rapidly growing for the past decade in China, making price forecasting an important issue to the people and policy makers. We approach this problem by exploring neural networks for forecasting of house prices from one hundred major cities for the period of June 2010–May 2019, serving as the first study with such wide coverage for the emerging Chinese market through a machine learning technique. We aim at constructing simple and accurate neural networks as a contribution to pure technical forecasting of house prices. To facilitate the analysis, we investigate different model settings over the algorithm (the Levenberg-Marquardt, scaled conjugate gradient, and Bayesian regularization), delay (from two to six), hidden neuron (two, three, five, and eight), and data spitting ratio (70%–15%–15%, 60%–20%–20%, and 80%–10%–10% for trainingvalidationtesting), and arrive at a rather simple neural network with only four delays and three hidden neurons that leads to stable performance of 1% average relative root mean square error across the one hundred cities for the training, validation, and testing phases. We demonstrate the usefulness of the machine learning approach to the house price forecasting problem in the Chinese market. Our results could be used on a standalone basis or combined with fundamental forecasting in forming perspectives of house price trends and conducting policy analysis. Our empirical framework should not be difficult to deploy, which is an important consideration to many decision makers, and has potential to be generalized for house price forecasting of other cities in China.

Energy-efficient path planning for a single-load automated guided vehicle in a manufacturing workshop

Article

May 2021
COMPUT IND ENG

With the aggravation of the global greenhouse effect and environmental pollution, energy saving and emission reduction have already become the consensus of the manufacturing industry to enhance sustainability. A material handling system is an essential component of a modern manufacturing system, and its energy consumption (EC) is a non-negligible part when evaluating the total production EC. As typical transport equipment, automated guided vehicles (AGVs) have been widely applied in various types of manufacturing workshops. Correspondingly, AGV path planning is usually a multi-objective optimization problem, and closely related to the workshop logistics efficiency and the smoothness of the whole manufacturing process. However, the optimization objectives that current AGV path planning research mostly focuses on are transport distance, time, and cost, while EC or EC-related environmental impact indicators are seldom touched on. To address this, an investigation into the energy-saving oriented path planning is executed for a single-load AGV in a discrete manufacturing workshop environment. Based on the analysis of AGV EC characteristics from the perspective of motion state and vehicle structure, transport distance and EC are selected as two optimization objectives, and an energy-efficient AGV path planning (EAPP) model is formulated. Further, two solution methods, i.e., the two-stage solution method and the particle swarm optimization-based method, are put forward to solve the established model. Moreover, the experimental study verifies the effectiveness of the proposed model and its solution methods and indicates that transport task execution order has a significant impact on AGV transport EC.

Multirobot coordination with deep reinforcement learning in complex environments

Article

Apr 2021
EXPERT SYST APPL

In the multiple autonomous robot system, it is very important to complete path planning coordinately and effectively in the processes of interference avoidance, resource allocation and information sharing. In traditional multirobot coordination algorithms, most of the solutions are in known environments, the target position that each robot needs to move to and the robot priority are set, which limits the autonomy of the robot. Only using visual information to solve the problem of multirobot coordination is still less. This paper proposes a multi-robot cooperative algorithm based on deep reinforcement learning to make the robot more autonomous in the process of selecting target positions and moving. We use the end-to-end approach, using only the top view, that is, a robot-centered top view, and the first-person view, that is, the image information collected from the first-person perspective of the robot, as input. The proposed algorithm, which includes a dueling neural network structure, can solve task allocation and path planning; we call the algorithm TFDueling. Through its perception and understanding of the environment, the robot can reach the target position without collision, and the robot can move to any target position. We compare the proposed algorithm, TFDueling, with different input structure algorithms, TDueling and FDueling, and with different neural network structures, TFDQN and TFDDQN. Experiments show that the proposed TFDueling algorithm has the highest accuracy and robustness.

Hybrid path planning using positioning risk and artificial potential fields

Article

Mar 2021
AEROSP SCI TECHNOL

Most conventional path generation algorithms search for an optimal path that avoids collisions with obstacles under the constraint of platforms' kino-dynamics. These conventional algorithms usually assume that a user position and obstacle locations are accurately known at any point in navigation environments. However, in a positioning network, the accuracy of a position estimate varies depending on, e.g., the ranging accuracy, network geometry, multipath error, and signal blockages which may lead to unexpected situations, including collision and low efficiency path planning. Therefore, positioning accuracy must be considered in path generation to ensure a reliable navigation capability and collision avoidance. To consider positioning accuracy in path planning, the proposed method in this paper uses a mixture of potential and positioning risk fields that generates a hybrid directional flow to guide an unmanned vehicle (UV) in a safe and efficient path. The results of simulations showed that the proposed method generated successful paths for around 90% percent of the tested routes, while using only the potential field method failed for around 50%. To demonstrate the effectiveness of the proposed local hybrid path planning method, we perform an experiment using a small-size quadcopter, and the results are analyzed and discussed.

Path planning optimization of indoor mobile robot based on adaptive ant colony algorithm

Article

Mar 2021
COMPUT IND ENG

In view of the shortcomings of traditional ant colony algorithm (ACO) in path planning of indoor mobile robot, such as a long time path planning, non-optimal path for the slow convergence speed, and local optimal solution characteristic of ACO, an improvement adaptive ant colony algorithm (IAACO) is proposed in this paper. In IAACO,firstly, in order to accelerate the real-time and safety of robot path planning, angle guidance factor and obstacle exclusion factor are introduced into the transfer probability of ACO;Secondly, heuristic information adaptive adjustment factor and adaptive pheromone volatilization factor are introduced into the pheromone update rule of ACO, to balance the convergence and global search ability of ACO; Finally, the multi-objective performance indexes are introduced to transform the path planning problem into a multi-objective optimization problem, so as to realize the comprehensive global optimization of robot path planning. The experimental results of main parameters selection, path planning performance in different environments, diversity of the optimal solution show that IAACO can make the robot attain global optimization path, and high real-time and stability performances of path planning.

A Hybrid Path Planning Strategy of Autonomous Underwater Vehicles

Conference Paper

Sep 2020

A comparative review on mobile robot path planning: Classical or meta-heuristic methods?

Article

Oct 2020
ANNU REV CONTROL

The involvement of Meta-heuristic algorithms in robot motion planning has attracted the attention of researchers in the robotics community due to the simplicity of the approaches and their effectiveness in the coordination of the agents. This study explores the implementation of many meta-heuristic algorithms, e.g. Genetic Algorithm (GA), Differential Evolution (DE), Particle Swarm Optimization (PSO) and Cuckoo Search Algorithm (CSA) in multiple motion planning scenarios. The study provides comparison between multiple meta-heuristic approaches against a set of well-known conventional motion planning and navigation techniques such as Dijkstra’s Algorithm (DA), Probabilistic Road Map (PRM), Rapidly Random Tree (RRT) and Potential Field (PF). Two experimental environments with difficult to manipulate layouts are used to examine the feasibility of the methods listed. several performance measures such as total travel time, number of collisions, travel distances, energy consumption and displacement errors are considered for assessing feasibility of the motion planning algorithms considered in the study. The results show the competitiveness of meta-heuristic approaches against conventional methods. Dijkstra ’s Algorithm (DA) is considered a benchmark solution and Constricted Particle Swarm Optimization (CPSO) is found performing better than other meta-heuristic approaches in unknown environments.

CPC Algorithm: Exact Area Coverage by a Mobile Robot Using Approximate Cellular Decomposition

Article

Oct 2020

A new coverage path planning (CPP) algorithm, namely cell permeability-based coverage (CPC) algorithm, is proposed in this paper. Unlike the most CPP algorithms using approximate cellular decomposition, the proposed algorithm achieves exact coverage with lower coverage overlap compared to that with the existing algorithms. Apart from a formal analysis of the algorithm, the performance of the proposed algorithm is compared with two representative approximate cellular decomposition-based coverage algorithms reported in the literature. Results of demonstrative experiments on a TurtleBot mobile robot within the robot operating system/Gazebo environment and on a Fire Bird V robot are also provided.

Robot path planning optimization method based on heuristic multi-directional rapidly-exploring tree

Article

Jul 2020
COMPUT ELECTR ENG

Aiming at the problems of low exploration efficiency and high optimal solution cost for existing robot path planning methods, a robot path planning optimization method based on heuristic multi-directional rapidly-exploring tree is implemented. In high-dimensional configuration space, drawing on the Rapid-exploration Random Tree (RRT) related algorithm idea, directional sampling control module works under the guidance of the robot goal course as a heuristic exploration. A flexible multi-directional rapidly-exploring tree construction method is used due to a degree of directional instability. In accordance with the principle of the centripetal growth of the tree, new multi-directional trees will be built on demand to arrive at specific coverage of the space. Then based on the previous path exploration vertices, through the merging method of the trees, a closed loop path is formed and optimized to finally generate a relative optimal path. Simulation experiment results show that this method could effectively improve the exploring efficiency with low computational cost.

Bayesian Stochastic Petri Nets (BSPN) - A New Modelling Tool for Dynamic Safety and Reliability Analysis

Article

Jul 2019
RELIAB ENG SYST SAFE

An efficient formalism for safety analysis should be: (i) able to consider the failure behaviour of complex engineering systems, and (ii) dynamic in nature to capture changing conditions and have wider applicability. The current formalisms used for safety analysis are lacking in one of the above-listed criteria. Bayesian network (BN) allows the modelling of failure of systems where the inter-nodal dependencies are represented exclusively by conditional probabilities. Stochastic Petri nets (SPN) enable the study of the dynamic behaviour of complex systems; however, they lack the ability to adapt to changes in the data and operating conditions. This paper proposes a hybrid formalism that strengthens SPN with BN capabilities. The proposed formalism is graphical and uses advance feature such as predicates to perform the data updating functions. This ability enables the analysis of continuous input data without the necessity of time-slice discretization process. The proposed formalism is termed “Bayesian Stochastic Petri Nets” (BSPN). It provides a dynamic assessment of safety by capturing additional sets of data rends. In BSPN, the conditional probability is captured as a time-dependent function to allow consideration of the cumulative effect of the failure scenario. The BSPN implementation is demonstrated with an example illustrating the modelling capabilities.

Matrix completion as a post-processing technique for probabilistic roadmaps

Article

Feb 2019

Inspired by the recent literature on matrix completion, this paper describes a novel post-processing algorithm for probabilistic roadmaps (PRMs). We argue that the adjacency matrix associated with real roadmaps can be decomposed into the sum of low-rank and sparse matrices. Given a PRM with n vertices and only (Formula presented.) collision-checked candidate edges, our algorithm numerically computes a relaxation of this decomposition, which estimates the status of all (Formula presented.) possible edges in the full roadmap with high accuracy, without performing any additional collision checks. Typical results from our experiments on problems from the Open Motion Planning Library indicate that after checking (Formula presented.) of the possible edges, the algorithm estimates the full visibility graph with (Formula presented.) accuracy. The practical utility of the algorithm is that the average path length across the resulting denser edge set is significantly shorter (at the cost of somewhat increased spatial complexity and query times). An ancillary benefit is that the resulting low-rank plus sparse decomposition readily reveals information that would be otherwise difficult to compute, such as the number of convex cells in free configuration space and the number of vertices in each. We believe that this novel connection between motion planning and matrix completion provides a new perspective on sampling-based planning and may guide future algorithm development.

UAV Collision Avoidance Based on Varying Cells Strategy

Article

Oct 2018

This paper presents a trajectory planning strategy for collision avoidance in unmanned aerial vehicles (UAVs). Firstly, a varying cells strategy is proposed to integrate aerodynamic constraints into trajectory planning. Basic avoidance actions in the varying cells strategy are adapted accordingly to go through different cells, enabling more flexible avoidance maneuvers. Secondly, given the limited decision-making time involved, offline and online path planning methods are developed to increase the convergence rate. Finally, Monte Carlo simulations demonstrate that the proposed method satisfies aerodynamic constraints, while both the convergence and collision avoidance rates are better than that achieved by fixed cell-based methods

Multi-objective multi-robot path planning in continuous environment using an enhanced Genetic Algorithm

Article

Aug 2018
EXPERT SYST APPL

This paper presents a hybrid approach for path planning of multiple mobile robots in continuous environments. For this purpose, first, an innovative Artificial Potential Field (APF) algorithm is presented to find all feasible paths between the start and destination locations in a discrete gridded environment. Next, an enhanced Genetic Algorithm (EGA) is developed to improve the initial paths in continuous space and find the optimal path between start and destination locations. The proposed APF works based on a time-efficient deterministic scheme to find a set of feasible initial paths and is guaranteed to find a feasible path if one exists. The EGA utilizes five customized crossover and mutation operators to improve the initial paths. In this paper, path length, smoothness, and safety are combined to form a multi-objective path planning problem. In addition, the proposed method is extended to deal with multiple mobile robot path planning problem. For this purpose, a new term is added to the objective function which measures the distance between robots and a collision removal operator is added to the EGA to remove possible collision between paths. To assess the efficiency of the proposed algorithm, 12 planar environments with different sizes and complexities were examined. Evaluations showed that the control parameters of the proposed algorithm do not affect the performance of the EGA considerably. Moreover, a comparative study has been made between the proposed algorithm, A* PRM, B-RRT and Particle Swarm Optimization (PSO). The comparative study showed that the proposed algorithm outperforms PSO as well as well-recognized deterministic (A*) and probabilistic (PRM and B-RRT) path planning algorithms in terms of path length, run time, and success rate. Finally, simulations proved the efficiency of the proposed algorithm for a four-robot path planning problem. In this case, not only the proposed algorithm determined collision-free paths, but also it found near optimal solution for all robots.

A Generalized Stochastic Petri Net model of route learning for emergency egress situations

Article

Apr 2018
ENG APPL ARTIF INTEL

Route learning is an essential activity for a person visiting a new environment. The element of forgetting a location (called decision point) along a route, where a change in direction is needed, is of immense importance especially during emergency evacuation scenarios. It is this element that has not been given the attention it deserves in developing a route learning algorithm. This work proposes a model of route learning in a new environment based on landmarks using generalized stochastic Petri nets because landmarks based route learning has been observed as a method natural to humans. The model takes information about landmarks along a route and associated navigation commands and then chooses whether to save this information as part of the learned route or not. The selection is made by exploiting stochastic transitions for which the firing rates are dependent on the type of landmark encountered at a decision point. The final output is a route having some decision points missing; resembling the situation that humans encounter after they visit a route in a new environment. The model results closely match empirical results obtained with human subjects.

Hindsight Experience Replay

Article

Jul 2017

Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering. It can be combined with an arbitrary off-policy RL algorithm and may be seen as a form of implicit curriculum. We demonstrate our approach on the task of manipulating objects with a robotic arm. In particular, we run experiments on three different tasks: pushing, sliding, and pick-and-place, in each case using only binary rewards indicating whether or not the task is completed. Our ablation studies show that Hindsight Experience Replay is a crucial ingredient which makes training possible in these challenging environments. We show that our policies trained on a physics simulation can be deployed on a physical robot and successfully complete the task.

Grey Wolf optimizer for unmanned combat aerial vehicle path planning

Article

Sep 2016
ADV ENG SOFTW

Human-level control through deep reinforcement learning

Article

Feb 2015
NATURE

The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Multi-agent policy learning-based path planning for autonomous mobile robots

Abstract

No full-text available

Recommended publications

Research on Path Planning for Robots with Improved A* Algorithm under Bidirectional JPS Strategy

Path Planning of a Mobile Robot Based on the Improved Rapidly Exploring Random Trees Star Algorithm

Evaluation of a Smart Mobile Robotic System for Industrial Plant Inspection and Supervision

A quantitative investigation for deployment of mobile collaborative robots in high-value manufacturi...