ArticlePDF Available

Reinforcement based mobile robot path planning with improved dynamic window approach in unknown environment

January 2021
Autonomous Robots 45(2)

January 2021
45(2)

DOI:10.1007/s10514-020-09947-4

Authors:

Lu Chang

Nanjing University of Science and Technology

Liang Shan

Nanjing University of Science and Technology

Mobile robot path planning in an unknown environment is a fundamental and challenging problem in the field of robotics. Dynamic window approach (DWA) is an effective method of local path planning, however some of its evaluation functions are inadequate and the algorithm for choosing the weights of these functions is lacking, which makes it highly dependent on the global reference and prone to fail in an unknown environment. In this paper, an improved DWA based on Q-learning is proposed. First, the original evaluation functions are modified and extended by adding two new evaluation functions to enhance the performance of global navigation. Then, considering the balance of effectiveness and speed, we define the state space, action space and reward function of the adopted Q-learning algorithm for the robot motion planning. After that, the parameters of the proposed DWA are adaptively learned by Q-learning and a trained agent is obtained to adapt to the unknown environment. At last, by a series of comparative simulations, the proposed method shows higher navigation efficiency and successful rate in the complex unknown environment. The proposed method is also validated in experiments based on XQ-4 Pro robot to verify its navigation capability in both static and dynamic environment.

Schematic diagram of a the angle θ, b the minimum distance d

…

Trajectory comparison for function heading(v, ω). Notations: 1) The black arc means the predicted trajectory. 2) The green arc means the ideal trajectory. 3) The green star means the target. 4) The grey rectangle (if there is) means the obstacle. 5) The ideal trajectory, target and obstacle are the same in the comparison. These notations are also applicative below (Color figure online)

…

Trajectory comparison for function goaldist(v, ω)

…

Updating method of the cell cost

…

+39

a Trajectory of the robot with its cell group H , b Cells to calculate the oscillation(v, ω) value

…

Figures - uploaded by Liang Shan

Content may be subject to copyright.

Content uploaded by Liang Shan

Content may be subject to copyright.

Autonomous Robots (2021) 45:51–76

https://doi.org/10.1007/s10514-020-09947-4

Reinforcement based mobile robot path planning with improved

dynamic window approach in unknown environment

Lu Chang1·Liang Shan1·Chao Jiang2·Yuewei Dai1,3

Received: 1 November 2019 / Accepted: 8 September 2020 / Published online: 30 September 2020

Abstract

Mobile robot path planning in an unknown environment is a fundamental and challenging problem in the ﬁeld of robotics.

Dynamic window approach (DWA) is an effective method of local path planning, however some of its evaluation functions

are inadequate and the algorithm for choosing the weights of these functions is lacking, which makes it highly dependent

on the global reference and prone to fail in an unknown environment. In this paper, an improved DWA based on Q-learning

is proposed. First, the original evaluation functions are modiﬁed and extended by adding two new evaluation functions to

enhance the performance of global navigation. Then, considering the balance of effectiveness and speed, we deﬁne the state

space, action space and reward function of the adopted Q-learning algorithm for the robot motion planning. After that, the

parameters of the proposed DWA are adaptively learned by Q-learning and a trained agent is obtained to adapt to the unknown

environment. At last, by a series of comparative simulations, the proposed method shows higher navigation efﬁciency and

successful rate in the complex unknown environment. The proposed method is also validated in experiments based on XQ-4

Pro robot to verify its navigation capability in both static and dynamic environment.

Keywords Robot navigation ·Path planning ·DWA ·Q-learning ·Evaluation function

This work is supported by the Natural Science Foundation of Jiangsu

Province (BK20191286) and the Fundamental Research Funds for the

Central Universities (30920021139).

Electronic supplementary material The online version of this article

(https://doi.org/10.1007/s10514-020-09947-4) contains

supplementary material, which is available to authorized users.

BLiang Shan

shanliang@njust.edu.cn

Lu Chang

116110001162@njust.edu.cn

Chao Jiang

chao.jiang@uwyo.edu

Yuewei Dai

dywjust@163.com

1School of Automation, Nanjing University of Science and

Technology, Nanjing, Jiangsu, People’s Republic of China

2Department of Electrical and Computer Engineering,

University of Wyoming, Laramie, WY 82071, USA

3School of Electronic and Information Engineering, Nanjing

University of Information Science and Technology, Nanjing,

Jiangsu, People’s Republic of China

1 Introduction

In the ﬁelds of industrial and agricultural production, intel-

ligent logistics, space exploration and emergency assis-

tance, the application of mobile robots has become more

widespread. Robots can carry different tools such as robotic

arms, rangeﬁnders, ﬁre extinguishers to accomplish different

tasks. The basis for completing the task is that the robot can

move autonomously and adaptively.

Path planning is one of the key technologies of mobile

robots (Durrant 1994), which is described as ﬁnding a

collision-free path connecting the starting point and the tar-

get in the working environment. The optimization indicator

of the path can be selected as shortest length, minimum

travel time, minimum collision probability, passing through

speciﬁc locations, and so on. According to the environment

awareness and planning scope, the path planning algorithms

can be divided into (1) global path planning, where the robot

ﬁnds a collision-free path connecting the starting point with

the target under the known global environment map, which is

usually conducted once and obtains a reference global path;

(2) local path planning, where the robot moves according to

the real-time information acquired by range sensors such as

123

52 Autonomous Robots (2021) 45:51–76

lidar (Zhang and Singh 2017), camera (Pinto et al. 2016),

ultrasonic (Kim and Kim 2010), which is generally applied

for obstacle avoidance and performed at each time step. In an

actual workspace, the surrounding environment of the robot

is relatively easy to be detected by these sensors. For example,

the lidar obtains the nearest obstacle distance in each direc-

tion, camera obtains the image of a certain angle of view, and

ultrasonic measures the obstacle speed. These information

will help to model the surrounding environment. However,

due to the lack of understanding of the whole workspace and

some ambiguous factors like moving objects or pedestrians,

the complete global map is difﬁcult to be obtained. Therefore,

the global path planning is hard to conduct in an unknown

environment and the local path planning is a better method

to realize the mobile robot autonomous navigation based on

the surrounding environment.

To illustrate the application scenario of this paper, the

robot task in a warehouse is taken as a motivating exam-

ple. The warehouse robot needs to transport goods from the

warehouse entrance to a certain position in the warehouse

every day, but the distribution of other goods in the ware-

house is unknown. The robot can only perceive its location

in the warehouse, the position of the target and the surround-

ing obstacles (pedestrians or other goods) within perception

range. In such circumstances, only the static global map

acquired ofﬂine is known to the robot, and the robot moves

mainly base on the local information it perceives in real time.

Therefore, this paper aims to provide an effective local path

planning approach to solve the robot navigation problem in

unknown environments.

In recent years, machine learning (ML) techniques have

been extensively exploited in autonomous robots which are

endowed with adaptive learning ability based on accumulated

knowledge. Reinforcement learning (RL) is an important

branch of ML, which continuously updates the agent’s action

policies according to the feedback from the environment by

trial-and-error. Beneﬁting from the development of simula-

tion technology and the accumulation of measured data, the

samples for training are increasingly available. RL is show-

ing promising results in the ﬁeld of mobile robot navigation

for its powerful learning ability in complex environments.

The structure of path planning algorithms is reﬁned and the

parameters of these algorithms are optimized by the richer

data. In addition, early studies also used ML to enhance the

awareness of the environment which is vast, dynamic or par-

tially non-structured (Das et al. 2016).

Focusing on the local path planning algorithm, this work

aims to improve the original dynamic window approach

(DWA) to enhance the global navigation capability, and

proposes a Q-learning based method to learn the opti-

mal parameter adaptation in DWA. Q-learning algorithm

(Watkins 1992) is one of the RL algorithms, which has

many features that makes it suitable for robot path planning.

Firstly, the Q-learning algorithm does not require any prior

information of the environment, but accumulates the aware-

ness of environment through interacting with it. Similarly,

the robot does not have any information of the global map,

but moves only based on the action policy and the real-time

perception of the surrounding environment. Secondly, as an

ofﬂine method, Q-learning algorithm makes past experience

reusable for non-real-time learning, and the action to update

the Q-table does not have to be taken. This learning method

can effectively save the training cost and improve the training

efﬁciency for mobile robots. Thirdly, the training data of Q-

learning is unlabeled and not every single action is given an

immediate reward from the environment. Most of the time,

the overall reward of an action sequence is obtained after a

complete interaction is ﬁnished. Similarly, the environment

may not tell the robot whether a single action is right, but

give a reward after each episode, that is, the end of each path

planning (arriving at target, colliding with obstacles, being

forced to stop or other termination conditions). Finally, the

main goal of Q-learning algorithm is to train a general model

for the same set of tasks. Similarly, the trained model (agent

Q) on a map containing complex conditions can also obtain

satisfactory planning results in other environments.

Our contribution in this paper is two-fold. Firstly, we

improve the three evaluation functions in DWA by modi-

fying the insufﬁciency of the original functions and adding

two new functions to take into account special situations.

These ﬁve evaluation functions enhance the efﬁciency of the

robot navigation to the target under normal conditions, and

solve the problem of local optimum caused by the lack of

the global information. Secondly, DWA parameters adaptive

tunning method based on Q-learning is proposed. To the best

of our knowledge, our work in this paper is the ﬁrst work to

combine the DWA with Q-learning and adjust the weights of

each evaluation function in DWA. Unlike other literatures (to

be discussed in Section 2) where the robot speed is directly

generated by Q-learning, in this paper, the Q-learning is used

to adjust the parameters in DWA and the speed taken by the

robot is still generated by DWA, which ensures the coher-

ence and realizability of the path. The parameters adjusted

dynamically are the weights of each evaluation function and

the forward simulation time, which constitutes the action

space of the agent Q. The state space and reward function are

designed based on the robot pose and its relationship between

the obstacle and target.

The rest of the paper is organized as follows. Section 2

introduces the related achievements on the common path

planning algorithms, and analyzes their merit and demerit.

Section 3presents the deﬁnition and assumptions of the prob-

lem. Section 4proposes the improved DWA by modifying

and adding its evaluation functions, and implements the sim-

ulation to verify its effectiveness. Section 5discusses the

algorithm designed for parameter tunning in DWA based

123

Autonomous Robots (2021) 45:51–76 53

on Q-learning, demonstrates the training process and the

simulation results that shows the superiority of the trained

agent. Section 6conducts the hardware experiments to vali-

date the navigation performance of the proposed method in

real unknown environments. Section 7concludes this paper.

2 Related works

The content of this paper covers the path planning algorithm

of mobile robot and its combination with RL. A summary of

the relevant works and results in these areas are as follows.

Over the past few decades, the path planning technology

has been studied by many scholars. The methods based on

different ideals and map descriptions have been proposed,

such as potential ﬁeld method (PFM), grid-based method,

sampling-based planning (SBP), intelligent algorithm, DWA,

RL, etc. In this section, we brieﬂy introduce other kinds of

path planning algorithms, then focus on the DWA and RL

and analyses their shortcomings.

2.1 Overview of path planning algorithms

Grid-based method is a classic global path planning algo-

rithm that divides the map into the two-dimensional grids.

If there are obstacles close enough to a certain grid, this

gird is regarded as an unreachable grid. Otherwise, it is

a reachable grid. A path connecting the starting point and

the target will be found among the reachable grid. Repre-

sentative algorithms include A* (Hart et al. 1968), REA*

(Zhang et al. 2016), limited-damage A*(Bayili and Polat

2011), D*(González et al. 2017), D Lite (Likhachev et al.

2008), Focused D*(Stentz 1995), etc. Among them, the

A*algorithm and its improved methods are mainly used for

the shortest path calculation on the static map, while the

D* algorithm and its improved methods are for the situation

where the obstacle and target are alterable. The improvement

directions of the grid-based method generally are speeding

up the computation (Zhao et al. 2018), increasing the search-

ing direction to shorten the path (Xin et al. 2014) or ensuring

the safety and feasibility of the trajectory (Chang et al. 2019).

The grid-based method often depends on the known global

map and clear obstacle position, and the result of this method

is a reference path, which still remains the path tracking algo-

rithm to guide the robot motion.

PFM (Khatib 1986) can be used in both global and local

path planning. Its basic principle is to establish a virtual

potential ﬁeld where the target establishes a gravitational

ﬁeld while the obstacle establishes a repulsive one. The robot

moves along the negative gradient direction of the superim-

posed ﬁeld until it reaches the target. An enhanced PFM

(Li and Chou 2016) combining with Levenberg-Marquardt

(LM) algorithm and k-trajectory algorithm was proposed

to overcome the inherent oscillation problem of the basic

PFM. In order to assist the human-robot interaction, an online

local PFM was deﬁned by adapting animal motion attributes,

where the improved potential ﬁeld enabled the robot to move

along the edge of the obstacle (Bence et al. 2016). The local

optimum and the goal non-reachable with obstacles nearby

(GNRON) are common problems of PFM. To solve these

problems, a stronger attractive function was proposed to

ensure that the robot reaches the target successfully and a

rotational force was introduced to allow the robot to escape

from the deadlock positions (Azzabi and Nouri 2019). PFM

is proved to be an effective path planning algorithm where the

effect of the environment is expressed as the action of force,

and the desired velocity is calculated by the resultant force.

The velocity can serve as the command signal of the motor.

However, PFM requires the positions or velocities of all the

obstacles, which is hard to be detected in real time. Solu-

tions of the common problems like GNRON and trajectory

oscillation highly depend on the global map.

The biological or physical laws have been summarized

from the nature to serve various ﬁelds, which is called the

intelligent algorithm. Some latest research about the intel-

ligent algorithm applied in the robot path planning are as

bellow. A DEQPSO method combining with the differen-

tial evolution (DE) algorithm and quantum-behaved particle

swarm optimization (QP- SO) was proposed, and a safe and

ﬂyable path was generated in the presence of different threat

environments for the unmanned aerial vehicle (UAV) by this

method (Fu et al. 2013). To reduce the time of reaching

the optimal path, a generalized intelligent water drops algo-

rithm (IWD) was proposed based on fuzzy local. This method

divides the graph into equal sections, compares the paths on

them with a fuzzy inference system and determines the worth

of each solution by the comparison (Monfared and Salman-

pour 2015). The self-organising map (SOM) algorithm was

proposed as a solution to the multi-robot path planning prob-

lem for active perception and data collection tasks. (Best et al.

2017). An evolutionary approach to solve the mobile robot

path planning problem was proposed, which combined the

artiﬁcial bee colony (ABC) algorithm as a local search pro-

cedure and the evolutionary programming (EP) algorithm to

reﬁne the feasible path found by a set of local procedures

(Contreras-Cruz et al. 2015). An application of the bacterial

foraging optimization (BFO) to the problem of mobile robot

navigation was explored to determine the shortest feasible

path to move from any current position to the target position

in an unknown environment with moving obstacles (Hos-

sain and Ferdous 2015). Intelligent algorithms may obtain

the best result on a particular map by iterative processes and

complex calculations, but are hard to execute when the global

map is unknown or dynamic. Besides this, Intelligent algo-

rithms usually only obtain reference paths.

123

54 Autonomous Robots (2021) 45:51–76

Sampling based planning (SBP) is unique in the fact

that planning occurs by sampling the conﬁguration space

(C-space). In a sense SBP attempts to capture the con-

nectivity of the C-space by sampling it (Elbanhawi and

Simic 2014). The main SBP algorithm includes probabilis-

tic roadmap method (PRM), randomized potential plan-

ner (RPP), rapidly-exploring random trees (RRT), explor-

ing/exploiting tree (EET), expansive space trees (EST), etc.

The potential function based-RRT* incorporating the arti-

ﬁcial potential ﬁeld algorithm in RRT* was proposed to

decrease the iterations number, and then leaded to more

efﬁcient memory utilization and an accelerated convergence

rate (Qureshi and Ayaz 2016). The notion of ﬂexible PRM

was introduced to solve the problem of planning paths in

the workplaces containing obstacles and regions with pref-

erences expressed as degrees of desirability (Khaled et al.

2013). The EET planner deliberately trades probabilistic

completeness for computational efﬁciency. When the avail-

able information captures the structure, the planner became

increasingly exploitative. Otherwise, the planner increased

local conﬁguration space exploration (Rickert et al. 2008).

Although some achievements have been made to solve

the uncertainty (Agha-Mohammadi et al. 2014) or kinody-

namic (Karaman and Frazzoli 2011) in active environments,

there is still much room for improvement of SBP algorithm

amongst moving uncontrollable obstacles and under stochas-

tic dynamic and sensing conditions.

2.2 Path planning with dynamic window approach

Local path planning and obstacle avoidance problem was

ﬁrstly formulated as constrained velocity space optimiza-

tion problem and was solved by curvature velocity method

(CVM) algorithm (Simmons 1996). On the basis of CVM,

the more complete dynamic window approach (DWA) was

proposed which considered the robot physical constraints,

environmental constraints and current speed (Fox et al. 2002).

DWA algorithm ﬁrst obtains a sampling window of the speed

based on the kinematics model and current speed of the robot,

then generates the trajectory for each set of speed within the

window, ﬁnally evaluates these trajectories by the evalua-

tion function to ﬁnd the optimal speed at the next moment.

According to the sensor and robot pose information, the eval-

uation functions often consider three factors of speed, route

angle and distance form obstacle. This method can directly

obtain the desired linear and angular speed while consider-

ing the robot kinematics model, which makes the trajectory

smoother and suitable for robot motion. Viewing the DWA as

a model predictive control method, a version of DWA which

is tractable and convergent was proposed using the control

Lyapunov function (CLF) framework (Ogren and Leonard

2005). A method based on the integration of focused D* and

DWA with some adaptations providing efﬁcient avoidance

of moving obstacles was proposed to enhance the naviga-

tion capability in partially unknown environments (Seder and

Petrovi´c2007). To improve the vulnerable performance of

DWA facing the trap situations, a global dynamic window

approach (GDWA) with a scalar-valued function represent-

ing the distance from the goal point was proposed. (Kiss

and Tevesz 2012; Maroti et al. 2013). An energy efﬁcient

local path planner in dynamic environments was presented,

which extended the DWA to incorporate a cost function based

on energy consumption predicted using a linear regression

(LR) model (Henkel et al. 2016). To improve the accep-

tance of robot navigation and adapt the trajectory to mimic

human behavior, the biomimetical DWA (BDWA) was pro-

posed whose reward function was extracted from real traces

of different motor disabilities navigating in a hospital envi-

ronment (Ballesteros et al. 2017). A FGM-DWA algorithm

was proposed to achieve the safe, smooth and fast navigation

where the follow the gap method (FGM) was to guide the

robot globally through the optimum gap and the DWA was

to avoid the oscillations or collisions caused by the robot

dynamics (Aykut and Volkan 2018).

However, the existing DWA is still unideal in two aspects.

First, the mechanism of the existing evaluation function is

not reasonable enough, which may lead to a better trajec-

tory associated with a lower score. The number of evaluation

functions is inadequate to deal with some special situations

where the robot is very close to the target point or faces

the spiral obstacle distribution. Second, in the path planning

process, the degree of demand for each evaluation function is

time-varying, which changes with the positional relationship

between the robot, obstacle and target. However, the ﬁxed

weights will not be able to adapt to the dynamic situation,

that is, the robot cannot decide which evaluation function

should be taken into special consideration under different

circumstances.

These two aspects may cause some unsatisfactory condi-

tions. For example, the speed of the robot is too low when

it is near the target, or the robot is too close to the obstacle

when avoiding it. When the obstacle is in the spiral distri-

bution, the robot may be trapped inside it and cannot escape

(called local minima problem). When facing dense obstacles,

the robot may not choose the shorter path between them but

bypasses these areas, which makes the overall path longer.

For the ﬁrst aspect, GDWA was proposed to eliminate the

previously mentioned local minima problem by deﬁning a

navigation function (NF) (Brock and Oussama 1999). NF

is a scalar valued function deﬁned on the reachable loca-

tion in global map and has exactly one minimum, namely

the target position. The NF values at each reachable position

were obtained by global path planning algorithm like wave-

front propagation (WP) (Kiss and Tevesz 2012) or Dijkstra

(Maroti et al. 2013). The evaluation function of GDWA is

designed based on the deﬁned NF using its value and gra-

123

Autonomous Robots (2021) 45:51–76 55

dient, which is still a form of weighted summation. GDWA

eliminates the local minima problem in most cases, but may

fail to converge to the target position or still get trapped due to

the unreasonable weights or zero velocity. More importantly,

the deﬁnition of NF highly depends on the accurate global

map, and the ideal of GDWA is similar to functional opti-

mization. In this paper, we know less about the global map

or the obstacle in it, so there is no such NF to optimize. We

only use the real-time detection of the surrounding obstacle.

For the second aspect, a version of GDWA changed the

evaluation function into a no-weighted from. This function

only considers the position of the robot and the value of NF,

which makes it contain only one item (Kiss 2012). Another

kind of method is adjusting each weight separately. For exam-

ple, the weight of evaluation subfunction considering the

velocity became adaptive according to the distance from the

obstacle (Wang et al. 2019), or the weight of one subfunc-

tion was adjusted when other weights were ﬁxed (Xu 2017).

Without the global information or NF serving as the basis for

optimization, the existing optimization algorithms are hard

to apply. Adjusting weights separately splits the relationship

between them, which may miss the best weights combina-

tion. In contrast, our method adjusts all weights jointly and

choses different combinations for different conditions.

2.3 Path planning with reinforcement learning

One of the most interesting approaches applied in path plan-

ning is RL. In the process of RL, the robot learns the best

strategy of path planning by interacting with the environment,

just like some living creatures. In the proposed approach, the

robot path planning was solved with Q-learning, which was

ﬁrst introduced for learning with delayed rewards (Kröse

1995). One of the ﬁrst application of Q-learning in robot

navigation utilized the example trajectories to bootstrap the

value function approximation and split the learning into two

phases (Smart and Kaelbling 2002). In order to improve the

hit rate and reduce the size of the Q-table, the number of

the states was limited based on a new deﬁnition for the states

space (Jaradat et al. 2011). A new methodology was proposed

for Q-learning with improved particle swarm optimization

(IPSO) to reduce the complexity of the classical Q-learning

and the robot’s energy consumption by saving turning angles

and path length (Das et al. 2015).

In recent years, there has been a new trend of combining Q-

learning with deep learning. Under the circumstance where

the global information is available, a new approach using

Q-learning and a neural network planner was proposed to

solve the problem of autonomous movement in environment

containing both static and dynamic obstacles (Duguleana

and Mogan 2016). A model-based path planning method

was proposed with Q-learning, whose Q-value was approx-

imated with a neural network named deep Q-learning and

the reward was calculated from the grid map (Sharma et al.

2017). A log-based reward function was introduced in deep

Q-learning to increase the success rate of obstacle avoid-

ance for the wheeled mobile robot (Mohanty et al. 2017).

Q-learning was also applied in collaborative path planning

system with holonic multi agent architecture (Lamini et al.

2015). Combining with Boltzmann policy to avoid trapping

in local optimum, Q-learning can remarkably improve the

efﬁciency of the multi-robot system, reducing the number of

explorations and converging the process (Wang et al. 2014).

In the aforementioned literatures that applied Q-learning

in the robot navigation, they optimized the generation mode

of Q-values or the selection of reward functions to reduce

the operating cost of programs and robots, or promoted its

application objects to multiple agents. However, the state

apace only considers the position relationship of robot with

the obstacle or target, but ignores the orientation and veloc-

ity of the robot, which may not fully describe the robot state.

In addition, the action apace is relatively simple, generally

are turning right or left, moving one step at several direc-

tions (especially in a grid map). These blunt actions tend to

make the path too incoherent to track and some better paths

be ignored. In contrast, Q-learning applied in our proposed

method is to enhance the performance of DWA, which pre-

serves the advantage of feasible and superior path of DWA.

2.4 Summary and motivation

As mentioned, most of the existing path planning algorithms

rely heavily on known static global maps. If the map has

dynamic obstacles, their state (location, shape or velocity)

needs to be highly understood. Therefore, these algorithms

may not perform well in the environment with unknown static

and dynamic obstacles. In addition, they often obtain the

reference path rather than velocity command, so more sen-

sors or methods should be adopted to track the path, which

increases the complexity of the navigation and computation

for the mobile robot.

In this paper, the navigation scenario contains only the

outline of the global map but not the obstacles in it, and the

obstacles are detected in real time by the sensors on the robot,

so the path planning algorithm can only use the information

of surrounding obstacles. Meanwhile, to simplify the naviga-

tion framework, the algorithm is better to obtain the desired

velocity directly instead of the desired path. Therefore, we

ﬁnd that DWA meets these requirements and is suitable as the

path planning algorithm. As for the two defects of DWA that

analyzed above, we correct them by enhancing the evaluation

functions and parameter self-adaption based on Q-learning.

123

56 Autonomous Robots (2021) 45:51–76

3 Problem deﬁnition and assumptions

Similar to the task in a warehouse above, we deﬁne the navi-

gation problem considered in this paper in general. The initial

position of the robot and target are preset, and the robot only

knows the outline of the global map (like the walls of the

warehouse), which means the obstacles inside the map is

completely unknown. The task of robot is to reach the target

while keeping the safe distance from the obstacles. The robot

can reach the target with any velocity and does not need to

stop at the target.

Beside the outline of the global map, all the information

the robot can get are obtained by the various sensors equipped

(gyroscope, lidar, encoder, etc.). At each time step, the infor-

mation required by the proposed algorithm can be obtained

based on these sensors. Concretely, we make the following

assumptions about the navigation problem.

Assumption 1 The location, orientation and velocity of the

robot in the global map are known at each time step.

Assumption 2 The position of the target in the global map is

known at each time step.

Assumption 3 The distance of the nearest obstacle in each

direction of a circle is known at each time step (realized by

the lidar on the robot). The structure of these data is expressed

as [ΘD], where Θ=[θ1θ2··· θn]Tare angles in the range

[0,2π) with a certain resolution; D=[d1d2··· dn]Tare

the distances of the nearest obstacle in the ith angle, diis not

greater than the detection range; nis the angle resolution of

the lidar, meaning that ndata pairs of angle and distance will

be obtained after the lidar scanning a circle.

The robot in this paper is differential driven, and we only

consider its forward and rotational motion. The kinematics

model of the robot is shown in Eq. (1).

⎧

⎪

⎨

⎪

⎩

˙x=vcos θ

˙y=vsin θ

θ=ω

˙v=a

˙ω=α

(1)

where [x,y]is the position of the robot; θis the orientation of

the robot; vand ωare the linear velocity and angular velocity

of the robot; aand αare the linear acceleration and angular

acceleration of the robot.

4 Improved DWA algorithm

4.1 Basic DWA algorithm

DWA algorithm transforms the path planning problem into

the constrained optimization problem of the velocity space,

and controls the robot motion by outputting the real-time

optimal speed.

The speed set (v, ω) is restricted to the following restric-

tions:

(1) The velocity of robot cannot exceed the maximum.

0≤vt≤vmax

−ωmax ≤ωt≤ωmax (2)

where vtand ωtare the linear speed and angular speed of the

robot at time t; vmax and ωmax are the maximum velocity.

(2) The acceleration of robot cannot exceed the maximum.

vt−amaxΔt≤vt+1≤vt+amax Δt

ωt−αmaxΔt≤ωt+1≤ωt+αmax Δt(3)

where vt+1and ωt+1are the linear speed and angular speed

of the robot at time t+1; amax and αmax are the maximum

linear acceleration and angular acceleration of the robot; Δt

is the time step.

The feasible speed sets are obtained by sampling the

velocity space which matches the kinematics model, and the

predicted trajectories are generated by these speed sets and

the trajectory prediction time period T. These trajectories,

which will be some arcs, are scored by the evaluation func-

tion to ﬁnd the best one and the corresponding speed set. The

original evaluation function is shown in Eq. (4).

J(v, ω) =σ[w1·heading(v, ω) +w2·obdist (v , ω)

+w3·velo ci t y(v, ω)](4)

Equation (4) contains three evaluation subfunctions and their

coefﬁcients w1,w

2,w

3;σdenotes the normalization pro-

cess.

Concretely, function heading(v, ω) calculates the angle

θbetween the orientation angle of the robot at the end of

the predicted trajectory and that of the robot position to the

target. This function evaluates the degree of the trajectory

being toward the target. The schematic diagram of angle θ

is shown in Fig. 1a. The smaller the angle is, the higher the

score will be.

Function obdi s t (v, ω) calculates the minimum distance

from each point of the predicted trajectory to the obstacle.

This function evaluates the degree of the trajectory being

away from the obstacle. The schematic diagram of ﬁnding

the minimum distance is shown in Fig. 1b. Taking three points

as an example, d2is the minimum distance. The farther the

distance is, the higher the score will be. If the minimum

distance of a trajectory is less than the safe distance, this

trajectory will be discarded directly and removed from the

sampling space.

Function velo ci t y(v, ω) evaluates the linear and angular

velocity of the robot, which favors fast and straight move-

123

Autonomous Robots (2021) 45:51–76 57

Fig. 1 Schematic diagram of athe angle θ,bthe minimum distance d

ment. The higher the linear speed and the smaller the angular

velocity are, the higher the score will be.

4.2 Improved DWA algorithm

The basic DWA algorithm only contains three kinds of

evaluation functions, which is not adaptable to unknown

environments. The robot may hesitate in front of the obstacle

for a long time and be easy to fall into the spiral obstacle,

caused by the lack of the number of the existing evaluation

functions and the mechanism of these function being not

quite appropriate. In this section, we modify the existing eval-

uation functions to improve the score of better trajectories in

each speciﬁc subfunction and add two new subfunctions to

consider more complex situations.

4.2.1 Modiﬁed subfunction heading(v,!)

This modiﬁcation is to reduce the negative effects that func-

tion heading(v, ω) may cause when the robot approaches

the target, whose method is changing the reference position

used to calculate the angle.

For the predicted trajectory to be prospective enough, the

trajectory prediction time period Tis generally dozens of

times of the time step, that is, T=(10–30)Δt. Function

heading(v, ω) calculates the angle based on the ﬁnal robot

position of the predicted trajectory, which is the position the

robot can reach by moving at the current speed for the whole

time period T. However, the speed will be re-selected after

Δt, which means that the robot will only move along a short

section of the whole predicted trajectory.

Figure 2shows the comparison of the two predicted tra-

jectories. Obviously, trajectory A is more close to the ideal

one, which means the velocity corresponding to trajectory

A is better. So the score of trajectory A should be higher

than that of trajectory B. If the reference position is at the

end of the predicted trajectory, we can see from Fig. 2that

θB,o<θ

A,o, which makes the trajectory A get the lower

score. If the reference position is set somewhere close to the

current position, we can see from Fig. 2that θA,n<θ

B,n.It

means trajectory A will get the higher score, which makes

the robot turn to the target in time.

Fig. 2 Trajectory comparison for function heading(v, ω). Notations:

1) The black arc means the predicted trajectory. 2) The green arc means

the ideal trajectory. 3) The green star means the target. 4) The grey

rectangle (if there is) means the obstacle. 5) The ideal trajectory, target

and obstacle are the same in the comparison. These notations are also

applicative below (Color ﬁgure online)

The distance from the current position, which is called the

travelled distance, is directly determined by the number of

time steps. In practice, the travelled distance dshould be set

ﬁrst, and the time steps which are required to move such dis-

tance can be estimated by the current speed. Generally, when

dis short enough, the robot trajectory can be considered a

straight line, so the number of the time steps can be estimated

by Eq. (5).

nΔt=fix(d/v ) (5)

where vis the current linear speed; fix(·)is the rounding

function.

When operating the function heading(v, ω),therefer-

ence position of the robot is the position after nΔttime steps

in the predicted trajectory.

Finally, function heading(v, ω) returns an angle between

the orientation of the new reference position and the line

pointing from the new reference position to the target.

4.2.2 Modiﬁed subfunction obsdist (v,!)

This modiﬁcation is to reduce the negative effects that func-

tion obsd i st(v, ω) may cause in the environment with many

obstacles, whose method is similar to that described in Sec-

tion 4.2.1.

Figure 3shows the comparison of two predicted trajecto-

ries. Obviously, trajectory A is more close to the ideal one,

which means the velocity corresponding to trajectory A is

better. In terms of the original function, as can be seen from

Fig. 3, trajectory A will be discarded since dA,ois less than the

safe distance. Actually, all the trajectory similar to trajectory

A will suffer the risk of being discarded while the trajectory

similar to trajectory B can get a higher score, which makes

the navigation poor. Function obsdi st(v, ω) has two effects:

trajectory scoring and discarding. The trajectory discarding

123

58 Autonomous Robots (2021) 45:51–76

Fig. 3 Trajectory comparison for function obsdis t(v, ω)

part should be more cautious as it removes a possible speed

set directly which will not be evaluated by other subfunc-

tions, and the speed is re-selected at each time step. So the

trajectory similar to trajectory A that will not collide with the

obstacle temporarily should not be discarded. The trajectory

scoring part remains unchanged. When discarding the tra-

jectory, the minimum distance from the obstacle should be

calculated from the beginning of the trajectory to the posi-

tion after several time steps. As can be seen from Fig. 3,dA,n

is similar to dB,n, and both are more than the safe distance,

so that the two trajectories can be retained and evaluated by

other subfunctions.

The estimation of the number of time steps is similar to

the Eq. (5). Since the purpose of this function is to avoid

obstacles, the value of the travelled distance dshould be

slightly larger than that of the function heading(v, ω).

Finally, function obsdi st(v, ω) returns the shortest dis-

tance between the predicted trajectory and obstacles.

4.2.3 Modiﬁed subfunction velocity (v,!)

This modiﬁcation is to reduce the negative effects that func-

tion velo ci t y(v, ω) may cause when the robot need to move

slowly and circuitously.

The score of this function is determined by the linear and

angular speed of the trajectory, which is to allow the robot

to move as fast as possible and avoid unnecessary turning.

Figure 4shows the comparison of the two predicted trajec-

tories that share the same linear speed, while the angular

speed of trajectory A is greater than that of trajectory B.

Obviously, trajectory A is better, but trajectory B will get

the higher score using the original function where the scores

of the two speeds are independently calculated and simply

added. Under the same situation, the higher the linear speed

is, the faster the robot reaches the target. Therefore, the scor-

ing function of the linear speed does not need to change.

In the environment with many obstacles, it is more reason-

able to move circuitously with slower linear speed and larger

angular speed, and the larger angular speed will not increase

the risk of collision with obstacles too much when the linear

Fig. 4 Trajectory comparison for function veloci t y (v, ω)

Fig. 5 Schematic diagram of the minimum distance d

speed is slower. Therefore, we modify the scoring function

of the angular speed as Eq. (6).

ω=ωmax −kv

vmax

ω(6)

where the term v

vmax is added to measure the current linear

speed and kis a parameter.

Finally, function veloci t y(v , ω) returns the score of the

current speed which is calculated by Eq. (7), where σdenotes

the normalization of the two speeds.

velo ci t y(v, ω) =σ(ω+v) (7)

4.2.4 Added subfunction goaldist (v,!)

Function goaldist(v , ω) calculates the minimum distance

from each point of the predicted trajectory to the target. This

function is to evaluate the degree of the trajectory being close

to the target. The weight of this function is w4. Figure 5shows

the method to ﬁnd the minimum distance. Taking three points

as an example, d2is the minimum distance. The closer the

distance is, the higher the score will be.

This function is added to enhance the motion trend towards

the target point especially when there are obstacles around

it. Figure 6shows the situation to compare the two predicted

trajectories. Obviously, trajectory A is more close to the ideal

123

Autonomous Robots (2021) 45:51–76 59

Fig. 6 Trajectory comparison for function goaldist(v, ω )

one, which means the velocity corresponding to trajectory A

is better. In terms of the original evaluation functions, we can

see from Fig. 6that θA<θ

Band dA,o<dB,owhich results

in the scores of the two trajectories being almost the same

and it is hard to choose the better one. After adding function

goaldist(v , ω), we can ﬁnd in Fig. 6that dA,g<dB,g, which

helps trajectory A to get the higher score.

Note that the subfunction heading(v, ω) is also to make

the robot move toward the target, but keeping these two

subfunctions is still necessary as they have different effects

and applicable conditions. Though the reference position in

subfunction heading(v, ω) has been modiﬁed, it is still pos-

sible that this position in the trajectory overshoots the target,

especially when the robot is very close to the target, which

will adversely impact the navigation effect. At this time,

subfunction goaldist(v , ω) can help the robot turn to the

target with a steady speed. On the other hand, if the robot

is still far away from the target, the minimum distances

to the target of all the trajectories will be large and simi-

lar, which greatly limits the discrimination of subfunction

goaldist(v , ω). At this time, the robot mainly depend on the

subfunction heading(v, ω) to navigate to the target. Actu-

ally, the subfunction goaldi st(v, ω) is designed only to work

when the minimum distance to the target is less than 2m.In

general, these two subfunctions contribute to the robot mov-

ing to the target from the view of orientation and distance, and

work in different conditions, so they are both indispensable.

Finally, function goaldi st(v, ω) returns the shortest dis-

tance between the predicted trajectory and the target.

4.2.5 Added subfunction oscillation(v,!)

Function osci ll at i on(v, ω) is to evaluate the proximity of

the predicted trajectory and the historical trajectory. The

weight of this function is w5. We describe the map as a two-

dimensional cell group Hwith a certain resolution, which is

similar to the grid map, to store the information of historical

Fig. 7 Updating method of the cell cost

trajectory. Each cell has a cost to describe how close the his-

torical trajectory is to this cell. The higher the value of the

cell is, the easier it is for the robot to stagnate or wander near

this position. The cell group His updated after each robot

movement. The cost of the cell on or near the robot position

will be updated with the method shown in Fig. 7, where P

is the robot position at time tand Rerc is the radius of the

circular area affected by the robot where cell cost should be

updated.

We establish the coordinate system ΩPwith the origin

P, where dij means the distance from the cell with coordi-

nates (i,j)to the origin Pand wij means the cost should be

increased on the cell (i,j)at time twhich is calculated by

Eq. (8).

wij =(Rrec −dij)vt

Rrecvmax

(8)

Adding the speed term to the cost calculation can avoid a

rapid increase in the cost of the surrounding cell when the

robot speed is slower.

Figure 8a shows the trajectory of the robot obtained by

the improved evaluation function in the map with simple

obstacles, and the cell group Hgenerated by this trajectory.

The color of each cell represents its cost, i.e. the darker the

color is, the higher the cost is. The score of the function

osci ll at i on(v, ω) of a trajectory is the total cost of the cell

swept over by this trajectory. For example, as for the trajec-

tory shown in Fig. 8b, its score is the sum of the cost of all

the red cell.

This function is added to prevent the robot from returning

to the place travelled or going around in circle, which is espe-

123

60 Autonomous Robots (2021) 45:51–76

(b)(a)

Fig. 8 aTrajectory of the robot with its cell group H,bCells to calculate

the osci llat ion(v, ω ) value

Fig. 9 Trajectory comparison for function oscil la ti on(v, ω)

cially important when the obstacle is in spiral distribution.

Figure 9shows the comparison of the two predicted trajec-

tories. Obviously, the robot should move along trajectory

A rather than trajectory B. In terms of the original evalua-

tion functions, we can see from Fig. 9that θA<θ

Band

dA,o<dB,o, which results in the scores of the two trajecto-

ries being almost the same and it is hard to choose the better

one. After adding function osci ll ati on(v, ω) we can ﬁnd in

Fig. 9that the gird cost in the red circle of trajectory B is

larger while trajectory A overlaps little with the historical

trajectory. This difference means the proximity of trajectory

B and historical trajectory is signiﬁcantly higher than that of

trajectory A, which helps trajectory A to get the higher score.

Finally, function oscillation(v, ω) returns the total cost

of all the cell swept by the predicted trajectory.

After modifying and adding the evaluation subfunctions,

the improved evaluation function is shown in Eq. (9).

J(v, ω) =σ[w1·heading(v, ω) +w2·obd i st(v, ω)

+w3·velo ci t y(v, ω) +w4·goaldist(v, ω)

+w5·osci ll at i on(v, ω)](9)

Table 1 Parameters of the robot

Parameter name Parameter value

Safety radius R0.4m

vmax 1m/s

ωmax 2πrad/s

amax 0.5m/s

αmax 6πrad/s2

Resolution of linear speed 0.01 m/s

Resolution of angular speed π

36 rad/s

4.3 Simulation analysis

In this section, ﬁve simulations in different scenarios are car-

ried out to prove the effectiveness of the improved DWA

algorithm, and the performance is compared with the original

one. These simulations are implemented in smaller environ-

ments which are similar to the ones in Sect. 4.2 to show the

effect of the improved evaluation functions, and in larger

environments with discrete or spiral obstacle distributions to

show the overall performance of the proposed algorithm.

In order to make the simulation results more practical,

the kinematic model of the robot is set the same as the real

robot used in the hardware experiments. The resolution of

speed balances the navigation performance and computation

cost. Considering the real radius of the robot, the indicator

of reaching the target is the distance between the robot and

target being less than 0.05m. The speed and orientation of the

robot are set to zero at the beginning. The robot parameters

are shown in Table 1.

The parameters in the proposed algorithm are also set to

ensure the comparability, as are shown in Table 2. Weights of

the namesake evaluation subfunctions (w1,w2and w3)are

the same. Some parameters in these subfunctions (travelled

distance in heading(v, ω) and obsdi st(v, ω)) and weights

of the added evaluation subfunctions (w4and w5) are unique

to the improved DWA. Two travelled distances are chosen

according to the analysis in Sects. 4.2.1 and 4.2.2. Safety

is considered as the primary standard of robot motion, so

the weight of the obsd i st(v, ω) (w2) is twice as much as

other evaluation functions, and weights of other evaluation

functions are the same. All the simulations in this subsection

share the same parameters.

4.3.1 Simulation scenario 1

The ﬁrst simulation scenario is carried out in the environ-

ment similar to Fig. 3, which is to show the effect of the

modiﬁed subfunction obsd i st(v, ω). The parameters of the

map are shown in Table 3. Figure 10a, b respectively shows

the trajectory of the original and improved DWA algorithm.

123

Autonomous Robots (2021) 45:51–76 61

Table 2 Parameters of DWA

Parameter name Parameter value

w11

w22

w31

w41

w51

Travelled distance in heading(v , ω) 0.5m

Travelled distance in obsdi st (v , ω) 0.8m

Table 3 Parameters of simulation scenario 1

Parameter name Parameter value

Map size 9 m ×6m

Starting position (1m,1.5m)

Target position (8m,5.5m)

Initial orientation 0 rad

(a) (b)

Fig. 10 Trajectory obtained by the aoriginal DWA, bimproved DWA

in simulation scenario 1

Obviously, the improved trajectory is better since the robot

moves through the obstacles and directly to the target. If there

are more obstacles under the target, the original trajectory

may become even more sub-optimal. The main reason for this

is that the modiﬁed subfunction obsd i st(v, ω) changes the

way of discarding the trajectory, so that trajectories through

the obstacles can be remained.

4.3.2 Simulation scenario 2

The second simulation scenario is carried out in the envi-

ronment similar to Fig. 4, which is to show the effect of

the modiﬁed subfunction velo ci t y(v, ω). The parameters

of the map are shown in Table 4. Figure 11a, b respectively

shows the trajectory of the original and improved DWA algo-

rithm. One can ﬁnd that the improved trajectory go through

the obstacles with a shorter path and a faster speed. It is

mainly because of the modiﬁed subfunction obsd i st(v , ω),

and the modiﬁed subfunction velo ci t y(v, ω) which enables

the robot to consider the linear and angular speed together.

Meanwhile, dangerous trajectories have been discarded by

subfunction obsd i st(v, ω) so the safety is secured.

Table 4 Parameters of simulation scenario 2

Parameter name Parameter value

Map size 8 m ×5m

Starting position (2m,1.5m)

Target position (7.5m,1m)

Initial orientation π/2rad

(a) (b)

Fig. 11 Trajectory obtained by the aoriginal DWA, bimproved DWA

in simulation scenario 2

Table 5 Parameters of simulation scenario 3

Parameter name Parameter value

Map size 4 m ×6m

Starting position (1m,1m)

Target position (3.5m,5.5m)

Initial orientation π/2rad

4.3.3 Simulation scenario 3

The third simulation scenario is carried out in the environ-

ment similar to Fig 6, which is to show the effect of the

modiﬁed subfunction goaldist(v, ω). The parameters of the

map are shown in Table 5. Figure 12a, b respectively shows

the trajectory of the original and improved DWA algorithm.

It is obvious that the improved trajectory reaches the target

with a shorter length while ensures the safety margin. The

main reason is that the added subfunction goaldist(v, ω)

enhances the motion trend towards the target when there are

obstacles around it, while the safety is secured by subfunction

obsd i st(v, ω).

4.3.4 Simulation scenario 4

The forth simulation scenario is carried out in the environ-

ment with discrete obstacles generated randomly at integer

coordinates within certain ranges, which is to show the over-

all performance of the proposed algorithm. The parameters

of the map are shown in Table 6.

Figure 13a, b respectively shows the trajectory of the orig-

inal and improved DWA algorithm. Obviously, the improved

trajectory is smoother. Figure 13c, d respectively shows the

linear speed curve of the original algorithm and the improved

123

62 Autonomous Robots (2021) 45:51–76

(a) (b)

Fig. 12 Trajectory obtained by the aoriginal DWA, bimproved DWA

in simulation scenario 3

algorithm. The improved DWA takes 19.65 s to reach the tar-

get while the original one takes 33.95 s, which proves that the

improved algorithm signiﬁcantly shortens the moving time

and improves the trafﬁc efﬁciency. According to Fig. 13c, it

can be found that the original algorithm has lower trafﬁc efﬁ-

ciency in the position near the obstacles or corners (around

9 s, 21 s and 27 s) and target (around 33 s).

Three reasons can account for the low trafﬁc efﬁciency of

the original DWA. Firstly, the original obsdist (v, ω ) func-

tion calculates the minimum distance from the obstacle by the

whole trajectory, which causes the predicted trajectory with

larger linear speed to be more likely to collide with obstacle.

These trajectories will get low score or be discarded directly

when the robot is near the obstacle. Secondly, the trajecto-

Table 6 Parameters of simulation scenario 4

Parameter name Parameter value

Map size 14 m ×14 m

Starting position (1m,1m)

Target position (13 m,13 m)

Initial orientation 0 rad

ries with larger linear speed and slower angular speed will

get lower score using the original veloci t y (v, ω) function,

which reduces the trafﬁc efﬁciency of the robot when moving

slowly and circuitously. Thirdly, the original heading(v, ω)

function calculates the orientation angle by the ﬁnal position

of the trajectory, which causes the trajectory with larger lin-

ear speed to get lower score due to its end overshooting the

target when the robot is near the target. Also, it delays the

robot reaching to the target which can be found in the close-up

window of Fig. 13a. In contrast, the improved DWA avoids

these three problems and signiﬁcantly improves the trafﬁc

efﬁciency by enhancing the evaluation function.

4.3.5 Simulation scenario 5

The ﬁfth simulation scenario is carried out in the envi-

ronment with spiral obstacle located at the center of the

map. This scenario is mainly to prove the effect of function

osci ll at i on(v, ω). The parameters of the map are shown in

Table 7

Figure 14a, b respectively shows the trajectories of the

original and improved DWA algorithm. Note that the robot

(a) (b)

Fig. 13 Results of in simulation scenario 4. aTrajectory obtained by the original DWA and its close-up window, bTrajectory obtained by the

improved DWA and its close-up window, cLinear speed obtained by the original DWA, dLinear speed obtained by the improved DWA

123

Autonomous Robots (2021) 45:51–76 63

Table 7 Parameters of simulation scenario 5

Parameter name Parameter value

Map size 11 m ×11 m

Starting position (1.5m,1.5m)

Target position (10 m,10 m)

Initial orientation 0 rad

(a) (b)

Fig. 14 Trajectory obtained by the aoriginal DWA, bimproved DWA

in simulation scenario 5

does not know the global distribution of obstacles, so it can-

not go around the obstacle area directly. It is obviously that

the improved trajectory escapes from the spiral obstacle and

successfully reaches the target, while the original trajectory

falls into local minima and keeps turning around. Note that

the function goaldist(v, ω) is designed to work only when

the minimum distance of all the predicted trajectories to tar-

get is less than 2 m, so it is function osci ll ati on(v, ω) that

causes the different performance of the two trajectories.

Comparing Fig. 14a, b, it can be found that the robot moves

straight (see the initial section of the trajectory) when there

is no obstacle around, and the original or improved DWA

has the same effect on the trajectory generation. The snap-

shots of the positions when the two trajectories differ from

each other are shown in Figs. 15 and 16. The green arcs in

front of the robot are all the predicted trajectories in the sam-

pling window, and the trajectory marked with dotted red line

is the highest trajectory to be adopted at the current time.

In Fig. 16, the cell group Hto record the historical trajec-

tory is shown with the similar method in Fig. 8. Observing

Figs. 15a and 16a, at this time, the historical trajectory and

all predicted trajectories of the robot are basically the same.

Owing to function osci ll ati on(v, ω), the improved trajec-

tory avoids the predicted trajectories that are heavily biased

towards the historical trajectory. In contrast, the original tra-

jectory “forgets” the travelled position and always enters the

spiral obstacle. Observing the subﬁgures b, c and d of Figs.

15 and 16, the improved trajectory escapes from the spiral

obstacle and avoids the travelled position at all times, while

the original trajectory cannot escape.

(a) (b)

Fig. 15 Snapshots of the position in the trajectory obtained by the

original DWA in simulation scenario 5. a–dRespectively shows the

robot position in chronological order

(a) (b)

Fig. 16 Snapshots of the position in the trajectory obtained by the

improved DWA in simulation scenario 5. a–dRespectively shows the

robot position in chronological order

123

64 Autonomous Robots (2021) 45:51–76

From the comparison of the two methods in ﬁve simu-

lations, we can ﬁnd that the improved DWA shortens the

trajectory length, improves the capability and efﬁciency of

reaching the target in different environments with complex

obstacles.

5 DWA parameter adaptive tunning

algorithm based on reinforcement

learning

The weight terms of each evaluation subfunction were con-

sidered difﬁcult to choose (Kiss and Tevesz 2012). In this

section, we ﬁrst analyze the parameters needed to be dynam-

ically adjusted in DWA, and propose an adaptive tunning

algorithm based on Q-learning to adjust them, ﬁnally con-

duct the simulation to verify the proposed method.

5.1 Parameter analysis in DWA

Because of the complex real working environment, the

demand for each evaluation function is time-varying. For

example, when the robot approaches the obstacle, it should

turn or decelerate at once, so the effect of function obsdist (v , ω)

needs to increase while other functions decrease to prevent

collisions. When the robot approaches the target, it can arrive

quickly in any direction regardless of the approaching angle,

so the effect of function heading(v, ω) can decrease and

function goaldist(v , ω) increase. When the current direc-

tion deviates from the target severely, it is possible that the

robot avoids obstacles overly or goes around in circle, so

the effect of function heading(v, ω) and oscillation(v , ω)

should increase. It is hard to consider all situations only by

designing functions, so the weights of each function need to

be self-adaptive with the change of the surrounding environ-

ment and the robot state.

In addition, all evaluation functions will use the time

period T. Considering the situation in Fig. 17a, there is a

spiral obstacle near the target, and the robot is still away

from the obstacle in a certain distance. If Tis shorter, all the

sampling trajectories will be shorter. In terms of the existing

ﬁve evaluation functions, trajectory A will get the highest

score. However, after the robot moves along the trajectory

for a while, all trajectories will collide with obstacles which

makes the robots stop and fail to navigate. In contrast, if Tis

longer, the trajectory will be longer, so the further obstacle

can be perceived. Trajectory B may get the highest score,

so that the robot can adjust its heading in advance. How-

ever, considering the situation in Fig. 17b, there are many

small obstacles around the robot. A longer Tresults in a

longer predicted trajectory. Most trajectories will collide with

obstacles, which reduces the discrimination of the function

obsd i st(v, ω) and discards some feasible direction like tra-

Fig. 17 Trajectory analysis for time period T

jectory A. Meanwhile some trajectories may be toward the

target but too close to obstacles, which leads to misjudgment

of function goaldist(v, ω) like trajectory B. These two kinds

of situations are both not ideal. Therefore, the time period T

should also be self-adaptive.

The time period Tessentially determines the distance trav-

eled from the original position. During the considerable time

period, the trajectory can no longer be seen as a straight line.

Once the travelled distance dis determined, we can obtain

the time period Tby the geometric relationship in Fig. 18 and

Eq. (10). Concretely, if the maximum distance between the

trajectory and current position is larger than d,Tis obtained

by the trajectory whose end position is dmeters away from

the original position. Otherwise, Tis set where the trajectory

moves half a cycle.

T=2arcsin

d

2r

ω2r>d

ω2r≤d

(10)

where r=v

ωis the radius of the predicted trajectory; vand

ωare the linear and angular speed of the robot.

According to the analysis above, six parameters need to be

dynamically adjusted during the navigation process, which

are: the weights of each evaluation function w1–w5and the

travelled distance d. The algorithm for parameter adaptation

is illustrated in Sect. 5.2.

123

Autonomous Robots (2021) 45:51–76 65

Fig. 18 Calculation method of time period T

5.2 Parameter adaptive algorithm in DWA based on

Q-learning

5.2.1 Application method of Q-learning in DWA

The tabular Q-learning establishes its agent as a m×nmatrix

Qwhere mis the state dimension and nis the action dimen-

sion. The agent executes an action Ajat state Si, the reward

Rfrom the environment will be superimposed on the posi-

tion (i,j)of the matrix Q. The action is selected according

to the current state, matrix Qand action selection strategy.

The updating rule of matrix Qis written by Eq. (11).

Q(s,a)=(1−α)Q(s,a)+α[R(s,a)+γQ(˜s,˜a)](11)

where α∈(0,1)is the learning rate. The larger of αmeans

the faster learn and converge rate, but may lead to overﬁtting;

γ∈(0,1)is the discount factor. The larger of γmeans the

long-term interests is paid more attention while the smaller

means the current interests; R(s,a)is the reward from the

environment obtained by the current state and action; Q(˜s,˜a)

is the maximum value of Qamong all the actions that the next

state corresponds to.

With the fundamental framework of Q-learning, cus-

tomized algorithm design is needed for the application in

DWA, which includes: deﬁnition of the state space, the action

space and the reward function. The design of the Q-learning

based parameter learning algorithm is presented in subse-

quent subsections.

5.2.2 Deﬁnition of the state space

In the process of robot motion, the state information depends

on the heading and speed of the robot, the surrounding

obstacle and target. In order to apply the agent to different

environments, the state cannot represent the absolute coor-

dinates and orientation on a speciﬁc map. Therefore, some

Fig. 19 Dimensionsketchofas1,bs2

general and relative features need to be extracted from the

complex environment to constitute the state space, and each

state corresponds to a particular situation. The robot’s state

is uniquely described at any time in a given map. Also, the

state space should not be too large, otherwise it will lead to

“the curse of dimensionality”.

The score of the trajectory in the evaluation function deter-

mines the robot speed at the next time step, so the deﬁnition

of state space should be closely related to the evaluation func-

tion. Form the previous analysis for the evaluation function,

we can ﬁnd that some particular features are frequently con-

sidered as the scoring basis, such as: the distance between the

robot and target, the speed and orientation of the robot and

the surrounding obstacle, which means that these features

play a pivotal role in the operation of DWA. Therefore, we

use these features to constitute the dimensions of state space

Sas Eq. (12).

S=[s1s2s3s4]T(12)

where s1–s4respectively are the various dimensions in Sto

be explained in detail below.

(1) Dimension s1is to represent whether the robot is very

close to the target whose value is designed as Eq. (13) and

dimension sketch is shown in Fig. 19a.

s1=1d<3R

2d≥3R(13)

where dis the distance between robot and target; Ris the

safety radius of the robot.

(2) Dimension s2is used to represent whether the robot

is moving towards the target and the approximate deviation.

The value of dimension s2is designed as Eq. (14) and dimen-

sion sketch is shown in Fig. 19b.

s2=⎧

⎨

⎩

1θr,g∈0,π

3

2θr,g∈−π

3,0

3θr,g∈els e

(14)

where θr,gis the angle between the orientation of the robot

and the line pointing from the robot position to the target.

123

66 Autonomous Robots (2021) 45:51–76

Fig. 20 Dimension sketch of s3

(3) Dimension s3is to represent whether the position

where the robot moves with its current speed for 1sis too

far away from its current position, which also reﬂects the

coverage of all the sampling trajectories. The travelled dis-

tance dbetween the two positions is calculated by Eq. (15).

d=2v

ωω>π

ωsin ω

2ω≤π(15)

where vand ωare the linear speed and angular speed of the

robot.

The value of s3is designed as Eq. (16) and dimension

sketch is shown in Fig. 20.

s3=1d≤0.5m

2d>0.5m (16)

(4) Dimension s4is to represent whether there are obsta-

cles in a certain range around the robot, and the approximate

relationship between the obstacle and the robot motion direc-

tion. Dimension s4describes the approximate distribution of

the surrounding obstacles with a single angle, which is called

dominant orientation of obstacles. Laser sensors are widely

used in the ﬁeld of robotic perception, so the method of laser

scanning can be applied to detect the obstacles around. We

divide the surrounding of the robot into several angles evenly,

and measure the distance of the nearest obstacle on each

angle. If the distance exceeds a given range 5R, it will be set

to zero. In this way, the angle of the nearest obstacle θon can

be found which reﬂects the distribution of the obstacle around

to a certain extent, but it is not an ideal feature as it abandons

most of the information. Here we present a concept called

the average orientation of obstacles θoa. Generally speaking,

the smaller the distance is, the more dangerous the obstacle

is, and the larger the proportion in the average orientation

should be. Therefore, the distance reciprocal of each angle

can be used as the weight for obtaining the average angle,

and the average orientation of obstacles is weighted sum of

Fig. 21 Analysis for the dominant orientation of obstacles

the angles, which is calculated by Eq. (17).

θoa =n

i=1θi

n

i=11

(17)

where nis the angle resolution; θiis the ith angle; diis the

distance of the nearest obstacle on the ith angle.

However, the average position is not always accurate. Con-

sidering two situations in Fig. 21,θn,ais the angle between the

orientation of the nearest and average obstacles. In situation

A, θn,ais smaller and the average orientation can reﬂect the

approximate distribution of the surrounding obstacle more

comprehensively. Conversely, obstacles in situation B are

located on both sides of the robot and there are no obsta-

cles in the direction of the average orientation, which makes

the average orientation fail to reﬂect the obstacle distribution.

The reason for this failure is that the obstacles are scattered or

discontinuous, so the average orientation may be the average

of several angles with short distance, which is probably the

discontinuity. At this time, θn,ais generally larger. Consid-

ering that θon also has certain reference value, the dominant

orientation of obstacles θois set as Eq. (18).

θo=θoa θn,a<π

θon θn,a≥π

(18)

Now, the value of s4can be designed as Eq. (19) and the

dimension sketch is shown in Fig. 22.

s4=⎧

⎪

⎨

⎪

⎩

1θr,o∈0,π

3

2θr,o∈−π

3,0

3θr,o∈els e

4 no obstacle around

(19)

where θr,ois the angle between the orientation of the robot

and the dominant orientation of surrounding obstacles.

So far, four kinds of features have been extracted. Single

feature can describe some explicit and measurable informa-

tion, while the relationship between each feature can deduce

some hidden information, such as: the relationship between

target and obstacles, the extent needed of obstacle avoidance,

123

Autonomous Robots (2021) 45:51–76 67

Fig. 22 Dimension sketch of s4

the way the robot moves toward target, etc. Both the measur-

able and hidden information are conducive to the parameter

selection in DWA. According to these four features, the state

space is divided into 48 substates, which constitutes the state

dimension of the matrix Q.

5.2.3 Deﬁnition of the action dimensions

The trajectory adopted by the robot has the highest total score

in the sampling space, and the total score is the weighted sum

of each evaluation function. According to the previous analy-

sis, the time period Tis also an important parameter affecting

the score, which is determined by the travelled distance dand

the speed of the robot. The real-time speed is known, so d

can determine T. Therefore, each dimension in the action

space includes six parameters: the weights of each evalua-

tion function w1–w5and the travelled distance d. Since each

evaluation function has been normalized before participating

in the general evaluation, their importance can be uniquely

represented by the weights. The base value of each weight is

set to 1, and the weight of the evaluation function requiring

more consideration is set 2 or 3. For example, if there are

closer obstacles on the motion direction, the weight of the

function obsd i st(v, ω) should increase. The travelled dis-

tance is divided into three grades, and the range of value for

each parameter is designed in Eq. (20).

w1,w

2,w

3,w

4,w

5∈{1,2,3}

d∈{1,1.5,2}(20)

To reduce the dimension of the action space, we do not

use all the combinations of the possible parameter values, but

select a representative part of them, namely the ﬁve weights

are chosen from {1,2}or {1,3}. After removing the equal

proportion weights and considering the three-grade trav-

elled distance, 93 combinations from the weights set {1,2}

are obtained by (1+C1

5+C2

5+C3

5+C4

5)×3, and 90

combinations are obtained from the weights set {1,3}by

(C1

5+C2

5+C3

5+C4

5)×3. In summary, 183 combinations

are selected to constitute the action dimension of the matrix

5.2.4 Deﬁnition of the reward function

The reward function is to calculate the reward from the envi-

ronment of a speciﬁc action on a given state, which is an

indicator of how good or bad the action is. The deﬁnition of

reward function should consider different aspects and rely

on the actual movement whose goal is to reach the target

quickly and avoid obstacles. We refer to some Atari game

environments which are commonly applied in RL research to

motivate the reward values, and design some minor rewards

for each change of the state to avoid the reward being too

sparse. Concretely, the principle of the reward function is as

follows. If the robot reaches the target, it gets a larger posi-

tive reward (+5000). If the robot collides with obstacles, it

gets a larger negative reward (−200). If the robot is closer

to the target (+10) or farther from the obstacle (+5), it gets

a certain positive reward, otherwise, it gets a certain nega-

tive reward (−10 or −5). In addition, the robot gets smaller

negative reward (−2) at each time the state changes before

reaching the target. Generally, the robot will experience more

states during the path which costs more time, so we hope this

reward can help the robot experience few states to obtain a

faster path.

Concretely, when the robot executes an action Aon the

previous state Spr e and enters the current state Scur r (Spre =

Scurr ),therewardRof Afrom the environment is calculated

with Algorithm 1.

5.2.5 Training process of the proposed method

The application of Q-learning in DWA is different from that

in other ﬁeld such as labyrinth, which is mainly reﬂected in

two aspects. First, in the real environment, the real state of

the robot changes continuously while the 48 states consisting

of the features are discrete, so not each action will change

the state and get the reward. Note that the previous action

should be maintained if the state is not changed, so that the

investigation time of an action can be guaranteed. The time

of updating the matrix Qis when the state changes, and the

position of updating in the matrix Qis determined by the state

before this change and the action causing this change. Sec-

ond, because of the large dimension of the matrix Q, when

selecting a new action, especially in the initial stage of train-

ing, there will be many zeros in the matrix Q, which causes

some states to have many actions with the same maximum

Qvalue. At this time, selecting an action randomly is not

appropriate, since it makes the action change too frequently.

The investigation time of an action is not long enough, and

123

68 Autonomous Robots (2021) 45:51–76

Algorithm 1: Algorithm for calculating the reward R

from the environment

Input:

Nearest distance to the obstacle at Spr e:do,pr e ;

Nearest distance to the obstacle at Scur r :do,curr ;

Distance to the target at Spr e:dt,pr e ;

Distance to the target at Scurr :dt,cur r .

Output:

Reward: R.

1R=0;

2if Scurr collides with obstacles then

3R=R−200;

4return R;

5if Scurr reaches the target then

6R=R+5000;

7return R;

8if do,curr >do,pr e then

9R=R+5;

10 else

11 R=R−5;

12 if dt,curr >dt,pr e then

13 R=R−10;

14 else

15 R=R+10;

16 R=R−2;

17 return R;

ﬁnally, the learning results are highly random. It should be

checked whether the action set with the maximum Qvalue

Amax Qcontains the previous action, and if does, the previous

action should be executed to keep its continuity, otherwise,

select one randomly. The end indicator of a navigation (say

an episode) is the robot reaching the target, forced to stop or

collision with obstacles.

Algorithm 2illustrates the training process of the proposed

method, where we assume that the initial robot pose does not

meet the end indicator of an episode. Concretely, ideﬁned

in line 1is the counter of the episode, and the training ends

when iexceeds the maximum times N.jdeﬁned in line

6is the counter of the time steps in one episode by which

we can calculate the time required for a real robot to ﬁnish

this episode. The weights chosen in line 10 are the default

weights.

5.3 Simulation analysis

The simulations are carried out in this section to prove imple-

ment the training process and show the training results of the

proposed method using MATLAB. To enhance the training

effect and make the result adapt to different environments,

we design the map shown in Fig. 23. In this ﬁgure, the dis-

tribution of obstacles is various and there are many feasible

paths connecting the starting point and the target. Using such

a complex map is to enrich the states perceived (row of the

Algorithm 2: Algorithm for parameter adaptation in the

improved DWA based on Q-learning

Input:

Information of the map and robot, e.g. stating point, target,

kinematic model of the robot, etc;

Parameter in this algorithm, e.g. maximum training time N,

learning rate α, greed factor γ,etc.

Output:

Trained agent: Q.

1i=1;

2Q=zero matri x of 48 ×183;

3while i≤Ndo

4Initialize robot pose;

5//an episode where Spr e and Scurr are the previous and current

state, Apre and Acur r are the previous and current action

6j=1;

7while True do

8Scurr = state perceived;

9if j=1then

10 Acurr =[111111.5]T;

11 else

12 if Spre =Scur r then

13 Acurr =Apr e ;

14 else

15 Calculate Rby Algorithm 1;

16 Update Qby Eq. (11);

17 if Scurr meet the end indicator of an episode then

18 break

19 else

20 if strategy is exploitation then

21 if Apre ∈Amax Qthen

22 Acurr =Apr e ;

23 else

24 Acurr =r an dom a ction i n Amax Q;

25 else

26 Acurr =r an dom a ction;

27 Spre =Scur r ;

28 Apre =Acur r ;

29 Robot moves with Acur r for one time step;

30 j=j+1;

31 i=i+1;

32 return Q;

matrix Q) and the actions chosen (column of the matrix Q)

during the training, which will help to train the agent Qfully

and make it applicative to other maps. Note that although the

distribution of the obstacles is determined, the robot is set to

use the information of the obstacles within a certain range,

which reﬂects the unknown characteristic of the global envi-

ronment. This simulation contains three parts: the ﬁrst part

is the training process to obtain the trained agent, the second

part is the test result to show the superiority of the trained

agent compared with the ﬁxed weight, and the third part is the

comparisons with other algorithm in different environments.

123

Autonomous Robots (2021) 45:51–76 69

Table 8 Parameters of the training process

Parameter name Parameter value

Map size 16 m ×16 m

Starting position (2.5m,2.5m)

Target position (14.5m,14.5m)

Initial orientation π/4rad

α0.5

γ0.5

εin ε−greed y 0.02

5.3.1 Training process

The parameters of the training are set by convention. The

initial speed of the robot is set to zero. Without loss of gen-

erality, the initial orientation of the robot is set to π/4rad

to make the trajectories be distributed in both sides of the

map diagonal. The kinematic model of robot is the same as

before. The parameters of the training process are shown in

Table 8.

The training is implemented on an Intel Core i7-8750H 2.2

GHz CPU with 16 GB memory and takes about 6.3 h. After

the training, 359 successful paths were obtained among the

5000 episodes, which are shown in Fig. 23a. It is clear that

multiple paths have been explored by the robot. To embody

the training effect more clearly, all the episodes are divided in

to 50 groups orderly and each group contains 100 episodes,

which are used for statistics. Figure 24 shows the number

of arriving paths and the average time consuming of the

arriving paths of each group respectively. With the agent Q

continuously updates, the successful rate of arrival presents a

rising tendency while the average time consumption presents

a declining tendency, which shows that the proposed method

conduces to the robot avoiding obstacles and ﬁnding a faster

path. A trained agent is obtained after the training. Figure

23b shows the trajectory of the robot adopting this trained

agent whose time consumption is 21.5s. The robot success-

fully reaches the target with relatively less time-consuming

path, which also reﬂects the effectiveness of the proposed

met- hod.

5.3.2 Test results

To further verify the performance of the trained agent, it

should be adopted in different environment. Note that the

information that the robot can perceive includes its pose and

speed, the relative position to the target and obstacles within

a certain range. These information will vary with different

initial states including positions and speeds. In addition, the

agent does not record any information about the distribution

of obstacles during the training. Therefore, the performance

(a) (b)

Fig. 23 aArriving paths among training, bTrajectory adopting the

trained agent Q

(a)

(b)

Fig. 24 The statistical results of each group. aAverage time consump-

tion of the successful path (s), bNumber of successful arrival (s), where

the discontinuity of the curve means no arriving paths in the correspond-

ing group

of the trained agent can be thoroughly evaluated with ade-

quate number of different initial states in the same map.

We randomly set 700 initial states, speciﬁcally: the start-

ing position is randomly distributed at the L-shaped area of

the lower left portion of the map, which is highlighted with

green rectangle in Fig. 25; starting linear speed is randomly

chosen in [0,0.8vmax]; starting angular speed is 0; starting

orientation is randomly chosen in [0,2π).

Figure 25 shows the navigation results of random ini-

tial state adopting the trained agent Qand three differ-

ent ﬁxed parameter sets respectively. The parameter set I

[111111.5]Tcontains the default parameters. This set

has the same weights of each evaluation subfunction which

means each of them shares the same importance in different

states, and the travelled distance 1.5mwhich is the medium

grade in Eq. (20). The parameter set II [121112]Tis

the same as the simulation in Sect. 4.3, which favors the

subfunction of obstacle avoidance. The parameter set III

[211111.5]Tfavors the subfunction of navigation to the

target. In Fig. 25, the length and orientation of line segments

123

70 Autonomous Robots (2021) 45:51–76

Fig. 25 Navigation result of

random initial state adopting a

trained agent Q,bparameter set

I, cParameter set II, d

Parameter set III

(a) (b)

at the initial position of each path means the initial linear

speed and orientation of the robot. The arrival path is blue

while the failure path is red, and the failure points where

robot is forced to stop or collides with obstacles are marked

with stars.

One can ﬁnd from Fig. 25a that the trained agent enables

the robot to move ﬂexibly between obstacles and reach the

target in most case. In contrast, the arriving paths of Fig.

25b mostly reach the target from the edge or outside of the

obstacle area, and many paths with the starting position that

need the path to go through the middle of the obstacle area

fail. In Fig. 25c, the trend of bypassing the whole obstacle

area is even more obvious as the subfunction of obstacle

avoidance has the largest weight in parameter set II. The

weight of subfunction of navigation to the target is set higher

than other weights in parameter set III. As a result, the willing

of going through the obstacle is strong but the least paths

successfully reach the target in 25d. One can ﬁnd from these

subﬁgures that the ﬁxed parameter set will make the robot

always show a speciﬁc preference of navigation but fail to

consider the different states to change this preference, which

leads to low success rate. The comparison results are shown

in Table 9. Under the same randomly-chosen initial states,

the success rate of the trained agent is signiﬁcantly higher

than that of the other ﬁxed parameter sets.

This test further proves that the proposed method with

trained agent Qcan ﬁnd the best action at different states

by learning, and have signiﬁcant effects without long-term

training.

5.3.3 Comparisons with other algorithms

In addition, we test the proposed method on different envi-

ronments by comparison with other local path planning

algorithms. The application scenario in this paper does not

include prior knowledge of obstacle in the global map, so the

comparison algorithm can only use the obstacle within the

detection range for real-time planning.

Rolling RRT algorithm is an improved version of RRT

which uses the rolling detection window instead of the accu-

rate global map for planning. This planning pattern is very

similar to our proposed method. Figure 26a is the result of

rolling RRT which is also smoothed by the Bezier curve (Li

et al. 2016). We build the same environment and test our

method with the trained agent Qin this environment, which

is shown in Fig. 26b. Obviously our path is smoother and

shorter without any smoothing process.

Bug algorithm is a stress based algorithm which avoids

the obstacle by rotating around it. In general, Bug algorithm

does not include the global map such as Bug2 (Lumelsky

and Stepanov 1987), VisBug (Lumelsky and Skewis 1990)

or E-Bug (Lynda 2015), but some versions still consider it

as K-Bug (Langer et al. 2007) or TangentBug (Ishay et al.

1998). Figure 27a is the result of a set of Bug algorithms

(Lynda 2015). We build the same environment and test our

method with the trained agent Qin this environment, which is

shown in Fig. 27b. We can ﬁnd that our trajectory is similar to

the best path (E-Bug) of the Bug family while being smoother

and safer.

123

Autonomous Robots (2021) 45:51–76 71

Table 9 Comparison of

navigation result with ﬁxed

parameters and trained agent

Number of navigations Number of arrival Success rate (%)

Trained agent 700 636 90.86

Parameter set I 700 184 26.29

Parameter set II 700 266 38.00

Parameter set III 700 120 17.14

(a) (b)

0102030405060708090100

100

Fig. 26 Navigation result of arolling RRT (Li et al. 2016), bour method

with trained agent Q

(a) (b)

Fig. 27 Navigation result of aBug family (Lynda 2015), bour method

with trained agent Q

Moreover, comparison algorithms only obtain the path. In

contrast, our paths are actually trajectories, which means that

no path tracking algorithm is needed to follow the path. This

is the advantage of DWA and our method preserves it.

The comparisons prove that our proposed method adapts

to different environments and shows better effects than other

algorithms. Another different environment is built in Sect.

6.2.1.

6 Hardware experiments

6.1 Experimental framework

The experiments are carried out on the XQ-4 Pro two-

wheeled mobile robot with Intel Core i7-4500U 1.8 GHz

CPU, 8 GB memory and Robot Operating System (ROS)

based on Ubuntu 14.04, which is shown in Fig. 28.This

robot is equipped with many sensors such as gyroscope, lidar,

Fig. 28 The XQ-4 Pro robot

infrared, camera, etc. The robot’s kinematic model is the

same as the simulation in Sect. 5.3. We use a workstation to

remotely login and operate the XQ-4 Pro through the wireless

network, and the monitoring and control on the workstation

is based on Rviz software.

In ROS, the autonomous navigation tasks involve many

nodes, which are divided into four categories by different

functions: sensor node, localization node, path planning node

(named move_base) and motor control node. The proposed

algorithm in this paper are implemented in the move_base

node through C++. The framework of autonomous naviga-

tion is shown in Fig. 29. The thin dashed box shows the

process of original path planning, where the A* global path

planner receives the information of the global map and tar-

get, and obtains a reference path which needs to be tracked

by the original DWA local path planner. However, when the

global map is less known or dynamic, a feasible reference

path is hard to obtain. In contrast, our new process shown in

the thick dashed box empowers the DWA algorithm with the

capability of global navigation. As it can be seen in the thick

dashed box, the improved DWA algorithm does not need the

global map, but only the target. The trained agent with opti-

mal weights obtained by the Algorithm 2is deployed on the

move_base node. Note that the parameters of the training in

Sect. 5.3 are set the same as those in the experiment, so the

trained agent form the simulation can be adopted in the real

robot. The action with the maximum Q-value, which con-

tains the optimal weighs of each evaluation function and the

traveled distance, is selected according to the trained agent

and the current state calculated by the perception. Finally,

123

72 Autonomous Robots (2021) 45:51–76

Fig. 29 Framework of autonomous navigation

the best trajectory is obtained and the corresponding speed

command is released to the motor controller.

In addition, due to the existence of unknown obstacles,

the reference path obtained by the global path planner may

be found unfeasible during the navigation. At this time, the

original path planning will rerun the A* global path planner to

get a new reference path considering the obstacles perceived.

This process is time-consuming and inefﬁcient, especially

when there are many unknown obstacles. In contrast, our

new process can navigate and deal with unknown obstacles

without the reference path.

The supplementary materials of this paper include two

videos named “dwa_static” and “dwa_dynamic”. These

videos are the screen recordings of the workstation during

the two experiments respectively. The agent Qdeployed in

the experiments is obtained by the training in Sect. 5.3.1,so

the successful arrivals in the videos validate the effectiveness

of the proposed method. The screenshots of the videos are

shown and analyzed in Sect. 6.2.

6.2 Experimental results

The experiments are conducted in an unknown environment

with static or dynamic obstacles to verify the performance

of the proposed method. All the experimental environments

are set up in the corridor of an ofﬁce building. Note that

we still operate the global path planner to show the differ-

ence between the reference path generated by the global path

planner and the real trajectory obtained by our proposed algo-

rithm. However, the reference path plays no role in the robot

navigation.

Fig. 30 aOriginal map of experiment scenario 1, breal experiment

scenario 1

6.2.1 Experiment scenario 1

The ﬁrst experiment scenario is carried out in the environ-

ment with unknown static obstacles. To reﬂect the char-

acteristic of “unknown”, the original scenario contains no

obstacles, and its 2D map built by simultaneous localization

and mapping (SLAM) is shown in Fig. 30a. However, as can

be seen in Fig. 30b, the actual environment for navigation

is equipped with obstacles such as stools and boxes. The

robot has no prior knowledge of the obstacles and can only

perceive the obstacles during the navigation. The spatial dis-

tribution of these obstacles is set to signiﬁcantly intercepts

the trajectory of the robot towards the target.

Figure 31 shows the temporal sequence of the snapshots of

robot navigation visualized in ROS Rviz and the correspond-

ing real robot position in experiment 1. Concretely, Fig. 31a

shows the starting position, the target and the reference path.

We can ﬁnd that the reference path only considers the origi-

nal obstacle (the wall). Figure 31b–d show the movements of

robot avoiding the equipped obstacles and Fig. 31eshowsthe

real trajectory as the robot successfully reaches the target. It

can be seen that the robot moves ﬂexibly between the obsta-

cles and the trajectory is quite different from the reference

path, which proves that the proposed method can navigate

the robot in an unknown environment.

6.2.2 Experiment scenario 2

The second experiment scenario is carried out in the environ-

ment with one moving pedestrian as the dynamic obstacle.

Note that although the dynamic obstacles are not included

in our training process, the original DWA is well improved

for better navigation performance, and the trained agent is

obtained by the complex environment containing various

states deﬁned in Sect. 5.2.2, so the proposed method can

deal with the scenario with simple dynamic obstacles. The

map of the original scenario built by SLAM is shown in Fig.

32a and the actual environment for navigation is shown in

Fig. 32b. In the experiment, a pedestrian walks randomly in

123

Autonomous Robots (2021) 45:51–76 73

Fig. 31 The process of robot experiment 1. a–eRespectively shows the

robot position in chronological order

Fig. 32 aOriginal map of experiment scenario 2, breal experiment

scenario 2

the scenario who signiﬁcantly intercepts the trajectory of the

robot towards the target twice.

Figure 33 shows the temporal sequence of the snapshots

of the robot navigation visualized in ROS Rviz and the cor-

responding real robot position in experiment 2. Concretely,

Fig. 33a shows the starting position, the target and the ref-

erence path. Since there is no obstacle in the original map,

the reference path goes straight to the target. Figure 33b–g

show the movements of robot avoiding the moving pedes-

Fig. 33 The process of robot experiment 2. a–hRespectively shows

the robot position in chronological order

123

74 Autonomous Robots (2021) 45:51–76

trian. When a pedestrian hinders the robot motion, the robot

steers away from this pedestrian (see Fig. 33b, c, e, f). When

the pedestrian is far from the robot, the robot returns to the

target direction (see Fig. 33d, g). Figure 33e shows the real

trajectory as the robot successfully reaches the target. It can

be seen that the real trajectory avoids the moving pedestrian

and is quite different from the reference path, which proves

that the proposed method can effectively navigate the robot

in the simple dynamic environment.

7 Conclusion

As a classic local path planning algorithm, DWA has little

capability of global search and relies on a reference path,

which makes the robot navigation hard to achieve when the

global environment is unknown. In this paper, a modiﬁed

method for mobile robot navigation in unknown environ-

ments is proposed using the improved DWA combined with

Q-learning. We improve the original DWA by modifying and

extending the original evaluation functions, whose calcula-

tion method are redesigned and number of the functions is

increased to enhance the navigation capability of DWA. In

addition, the weights of each evaluation function were dif-

ﬁcult to choose, so the Q-learning is applied to adaptively

tune these weights. In order to apply Q-learning to robot nav-

igation in an unknown environment, the state space, action

space and reward function are deﬁned. These deﬁnitions try

to mimic the human perception, reasoning and handling of

the unknown environment. Meanwhile they balance naviga-

tion performance and computation cost.

Series of simulations are carried out to verify the per-

formance of the improved DWA and its combination with

Q-learning. The ﬁrst part of simulations shows that the efﬁ-

ciency of navigation of the improved DWA is signiﬁcantly

better than that of the original DWA in both environments

with discrete and spiral obstacle. In the second part of sim-

ulations, the training process shows the trend of decreasing

time consumption and increasing success rate, the test result

shows that the success rate of the DWA with the trained

agent is signiﬁcantly higher than that of DWA with the ﬁxed

parameters, and the comparisons prove the superiority of our

method against other local planners. The proposed method

is validated by the hardware experiments based on XQ-4 Pro

robot. The experimental results show that the improved DWA

with trained agent is able to navigate the robot in both static

and dynamic unknown environment.

As future work, dynamic obstacles will be considered in

the training process and more complex obstacles will be set

in the experiment. Also, we may study pedestrian behaviors

avoiding robot to promote the robot obstacle avoidance.

References

Agha-Mohammadi, A. A., Chakravorty, S., & Amato, N. M. (2014).

Firm: Sampling-based feedback motion-planning under motion

uncertainty and imperfect measurements. The International Jour-

nal of Robotics Research,33(2), 268–304.

Azzabi, A., & Nouri, K. (2019). An advanced potential ﬁeld method

proposed for mobile robot path planning. Transactions of the

Institute of Measurement and Control,. https://doi.org/10.1177/

0142331218824393.

Ballesteros, J., Urdiales, C., Velasco, A. B. M., & Ramos-Jimenez, G.

(2017). A biomimetical dynamic window approach to navigation

for collaborative control. IEEE Transactions on Human–Machine

Systems,47(6), 1123–1133.

Bayili, S., & Polat, F. (2011). Limited-damage A*: A path search algo-

rithm that considers damage as a feasibility criterion. Knowledge-

Based Systems,24(4), 501–512.

Best, G., Faigl, J., & Fitch, R. (2017). Online planning for multi-robot

active perception with self-organising maps. Autonomous Robots,

42(4), 715–738.

Brock, O., & Oussama, K. (1999). High-speed navigation using the

global dynamic window approach. In 1999 IEEE international

conference (pp. 341–346).

Chang, L., Shan, L., Li, J.,& Dai, Y. W. (2019). The path planning of

mobile robots based on an improved A* algorithm. In 2019 IEEE

16th international conference on networking, sensing and control

(ICNSC) (pp. 257–262).

Contreras-Cruz, M. A., Ayala-Ramirez, V., & Hernandez-Belmonte,

U. H. (2015). Mobile robot path planning using artiﬁcial bee

colony and evolutionary programming. Applied Soft Computing,

30(2015), 319–328.

Das, P. K., Behera, H. S., & Panigrahi, B. K. (2015). Intelligent-based

multi-robot path planning inspired by improved classical Q-

learning and improved particle swarm optimization with perturbed

velocity. Engineering Science and Technology, an International

Journal,. https://doi.org/10.1016/j.jestch.2015.09.009.

Das, P. K., Behera, H. S., & Panigrahi, B. K. (2016). Intelligent-based

multi-robot path planning inspired by improved classical q-

learning and improved particle swarm optimization with perturbed

velocity. Engineering Science and Technology, an International

Journal,19(1), 651–669.

Duguleana, M., & Mogan, G. (2016). Neural networks based rein-

forcement learning for mobile robots obstacle avoidance. Expert

Systems with Applications,62(2016), 104–115.

Durrant, W. H. (1994). Where am I? A tutorial on mobile vehicle local-

ization. Industrial Robot,21(2), 11–16.

Elbanhawi, M., & Simic, M. (2014). Sampling-based robot motion plan-

ning: A review. IEEE Access,2(2014), 56–77.

Fox, D., Burgard, W., & Thrun, S. (2002). The dynamic window

approach to collision avoidance. IEEE Robotics & Automation

Magazine,4(1), 23–33.

Fu, Y., Ding, M., Zhou, C., & Han, H. (2013). Route planning for

unmanned aerial vehicle (UAV) on the sea using hybrid differen-

tial evolution and quantum-behaved particle swarm optimization.

IEEE Transactions on Systems, Man, and Cybernetics: Systems,

43(6), 1451–1465.

González, R., Jayakumar, P., & Iagnemma, K. (2017). Stochastic mobil-

ity prediction of ground vehicles over large spatial regions: A

geostatistical approach. Autonomous Robots,41(2), 311–331.

Hart, P. E., Nilsson, N. J., & Raphael, B. (1968). A Formal Basis for the

Heuristic Determination of Minimum Cost Paths. IEEE Transac-

tions on Systems Science and Cybernetics,4(2), 100–107.

Henkel, C., Bubeck, A., & Xu, W. (2016). Energy efﬁcient dynamic

window approach for local path planning in mobile service robotics

*. IFAC PapersOnLine,49(15), 32–37.

123

Autonomous Robots (2021) 45:51–76 75

Hossain, M. A., & Ferdous, I. (2015). Autonomous robot path plan-

ning in dynamic environment using a new optimization tech-

nique inspired by bacterial foraging technique. Robotics and

Autonomous Systems,64(2015), 137–141.

Ishay, K., Elon, R., & Ehud, R. (1998). TangentBug: A range-sensor-

based navigation algorithm. The International Journal of Robotics

Research,17(9), 934–953.

Jaradat, M. A. K., Al-Rousan, M., & Quadan, L. (2011). Reinforcement

based mobile robot navigation in dynamic environment. Robotics

and Computer-Integrated Manufacturing,27(1), 135–149.

Karaman, S., & Frazzoli, E. (2011). Sampling-based algorithms for

optimal motion planning. The International Journal of Robotics

Research,30(7), 846–894.

Khaled, B., Froduald, K., & Leo, H. (2013). Randomized path

planning with preferences in highly complex dynamic environ-

ments.Robotica,31(8), 1195–1208.

Khatib, O. (1986). Real-time obstacle avoidance for manipulators and

mobile robots. International Journal of Robotics Research,5(1),

90–98.

Kim, S., & Kim, H. (2010). Optimally overlapped ultrasonic sensor ring

design for minimal positional uncertainty in obstacle detection.

International Journal of Control, Automation, and Systems,8(6),

1280–1287.

Kiss, D. (2012). A receding horizon control approach to navigation in

virtual corridors. Applied Computational Intelligence in Engineer-

ing and Information Technology,1(2012), 175–186.

Kiss, D., & Tevesz, G. (2012). Advanced dynamic window based nav-

igation approach using model predictive control. International

conference on methods & models in automation & robotics (pp.

148–153).

Kovács, B., Szayer, G., Tajti, F., Burdelis, M., & Korondi, P. (2016).

A novel potential ﬁeld method for path planning of mobile robots

by adapting animal motion attributes. Robotics and Autonomous

Systems,82(C), 24–34.

Kröse, Ben J. A. (1995). Learning from delayed rewards. Robotics and

Autonomous Systems,15(4), 233–235.

Lamini, C., Fathi, Y., & Benhlima, S. (2015). Collaborative Q-learning

path planning for autonomous robots based on holonic multi-agent

system. In International conference on intelligent systems: theo-

ries & applications (pp. 1–6).

Langer, R. A., Coelho, L. S., & Oliveira G. H. C. (2007). K-Bug,

a new bug approach for mobile robot’s path planning. Control

applications. In 2007 IEEE international conference on control

applications (pp. 403–408).

Li, G., & Chou, W. (2016). An improved potential ﬁeld method for

mobile robot navigation. High Technology Letters,22(1), 16–23.

Li, M., Song, Q., Zhao, Q. J., & Zhang, Y. L. (2016). Route planning for

unmanned aerial vehicle based on rolling RRT in unknown envi-

ronment. In 2016 IEEE international conference on computational

intelligence and computing research (pp. 1–4).

Likhachev, M., Ferguson, D., Gordon, G., Stentz, A., & Thrun, S.

(2008). Anytime search in dynamic graphs. Artiﬁcial Intelligence,

172(14), 1613–1643.

Lumelsky, V. J., & Skewis, T. (1990). Incorporating range sensing in

the robot navigation function. IEEE Transactions on Systems, Man

and Cybernetics,20(5), 1058–1069.

Lumelsky, V. J., & Stepanov, A. A. (1987). Path-planning strategies for

a point mobile automaton moving amidst unknown obstacles of

arbitrary shape. Algorithmica,2(1), 403–430.

Lynda, D. (2015). E-Bug: New bug path-planning algorithm for

autonomous robot in unknown environment. In 2015 international

conference on information procession,security and advanced com-

munications (pp. 1–8).

Maroti, A., Szaloki, D., Kiss, D., & Tevesz, G. (2013). Investigation

of dynamic window based navigation algorithms on a real robot.

In 2013 IEEE 11th international symposium on applied machine

intelligence and informatics (SAMI)(pp. 95–100).

Mohanty, P. K., Sah, A. K., Kumar, V., & Kundu, S. (2017). Application

of deep Q-learning for wheel mobile robot navigation. Interna-

tional conference on computational intelligence & networks (pp.

88–93).

Monfared, H., & Salmanpour, S. (2015). Generalized intelligent water

drops algorithm by fuzzy local search and intersection operators on

partitioning graph for path planning problem. Journal of Intelligent

& Fuzzy Systems,29(2), 975–986.

Ogren, P., & Leonard, N. E. (2005). A convergent dynamic window

approach to obstacle avoidance. IEEE Transactions on Robotics,

21(2), 188–195.

Özdemi, A., & Sezer, V. (2018). Follow the gap with dynamic window

approach. International Journal of Semantic Computing,12(01),

43–57.

Pinto, A. M., Moreira, E., Lima, J., Sousa, J. P., & Costa, P. (2016). A

cable-driven robot for architectural constructions: a visual-guided

approach for motion control and path-planning. Autonomous

Robots,41(7), 1487–1499.

Qureshi, A. H., & Ayaz, Y. (2016). Potential functions based sampling

heuristic for optimal path planning. Autonomous Robots,40(6),

1079–1093.

Rickert, M., Brock, O., & Knoll, A. (2008). Balancing exploration and

exploitation in motion planning. In IEEE international conference

on robotics & automation (pp. 2812–2817).

Seder, M. & Petrovi´c, I. (2007). Dynamic window based approach to

mobile robot motion control in the presence of moving obstacles. In

Proceedings of the 2007 IEEE international conferenceon robotics

and automation (pp. 1986–1991).

Sharma, A., Gupta, K., Kumar, A., Sharma, A., & Kumar, R. (2017).

Model based path planning using Q-Learning. In 2017 IEEE

international conference on industrial technology (ICIT) (pp. 837–

842).

Simmons, R. (1996). The curvature-velocity method for local obsta-

cle avoidance. In IEEE international conference on robotics &

automation (pp. 3375–3382).

Smart, W. D., & Kaelbling, L. P. (2002). Effective reinforcement learn-

ing for mobile robots. In IEEE international conference on robotics

and automation (pp. 3404–3410).

Stentz A. (1995). The focussed D* algorithm for real-time replanning. In

International joint conference on artiﬁcial intelligence (pp. 1662–

1669).

Wang, Y. X., Tian, Y. Y., Li, X., & Li, L. H. (2019). Self-adaptive

dynamic window approach in dense obstacles. Control and Deci-

sion,34(02), 34–43.

Wang, Z., Shi, Z., Li, Y., & Tu, J. (2014). The optimization of path

planning for multi-robot system using Boltzmann Policy based

Q-learning algorithm. In 2013 IEEE international conference on

robotics and biomimetics (ROBIO) (pp. 1199–1204).

Watkins. (1992). Technical note: Q-learning. Machine Learning,8(3–

4), 279–292.

Xin, Y., Liang, H. W., Du, M., Mei, T., Wang, Z. L., & Jiang, R. (2014).

An improved A* algorithm for searching inﬁnite neighbourhoods.

Robot,36(5), 627–633.

Xu, Y. L. (2017). Research on mapping and navigation technology of

mobile robot based on ROS. M.A. Thesis. Harbin: Harbin Institute

of Technology.

Zhang, A., Chong, C., & Bi, W. (2016). Rectangle expansion A*

pathﬁnding for grid maps. Chinese Journal of Aeronautics,29(5),

1385–1396.

Zhang, J., & Singh, S. (2017). Low-drift and real-time lidar odometry

and mapping. Autonomous Robots,41(2), 401–416.

Zhao, X., Wang, Z., Huang, C. K., & Zhao, Y. W. (2018). Mobile robot

path planning based on an improved A* algorithm. Robot,40(06),

137–144.

123

76 Autonomous Robots (2021) 45:51–76

Publisher’s Note Springer Nature remains neutral with regard to juris-

dictional claims in published maps and institutional afﬁliations.

Lu Chang received his B.S. degree

in Electrical Engineering from the

Jiangsu University of Science and

Technology, Zhenjiang, P.R. China

in 2016, and the M.S. degree in

System Engineering from the Nan-

jing University of Science and Tech-

nology. He is currently a Ph.D.

student in Control Science and

Engineering at the Nanjing Uni-

versity of Science and Technol-

ogy. His research interests include

robot path planning and SLAM

algorithm, machine learning and

human–robot interaction.

Liang Shan received his B.S.

degree in Electrical Engineering

and the Ph.D. degree in Control

Science and Control Engineering

from the Nanjing University of

Science and Technology, Nanjing,

P.R. China in 2002 and 2007, respec-

tively. He is currently an Asso-

ciate Professor with the Nanjing

University of Science and Tech-

nology. His research interests include

mobile robot technology, system

detection and modeling, and intel-

ligence artiﬁcial algorithm.

Chao Jiang received his B.S.

degree in Measuring and Con-

trol Technology and Instrumenta-

tion from Chongqing University,

Chongqing, China, in 2009, and

his Ph.D. degree in Electrical Engi-

neering from Stevens Institute of

Technology, Hoboken, NJ, USA,

in 2019. He worked as a Research

Assistant in the Key Laboratory

of Optoelectronic Technology and

Systems (Chongqing University),

Ministry of Education, China, from

2009 to 2012. He joined the Depart-

ment of Electrical and Computer

Engineering at the University of Wyoming, Laramie, WY in 2019,

where he is currently an Assistant Professor. His main research inter-

ests include autonomous robots, human–robot interaction, robotic

learning, deep reinforcement learning, and multi-robot cooperative

control. He is the recipient of the Innovation & Entrepreneurship Doc-

toral Fellowship at Stevens Institute of Technology from 2012 to 2016

and the recipient of the Outstanding Doctoral Dissertation Award in

Electrical Engineering at Stevens Institute of Technology in 2019. He

is a member of IEEE.

Yuewei Dai received his B.S.

and M.S. degrees in system engi-

neering from the East China Insti-

tute of Technology, Nanjing, P.R.

China, in 1984 and 1987, respec-

tively, and the Ph.D. degree in

control science and engineering

from the Nanjing University of

Science and Technology, in 2002.

He is currently a professor with

the School of Automation, Nan-

jing University of Science and Tech-

nology. His research interests are

in multimedia security, system engi-

neering theory, and network secu-

rity.

123

A preview of this full-text is provided by Springer Nature.

Learn more

Content available from Autonomous Robots

This content is subject to copyright. Terms and conditions apply.

Research on robot path planning by integrating state-based decision-making A* algorithm and inertial dynamic window approach

Article

Full-text available

Jun 2024

In response to challenges faced by mobile robots in global path planning within high-resolution grid maps—such as excessive waypoints, low efficiency, inability to evade random obstacles, and poor maneuverability in narrow passage environments during local path planning—a robot path planning algorithm is proposed. This algorithm integrates state-based decision-making A* algorithm with inertial dynamic window approach. Firstly, the exploration method of the A* algorithm is enhanced to dynamically adapt to the current state of the mobile robot, reducing the number of exploration nodes to improve exploration efficiency. Redundant turning points are eliminated from the original planned path to optimize the global path. Next, a path deviation evaluation function is incorporated into the speed space evaluation function of the dynamic window approach. This function adds weight to forward movement along the original direction, enhancing the robot’s ability to navigate through narrow environments. Finally, key points of the global path are used as sub-goals for local path planning, achieving a fusion of approaches. This enables the robot to simultaneously determine the optimal global path and perform random obstacle avoidance. Experimental verification demonstrates that deploying this integrated algorithm enhances exploration efficiency, reduces path turning points, achieves random obstacle avoidance, and excels in narrow passage environments for mobile robots.

DiPPeST: Diffusion-based Path Planner for Synthesizing Trajectories Applied on Quadruped Robots

Preprint

Full-text available

May 2024

We present DiPPeST, a novel image and goal conditioned diffusion-based trajectory generator for quadrupedal robot path planning. DiPPeST is a zero-shot adaptation of our previously introduced diffusion-based 2D global trajectory generator (DiPPeR). The introduced system incorporates a novel strategy for local real-time path refinements, that is reactive to camera input, without requiring any further training, image processing, or environment interpretation techniques. DiPPeST achieves 92% success rate in obstacle avoidance for nominal environments and an average of 88% success rate when tested in environments that are up to 3.5 times more complex in pixel variation than DiPPeR. A visual-servoing framework is developed to allow for real-world execution, tested on the quadruped robot, achieving 80% success rate in different environments and showcasing improved behavior than complex state-of-the-art local planners, in narrow environments.

A Review of Path Planning Methods for Marine Autonomous Surface Vehicles

Article

Full-text available

May 2024

A marine autonomous surface vehicle (ASV) is a kind of autonomous marine robot with intelligent and flexible use advantages. They are mainly divided into two categories: unmanned vessels and unmanned sailboats. Marine ASVs are essential in marine science, industry, environmental protection, and national defense. One of the primary challenges faced by marine ASVs is autonomously planning paths in an intricate marine environment. Numerous research findings have surfaced in recent years, including the combination with popular machine learning. However, a systematic literature review is still lacking, primarily a comprehensive comparison of two types of ASV path planning methods. This review first introduces the problem and evaluation indicators of path planning for ASVs. Then, aiming at unmanned vessels and sailboats, respectively, it sorts out various path planning algorithms proposed in the existing literature, including the advantages and limitations of both kinds of ASVs, and discusses them in combination with evaluation indicators. Also, this paper explores how marine environmental factors affect path planning and its corresponding treatment methods. Finally, this review summarizes the challenges of unmanned ship path planning, proposes potential technical solutions and future development directions, and aims to provide references for further development in this field.

A Hybrid Algorithm for Dynamic Path Planning of Mobile Robot Using Improved A* and DWA

Chapter

Jun 2024

On Hierarchical Path Planning Based on Deep Reinforcement Learning in Off- Road Environments

Conference Paper

Feb 2024

Path Planning Algorithm of Mobile Robot Based on Improved Q-Learning

Conference Paper

Mar 2024

Path planning using A* algorithm and DWA

Conference Paper

May 2024

Yishi Duan

Dynamic Obstacle Avoidance Research for Mobile Robots Incorporating Improved A-Star Algorithm and DWA Algorithm

Conference Paper

Dec 2023

Multi-Robot Multi-Objective Path Planning Based on Improved DWA Algorithm

Conference Paper

Dec 2023

Dynamic Adaptive Dynamic Window Approach

Article

Jan 2024

Robust local navigation is a critical capability for any mobile robot operating in a real-world, unstructured environment, especially when there are humans or other moving obstacles in the workspace. One of the most commonly used methods for local navigation is the Dynamic Window Approach (DWA), which does not address the problem of dynamic obstacles and depends heavily on the settings of the parameters in its cost function. Thus, it is a static approach that does not adapt to the characteristics of the environment, which can change significantly. On the other hand, data-driven deep learning approaches attempt to adapt to the characteristics of the environment by predicting the appropriate robot motion based on the current observation. However, they cannot guarantee collision-free trajectories for unseen inputs. In this work, we combine the best of both worlds. We propose a neural network to predict the weights of the DWA, which is then used for safe local navigation. To address the problem of dynamic obstacles the proposed method considers a short sequence of observations to allow the network to model the motion of the obstacles and adjust the DWA weights accordingly. The network is trained using the Proximal Policy Optimization (PPO) in a reinforcement learning setting in a simulated dynamic environment. We perform a comprehensive evaluation of the proposed approach in realistic scenarios using range scans of real 3D spaces and show that it outperforms both DWA and purely Deep Learning approaches.

The Path Planning of Mobile Robots Based on an Improved A * Algorithm

Conference Paper

Full-text available

May 2019

Application of Deep Q-Learning for Wheel Mobile Robot Navigation

Conference Paper

Full-text available

Oct 2017

Online planning for multi-robot active perception with self-organising maps

Article

Full-text available

Apr 2018
AUTON ROBOT

We propose a self-organising map (SOM) algorithm as a solution to a new multi-goal path planning problem for active perception and data collection tasks. We optimise paths for a multi-robot team that aims to maximally observe a set of nodes in the environment. The selected nodes are observed by visiting associated viewpoint regions defined by a sensor model. The key problem characteristics are that the viewpoint regions are overlapping polygonal continuous regions, each node has an observation reward, and the robots are constrained by travel budgets. The SOM algorithm jointly selects and allocates nodes to the robots and finds favourable sequences of sensing locations. The algorithm has a runtime complexity that is polynomial in the number of nodes to be observed and the magnitude of the relative weighting of rewards. We show empirically the runtime is sublinear in the number of robots. We demonstrate feasibility for the active perception task of observing a set of 3D objects. The viewpoint regions consider sensing ranges and self-occlusions, and the rewards are measured as discriminability in the ensemble of shape functions feature space. Exploration objectives for online tasks where the environment is only partially known in advance are modelled by introducing goal regions in unexplored space. Online replanning is performed efficiently by adapting previous solutions as new information becomes available. Simulations were performed using a 3D point-cloud dataset from a real robot in a large outdoor environment. Our results show the proposed methods enable multi-robot planning for online active perception tasks with continuous sets of candidate viewpoints and long planning horizons.

Mobile Robot Path Planning Based on an Improved A* Algorithm

Article

Nov 2018

Aiming at the problems of A* algorithm in the large scene, such as large memory overhead and long computation time, an improved A* algorithm is proposed based on A* algorithm and jump point search algorithm. The algorithm expands by selecting jump points until the final path is generated. During the expansion process, a lot of unnecessary nodes in A* algorithm which are added to OpenList and ClosedList can be replaced by jump points to reduce computation. In order to prove the validity of the improved A* algorithm, the simulation is carried out in 2D grid map of different sizes. The simulation result shows that the improved A* algorithm expands fewer nodes in the pathfinding process, and the pathfinding speed is faster. The acceleration effect becomes more apparent with the increase of environment map. Finally, the improved A* algorithm is implemented on the mobile robot Turtlebot2 to conduct comparative experiments. The experimental result shows that the improved A* algorithm can speed up by about 200% compared with the A* algorithm when generating the same path, and it can meet the requirements of path planning for mobile robots.

An advanced potential field method proposed for mobile robot path planning

Article

Jan 2019

In this paper, both the local minimum and the goal non-reachable with obstacles nearby (GNRON) problems have been solved on the artificial potential field (APF) for mobile robot path planning in a bounded environment. The GNRON problem occurs when the goal’s position is inside the circle of influence of the obstacle. This phenomenon gives rise to a larger repulsive force with regard to the attractive force. Therefore, a stronger attractive function is proposed to ensure that the robot reaches the target successfully. The local minimum problem occurs when the robot, the goal and the centre of the obstacle are aligned. This situation engenders a total potential force equal to zero before reaching the goal position. To overcome this problem, a novel of repulsive potential function is proposed by activating a virtual escaping force when a local minimum is detected. This force behaves as a rotational force allowing the robot to escape from the deadlock positions and turn smoothly away from obstacles in the direction of the target. The combination of the new attractive force and the novel repulsive force could solve the local minimum and the GNRON problems. Finally, some methodic simulations, based on a comparative study with other methods from the literature, are carried out to validate and demonstrate the effectiveness of the advanced potential field method to handle the local minimum and GNRON problems.

Follow the Gap with Dynamic Window Approach

Article

Mar 2018

Follow the Gap Method (FGM) is an obstacle avoidance method which uses gap arrays. This method recursively directs the robot to the goal state while avoiding the obstacles through the safest gap. Since FGM is a geometric method, it does not consider the robot dynamics. For this reason, oscillations or collisions due to robot dynamics are possible. On the other hand, FGM calculates the desired heading angle, but it does not give linear and angular velocity reference. Dynamic Window Approach (DWA) is one of the most popular obstacle avoidance algorithms which does take robot dynamics into consideration. It calculates best angular and linear velocity pair which is chosen by an objective function. In this paper, an FGM-DW approach which uses the strongest elements of FGM and DWA methods to achieve safe, smooth and fast navigation is proposed. The FGM-DW approach provides these concerns and meets the low level angular and rotational velocity requirement of FGM. In this paper, the performance and analysis of FGM-DW are shown by both simulations and real-world experimental tests.

A Biomimetical Dynamic Window Approach to Navigation for Collaborative Control

Article

Jul 2017

Shared control is a strategy used in assistive platforms to combine human and robot orders to achieve a goal. Collaborative control is a specific shared control approach, in which user’s and robot’s commands are merged into an emergent one in a continuous way. Robot commands tend to improve efficiency and safety. However, sometimes, assistance can be rejected by users when their commands are too altered. This provokes frustration and stress and, usually, decreases emergent efficiency. To improve acceptance, robot navigation algorithms can be adapted to mimic human behavior when possible. We propose a novel variation of the well-known dynamic window approach (DWA) that we call biomimetical DWA (BDWA). The BDWA relies on a reward function extracted from real traces from volunteers presenting different motor disabilities navigating in a hospital environment using a rollator for support. We have compared the BDWA with other reactive algorithms in terms of similarity to paths completed by people with disabilities using a robotic rollator in a rehabilitation hospital unit. The BDWA outperforms all tested algorithms in terms of likeness to human paths and success rate.

Model based path planning using Q-Learning

Conference Paper

Mar 2017

Route planning for unmanned aerial vehicle based on rolling RRT in unknown environment

Conference Paper

Dec 2016

Real-time obstacle avoidance for manipulators and mobile robots

Article

Jan 1980
INT J ROBOT RES

Oussama Khatib

Reinforcement based mobile robot path planning with improved dynamic window approach in unknown environment

Abstract and Figures

Recommended publications

Reinforcement based mobile robot navigation in dynamic environment

Enhancing Stability and Performance in Mobile Robot Path Planning with PMR-Dueling DQN Algorithm

A Q-learning approach based on human reasoning for navigation in a dynamic environment

Path planning of unmanned mobile robot in unknown obstacle environments for multi switching control...

The Path Planning of Mobile Robots Based on an Improved A * Algorithm

An Adaptive Conversion Speed Q-Learning Algorithm for Search and Rescue UAV Path Planning in Unknown...

Hierarchical multi-robot navigation and formation in unknown environments via deep reinforcement lea...