Conference PaperPDF Available

Decision-Making for Oncoming Traffic Overtaking Scenario using Double DQN

Authors:
Decision-Making for Oncoming Traffic Overtaking
Scenario using Double DQN
Shuojie Mo, Xiaofei Pei*, Zhenfu Chen
School of Automotive Engineering
Wuhan University of Technology
Wuhan, China
peixiaofei7@whut.edu.cn
AbstractGreat progress has been made in the field of
machine learning in recent years. And learning-based methods
have been widely utilized for developing highly autonomous
vehicle. To this end, we introduce a reinforcement learning
based intelligent autonomous vehicle decision making method
for oncoming overtaking scenario. The goal of reinforcement
learning is to learn how to take optimal decision in
corresponding observations through interactions with the
environment using a reward function to estimate whether the
decision is good or not. A Double Deep Q-learning (Double DQN)
agent was used to learn policies (control strategies) for both
longitudinal speed and lane change decision. Prioritized
Experience Replay (PER) was used to accelerate convergence of
the policies. A two-way 3-car scenario with oncoming traffic was
established in SUMO (Simulation of Urban Mobility) to train
and test the policies.
Keywordsautonomous vehicle, decision-making,
overtaking, reinforcement learning.
I. INTRODUCTION
Overtaking is one of the most complex of all maneuvers
since it combines both longitudinal and lateral control.
Average speed of an autonomous vehicle can be efficiently
increased by overtaking which though is dangerous and
always the cause of accidents. In this paper, we especially
focus on two-way 3-car overtaking scenario with oncoming
traffic. Reinforcement learning based decision making
method was introduced to increase the safety and efficiency in
overtaking maneuver.
Overtaking behaviors generally take place many times in
daily driving which are closely related to safety and average
velocity. There are plenty of researches that have been
developed to solve the decision-making problems of
overtaking behavior. In [1] and [2], a fuzzy control based
method was used to mimic human behavior and reactions
during overtaking maneuvers. Throttle, brake as well as
steering angle controls were output by the fuzzy controllers to
overtake the front vehicle Fenghui Wang et al. [3] proposed a
MPC overtaking control based on estimation of the conflict
probability using relative distance between vehicles with
uncertainty. The goal was to find a set of optimal control input
sequence to minimize the objective function. Karaduman et al.
[4] built a 3-car overtaking scenario using Bayesian network
to estimate the probability of collision.
In this work, learning based method was introduced to
solve the overtaking decision-making problem.
Reinforcement learning (RL) aims to solve sequential
decision-making problem in uncertainty by interacting with
world to maximize cumulative reward. Xin Li et al. used Q-
learning method to obtain optimal decision in highway
overtaking scenario [5]. A multiple-goal RL framework based
method was presented in [6]. By considering all of sub-goals,
Ngai and Yung employed seven Q-learning agents
corresponding to those goals as well as a fusion function to
determine the best action while overtaking in curve. Both Q-
learning methods work well. However, oncoming overtaking
scenario is not discussed and continues state need to be
quantized in Q-learning methods. The discrete state will lose
information thus worsening the decision you make.
In this paper, we mainly consider decision-making
problem in two-way 3-car oncoming overtaking scenario
using Double Deep Q-learning (DDQN). The goal of the
presented RL-based method is to train an agent that performs
better than traditional approaches in safety and time efficiency
for changeable scenarios. We offer two main contributions :1)
Neural network in DDQN can handle continuous raw inputs
without missing part of information. 2) Compare to traditional
decision-making methods, vehicle model is not necessary in
our paper. The paper is organized as follows. In section Ⅱ, we
introduce the framework of reinforcement learning and the
algorithm we used. Section outlines the problem
formulation, and Section Ⅳ evaluates the proposed method in
SUMO scenario. Finally, conclusion is given in Section Ⅴ.
II. BACKGROUND
A. Markov Decision Process
Reinforcement learning learns optimal policies by
executing an action to interact with the environment and
improve the action strategy according to the received
reward[7]. We can model it as Markov decision processes
(MDPs) defined by a six element tuple (,,,,). An
agent’s behaviors is defined by a policy π, which maps states
S to a probability distribution over the actions A. A transition
function (|,)defines the probability of the state
changing from s to s’ under action a.
A reward function (,) is used to estimate the decision
and discount factor γ ∈ [0,1] is employed to calculate the
cumulative discounted expected reward. Specifically, the
agent seeks a behavior policy =(|) that maximizes
cumulative expected reward, also called action-value function
(known as Q function)
*
0
( , ) max | , ,
k
tk t t
k
Q sa E r s sa a
π
γπ
+
=

= = =


(1)
B. Double Deep Q-learning (Double DQN)
A deep Q network (DQN) is a neural network that for a
given state s outputs a vector of action values (s,·; θ), where
θ are the parameters of the neural network. Two important
ingredients of the DQN algorithm as proposed by Mnih et al.
[8] are the use of replay memory and the use of a target
network with parameters . Replay memory removes
correlations in the observation sequence thus dramatically
improving the performance. The target network is the same as
the original network (main network) expect that its parameters
are copied every t steps from main network
= and kept
fixed on all other steps. The target TD-error used by DQN is
ˆ
= max (s ,a | )
DQN
ta
yr Q
γθ
′′
+
(2)
However, overestimations occur due to the max operation
in the target into action selection and action evaluation in both
Q-learning and DQN algorithms. It can be fixed by
decomposing the two processes which is the main idea of
Double DQN. Double DQN has two network architecture the
same as DQN as shown in Fig.1. Main network is used for
evaluating the greedy policy but target network is used to
estimate its value [9]. The target TD-error in Double DQN is
as follows
ˆ
max ( ,arg max ( , | ) | )
DDQN
ta
aA
y r Qs Qs a
γ θθ
−−
′′
= +
(3)
During learning, parameters are updated on samples from
replay memory. The Q-learning update at iteration i uses the
following loss function:
2
( ) - (, | )
( , , , )~ ( )
DDQN
L E y Qsa
i sars D t i
θθ
µ



=



(4)
in which D is the replay memory, and we take samples of
experience
( , , , )~ ( )sars D
µ
. Here we use prioritized experience
replay (PER) to pick a minibatch of experience to make the
most effective use of the replay memory for learning. The
central component of prioritized replay is the criterion by
which the importance of each transition is measured by TD-
error [10]. The bigger TD-error is, the more we can learn.
III. PROBLEM FORMULATION
A. Oncoming Overtaking Scenario
In this paper we focus on decision-making during on-
coming overtaking scenario. Fig.2 shows a typical two-way 3-
car oncoming overtaking scenario built in SUMO (Simulation
of Urban Mobility). We assume Auto can accurately obtain its
own state and information of its surroundings using sensor
system. In the initialization of driving scenario, speed of Auto
is set to 10 (m/s). Initial speed of Vehicle1 is between [5(m/s
²),7 (m/s²)] with interval of 0.5(m/s²), and initial distance
from Auto is within [30(m),50(m)] with interval of 5(m).
Initial speed of Vehicle2 varies from 10 (m/s) to 15 (m/s) with
interval of 0.5(m/s). A constant-speed model is applied to
Vehicle1. And Intelligent Drive Model(IDM) [11], which is
the default lane following mode, is employed to model the
longitudinal dynamics of Vehicle2. In addition, the
acceleration for all three vehicles is within [-3(m/s²), 3(m/s²)].
Initial distance to Auto varies from 100 m to 300 (m) with
interval of 5(m).
B. State Space
State represents information about itself and its
surroundings for decision making. In this paper, only
longitudinal distances are considered. An abstract state vector
Fig. 1. Framework of Double DQN
Fig. 2. A typical two-way 3-car oncoming overtaking scenario in SUMO. Green stands for the autonomous vehicle Auto, Red stands for the front slow
overtaken vehicle Vehicle1, and Yellow stands for oncoming vehicle Vehicle2
is defined by (5) to provide adequate environment
information:
( )
, , , , 1, 2 6
k k k k kT
auto auto i i
s vx dv i= ∆ ∆ = ……, ,
(5)
where
is the current speed of Auto and
auto
x
is the global
position along x-axis of Auto.
d
and
v
are used to
represented relationship between Auto and surrounding
vehicles where six elements are used respectively stands for
relative distance d or velocity v with front, left-front, right-
front, behind, left-behind and right-behind vehicles.
C. Action Space
Action space mainly includes longitudinal speed control
and lateral lane-change decision. Here five kinds of
accelerations are chosen to control longitudinal speed: -3 m/s²
for medium brake, -1 m/s² for soft brake, 0 m/s² for speed
maintain, 1 m/s² for soft acceleration and 3 m/s² for rapid
acceleration. And a lane-change signal LC is used for lateral
control. The action space is defined as
( )
, , 1,2, 3, 4, 5
i
A a LC i= =
D. Reward function
Reward function is used for guiding agent to achieve
goals. Therefore, the shape of reward function should base on
learning objectives. In this paper, we expect agent overtake
the front slower vehicle as quick as possible without collision.
The following aspects are mainly considered:
1) Collision avoidance: Collision is the most dangerous
situation that we must avoid. A TTC based piecewise
function is used to formulate reward:
1.5 2.5 3
2 2 2.5
42
40
collision
if TTC
if TTC
Rif TTC
if collision
− ≤≤
− ≤≤
=−≤
(6)
2) Speed:Ultimate objective of overtaking is to increase
speed for quicker passage. Here we use the following speed
reward function to encourage agent to speed up
0.2 (v 10)
velocity auto
R=×−
(7)
3) Opposite lane occupancy:Obviously, long-term
occupancy of opposite lane is a dangerous situation. A small
penalty is needed to reduce occupancy time
1
opposite lane
R
= −
(8)
4) Overtake:When agent achieves goal state e.g.
successfully overtake the lower front vehicle, a large reward
will be given
200
overtake
R=
(9)
Formally, the final reward function can be defined as sum
of above sub-goals:
+
velocity overtake opposite lane collision
RR R R R
= ++
(10)
IV. EVALUATION
The mentioned Double DQN algorithm is applied for
decision-making. It outputs a reasonable policy for overtaking
with oncoming traffic in two-way scenario. Time step for
decision is set to 0.1s. Values of hyper-parameters are not
systematically tuning due to the high computational cost. A
relatively good set of hyper-parameters are selected after
several tests, shown in Table . Fig.3 is the learning curve for
average cumulative reward. Here epochs is utilized to
calculate average epoch reward (red line) for a clearly training
process. As we can see, the agent converges approximately
after about 70 epochs’ training. Better performance of Double
DQN agent will be obtained if time adequate.
According to the oncoming overtaking scenario presented
in Section , we built a two-way scenario with other two
vehicles in SUMO. Speed and position of Vehicle2 will be
randomly initialized every episode. And either collision or
successfully overtaking the slower Vehicel1 will be the end
signal of an episode.
Fig. 4 shows how agent acts with different
overtaking policies in distinct scenarios with random
initializations. In Fig.4 (a), in which either speed of Vehicle2
is not so fast or initial position of Vehicle2 is far from Auto,
there is enough time for Auto to directly overtake the slower
front Vehicle1.
Fig. 3. Learning curve for Double DQN.
Fig. 4. Two typical policies for overtaking. Mark in left side
represtents actions for agent. ’ stands for accleration;‘ ’ stands for
speed maintainment;‘ ’ stands for brake; ’ stands for lane-
change maneuver.
TABLE I. LIST OF HYPER-PAREMETERS
Hyper-parameter Val ue
episode 150000
minibatch size 32
replay memory size 250000
target network soft
update 0.01
discounter factor 0.99
learning rate 5e-5
exploration 10.1 ( annealed over 5e5 steps)
replay start size 25000
prioritized replay
alpha 0.6
importance weights 0.4→1(increased over 5e5 steps)
Different policy is used in Fig.4 (b) where the Auto cannot
overtake the front immediately. In this situation, the trained
agent will act like human, wait for overtaking until the
oncoming Vehicle2 pass.
To emphasize the time efficiency in oncoming traffic
scenarios using the learning based method, we take the default
Lane-Change mode LC2013in SUMO into comparison, in
which a benefit was computed for estimating the advantage of
changing one lane to another:
( )
max
(, ) (, )
() ()
n
pos n pos c
l
c
v tl v tl
bt vl
=
(11)
where
()
n
l
bt
is the benefit of changing to lane
n
l
;
c
l
and
n
l
are
the vehicle’s current and neighbor lanes respectively;
(,)
pos
v tl
is the velocity the vehicle could drive safe with on lane l;
max
()vl
is the maximum velocity the vehicle can take on lane l.
More information about the ‘LC2013’ can be found in [12]
and [13].
TABLE II. COMPARISONS FROM DOUBLE-DQN AND SUMO LANE-
CHANGE METHOD
LC2013
baseline
Double
DQN Improvement
Average Speed
average
auto
v
(m/s) 11.92 13.40 12.4%
Time in Opposite
lane
op
t
(s)
7.08 3.80 46.3%
Overtaking
Duration
total
t
(s)
10.43 7.02 32.7%
Opposite lane
occupancy
/
op total
tt
(%)
67.9% 54.1% 20.3%
We test the above mentioned methods for 1000 trails in
SUMO. The learning-based approach we presented obtains a
collision-free overtaking rate at 98.5%. Average overtaking
duration
total
t
, time in opposite lane
op
t
, and speed
average
auto
v
are
the indicators for comparison. The results shown in TABLE
indicate that the RL based decision-making method can
efficiently increase the average speed of Auto and reduce
overtaking time on contrast to the traditional method. In
addition, opposite lane occupancy indicates that specially
designed reward function for sub-goals works well thus
enhancing safety of autonomous vehicles.
V. CONCLUSIONS
With the employment of Double DQN, continues states
can be directly used for learning a more precise policy without
discretization. Evaluation results indicated the trained model-
free agent performed better nearly in all of the random
scenarios in SUMO. In this paper we modeled the decision-
making problem as a MDP, where all the information was
assumed to be completely observable. However, uncertainties
in perception with imperfect sensor systems and other traffic
participants are unavoidable in reality. Future works on
decision-making for autonomous vehicles will be POMDP-
based approaches, where POMDP provides a powerful
mathematical framework for dealing with uncertainty.
REFERENCES
[1] J. E. Naranjo, C. Gonzalez, R. Garcia, and T. de Pedro, "Lane-Change
Fuzzy Control in Autonomous Vehicles for the Overtaking Maneuver,"
IEEE Transactions on Intelligent Transportation Systems, vol. 9, no. 3,
pp. 438-450, 2008
[2] J. Perez, V. Milanes, E. Onieva, J. Godoy, and J. Alonso, "Longitudinal
fuzzy control for autonomous overtaking," in 2011 IEEE International
Conference on Mechatronics, 2011, pp. 188-193: IEEE.
[3] F. Wang, M. Yang, and R. J. I. T. o. I. T. S. Yang, "Conflict-
probability-estimation-based overtaking for intelligent vehicles," vol.
10, no. 2, pp. 366-370, 2009.
[4] O. Karaduman, H. Eren, H. Kurum, and M. Celenk, "Interactive risky
behavior model for 3-car overtaking scenario using joint Bayesian
network," in 2013 IEEE Intelligent Vehicles Symposium (IV), 2013,
pp. 1279-1284: IEEE.
[5] X. Li, X. Xu, L. Zuo, and Ieee, Reinforcement Learning Based
Overtaking Decision-Making for Highway Autonomous Driving (2015
Sixth International Conference on Intelligent Control and Information
Processing). New York: Ieee, 2015, pp. 336-342.
[6] D. C. K. Ngai and N. H. C. Yung, "A Multiple-Goal Reinforcement
Learning Method for Complex Vehicle Overtaking Maneuvers," (in
English), Ieee Transactions on Intelligent Transportation Systems,
Article vol. 12, no. 2, pp. 509-522, Jun 2011.
[7] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.
MIT press, 2018.
[8] V. Mnih et al., "Human-level control through deep reinforcement
learning," vol. 518, no. 7540, p. 529, 2015.
[9] H. Van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning
with double q-learning," in Thirtieth AAAI Conference on Artificial
Intelligence, 2016.
[10] T. Schaul, J. Quan, I. Antonoglou, and D. J. a. p. a. Silver, "Prioritized
experience replay," 2015.
[11] M. Treiber, A. Hennecke, and D. J. P. r. E. Helbing, "Congested traffic
states in empirical observations and microscopic simulations," vol. 62,
no. 2, p. 1805, 2000.
[12] J. Erdmann, "SUMO’s lane-changing model," in Modeling Mobility
with Open Data: Springer, 2015, pp. 105-123.
[13] D. Krajzewicz, "Traffic simulation with SUMOsimulation of urban
mobility," in Fundamentals of traffic simulation: Springer, 2010, pp.
269-293.
... In order to remove the influence of the hyper-parameter settings on the training results, uniform settings were applied to the common parameters used by the algorithm. Following [47][48][49], the specific parameters of D3QN are listed in Table 2 below. All the guidelines and tests discussed in this paper were completed in a Win10 system with unified software and hardware environment information. ...
Article
Full-text available
Air logistics transportation has become one of the most promising markets for the civil drone industry. However, the large flow, high density, and complex environmental characteristics of urban scenes make tactical conflict resolution very challenging. Existing conflict resolution methods are limited by insufficient collision avoidance success rates when considering non-cooperative targets and fail to take the temporal constraints of the pre-defined 4D trajectory into consideration. In this paper, a novel reinforcement learning-based tactical conflict resolution method for air logistics transportation is designed by reconstructing the state space following the risk sectors concept and through the use of a novel Estimated Time of Arrival (ETA)-based temporal reward setting. Our contributions allow a drone to integrate the temporal constraints of the 4D trajectory pre-defined in the strategic phase. As a consequence, the drone can successfully avoid non-cooperative targets while greatly reducing the occurrence of secondary conflicts, as demonstrated by the numerical simulation results.
... In another work [89], a DRL-based framework, deep deterministic policy gradient (DDPG), is adopted for decision-making in an emergency to solve the autonomous braking problem. The work in [90] uses double deep Qnetwork (Double DQN) to learn policies (control strategies) for both longitudinal speed and lane change decisions. ...
Article
Full-text available
Autonomous driving (AD) has developed tremendously in parallel with the ongoing development and improvement of deep learning (DL) technology. However, the uptake of artificial intelligence (AI) in AD as the core enabling technology raises serious cybersecurity issues. An enhanced attack surface has been spurred on by the rising digitization of vehicles and the integration of AI features. The performance of the autonomous vehicle (AV)-based applications is constrained by the DL models' susceptibility to adversarial attacks despite their great potential. Hence, AI-enabled AVs face numerous security threats, which prevent the large-scale adoption of AVs. Therefore, it becomes crucial to evolve existing cybersecurity practices to deal with risks associated with the increased uptake of AI. Furthermore, putting defense models into practice against adversarial attacks has grown in importance as a field of study amongst researchers. Therefore, this study seeks to provide an overview of the most recent adversarial defensive and attack models developed in the domain of AD.
... Machine-learning techniques have also been applied to provide overtaking solutions. In Mo et al. (10), Double Deep Q-learning (DDQN) was used to learn a policy for longitudinal speed and lane change, to provide for vehicle overtaking. In Chen et al. (11), Deep Q-learning was used to perform overtaking decision-making on one-way through two lanes. ...
Article
Full-text available
Despite the many advancements in traffic safety, vehicle overtaking still poses significant challenges to both human drivers and autonomous vehicles, especially, how to evaluate the safety of passing a leading vehicle efficiently and reliably on a two-lane road. However, few realistic attempts in this field have been made in the literature to provide practical solutions without prior knowledge of the state of the environment and simplifications of vehicle models. These model simplifications make many of the proposed solutions in the literature unusable in real scenarios. Considering the dangers that can arise from performing a defective overtake and the substantial risk of vehicle crashes during high-speed maneuvers, in this paper we propose a system based on model predictive control to accurately estimate the safety of starting a vehicle overtake in addition to vehicle control during the maneuver that aims to ensure a collision-free overtake using a complete and realistic model of the vehicle’s dynamics. The system relies on a stereoscopic vision approach and machine learning techniques (YOLO and DeepSORT) to gather information about the environment such as lane width, lane center, and distance from neighboring vehicles. Furthermore, we propose a set of scenarios to test the performance of the proposed system based on accurate modeling of the environment under a range of traffic conditions and road architecture. The simulation result shows the high performance of the proposed system in ending collisions during overtaking and providing optimal pathing that minimizes travel time.
Preprint
Full-text available
Overtaking on two-lane roads is a great challenge for autonomous vehicles, as oncoming traffic appearing on the opposite lane may require the vehicle to change its decision and abort the overtaking. Deep reinforcement learning (DRL) has shown promise for difficult decision problems such as this, but it requires massive number of data, especially if the action space is continuous. This paper proposes to incorporate guidance from an expert system into DRL to increase its sample efficiency in the autonomous overtaking setting. The guidance system developed in this study is composed of constrained iterative LQR and PID controllers. The novelty lies in the incorporation of a fading guidance function, which gradually decreases the effect of the expert system, allowing the agent to initially learn an appropriate action swiftly and then improve beyond the performance of the expert system. This approach thus combines the strengths of traditional control engineering with the flexibility of learning systems, expanding the capabilities of the autonomous system. The proposed methodology for autonomous vehicle overtaking does not depend on a particular DRL algorithm and three state-of-the-art algorithms are used as baselines for evaluation. Simulation results show that incorporating expert system guidance improves state-of-the-art DRL algorithms greatly in both sample efficiency and driving safety.
Article
This study contributes to addressing the challenge of quickly obtaining an effective and accurate nonparametric model for describing ship maneuvering motion in three degrees of freedom (3-DOF). To achieve this, an intelligent ship dynamics nonparametric modeling method named improved PER-DDPG is proposed. This method leverages the deep deterministic policy gradient algorithm (DDPG) and prioritized experience replay mechanism (PER) and analyzes the characteristics between the goal of deep reinforcement learning (DRL) and the modeling process of the nonparametric model. The PER mechanism is utilized to enhance the agent’s understanding of the overall mechanism of ship motion by improving the utilization of samples. The meaning of target value is redefined due to transforming DRL aiming at maximizing cumulative rewards into maximizing the set of immediate rewards at each time step. To validate the performance of the proposed modeling method, we conduct simulation studies using a benchmark ship model i.e., a Mariner cargo ship dynamic model, and experimental studies using a real unmanned surface vehicle (USV). In the simulation test, we demonstrate the effectiveness and generalization of the proposed method through zigzag and turning circle tests. Furthermore, we verify the robustness and applicability of the proposed method by using datasets with uncertain environmental disturbances and datasets with different sampling frequencies. Additionally, the experimental tests conducted on the USV indicate the consistency of the proposed approach.
Chapter
The decision-making process for autonomous vehicles comes with numerous challenges that are not easily solved. With the ever-changing traffic situation of the world and the increasing need for autonomous driving technology, there are constant innovations to deal with the increasing number of problems in the complex environments that autonomous driving agents find themselves in. Developing countries like India face even more numerous challenges with existing autonomous driving solutions not being directly transferable. However, with the maturation and advancement of deep learning technology over the years, more and more novel methods in the field of deep reinforcement learning are being proposed to tackle both new challenges and existing challenges. In this study, we explore the contemporary reinforcement learning techniques for autonomous driving tasks and analyze their applicability for the unstructured road environment and also look at some of the less-common scenarios that occur frequently in the Indian context.KeywordsAutonomous drivingReinforcement learningIndian road scenariosDecision-makingDeep learning
Article
Full-text available
The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether this harms performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-learning with a deep neural network, suffers from substantial overestimations in some games in the Atari 2600 domain. We then show that the idea behind the Double Q-learning algorithm, which was introduced in a tabular setting, can be generalized to work with large-scale function approximation. We propose a specific adaptation to the DQN algorithm and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.
Conference Paper
Full-text available
Cooperative maneuver among autonomous and conventional vehicles can be considered one of the next steps for obtaining a safer and more comfortable driving. Some examples of these maneuvers are: Adaptive Cruise Control (ACC), intelligent intersection management or automatic overtaking maneuvering, among others. One of the most important aims of the Intelligent Transportation Systems (ITS) is to use roads without substantial modifications of the current infrastructure. The overtaking maneuver demands special attentions, in particular when it is carried out on two-way roads. For this reason, a fuzzy decision system based on fuzzy logic able to execute an autonomous overtaking in two-way roads has been implemented in this paper. The experiments are focused on the longitudinal control when an autonomous vehicle is overtaking to other. Moreover, different cases have been considered using an oncoming vehicle from the other direction on a two-way road. Several trials with three real vehicles communicated via a wireless network were carried out, showing a good behavior.
Article
Full-text available
Overtaking is a complex and hazardous driving maneuver for intelligent vehicles. When to initiate overtaking and how to complete overtaking are critical issues for an overtaking intelligent vehicle. We propose an overtaking control method based on the estimation of the conflict probability. This method uses the conflict probability as the safety indicator and completes overtaking by tracking a safe conflict probability. The conflict probability is estimated by the future relative position of intelligent vehicles, and the future relative position is estimated by using the dynamics models of the intelligent vehicles. The proposed method uses model predictive control to track a desired safe conflict probability and synthesizes decision making and control of the overtaking maneuver. The effectiveness of this method has been validated in different experimental configurations, and the effects of some parameters in this control method have also been investigated.
Conference Paper
Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, this approach simply replays transitions at the same frequency that they were originally experienced, regardless of their significance. In this paper we develop a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently. We use prioritized experience replay in Deep Q-Networks (DQN), a reinforcement learning algorithm that achieved human-level performance across many Atari games. DQN with prioritized experience replay achieves a new state-of-the-art, outperforming DQN with uniform replay on 41 out of 49 games.
Article
SUMO is an open source microscopic traffic simulation. A major component of modelling microscopic vehicle behavior is the lane-changing behavior on multi-lane roads. We describe a new model which uses a 4-layered hierarchy of motivations to determine the vehicle behavior during every simulation step and motivate in which ways it improves the current lane-changing model.
Article
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Conference Paper
In this paper, we propose a new model for 3-car interactive risky behavior of vehicles travelling in front and behind of a driver (overtaken) car. Following distance of vehicles moving in front and at rear end of the car in question plays an important role for overtaking scenario. Moreover, the distance between the car in front and the vehicle following it should be sufficiently long for preventing collision if overtaking is inevitable for the motorist behind the middle subject vehicle. Here, we consider the roles of the vehicles involved in such a scenario. We observe the behaviors of moving vehicles in front and the rear end of the subject car. To this end, front and rear car images are acquired by two cameras and subjected to vertical and horizontal optical flow edge map creation. In classification stage of the optical flow edge map clusters, a motion vector histogram thresholding method is utilized in conjunction with a decision assessment strategy based on the joint Bayesian belief network statistical model. In turn, not only the trajectories of the cars are captured but also joint behavior of three cars over-taken scenario is estimated using the proposed interactive risk model.
Chapter
SUMO is a microscopic road traffic simulation made available as open source under the GPL license. The complete suite includes tools for importing road networks, generating routes from different sources, and two versions of the traffic simulation itself, one started from the command line and one including a graphical user interface. The simulation uses the microscopic, space-continuous and time-discrete car-following model developed by S. Krauß and a lane-changing model developed within the work on the simulation. Traffic assignment is normally performed using the iterative approach formulated by C. Gawron, but further methods, such as a one-shot assignment method, exists. The traffic simulation offers a socket-based interface to external applications, allowing to interact with a running simulation online. Values and states of objects the simulation consists of can be both retrieved and changed. SUMO has been used within different projects both by the DLR and by external organizations. The software and documentation can be accessed at http://sumo.sf.net.
Article
In this paper, we present a learning method to solve the vehicle overtaking problem, which demands a multitude of abilities from the agent to tackle multiple criteria. To handle this problem, we propose to adopt a multiple-goal reinforcement learning (MGRL) framework as the basis of our solution. By considering seven different goals, either Q-learning (QL) or double-action QL is employed to determine action decisions based on whether the other vehicles interact with the agent for that particular goal. Furthermore, a fusion function is proposed according to the importance of each goal before arriving to an overall but consistent action decision. This offers a powerful approach for dealing with demanding situations such as overtaking, particularly when a number of other vehicles are within the proximity of the agent and are traveling at different and varying speeds. A large number of overtaking cases have been simulated to demonstrate its effectiveness. From the results, it can be concluded that the proposed method is capable of the following: 1) making correct action decisions for overtaking; 2) avoiding collisions with other vehicles; 3) reaching the target at reasonable time; 4) keeping almost steady speed; and 5) maintaining almost steady heading angle. In addition, it should also be noted that the proposed method performs lane keeping well when not overtaking and lane changing effectively when overtaking is in progress.