Conference PaperPDF Available

Decision-Making for Oncoming Traffic Overtaking Scenario using Double DQN

September 2019

September 2019

DOI:10.1109/CVCI47823.2019.8951626

Conference: 2019 3rd Conference on Vehicle Control and Intelligence (CVCI)

Authors:

Xiaofei Pei

Wuhan University of Technology

Content uploaded by Xiaofei Pei

Content may be subject to copyright.

Decision-Making for Oncoming Traffic Overtaking

Scenario using Double DQN

Shuojie Mo, Xiaofei Pei*, Zhenfu Chen

School of Automotive Engineering

Wuhan University of Technology

Wuhan, China

peixiaofei7@whut.edu.cn

Abstract—Great progress has been made in the field of

machine learning in recent years. And learning-based methods

have been widely utilized for developing highly autonomous

vehicle. To this end, we introduce a reinforcement learning

based intelligent autonomous vehicle decision making method

for oncoming overtaking scenario. The goal of reinforcement

learning is to learn how to take optimal decision in

corresponding observations through interactions with the

environment using a reward function to estimate whether the

decision is good or not. A Double Deep Q-learning (Double DQN)

agent was used to learn policies (control strategies) for both

longitudinal speed and lane change decision. Prioritized

Experience Replay (PER) was used to accelerate convergence of

the policies. A two-way 3-car scenario with oncoming traffic was

established in SUMO (Simulation of Urban Mobility) to train

and test the policies.

Keywords—autonomous vehicle, decision-making,

overtaking, reinforcement learning.

I. INTRODUCTION

Overtaking is one of the most complex of all maneuvers

since it combines both longitudinal and lateral control.

Average speed of an autonomous vehicle can be efficiently

increased by overtaking which though is dangerous and

always the cause of accidents. In this paper, we especially

focus on two-way 3-car overtaking scenario with oncoming

traffic. Reinforcement learning based decision making

method was introduced to increase the safety and efficiency in

overtaking maneuver.

Overtaking behaviors generally take place many times in

daily driving which are closely related to safety and average

velocity. There are plenty of researches that have been

developed to solve the decision-making problems of

overtaking behavior. In [1] and [2], a fuzzy control based

method was used to mimic human behavior and reactions

during overtaking maneuvers. Throttle, brake as well as

steering angle controls were output by the fuzzy controllers to

overtake the front vehicle Fenghui Wang et al. [3] proposed a

MPC overtaking control based on estimation of the conflict

probability using relative distance between vehicles with

uncertainty. The goal was to find a set of optimal control input

sequence to minimize the objective function. Karaduman et al.

[4] built a 3-car overtaking scenario using Bayesian network

to estimate the probability of collision.

In this work, learning based method was introduced to

solve the overtaking decision-making problem.

Reinforcement learning (RL) aims to solve sequential

decision-making problem in uncertainty by interacting with

world to maximize cumulative reward. Xin Li et al. used Q-

learning method to obtain optimal decision in highway

overtaking scenario [5]. A multiple-goal RL framework based

method was presented in [6]. By considering all of sub-goals,

Ngai and Yung employed seven Q-learning agents

corresponding to those goals as well as a fusion function to

determine the best action while overtaking in curve. Both Q-

learning methods work well. However, oncoming overtaking

scenario is not discussed and continues state need to be

quantized in Q-learning methods. The discrete state will lose

information thus worsening the decision you make.

In this paper, we mainly consider decision-making

problem in two-way 3-car oncoming overtaking scenario

using Double Deep Q-learning (DDQN). The goal of the

presented RL-based method is to train an agent that performs

better than traditional approaches in safety and time efficiency

for changeable scenarios. We offer two main contributions :1)

Neural network in DDQN can handle continuous raw inputs

without missing part of information. 2) Compare to traditional

decision-making methods, vehicle model is not necessary in

our paper. The paper is organized as follows. In section Ⅱ, we

introduce the framework of reinforcement learning and the

algorithm we used. Section Ⅲ outlines the problem

formulation, and Section Ⅳ evaluates the proposed method in

SUMO scenario. Finally, conclusion is given in Section Ⅴ.

II. BACKGROUND

A. Markov Decision Process

Reinforcement learning learns optimal policies by

executing an action to interact with the environment and

improve the action strategy according to the received

reward[7]. We can model it as Markov decision processes

(MDPs) defined by a six element tuple (,,,,). An

agent’s behaviors is defined by a policy π, which maps states

S to a probability distribution over the actions A. A transition

function (′|,)defines the probability of the state

changing from s to s’ under action a.

A reward function (,) is used to estimate the decision

and discount factor γ ∈ [0,1] is employed to calculate the

cumulative discounted expected reward. Specifically, the

agent seeks a behavior policy ∗=(|) that maximizes

cumulative expected reward, also called action-value function

(known as Q function)

( , ) max | , ,

tk t t

Q sa E r s sa a

γπ

∞



= = =





∑

(1)

B. Double Deep Q-learning (Double DQN)

A deep Q network (DQN) is a neural network that for a

given state s outputs a vector of action values (s,·; θ), where

θ are the parameters of the neural network. Two important

ingredients of the DQN algorithm as proposed by Mnih et al.

[8] are the use of replay memory and the use of a target

network with parameters −. Replay memory removes

correlations in the observation sequence thus dramatically

improving the performance. The target network is the same as

the original network (main network) expect that its parameters

are copied every t steps from main network 

−= and kept

fixed on all other steps. The target TD-error used by DQN is

= max (s ,a | )

DQN

yr Q

γθ

−

′

′′

(2)

However, overestimations occur due to the max operation

in the target into action selection and action evaluation in both

Q-learning and DQN algorithms. It can be fixed by

decomposing the two processes which is the main idea of

Double DQN. Double DQN has two network architecture the

same as DQN as shown in Fig.1. Main network is used for

evaluating the greedy policy but target network is used to

estimate its value [9]. The target TD-error in Double DQN is

as follows

max ( ,arg max ( , | ) | )

DDQN

y r Qs Qs a

γ θθ

−−

′∈

′′

= +

(3)

During learning, parameters are updated on samples from

replay memory. The Q-learning update at iteration i uses the

following loss function:

( ) - (, | )

( , , , )~ ( )

DDQN

L E y Qsa

i sars D t i

θθ







=

′





(4)

in which D is the replay memory, and we take samples of

experience

( , , , )~ ( )sars D

′

. Here we use prioritized experience

replay (PER) to pick a minibatch of experience to make the

most effective use of the replay memory for learning. The

central component of prioritized replay is the criterion by

which the importance of each transition is measured by TD-

error [10]. The bigger TD-error is, the more we can learn.

III. PROBLEM FORMULATION

A. Oncoming Overtaking Scenario

In this paper we focus on decision-making during on-

coming overtaking scenario. Fig.2 shows a typical two-way 3-

car oncoming overtaking scenario built in SUMO (Simulation

of Urban Mobility). We assume Auto can accurately obtain its

own state and information of its surroundings using sensor

system. In the initialization of driving scenario, speed of Auto

is set to 10 (m/s). Initial speed of Vehicle1 is between [5(m/s

²),7 (m/s²)] with interval of 0.5(m/s²), and initial distance

from Auto is within [30(m),50(m)] with interval of 5(m).

Initial speed of Vehicle2 varies from 10 (m/s) to 15 (m/s) with

interval of 0.5(m/s). A constant-speed model is applied to

Vehicle1. And Intelligent Drive Model(IDM) [11], which is

the default lane following mode, is employed to model the

longitudinal dynamics of Vehicle2. In addition, the

acceleration for all three vehicles is within [-3(m/s²), 3(m/s²)].

Initial distance to Auto varies from 100 m to 300 (m) with

interval of 5(m).

B. State Space

State represents information about itself and its

surroundings for decision making. In this paper, only

longitudinal distances are considered. An abstract state vector

Fig. 1. Framework of Double DQN

Fig. 2. A typical two-way 3-car oncoming overtaking scenario in SUMO. Green stands for the autonomous vehicle Auto, Red stands for the front slow

overtaken vehicle Vehicle1, and Yellow stands for oncoming vehicle Vehicle2

is defined by (5) to provide adequate environment

information:

( )

, , , , 1, 2 6

k k k k kT

auto auto i i

s vx dv i= ∆ ∆ = ……，，

(5)

where

auto

is the current speed of Auto and

auto

is the global

position along x-axis of Auto.

d∆

and

v∆

are used to

represented relationship between Auto and surrounding

vehicles where six elements are used respectively stands for

relative distance d or velocity v with front, left-front, right-

front, behind, left-behind and right-behind vehicles.

C. Action Space

Action space mainly includes longitudinal speed control

and lateral lane-change decision. Here five kinds of

accelerations are chosen to control longitudinal speed: -3 m/s²

for medium brake, -1 m/s² for soft brake, 0 m/s² for speed

maintain, 1 m/s² for soft acceleration and 3 m/s² for rapid

acceleration. And a lane-change signal LC is used for lateral

control. The action space is defined as

( )

, , 1,2, 3, 4, 5

A a LC i= =

D. Reward function

Reward function is used for guiding agent to achieve

goals. Therefore, the shape of reward function should base on

learning objectives. In this paper, we expect agent overtake

the front slower vehicle as quick as possible without collision.

The following aspects are mainly considered:

1) Collision avoidance: Collision is the most dangerous

situation that we must avoid. A TTC based piecewise

function is used to formulate reward:

1.5 2.5 3

2 2 2.5

collision

if TTC

Rif TTC

if collision

− ≤≤



− ≤≤



=−≤



−



(6)

2) Speed:Ultimate objective of overtaking is to increase

speed for quicker passage. Here we use the following speed

reward function to encourage agent to speed up

0.2 (v 10)

velocity auto

R=×−

(7)

3) Opposite lane occupancy:Obviously, long-term

occupancy of opposite lane is a dangerous situation. A small

penalty is needed to reduce occupancy time

opposite lane

−

= −

(8)

4) Overtake:When agent achieves goal state e.g.

successfully overtake the lower front vehicle, a large reward

will be given

200

overtake

(9)

Formally, the final reward function can be defined as sum

of above sub-goals:

velocity overtake opposite lane collision

RR R R R

−

= ++

(10)

IV. EVALUATION

The mentioned Double DQN algorithm is applied for

decision-making. It outputs a reasonable policy for overtaking

with oncoming traffic in two-way scenario. Time step for

decision is set to 0.1s. Values of hyper-parameters are not

systematically tuning due to the high computational cost. A

relatively good set of hyper-parameters are selected after

several tests, shown in Table Ⅰ. Fig.3 is the learning curve for

average cumulative reward. Here epochs is utilized to

calculate average epoch reward (red line) for a clearly training

process. As we can see, the agent converges approximately

after about 70 epochs’ training. Better performance of Double

DQN agent will be obtained if time adequate.

According to the oncoming overtaking scenario presented

in Section Ⅲ, we built a two-way scenario with other two

vehicles in SUMO. Speed and position of Vehicle2 will be

randomly initialized every episode. And either collision or

successfully overtaking the slower Vehicel1 will be the end

signal of an episode.

Fig. 4 shows how agent acts with different

overtaking policies in distinct scenarios with random

initializations. In Fig.4 (a), in which either speed of Vehicle2

is not so fast or initial position of Vehicle2 is far from Auto,

there is enough time for Auto to directly overtake the slower

front Vehicle1.

Fig. 3. Learning curve for Double DQN.

Fig. 4. Two typical policies for overtaking. Mark in left side

represtents actions for agent. ‘’ stands for accleration;‘ ’ stands for

speed maintainment;‘ ’ stands for brake; ‘ ’ stands for lane-

change maneuver.

TABLE I. LIST OF HYPER-PAREMETERS

Hyper-parameter Val ue

episode 150000

minibatch size 32

replay memory size 250000

target network soft

update 0.01

discounter factor 0.99

learning rate 5e-5

exploration 1→0.1 ( annealed over 5e5 steps)

replay start size 25000

prioritized replay

alpha 0.6

importance weights 0.4→1(increased over 5e5 steps)

Different policy is used in Fig.4 (b) where the Auto cannot

overtake the front immediately. In this situation, the trained

agent will act like human, wait for overtaking until the

oncoming Vehicle2 pass.

To emphasize the time efficiency in oncoming traffic

scenarios using the learning based method, we take the default

Lane-Change mode ‘LC2013’ in SUMO into comparison, in

which a benefit was computed for estimating the advantage of

changing one lane to another:

( )

max

(, ) (, )

() ()

pos n pos c

v tl v tl

bt vl

−

(11)

where

()

is the benefit of changing to lane

;

and

are

the vehicle’s current and neighbor lanes respectively;

(,)

pos

v tl

is the velocity the vehicle could drive safe with on lane l;

max

()vl

is the maximum velocity the vehicle can take on lane l.

More information about the ‘LC2013’ can be found in [12]

and [13].

TABLE II. COMPARISONS FROM DOUBLE-DQN AND SUMO LANE-

CHANGE METHOD

LC2013

（baseline）

Double

DQN Improvement

Average Speed

average

auto

(m/s) 11.92 13.40 12.4%

Time in Opposite

lane

(s)

7.08 3.80 46.3%

Overtaking

Duration

total

(s)

10.43 7.02 32.7%

Opposite lane

occupancy

op total

(%)

67.9% 54.1% 20.3%

We test the above mentioned methods for 1000 trails in

SUMO. The learning-based approach we presented obtains a

collision-free overtaking rate at 98.5%. Average overtaking

duration

total

, time in opposite lane

, and speed

average

auto

are

the indicators for comparison. The results shown in TABLE Ⅱ

indicate that the RL based decision-making method can

efficiently increase the average speed of Auto and reduce

overtaking time on contrast to the traditional method. In

addition, opposite lane occupancy indicates that specially

designed reward function for sub-goals works well thus

enhancing safety of autonomous vehicles.

V. CONCLUSIONS

With the employment of Double DQN, continues states

can be directly used for learning a more precise policy without

discretization. Evaluation results indicated the trained model-

free agent performed better nearly in all of the random

scenarios in SUMO. In this paper we modeled the decision-

making problem as a MDP, where all the information was

assumed to be completely observable. However, uncertainties

in perception with imperfect sensor systems and other traffic

participants are unavoidable in reality. Future works on

decision-making for autonomous vehicles will be POMDP-

based approaches, where POMDP provides a powerful

mathematical framework for dealing with uncertainty.

REFERENCES

[1] J. E. Naranjo, C. Gonzalez, R. Garcia, and T. de Pedro, "Lane-Change

Fuzzy Control in Autonomous Vehicles for the Overtaking Maneuver,"

IEEE Transactions on Intelligent Transportation Systems, vol. 9, no. 3,

pp. 438-450, 2008

[2] J. Perez, V. Milanes, E. Onieva, J. Godoy, and J. Alonso, "Longitudinal

fuzzy control for autonomous overtaking," in 2011 IEEE International

Conference on Mechatronics, 2011, pp. 188-193: IEEE.

[3] F. Wang, M. Yang, and R. J. I. T. o. I. T. S. Yang, "Conflict-

probability-estimation-based overtaking for intelligent vehicles," vol.

10, no. 2, pp. 366-370, 2009.

[4] O. Karaduman, H. Eren, H. Kurum, and M. Celenk, "Interactive risky

behavior model for 3-car overtaking scenario using joint Bayesian

network," in 2013 IEEE Intelligent Vehicles Symposium (IV), 2013,

pp. 1279-1284: IEEE.

[5] X. Li, X. Xu, L. Zuo, and Ieee, Reinforcement Learning Based

Overtaking Decision-Making for Highway Autonomous Driving (2015

Sixth International Conference on Intelligent Control and Information

Processing). New York: Ieee, 2015, pp. 336-342.

[6] D. C. K. Ngai and N. H. C. Yung, "A Multiple-Goal Reinforcement

Learning Method for Complex Vehicle Overtaking Maneuvers," (in

English), Ieee Transactions on Intelligent Transportation Systems,

Article vol. 12, no. 2, pp. 509-522, Jun 2011.

[7] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.

MIT press, 2018.

[8] V. Mnih et al., "Human-level control through deep reinforcement

learning," vol. 518, no. 7540, p. 529, 2015.

[9] H. Van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning

with double q-learning," in Thirtieth AAAI Conference on Artificial

Intelligence, 2016.

[10] T. Schaul, J. Quan, I. Antonoglou, and D. J. a. p. a. Silver, "Prioritized

experience replay," 2015.

[11] M. Treiber, A. Hennecke, and D. J. P. r. E. Helbing, "Congested traffic

states in empirical observations and microscopic simulations," vol. 62,

no. 2, p. 1805, 2000.

[12] J. Erdmann, "SUMO’s lane-changing model," in Modeling Mobility

with Open Data: Springer, 2015, pp. 105-123.

[13] D. Krajzewicz, "Traffic simulation with SUMO–simulation of urban

mobility," in Fundamentals of traffic simulation: Springer, 2010, pp.

269-293.

An ETA-Based Tactical Conflict Resolution Method for Air Logistics Transportation

Article

Full-text available

May 2023

Air logistics transportation has become one of the most promising markets for the civil drone industry. However, the large flow, high density, and complex environmental characteristics of urban scenes make tactical conflict resolution very challenging. Existing conflict resolution methods are limited by insufficient collision avoidance success rates when considering non-cooperative targets and fail to take the temporal constraints of the pre-defined 4D trajectory into consideration. In this paper, a novel reinforcement learning-based tactical conflict resolution method for air logistics transportation is designed by reconstructing the state space following the risk sectors concept and through the use of a novel Estimated Time of Arrival (ETA)-based temporal reward setting. Our contributions allow a drone to integrate the temporal constraints of the 4D trajectory pre-defined in the strategic phase. As a consequence, the drone can successfully avoid non-cooperative targets while greatly reducing the occurrence of secondary conflicts, as demonstrated by the numerical simulation results.

Cybersecurity of Autonomous Vehicles: A Systematic Literature Review of Adversarial Attacks and Defense Models

Article

Full-text available

Jan 2023

Autonomous driving (AD) has developed tremendously in parallel with the ongoing development and improvement of deep learning (DL) technology. However, the uptake of artificial intelligence (AI) in AD as the core enabling technology raises serious cybersecurity issues. An enhanced attack surface has been spurred on by the rising digitization of vehicles and the integration of AI features. The performance of the autonomous vehicle (AV)-based applications is constrained by the DL models' susceptibility to adversarial attacks despite their great potential. Hence, AI-enabled AVs face numerous security threats, which prevent the large-scale adoption of AVs. Therefore, it becomes crucial to evolve existing cybersecurity practices to deal with risks associated with the increased uptake of AI. Furthermore, putting defense models into practice against adversarial attacks has grown in importance as a field of study amongst researchers. Therefore, this study seeks to provide an overview of the most recent adversarial defensive and attack models developed in the domain of AD.

Model Predictive Control for Full Autonomous Vehicle Overtaking

Article

Full-text available

Jan 2023

Despite the many advancements in traffic safety, vehicle overtaking still poses significant challenges to both human drivers and autonomous vehicles, especially, how to evaluate the safety of passing a leading vehicle efficiently and reliably on a two-lane road. However, few realistic attempts in this field have been made in the literature to provide practical solutions without prior knowledge of the state of the environment and simplifications of vehicle models. These model simplifications make many of the proposed solutions in the literature unusable in real scenarios. Considering the dangers that can arise from performing a defective overtake and the substantial risk of vehicle crashes during high-speed maneuvers, in this paper we propose a system based on model predictive control to accurately estimate the safety of starting a vehicle overtake in addition to vehicle control during the maneuver that aims to ensure a collision-free overtake using a complete and realistic model of the vehicle’s dynamics. The system relies on a stereoscopic vision approach and machine learning techniques (YOLO and DeepSORT) to gather information about the environment such as lane width, lane center, and distance from neighboring vehicles. Furthermore, we propose a set of scenarios to test the performance of the proposed system based on accurate modeling of the environment under a range of traffic conditions and road architecture. The simulation result shows the high performance of the proposed system in ending collisions during overtaking and providing optimal pathing that minimizes travel time.

Integrating Expert Guidance for Efficient Learning of Safe Overtaking in Autonomous Driving Using Deep Reinforcement Learning

Preprint

Full-text available

Aug 2023

Overtaking on two-lane roads is a great challenge for autonomous vehicles, as oncoming traffic appearing on the opposite lane may require the vehicle to change its decision and abort the overtaking. Deep reinforcement learning (DRL) has shown promise for difficult decision problems such as this, but it requires massive number of data, especially if the action space is continuous. This paper proposes to incorporate guidance from an expert system into DRL to increase its sample efficiency in the autonomous overtaking setting. The guidance system developed in this study is composed of constrained iterative LQR and PID controllers. The novelty lies in the incorporation of a fading guidance function, which gradually decreases the effect of the expert system, allowing the agent to initially learn an appropriate action swiftly and then improve beyond the performance of the expert system. This approach thus combines the strengths of traditional control engineering with the flexibility of learning systems, expanding the capabilities of the autonomous system. The proposed methodology for autonomous vehicle overtaking does not depend on a particular DRL algorithm and three state-of-the-art algorithms are used as baselines for evaluation. Simulation results show that incorporating expert system guidance improves state-of-the-art DRL algorithms greatly in both sample efficiency and driving safety.

Improved PER-DDPG based nonparametric modeling of ship dynamics with uncertainty

Article

Aug 2023
OCEAN ENG

This study contributes to addressing the challenge of quickly obtaining an effective and accurate nonparametric model for describing ship maneuvering motion in three degrees of freedom (3-DOF). To achieve this, an intelligent ship dynamics nonparametric modeling method named improved PER-DDPG is proposed. This method leverages the deep deterministic policy gradient algorithm (DDPG) and prioritized experience replay mechanism (PER) and analyzes the characteristics between the goal of deep reinforcement learning (DRL) and the modeling process of the nonparametric model. The PER mechanism is utilized to enhance the agent’s understanding of the overall mechanism of ship motion by improving the utilization of samples. The meaning of target value is redefined due to transforming DRL aiming at maximizing cumulative rewards into maximizing the set of immediate rewards at each time step. To validate the performance of the proposed modeling method, we conduct simulation studies using a benchmark ship model i.e., a Mariner cargo ship dynamic model, and experimental studies using a real unmanned surface vehicle (USV). In the simulation test, we demonstrate the effectiveness and generalization of the proposed method through zigzag and turning circle tests. Furthermore, we verify the robustness and applicability of the proposed method by using datasets with uncertain environmental disturbances and datasets with different sampling frequencies. Additionally, the experimental tests conducted on the USV indicate the consistency of the proposed approach.

Safety-critical Decision-making and Control for Autonomous Vehicles with Highest Priority

Conference Paper

Jun 2023

Decision-making for Overtaking in Specific Unmanned Driving Scenarios based on Deep Reinforcement Learning

Conference Paper

May 2023

Autonomous vehicular overtaking maneuver: A survey and taxonomy

Article

May 2023

Decision-Making Models for Autonomous Vehicles at Unsignalized Intersections Based on Deep Reinforcement Learning

Conference Paper

Jul 2022

Reinforcement Learning for Autonomous Driving Scenarios in Indian Roads

Chapter

Nov 2022

The decision-making process for autonomous vehicles comes with numerous challenges that are not easily solved. With the ever-changing traffic situation of the world and the increasing need for autonomous driving technology, there are constant innovations to deal with the increasing number of problems in the complex environments that autonomous driving agents find themselves in. Developing countries like India face even more numerous challenges with existing autonomous driving solutions not being directly transferable. However, with the maturation and advancement of deep learning technology over the years, more and more novel methods in the field of deep reinforcement learning are being proposed to tackle both new challenges and existing challenges. In this study, we explore the contemporary reinforcement learning techniques for autonomous driving tasks and analyze their applicability for the unstructured road environment and also look at some of the less-common scenarios that occur frequently in the Indian context.KeywordsAutonomous drivingReinforcement learningIndian road scenariosDecision-makingDeep learning

Deep Reinforcement Learning with Double Q-Learning

Article

Full-text available

Sep 2015

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether this harms performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-learning with a deep neural network, suffers from substantial overestimations in some games in the Atari 2600 domain. We then show that the idea behind the Double Q-learning algorithm, which was introduced in a tabular setting, can be generalized to work with large-scale function approximation. We propose a specific adaptation to the DQN algorithm and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

Longitudinal fuzzy control for autonomous overtaking

Conference Paper

Full-text available

Apr 2011

Cooperative maneuver among autonomous and conventional vehicles can be considered one of the next steps for obtaining a safer and more comfortable driving. Some examples of these maneuvers are: Adaptive Cruise Control (ACC), intelligent intersection management or automatic overtaking maneuvering, among others. One of the most important aims of the Intelligent Transportation Systems (ITS) is to use roads without substantial modifications of the current infrastructure. The overtaking maneuver demands special attentions, in particular when it is carried out on two-way roads. For this reason, a fuzzy decision system based on fuzzy logic able to execute an autonomous overtaking in two-way roads has been implemented in this paper. The experiments are focused on the longitudinal control when an autonomous vehicle is overtaking to other. Moreover, different cases have been considered using an oncoming vehicle from the other direction on a two-way road. Several trials with three real vehicles communicated via a wireless network were carried out, showing a good behavior.

Conflict-Probability-Estimation-Based Overtaking for Intelligent Vehicles

Article

Full-text available

Jun 2009

Overtaking is a complex and hazardous driving maneuver for intelligent vehicles. When to initiate overtaking and how to complete overtaking are critical issues for an overtaking intelligent vehicle. We propose an overtaking control method based on the estimation of the conflict probability. This method uses the conflict probability as the safety indicator and completes overtaking by tracking a safe conflict probability. The conflict probability is estimated by the future relative position of intelligent vehicles, and the future relative position is estimated by using the dynamics models of the intelligent vehicles. The proposed method uses model predictive control to track a desired safe conflict probability and synthesizes decision making and control of the overtaking maneuver. The effectiveness of this method has been validated in different experimental configurations, and the effects of some parameters in this control method have also been investigated.

Prioritized Experience Replay

Conference Paper

Nov 2016

Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, this approach simply replays transitions at the same frequency that they were originally experienced, regardless of their significance. In this paper we develop a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently. We use prioritized experience replay in Deep Q-Networks (DQN), a reinforcement learning algorithm that achieved human-level performance across many Atari games. DQN with prioritized experience replay achieves a new state-of-the-art, outperforming DQN with uniform replay on 41 out of 49 games.

Reinforcement learning based overtaking decision-making for highway autonomous driving

Conference Paper

Nov 2015

SUMO’s Lane-changing model

Article

Mar 2015

Jakob Erdmann

SUMO is an open source microscopic traffic simulation. A major component of modelling microscopic vehicle behavior is the lane-changing behavior on multi-lane roads. We describe a new model which uses a 4-layered hierarchy of motivations to determine the vehicle behavior during every simulation step and motivate in which ways it improves the current lane-changing model.

Human-level control through deep reinforcement learning

Article

Feb 2015
NATURE

The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Interactive risky behavior model for 3-car overtaking scenario using joint Bayesian network

Conference Paper

Jun 2013

In this paper, we propose a new model for 3-car interactive risky behavior of vehicles travelling in front and behind of a driver (overtaken) car. Following distance of vehicles moving in front and at rear end of the car in question plays an important role for overtaking scenario. Moreover, the distance between the car in front and the vehicle following it should be sufficiently long for preventing collision if overtaking is inevitable for the motorist behind the middle subject vehicle. Here, we consider the roles of the vehicles involved in such a scenario. We observe the behaviors of moving vehicles in front and the rear end of the subject car. To this end, front and rear car images are acquired by two cameras and subjected to vertical and horizontal optical flow edge map creation. In classification stage of the optical flow edge map clusters, a motion vector histogram thresholding method is utilized in conjunction with a decision assessment strategy based on the joint Bayesian belief network statistical model. In turn, not only the trajectories of the cars are captured but also joint behavior of three cars over-taken scenario is estimated using the proposed interactive risk model.

Traffic Simulation with SUMO – Simulation of Urban Mobility

Chapter

Jan 2011

Daniel Krajzewicz

SUMO is a microscopic road traffic simulation made available as open source under the GPL license. The complete suite includes tools for importing road networks, generating routes from different sources, and two versions of the traffic simulation itself, one started from the command line and one including a graphical user interface. The simulation uses the microscopic, space-continuous and time-discrete car-following model developed by S. Krauß and a lane-changing model developed within the work on the simulation. Traffic assignment is normally performed using the iterative approach formulated by C. Gawron, but further methods, such as a one-shot assignment method, exists. The traffic simulation offers a socket-based interface to external applications, allowing to interact with a running simulation online. Values and states of objects the simulation consists of can be both retrieved and changed. SUMO has been used within different projects both by the DLR and by external organizations. The software and documentation can be accessed at http://sumo.sf.net.

A Multiple-Goal Reinforcement Learning Method for Complex Vehicle Overtaking Maneuvers

Article

Jul 2011

In this paper, we present a learning method to solve the vehicle overtaking problem, which demands a multitude of abilities from the agent to tackle multiple criteria. To handle this problem, we propose to adopt a multiple-goal reinforcement learning (MGRL) framework as the basis of our solution. By considering seven different goals, either Q-learning (QL) or double-action QL is employed to determine action decisions based on whether the other vehicles interact with the agent for that particular goal. Furthermore, a fusion function is proposed according to the importance of each goal before arriving to an overall but consistent action decision. This offers a powerful approach for dealing with demanding situations such as overtaking, particularly when a number of other vehicles are within the proximity of the agent and are traveling at different and varying speeds. A large number of overtaking cases have been simulated to demonstrate its effectiveness. From the results, it can be concluded that the proposed method is capable of the following: 1) making correct action decisions for overtaking; 2) avoiding collisions with other vehicles; 3) reaching the target at reasonable time; 4) keeping almost steady speed; and 5) maintaining almost steady heading angle. In addition, it should also be noted that the proposed method performs lane keeping well when not overtaking and lane changing effectively when overtaking is in progress.

Decision-Making for Oncoming Traffic Overtaking Scenario using Double DQN

Recommended publications

Safe Reinforcement Learning for Autonomous Vehicle Using Monte Carlo Tree Search

Modeling and simulation of overtaking behavior involving environment

An Autonomous Overtaking Maneuver Based on Relative Position Information

Virtual Target-Based Overtaking Decision, Motion Planning, and Control of Autonomous Vehicles