ArticlePDF Available

Deep reinforcement learning-based eco-driving control for connected electric vehicles at signalized intersections considering traffic uncertainties

Authors:

Abstract and Figures

Eco-driving control poses great energy-saving potential at multiple signalized intersection scenarios. However, traffic uncertainties can often lead to errors in ecological velocity planning and result in increased energy consumption. This study proposes an eco-driving approach with a hierarchical framework to be leveraged at signalized intersections that considers the impact of traffic uncertainty. The proposed approach leverages a queue-based traffic model in the upper level to estimate the impact of traffic uncertainty and generate dynamic modified traffic light information. In the lower level, a deep reinforcement learning-based controller is constructed to optimize velocity subject to the constraints from the traffic lights and traffic uncertainty, thereby reducing energy consumption while ensuring driving safety. The effectiveness of the proposed control strategy is demonstrated through numerous simulation case studies. The simulation results show that the proposed method significantly improves energy economy and prevents unnecessary idling in uncertain traffic scenarios, as compared to other approaches that ignore traffic uncertainty. Furthermore, the proposed method is adaptable to different traffic scenarios and showcases energy efficiency.
Content may be subject to copyright.
1
Deep Reinforcement Learning-Based Eco-Driving
Control for Connected Electric Vehicles at
Signalized Intersections Considering Traffic
Uncertainties
Jie Li1, Abbas Fotouhi2, Wenjun Pan1, Yonggang Liu1, *, Yuanjian Zhang3 and Zheng Chen4, **
1State Key Laboratory of Mechanical Transmissions & College of Mechanical and Vehicle Engineering
Chongqing University 400044, Chongqing, China
2Advanced Vehicle Engineering Centre, School of Aerospace, Transport and Manufacturing, Cranfield University,
Cranfield, Bedfordshire, MK43 0AL, UK
3Department of Aeronautical and Automotive Engineering, Loughborough University, Leicestershire, LE11 3TU,
UK
4Faculty of Transportation Engineering, Kunming University of Science and Technology, Kunming, 650500,
China
* Corresponding Authors: Yonggang Liu (andyliuyg@cqu.edu.cn) and Zheng Chen (chen@kust.edu.cn)
Abstract: Eco-driving control poses great energy-saving potential at multiple signalized intersection scenarios.
However, traffic uncertainties can often lead to errors in ecological velocity planning and result in increased energy
consumption. This study proposes an eco-driving approach with a hierarchical framework to be leveraged at
signalized intersections that considers the impact of traffic uncertainty. The proposed approach leverages a queue-
based traffic model in the upper level to estimate the impact of traffic uncertainty and generate dynamic modified
traffic light information. In the lower level, a deep reinforcement learning-based controller is constructed to optimize
velocity subject to the constraints from the traffic lights and traffic uncertainty, thereby reducing energy
consumption while ensuring driving safety. The effectiveness of the proposed control strategy is demonstrated
through numerous simulation case studies. The simulation results show that the proposed method significantly
improves energy economy and prevents unnecessary idling in uncertain traffic scenarios, as compared to other
approaches that ignore traffic uncertainty. Furthermore, the proposed method is adaptable to different traffic
scenarios and showcases energy efficiency.
Key Words: Eco-driving, deep reinforcement learning, velocity optimization, signalized intersection, connected
electric vehicle.
I. INTRODUCTION
Due to the persistent increment in energy consumption in the transportation sector, energy-saving solutions
2
have attracted huge attention from researchers all over the world [1]. Among those methods, eco-driving control
has been gradually recognized as a prospective manner to promote energy-saving transportation [2]. With the rapid
development of connected and automated vehicles (CAVs), vehicles are expected to have access to various driving
environment information and automatedly plan the velocity profiles to reduce energy consumption, referred to as
eco-driving technique, which has been widely demonstrated to remarkably reduce energy consumption in different
scenarios [3] and is therefore the research focus of this study.
In this study, eco-driving control simply means optimization of the vehicle’s velocity profile to avoid
unnecessary acceleration or deceleration and maintain the vehicle to operate in an efficient status. According to
different driving scenarios, eco-driving can be divided into single-vehicle and car-following scenarios. For the latter
field, the basic idea of eco-driving control is to maintain a safe inter-vehicle distance with preceding vehicles and
to reduce energy consumption during the car-following scenarios. In [4], car-following scenarios are divided into
four modes, and the model predictive control (MPC) is leveraged to optimize the motor torque for electric CAVs in
different driving modes. Xie et al. [5] proposed a predictive ecological adaptive cruise control approach for plug-in
hybrid electric vehicles (PHEVs), in which the motion of the preceding vehicle is predicted and the driving speed
is optimized accordingly based on MPC. In [6], reinforcement learning (RL) is exploited to improve the fuel
economy of CAVs in the car-following scenarios by considering the nonlinear efficiency of the powertrain system.
The eco-driving control in car-following scenarios can achieve energy-saving, safety and driving comfort
simultaneously. However, these approaches are designed for specific driving scenarios and are more feasible for
congested traffic conditions. For eco-driving control in single-vehicle scenarios, existing methods mainly work
based on planning the driving speed by considering complex constraints such as the speed limit, powertrain
efficiency and traffic light [7, 8]. For instance, in [9], a novel step-size discrete dynamic programming (DP)
algorithm is exploited to optimize driving speed and consequently reduce energy consumption at different speed
limits. Li et al. [10] constructed a data-driven powertrain energy consumption model to achieve co-optimization of
velocity and energy management for PHEVs with high calculation efficiency. In particular, signal phase and timing
(SPaT) information of the traffic lights, which can be acquired via vehicle-to-infrastructure (V2I) communication,
is quite valuable for eco-driving control in urban roadways. Since velocity profiles can be interrupted by signalized
intersections, they result in extra velocity fluctuations and more energy consumption.
3
Taking SPaT information into consideration, the eco-driving control technique has been widely investigated to
promote energy economy and traffic throughput [11]. Early researches on eco-driving control at signalized
intersections typically calculate a constant reference speed according to the SPaT information when driving through
the intersections [12]. However, the employment of constant driving speed is not practical and cannot achieve
optimal performance. In [13], the optimal velocity profile at signalized intersections is simplified into a constant
speed, and a bi-level DP method is exploited to solve the optimal eco-driving control among multiple intersections.
In [13], a constant-speed energy consumption map and a bi-level DP method are developed to optimize velocity
with high computational efficiency. With the rapid development of machine learning algorithms, RL has received
increasing attention in solving optimal control problem (OCP) for energy-saving [14]. To mitigate the high
computational burden of traditional optimization-based approaches, a deep reinforcement learning (DRL) approach
is proposed to plan an ecological velocity for connected PHEVs in a model-free method [15]. In [16], variable light
spacing and trigonometric signal models are employed to train a DRL controller, thereby enhancing the adaptability
to multi-light scenarios. However, the aforementioned approaches do not account for the influence of traffic
uncertainties. Consequently, these methods cannot guarantee passing through traffic lights without unnecessary
idling on real-world road.
In real urban road conditions, the traffic uncertainties arise from several aspects, which might interpret affect
the planned velocity profiles at intersections. The typical uncertainty stems from waiting queues and slow traffic
flow in congested conditions, and the feasible passing duration of traffic light dynamically varies [17]. In addition,
the signal control, such as bus signal priority control, can also generates uncertainty to eco-driving, as the signal
timing might be dynamically adjusted [18]. Since the signal control is beyond the scope of this research, the
following discussion will focus on the traffic uncertainties from waiting queue and slow traffic flow. To address the
influence of traffic uncertainty at signalized intersections, the existing research on eco-driving control can be
categorized into two groups to account for traffic uncertainty. The first group focuses on eco-driving control with
constraints of traffic lights and preceding vehicles. It involves switching between three driving modes based on real-
time driving conditions: traffic lights priority, car-following priority mode and emergency braking mode. For
instance, in [19], a predictive cruise control system is designed based on a bi-level MPC algorithm, which employs
a car-following-oriented and a SPaT-oriented MPC controller to save energy consumption on urban roads. Similarly,
4
Bai et al. [20] proposed a hybrid DRL-based eco-driving control strategy that combines rule-based and RL-based
control policies, wherein the RL-based controller addresses traffic light constraints, and the rule-based policy
focuses on velocity planning for car-following scenarios and driving safety. In [21], a DRL-based controller is
constructed to execute eco-driving control in urban scenarios based on a twin-delayed deep deterministic policy
gradient agent that takes into account the state of the host vehicle, preceding vehicle and next traffic light. However,
these approaches may result in unnecessary idling and deceleration before signalized intersections, as they do not
predict the possible waiting queues due to traffic jams before the intersections. Consequently, the controllers are
typically forced to switch to the car-following mode or emergency braking to ensure driving safety when the host
vehicle is approaching preceding cars, which can lead to a deterioration in the energy-saving performance of the
approaches that neglect traffic uncertainty.
In the second group, the waiting queues before signalized intersections are incorporated into eco-driving
control, and velocity is planned to avoid unnecessary idling. Sun et al. [22] formulated a data-driven chance-
constrained robust optimization problem based on empirical sample data in which an effective red-light duration is
proposed to illustrate the feasible passing time of a signalized intersection. In [23], an improved queue discharge
prediction method is implemented to predict the discharge time of the waiting queue before a signalized intersection,
and a hierarchical control framework is conduced to solve the ecological velocity. Dong et al. [24] designed a
vehicle queue predictor based on intelligent driver model (IDM), followed by a spatial-domain optimal control
strategy to realize energy consumption minimization and speed tracker. In addition, traffic flow models based on
shockwave theory are gradually employed in prediction of the waiting queues [25]. In [26], a kinematic wave model
is established to predict traffic dynamics and vehicle queue length on a signalized road. The optimal velocity is then
solved by the direct multiple shooting algorithm with the consideration of the constraints of modified traffic lights.
In [27], a long short-term memory neural network is implemented to predict dynamics traffic flow, thereby
promoting to waiting queue prediction. However, the above-mentioned methods are typically executed by assuming
that a waiting queue is already generated or will emerge before signalized intersections at red phase. In real-world
traffic conditions, a waiting queue is typically formed gradually when the host vehicle is approaching the next traffic
light, and it is difficult to anticipate the emergence of the waiting queue. Moreover, slow traffic flow might also
affect the motion of the host vehicle even at the green phase.
5
The eco-driving control with incorporating the queue effect have gained substantial attention recently.
Motivated by the above discussions, several problems are still unsolved in that area. Firstly, most existing studies
are executed via an optimization algorithm. However, the balance between the energy-saving performance and the
computational efficiency is intractable to be addressed especially in complicated urban scenarios. Secondly, in most
studies, the influence of traffic uncertainty on the eco-driving control at signalized intersections is insufficiently
considered. Moreover, most of existing studies are concerned about the traffic uncertainty executed in specific
scenarios, in which the queue progress is formed at the red phase. However, in urban scenarios, the formation of
queues is difficult to be anticipated and a slow traffic flow can also affect the motion of host vehicle at the green
phase. Hence, only taking the waiting queue at the red phase into consideration cannot sufficiently avoid
unnecessary deceleration or idling before the signalized intersections.
Departure Host vehicle
I2V
V2X
Sensors
Preceding vehicles
I2V
Multiple signalized
intersections Destination
The DDPG agent
DDPG agent
State
Environment
Reward Action
Modified traffic light modle
SPaT
information
Traffic data
Traffic density
Traffic flow
Maximum flow
Traffic uncertainty
Revised SPaT information
Longitudinal acceleration decisions
Road side units Road side units
Traffic flow model Modified SPaT
information
Fig. 1. Illustration of the proposed eco-driving control system.
Motivated by the above analysis, this study proposes a DRL-based eco-driving control approach at multiple
signalized intersections, in which a hierarchy framework is implemented as shown in Fig 1. The upper-level
controller estimates traffic uncertainty by using a queue-based traffic flow model and modifies the states of the
traffic lights accordingly, while the lower-level controller uses DRL to plan an ecological velocity for the vehicle
while considering driving safety and comfort constraints. The main contribution of this study is threefold: 1) an eco-
driving approach is proposed for signalized intersection scenarios and accounts for the impact of traffic uncertainty.
This is an important consideration as traffic flow in urban areas can be highly variable and unpredictable. 2) A
dynamic modified traffic light model is created, which considers the queue-based traffic flow model to correct the
6
states of the traffic lights. This model allows for more accurate traffic light predictions and can improve traffic flow.
3) The DRL-based eco-driving controller is designed to plan an ecological velocity at signalized intersections, while
considering traffic uncertainties and the constraints of the dynamic modified traffic light. Overall, this study presents
a promising approach for eco-driving control in urban areas, which can contribute to a more sustainable and efficient
transportation system.
The remainder of this paper is structured as follows: Section II presents the mathematical modeling of vehicle
and traffic uncertainty and formulates the OCP for eco-driving. Section III elaborates the eco-driving control by
consideration of the traffic uncertainty. Section IV presents the simulation results to verify the proposed method.
Finally, the main conclusions and future work are discussed in Section V.
II. MODELING AND PROBLEM STATEMENT
In this section, the vehicle kinematic model, the traffic flow model, and a modified traffic light model are
presented. The modified traffic light model is constructed to estimate the traffic uncertainty at signalized
intersections. In addition, this section formulates the OCP of eco-driving.
A. Vehicle Model
In eco-driving control, a longitudinal kinematics model is typically utilized to describe the motion of vehicles
[28]. The longitudinal driving force is presented, as:
2
cos( ) sin( )
2
air D
r
C Av
F Mgf Mg

= + +
(1)
where
r
F
is the resistance driving force,
M
is the vehicle mass,
f
means the rolling resistance coefficient,
g
is the acceleration of gravity,
expresses the road grade,
denotes the air drag coefficient,
A
indicates the
frontal area of the vehicle, and
air
is the air density. The driving resistance force consists of rolling resistance,
air resistance, acceleration resistance and road grade resistance. In this research, the lane change is neglected, and
the road grade is set as zero. By assuming the state variables of the vehicle model to be velocity
v
and position
d
,
the longitudinal kinematics model is formulated, as:
va=
(2)
7
dv=
(3)
m t t r
Ti F
r
aM
=
(4)
where
v
are
d
the variation of velocity and position each time step, respectively,
a
is the longitudinal
acceleration,
is the correction coefficient of rotating mass,
t
is the transmission system efficiency,
t
i
indicates
the transmission ratio, and
m
T
stands for the torque of motor. In this study, the basic parameters of an electric
vehicle is applied based on our previous study [29] to calculate the energy consumption
( , , )E v a
, as:
/ , 0
( , , ) ,0
m m m
b
m m m
PP
E v a P PP
==

(5)
9550 9550
m m m
m
tire t
T v T
Pri
==
(6)
( , )
m p p
T

=
(7)
where
b
P
is the battery power,
m
P
is the motor power,
is the road grade,
m
is the motor efficiency,
tire
r
is the
radius of vehicle tire,
m
is the rotation speed of motor,
represents the motor efficiency map.
B. Modeling and Prediction of Traffic Uncertainties
To consider traffic uncertainties at signalized intersections, a queueing-based traffic flow model [30], referred
to as Lighthill-Whitham-Richards (LWR) model, is built to predict the queue’s discharge time and traffic dynamics.
As discussed before, not only the waiting queue has an impact on the motion of the host vehicle, but also a slow
traffic flow can confine the speed of the host vehicle. Hence, the influence of traffic uncertainties at signalized
intersections are classified into two categories, i.e., waiting queue and slow traffic flow. To consider the influence
of waiting queue, the LWR model is leveraged to describe traffic dynamics of the waiting queue, as:
( , ) ( , ) 0
d t q d t
td

+=

(8)
w
q
v
=
(9)
8
where
( , )dt
and
( , )q d t
represent the traffic density and flow at position
d
and time
t
, respectively. The
parameter
w
v
is the effective velocity of a shockwave whereas
and
q
show the changes of traffic density
and flow between different regions. The flow-density diagram of the LWR model in a specific region is shown in
Fig. 2. With the increasement of traffic density, a growth of traffic flow is observed until it reaches the maximum
traffic flow. By contrast, after reaching the peak point, the traffic flow gradually decreases to zero with the increase
of the traffic density. At the maximum traffic density point, we assume a waiting queue emerges in this region.
Based on (9), when the traffic light turns to green phase, the dissipating speed of waiting queue
dis
v
can be presented
as the effective velocity of shockwave, as:
du
dis
du
qq
v

=
(10)
where
d
q
and
u
q
stand for the traffic flow on the downstream and upstream of the signalized intersection,
d
and
u
represent the traffic density on the downstream and upstream of the intersection respectively. In most
situations, the traffic flow on the downstream of a red-phase intersection can be estimated as the maximum traffic
flow, and the upstream of the red-phase intersection reaches the maximum traffic density when a waiting queue is
formed. Therefore, the queue’s dissipation rate after the green light starts is reformulated, as:
max
max
dis
s
q
v

=
(11)
where
s
is the traffic density corresponding to the maximum traffic flow,
max
q
is the maximum traffic flow,
max
is the maximum traffic density. Obviously, in a specific region, the queue’s dissipation rate is recognized as a
constant, which can be used to predict the dissipation time of the queue.
Traffic density
Traffic flow
Maximum flow
Maximum density
1
max,
( , )
s
q
max
( , )
s
q
2
Fig. 2. The flow-density diagram for the LWR model.
9
To consider the influence of a slow traffic flow, the arrive time to a signalized intersection is predicted for the
preceding vehicles. For this reason, the effective speed of the road is calculated based on traffic flow theory [31],
as:
()
() ()
j
j
traff j
qt
vt t
=
(12)
where j means the
th
j
region before the
th
j
traffic light,
()
j
traff
vt
indicates current effective speed at the
corresponding region, whereas
()
j
qt
and
()
jt
are the current traffic flow and density at that corresponding
region, respectively. If accurate traffic flow data is available, the traffic conditions can be preferably predicted at
intersections. However, the actual traffic flow information remains challenging to acquire in real world. By contrast,
the traffic count data is easy to be monitored and can be excavated to characterize the traffic flow to a large extent
[32]. In this study, we assume that loop detectors are installed at each signalized intersection and in the middle of
each two intersections. In addition, loop detectors will provide a traffic information vector of each passing vehicle,
i.e.,
[ , , ]
veh index index
Index t v
, including the index, arrive time and velocity of the vehicles. With the support of traffic
count data, the detectors at intersections can easily estimate the number of vehicles in that region, and the data from
the detectors between two traffic lights can be utilized to estimate the traffic flow and density, as:
1
() j
jveh
i
num
qt tt
=
(13)
1
()
() ()
j
jveh
j
i aver
num t
tt t v
=
(14)
1
1
()
i
j
aver i
j
veh
vv
num t
=
(15)
where i = 1, 2, 3, … means the
th
i
vehicle in the region,
j
veh
num
is the number of vehicles in the
th
j
region,
i
t
is the
arrive time to the loop detectors for the
th
i
vehicle, and
j
aver
v
is the average speed of the passing vehicles. Finally,
based on (12) to (15), the arrival time to the next traffic light of each vehicle can be predicted,
() ()
j
triff
pre
ij
triff
d
ttvt
=
(16)
10
where
j
triff
d
denotes the length of the
th
j
region. Thus, the arrival time to next traffic light of each preceding vehicle
is determined and continuously updated according to the current traffic flow and traffic density. By this manner, the
prediction of the traffic uncertainty includes the anticipation of queue discharge and preceding vehicles.
C. Optimal Eco-Driving Control Formulation
In this study, the aim of eco-driving control at signalized intersections is to optimize the host vehicle’s velocity
profile for energy-saving by considering the constraints of driving safety and traffic lights. The state variables vector
x
include the velocity and position of the host vehicle, and the control variable
u
is the longitudinal acceleration.
Therefore, the objective function, the constraints and the system dynamics of the OCP are formulated, as:
1 2 3
0( ( , ))
T
light safety
J p p E x u dt
= + +
(17)
min max
min max
0
max
0
initial
initial
a
T
aa
v v v
vv
d
T
d


=
=
(18)
()
()
()
()
vt
a
t
t
d
vt
=
(19)
where
( , )E x u
is the instantaneous electricity energy consumption,
are adjustable weight factors to balance
different indexes,
light
p
and
safety
p
are the penalties from traffic lights and preceding vehicles, and these penalties
are triggered when drive through a red light or a collision happens.
min
a
and
max
a
are the limits of acceleration,
min
v
and
max
v
are the limits of velocity,
0
v
and
0
d
are the initial velocity and position of the host vehicle,
respectively,
T
and
max
T
are the travel time and the maximum travel time for the whole scenario. It can be found
that the OCP contains nonlinear constraints from traffic lights and preceding vehicles, and consequently, it remains
challenging to directly solve it using traditional optimization algorithms. Previous studies attempted to solve this
problem by proposing modified optimization frameworks or algorithms, such as iterative DP algorithm and
hierarchical optimization framework [24, 27]. However, any improvement of the calculation efficiency is reported
to be achieved when the optimality of performance is sacrificed. In this study, a DRL-based method is leveraged to
11
find optimal control decisions of the OCP, thereby promoting performance with a slight calculation burden.
III. ECO-DRIVING CONTROL CONSIDERING TRAFFIC UNCERTAINTY AT SIGNALIZED INTERSECTIONS
To solve the multi-objective OCP of eco-driving control at signalized intersections, RL is applied to plan
velocity profiles for the host vehicle. The proposed scheme is described in Fig. 3, wherein a hierarchical framework
is implemented, in which the aforementioned traffic model is exploited to predict the traffic uncertainty, and a
modified traffic light model is developed to revise the SPaT information in the upper level. In addition, a deep
deterministic policy gradient (DDPG) agent is designed to generate acceleration control decisions at multiple
intersections under the influence of traffic uncertainty.
Traffic information from
loop detectors
Eco-driving control based on reinforcement learning
[ , , ]
veh A A
Index t v
Traffic
infrastructure
Host
vehicle
Queue
queue
l
mre ain
t
Optimal
profile
Time
Distance
Traffic model
Modified traffic light
()
() ()
j
j
traff j
qt
vt t
=
1
() j
jveh
end
AA
num
qt tt
=
1
()
() ()
j
jveh
end j
A A aver
num t
tt t v
=
Corrected traffic light state
Queue length
Corrected remaining time
DDPG agent
State
Environment
Reward Action
Acceleration
Target velocity
Traffic density
Traffic flow
Maximum flow
Traffic
infrastructure
Traffic
infrastructure
Fig. 3. Schematic of eco-driving control system at signalized intersections considering traffic uncertainties
A. Implementation of DRL for Eco-Driving Control
RL methods consider control systems as a Markov decision process, in which the next state
1t
s+
only depends
on the current state
t
s
and the control action
t
a
. The action is determined via a control policy
( , )
tt
sa
, and a
reward
r
is observed from the environment. In those methods, the aim of the agent is to learn a policy for
maximizing the cumulative reward, also called Q-value , which can be formulated as follows:
12
1
( , ) ( , ) max ( , )
T
t t t t i i
it
Q s a r s a r s a
=+
=+
(20)
where
T
is the duration of the whole scenario, and
is the discount factor. Eq. (20), referred to as the Bellman
function, can be recursively formulated, as:
11
( , ) ( , ) max ( , )
t t t t t t
Q s a r s a Q s a
++
=+
(21)
A classic actor-critic architecture is typically employed in RL methods. In that structure, there are two crucial
components, i.e., the policy function and the Q-value function, which account for generating actions and estimating
the maximum cumulative reward. With the advancement of machine learning techniques, various DRL algorithms,
including distributed proximal policy optimization (DPPO), DDPG and twin-delayed deep deterministic policy
gradient (TD3), have been explored to enhance eco-driving control [33, 34]. The selection of an appropriate DRL
algorithm is primarily determined by the action space and convergence performance. The longitudinal control of
eco-driving requires continuous action space to support microscopic CAV control. Additionally, implementation of
reference target velocity, which will be discussed later, can facilitate training convergence. Previous research results
showed that the DDPG can capture the potential nonlinearity of policy function and the Q-value function through
deep neural networks and generate continuous actions [35]. Concerning the requirement of implementation
simplicity and continuous actions, the DDPG algorithm is utilized to solve the OCP of eco-driving control. To
balance the conflict of energy-saving and pursuing speed, a green light optimized speed advisory (GLOSA)
algorithm [21] is leveraged to calculate the reference target speed for the DRL controller, which is formulated, as:
_max _min
max( ,1.1 )
tar tar tl tl
v f v v=
(22)
max
_ max
, green phase
, red phase
tl tl
remain
v
vd
t
=
(23)
_ min
, green phase
, red phase
tl
remain
tl
tl
remain green
d
t
vd
tt
=
+
(24)
where
_maxtl
v
and
_mintl
v
represent the maximum and minimum velocities to pass the next traffic light at the green
13
phase, respectively. Note that
_maxtl
v
and
_mintl
v
should satisfy the constraints of speed limits of the road. The
parameter
tl
d
is the distance to the next traffic light,
remain
t
is the remaining time of current phase,
green
t
and
red
t
are the time duration of green and red phases, and
tar
f
is a factor to adjust the target velocity.
1) State and action variables
The state vector is designed to illustrate the current state of the eco-driving system. In some previous studies,
all possible variables are selected to represent the state of the host vehicle, traffic light and preceding vehicles [36].
However, a quite large state-space model is entailed, leading to difficulty in learning convergence. The state
variables are simplified with the help of target velocity, and the variables include host vehicle velocity
v
, host
vehicle acceleration
a
, velocity deviation
v
, velocity deviation integral
inte
v
and velocity deviation derivative
v
. Note that velocity deviation represents the deviation between the host vehicle’s velocity and the target velocity.
The host vehicle’s velocity
v
and acceleration
a
can describe the motion of the host vehicle whereas the other
variables can provide necessary information of speed pursuing state to pass the next traffic light in a green phase.
Furthermore, the acceleration is selected as the action variable to plan the velocity profiles for the host vehicle to
achieve energy-saving at signalized intersections.
2Reward function
The reward function should be designed to encourage the agent to pass traffic lights in a green phase and reduce
energy consumption at the same time. Therefore, the reward function at signalized intersection scenarios mainly
consists of three aspects. The first one is energy consumption reward. The second one says the travel time should
be maintained within an acceptable range. The third aspect is related to driving safety, i.e., obeying traffic rules,
including traffic lights and speed limits. It should be noted that an emergency braking policy and a car-following
strategy [37] are applied to avoid collisions, and this item is not included in the reward function. Based on the above
discussion, the reward function
e
r
corresponding to energy consumption is designed as follows:
( , , )
ee
r f E v a
=
(25)
where
e
f
is a weight factor of the energy consumption reward. Furthermore, a penalty is added to the reward
14
function by considering acceleration to avoid frequent speed variations, minimize energy consumption and ensure
driving comfort. A quadratic penalty function
a
r
is considered for acceleration, presented below.
2
aa
r f a=
(26)
where
a
f
is a weight factor. The above item generates a penalty to critic module when acceleration or deceleration
emerges, especially in short maneuvers with a higher amplitude. In addition, the agent is encouraged to track the
target velocity for passing traffic lights at green phase. A reward function
v
r
for velocity deviation penalty is
formulated as follows:
2
()
v v tar
r f v v=
(27)
where
v
f
is a weight factor. This penalty item enables the agent to eliminate exceed velocity deviations to the target
velocity. However, it does not ensure a preferable tracking of the target velocity. For instance, velocity cannot stably
follow the target velocity in a small velocity deviation situation as other reward terms dominant the final reward.
Moreover, a positive reward item is necessary to encourage the host vehicle to move forward to the destination.
Hence, two additional items in reward function are designed, as:
2
2
1 ( )
exp( )
2
2
vpr vpr
dv
rf

=−
(28)
_ min _ max
_ min _ max
,
, ||
tl tl tl
tl
tl tl tl
v v v
rv v v v

=
(29)
where
vpr
r
is the positive reward to velocity deviation which satisfies a normal distribution,
dv
is the velocity
deviation to the reference target speed,
is the mean value of normal distribution, and
is the standard deviation
of the normal distribution.
tl
r
is the positive reward to velocity and encourages the host vehicle to move forward,
as long as the velocity is maintained within the boundaries when passing the next traffic light at a green phase.
tl
indicates the constant value for positive reward, and
vpr
f
is a weight factor. Finally, a reward for safety is designed
to penalize behaviors like violating traffic rules when the host vehicle passes traffic lights at a red phase or violate
the speed limit. By contrast, the safety reward is zero if there is no unsafe behavior occurs. Hence, a piecewise
15
function for safety reward is presented as follows:
, traffic rules violated
0, otherwise
safe
TL
r
=
(30)
where
safe
is the penalty for safety reward. In this manner, the total reward function can be formulated by adding
all items as presented below.
TL tl a v vpr e
r r r r r r r= + + + + +
(31)
B. Eco-Driving Approach Considering Traffic Uncertainty
As elaborated earlier, in real-world conditions, the SPaT information of the next traffic light may not be exactly
the same as practical conditions, due to the waiting queues and slow traffic flow. To address this limitation, a
modified traffic light model is developed to revise the affection of traffic uncertainty on SPaT information,
combining with the established traffic model. As such, the modified SPaT information is transferred to the DRL
controller to realize eco-driving control considering the influence of traffic uncertainty as shown in Fig. 4.
Host
vehicle
queue
l
Optimal
profile
Time
Modified traffic light state
Modified position
Modified remaining time
Modified traffic
light model
GLOSA algorithm
Target velocity
DDPG agent
[ , , ]
c c c
tl tl rem
st p t
TL tl a v vpr e
r r r r r r r= + + + + +
, , , ,
inte
v a v v v
Distance
Fig. 4. Schematic of eco-driving control using a modified traffic light model and DRL
The crucial information of traffic light consists of its state
tl
st
, position
tl
p
and remaining time
re
t
. Based on
this information, the target velocity can be obtained via (22) to (24). In this study, the lane change action is ignored,
and the host vehicle and preceding vehicles are assumed to drive in the same lane. Thus, the existence of any
16
preceding vehicle prevents the host vehicle from passing the signalized intersection, and the modified traffic light
state is corrected to red phase to accommodate it.
The modified state of the next traffic light is corrected as
c
tl
st
according to the following rule:
, 0
, 0
j
tl veh
c
tl j
veh
st num
st red num
=
=
(32)
The modified state of traffic light switches to red when there are preceding vehicles in the same region, otherwise
it keeps the actual state of traffic light. The motion state of the last preceding vehicle is leveraged to modify the
position and remaining time of the next traffic light when
c
tl
st
is transferred to red phase. The modified rules can be
divided into three cases to differentiate the influence of formed waiting queues, forming waiting queues and slow
traffic flow.
1) Case I: the queue process is completed, and the last preceding vehicle stops before the next traffic light. The
switch conditions are shown below:
( ) ,
( ) & 0
pre b e
end r r
pre
end cur end
t t t t
t t t v


==
(33)
where
()
pre
end
tt
is the predicted arrival time to next traffic light for the last preceding vehicle,
b
r
t
and
e
r
t
are the
starting and ending time of the red phase,
cur
t
means the current time, and
end
v
is the velocity of the last preceding
vehicle. The modified position
c
tl
p
and the modified remaining time
c
remain
t
of traffic light are formulated as follows:
c
tl tl queue
p p L=−
(34)
0
()
queue veh queue
L L L num=+
(35)
( ) ( )
c
remain remain remain t
t t t dt t
= + +
(36)
() queue
remain remain
dis
L
dt t t
v
=−
(37)
where
queue
L
is the queue length,
queue
num
is the number of vehicles in the queue,
0
L
is the standstill spacing,
veh
L
is
the length of each vehicle,
remain
dt
is the correction value of remaining time, and
t
is the timing buffer to ensure
the safety. In this situation, a waiting queue is already formed, and a constant postponement to the actual remaining
17
time is generated according to the predicted queue’s dissipation time. This case typically occurs when the host
vehicle is approaching next traffic light.
2) Case II: the waiting queue is forming, and the last preceding vehicle is approaching the next traffic light.
The switch conditions are shown below:
( ) ,
( ) & 0
pre b e
end r r
pre
end cur end
t t t t
t t t v



(38)
Similar to Case I, the modified position
c
tl
p
is predicted base on (34) and (35), whereas the correction value of
remaining time
remain
dt
is formulated, as:
()
() queue
remain remain
dis
Lt
dt t t
v
=+
(39)
In this situation, the waiting queue undergoes forming or is not formed yet. However, the traffic model predicts
there will be a waiting queue ahead, and the remaining time of the current phase is dynamically updated. Therefore,
the modified remaining time is predicted combining with the motion of the preceding vehicles, actual remaining
time and predicted queue length. In this case, there is normally a long distance between the host vehicle and the
next traffic light.
3) Case III: the last preceding vehicle can pass the next traffic light at green phase and the host vehicle will not
be affected by any waiting queue. The switch conditions are shown below:
( ) ,
pre b e
end g g
t t t t


(40)
where
b
g
t
and
e
g
t
are the starting and ending time of the green phase. The last preceding vehicle can pass the next
traffic light without idling, which indicates that the motion of the host vehicle will not be affected by any waiting
queue. Hence, the modified position of the traffic light is consistent with its actual position, and the correction value
of the remaining time
remain
dt
is formulated, as:
( ) ( )
end
remain pre curr remain
dt t t t t t=
(41)
In this situation, the velocity profile of the host vehicle is affected by the preceding traffic flow instead of a waiting
queue. Therefore, a time-varying postponement of the remaining time is designed to avoid unnecessary idling before
18
a traffic light. On this basis, a modified traffic light model is developed, and the modified position
c
tl
p
and modified
SPaT information of the traffic light
,
cc
tl remain
st t


are leveraged to calculate the reference target velocity using the
GLOSA algorithm. As a result, the host vehicle can anticipate potential waiting queues or traffic flow before the
next traffic light and can reasonably optimize its velocity profile to reduce energy consumption and eliminate
unnecessary idling as much as possible.
IV. SIMULATION VALIDATIONS
To verify performance of the proposed method, several simulation case studies are conducted under a
MATLAB/Simulink platform. A virtual traffic environment model [33, 38], including traffic lights, preceding
vehicles and road grade, etc., is employed to test eco-driving control strategies. In this study, the motion of preceding
vehicles is controlled by IDM according to real-time driving conditions in which all preceding vehicles are assumed
to have a same desired driving speed of 20 m/s. Four benchmark approaches are constructed to validate the proposed
method (called MG-DRL). The first benchmark method (i.e., IDM) is an IDM-based strategy, which is selected to
simulate the behavior of human drivers in the virtual traffic environment. This model is selected as a baseline
benchmark in the simulations, and the IDM-based method is widely employed to evaluate the driving approaches
without eco-driving control [33]. The mathematical description of IDM is formulated, as:
*2
max
( ( ), ( ))
()
( ) [1 ( ) ( ) ]
()
pre
desired
s v t v t
vt
a t a v s t
=
(42)
*00
max
( ) ( )
() 2
v t v t
s s T v t ab

= + +
(43)
where
desired
v
is the desired speed,
is the acceleration exponent,
*
s
is the desired headway distance,
pre
v
is the
velocity deviation to the preceding vehicle,
0
T
is the constant time headway,
b
is the desired deceleration, and
0
s
is the minimum safety distance. The basic parameters of the IDM model in this study are presented in Table 1.
The second benchmark method (i.e. G-IDM) is a GLOSA-based strategy, which plans the velocity profile
according to real SPaT information without considering traffic uncertainties [21].
tar
f
is set to be 0.6, which is the
same as what was set in the proposed method, whereas an IDM-based model is conducted to follow the velocity
19
planning result obtained from the GLOSA algorithm. Similarly, the third method (i.e., G-DRL) plans the reference
velocity based on the GLOSA algorithm, and it uses the DRL-based controller to optimize driving speed by
considering energy efficiency. The fourth benchmark (i.e., MG-MPC) plans the reference velocity considering the
prediction of traffic uncertainties proposed in this study, and a MPC-based method is implemented to plan ecological
velocity cross the upcoming signalized intersection [19]. The basic parameters of the host vehicle are presented in
Table 2.
Table 1. The basic parameters of the IDM model
Characteristic
Value
Desired speed (m/s)
20
Acceleration exponent
4
Constant time headway (s)
3
Desired deceleration (m)
1.6
Minimum safety distance (m)
3
Table 2. The basic parameters of the host vehicle
Characteristic
Value
Mass (kg)
1640
Frontal area (m2)
2.27
Air drag coefficient
0.3146
Tire rolling radius (m)
0.316
Rolling resistance coefficient
0.008
Battery capacity (Ah)
32.5
Final gear ratio
7.94
A. DRL Parameter Setting and Training
The main parameters of the DRL algorithm are presented in Table 3. Two deep networks are designed to
construct the actor network and the critic network, which consist of five and six hidden layers, respectively. These
two networks are trained based on an actor-critic structure, and the reward is calculated according to (25) to (31) at
each episode. A route with six traffic lights is selected to train the DDPG agent, and the maximum training episode
is set to 500, whereas the sample time of the simulation is set to 0.1 s.
To train the agent under different driving scenarios, several system parameters are randomized at each episode,
including the initial state of the host vehicle and preceding vehicles as well as the SPaT information. The training
result is shown in Fig. 5, the average accumulative reward increases sharply at the beginning of training, and it
gradually stabilizes after around 200 training episodes. Due to the randomness of initial conditions, the accumulative
reward of each episode has a slight difference after convergence. Thus, a trained DRL-based controller is obtained
20
for online execution, in which the controller can directly generate acceleration decisions according to the state of
the system.
Table 3. Parameters setting of the DRL algorithm
Parameters
Value
Number of neurons in critic network
60
Number of neurons in action network
60
Learning rate of critic network
1.00e-3
Learning rate of action network
1.00e-4
Number of layers in critic networks
6
Number of layers in action networks
5
Number of neurons in hidden layers
120
Target smooth factor
1.00e-3
Minibatch size
268
Discount factor
0.99
The weight of energy consumption
e
f
25
The weight of acceleration penalty
a
f
0.2
The weight of velocity deviation penalty
v
f
0.025
The positive reward for velocity
tl
0.8
The penalty for safety reward
safe
1000
0100 200 300 400 500
Episode
-8000
-7000
-6000
-5000
-4000
-3000
-2000
-1000
0
1000
Average reward
Fig. 5. Convergence of training.
B. Analysis of Energy Efficiency
To verify the energy-saving performance of the proposed eco-driving approach, a driving scenario with six
signalized intersections and six preceding vehicles is constructed to simulate different eco-driving approaches for
comparison. Besides, the positions of traffic lights are randomized to distinguish them from the training scenarios.
The simulation results of distance profiles are shown in Fig. 6 (a). Obviously, the velocity profile of the MG-DRL
method can be preferably optimized in advance without unnecessary idling or deceleration, and the host vehicle can
exactly pass each signalized intersection at available green phase. By contrast, the IDM-based method can only
follow the desired speed, which has to stop three times before traffic lights in this scenario. As shown in Fig. 6 (b),
21
compared with IDM-based method, the G-DRL and G-IDM methods can plan similar low speed to pass the fifth
intersection while the host vehicles of these two approaches are affected by the waiting queue before the traffic light.
By contrast, the proposed MG-DRL method and the MG-MPC method successfully predict the emergence of the
waiting queue, and a lower driving speed is planned to avoid idling. Fig. 6 (c) illustrates the modified SPaT
information to consider the slow traffic flow. In summary, the proposed modified traffic light model can accurately
modify the SPaT information. A modified red phase is observed and dynamically updated when there are preceding
vehicles ahead, thus impacting the velocity planning in a complicated urban road.
The proposed m ethod
Actual red phas e
Modified red ph ase
IDM
G-DRL
G-IDM
The preceding vehicles
MG-MPC
(a)
The proposed m ethod
Actual red ph ase
Modified red pha se
IDM
G-DRL
G-IDM
The preceding v ehicles
MG-MPC
The proposed me thod
Actual red phase
Modified red pha se
IDM
G-DRL
G-IDM
The preceding v ehicles
MG-MPC
(b) (c)
Fig. 6. Position profiles of different methods when there are six preceding vehicles. (a) the position profiles of the whole
scenario, (b) the position profiles of the fifth intersection, (c) the position profiles of the third intersection.
The generated velocity and acceleration profiles are shown in Fig. 7. As depicted in Fig. 7 (a), the host vehicle
under the proposed MG-DRL method maintains a reasonable driving speed throughout the whole trip. A relative
low speed is planned when a potential waiting queue is predicted ahead to pass the next traffic light smoothly (such
as the one happens around t=50 s and t=300 s in the figure). Compared with the MG-MPC method, the proposed
method leverages the DRL-based controller to plan a smoother velocity profile, resulting in improved energy
22
efficiency performance. Additionally, velocity profiles under the G-DRL and G-IDM methods exhibit similar trends
in most of the cases. However, the G-DRL method shows fluctuations in velocity when the host vehicle is
approaching the preceding vehicles. The reason is that the G-DRL method does not integrate the influence of
preceding vehicles into the DDPG agent, and therefore, the simple switch logic in this method leads to frequent
switches between the DRL-oriented and car-following modes. It can be observed in Fig. 7 (b) that the accelerations
of all methods remain within a reasonable range, ensuring the planning of ecological velocities between
intersections. Fig. 8 presents the results of acceleration distribution and idling time proportions. As shown in Fig. 8
(a), compared with the benchmark methods, the host vehicle under the MG-DRL method and MG-MPC method
notably reduce unnecessary acceleration time by traffic uncertainties prediction. Furthermore, the proposed method
effectively reduces accelerations at high levels compared to MG-MPC, thereby enhancing energy efficiency and
driving comfort. Moreover, the idling proportion results, shown in Fig. 8 (b), reveal that unnecessary idling is
eliminated under the MG-DRL and MG-MPC method, which significantly contributes to reducing acceleration
duration.
050 100 150 200 250 300 350 400
Time (s)
0
2
4
6
8
10
12
14
16
18
20
Velocity (m/s)
MG-DRL
IDM
G-DRL
G-IDM
MG-MPC
050 100 150 200 250 300 350 400
Time (s)
-4
-3
-2
-1
0
1
2
3
Acceleration (m/s)
MG-DRL
IDM
G-DRL
G-IDM
MG-MPC
(a) (b)
Fig. 7. Results of velocity and acceleration profiles. (a) velocity profiles, (b) acceleration profiles.
0 0
G-DRL
Acceleration distribution
G-IDM IDM MG-DRL
MG-MPC
0
50
100
150
200
250
Duration (s)
(0,1]
(1,2]
(2,3]
169.7
196.6
28.7
3.0
2.4 5.8
184
4.06.4
116.5
0 0
102
7.03.0 0
5
10
15
20
25
Percentage (%)
G-DRL G-IDM IDM MG-DRL
MG-MPC
Idling time proportion
21.73
16.86 17.4
(a) (b)
23
Fig. 8. Results of acceleration distribution and idling proportion. (a) acceleration distribution, (b) idling proportion
The simulation results of different approaches are summarized in Table 4. The proposed method generates a
notable energy-saving improvement of 24.29% compared with the IDM method, which is regarded as human drivers
in this study. Furthermore, superior performance in energy consumption is raised by the proposed method, compared
with the other existing eco-driving approaches. In addition, all schemes have slight difference in the whole travel
time, indicating that the proposed method improves the energy efficiency without remarkably sacrificing traffic
efficiency. It should be noted that, compared with the G-IDM method, the G-DRL method consumes more energy
due to its frequent mode switching. To sum up, the MG-DRL method is demonstrated to effectively handle the
traffic uncertainties in multiple signalized intersections, and the energy-saving performance is significantly
improved compared with the benchmark approaches.
Table 4. Simulation results under different approaches
Method
Battery energy
consumption
(kWh)
Improvement
(%)
Travel time (s)
Idling time
proportion (%)
IDM
0.7
-
369.6
21.73
G-IDM
0.6
14.29
370.1
16.86
G-DRL
0.63
10
368.9
17.40
MG-MPC
0.55
21.43
372
0
MG-DRL
0.53
24.29
370.5
0
C. Analysis of Adaptability
To verify the adaptability performance of the proposed method in different traffic scenarios, a Monte Carlo
simulation is conducted to fully explore the performance of the proposed method. The simulation involves 500
various trials for each method in the same road. Wherein, the traffic parameters, including the initial SPaT
information of traffic light, the number of preceding vehicles and the initial state of preceding vehicles, are all
randomized in each interval. To fully explore the adaptability of the proposed method, the energy consumption,
travel time, idling time proportion and low-speed time proportion are summarized for comparison. The speed
boundary (1.39 m/s) is selected to judge the vehicle in a low-speed state [27].
Table 5 summarizes the average energy consumption and travel time. Compared with the IDM method, an
average improvement of 17.65% is observed, indicating that the proposed method can improve energy efficiency in
different traffic conditions. In addition, an average energy consumption reduction of 4.41% is achieved, compared
with other three eco-driving approaches. Furthermore, the travel time of the proposed method doses not have
significant increase, and the idling time and low-speed time proportion are remarkably decreased. A slight average
24
idling time proportion of 1.7% is observed due to the prediction errors of traffic uncertainties, which is notably
reduced compared with other methods. Note that the extreme congestion scenarios where idling is inevitable for the
host vehicle, are not discussed in this study. The reason is that eco-driving approaches considering the SPaT
information, have limited contribution to energy-saving in these extreme scenarios [26].
Table 5. Average simulation results under different traffic scenarios
Method
Average battery
energy
consumption
(kWh)
average
energy
consumption
improvement
(%)
Average travel
time (s)
Time variation
(%)
Average idling
time proportion
(%)
Average low-
speed time
proportion (%)
IDM
0.68
-
381
-
13.35
13.02
G-IDM
0.61
11.76
370
-2.89
8.72
7.84
G-DRL
0.60
13.24
369
-3.15
5.25
5.69
MG-MPC
0.58
14.71
383.2
0.58
2.23
2.89
MG-DRL
0.56
17.65
382
0.26
1.7
2.63
100 200 300 400 500 600 700
Traffic flow (veh/h)
10
12
14
16
18
20
22
24
Average energy consumption improvement (%)
MG-DRL
MG-MPC
G-DRL
G-IDM
Fig. 9. Average energy consumption improvement with different traffic flows.
To analyze the performance under various traffic demands, the simulation results are segmented into different
traffic flows. The outcomes are presented in Fig. 9, and the average energy consumption improvement is illustrated
across different traffic flows. The highest average energy efficiency improvement is observed at around 700 veh/h,
as traffic lights impose a greater impact on driving speed in these circumstances. Moreover, it is worth noting that
the energy consumption improvement does not have an absolute relationship with the increase of traffic flow.
Specially, two tests with different number of preceding vehicles are selected to further analyze the influence of
different traffic conditions, which are defined as Scenario A and B. The number of preceding vehicles in these two
scenarios is set as 3 and 9, respectively. These two scenarios are regarded as lighter and heavier traffic congestion
levels, and other traffic parameters are set the same as the simulation in Section IV-B. Fig. 10 illustrates the position
profiles of these two scenarios. The results demonstrate that the proposed MG-DRL method still has superior
25
adaptability in different traffic scenarios, and the host vehicle under the MG-DRL method can accurately pass each
intersection without idling. As shown in Fig. 10 (a), all four eco-driving approaches have similar trends in position
profiles in a light traffic scenario. The reason is that the waiting queue does not have significant impact on the
motion of the host vehicle in this traffic scenario. Besides, the slow traffic flow is concerned in the first and second
intersections, wherein the host vehicle plans an ecological speed to eliminate deceleration and extra energy
consumption when approaching the preceding vehicles. By contrast, as shown in Fig. 10 (b), the MG-DRL method
plans a preferable velocity profile to avoid unnecessary idling when a potential traffic intervene is predicted,
especially in the first two intersections. The velocity profiles are shown in Fig. 11 (a) and (b), in different scenarios,
and it can be observed that the MG-DRL method can effectively plan driving speed within a reasonable range.
Additionally, the proposed method significantly restrains speed waves, unnecessary decelerations and idling,
compared with other approaches. The results verify that the host vehicle can satisfactorily tackle the traffic
uncertainties in different traffic scenarios.
MG-DRL
Actual red ph ase
Modified red ph ase
IDM
G-DRL
G-IDM
The preceding v ehicles
MG-MPC
MG-DRL
Actual red phas e
Modified red ph ase
IDM
G-DRL
G-IDM
The preced ing vehicles
MG-MPC
(a) (b)
Fig. 10. Position profiles in different traffic scenarios. (a) Scenario A, (b) Scenario B.
26
(a) (b)
Fig. 11. Velocity profiles in different scenarios. (a) Scenario A, (b) Scenario B.
The results of these two scenarios are summarized in Table 6 for performance comparison. It can be observed
that the MG-DRL method can effectively promote the energy efficiency in different scenarios with a maximum
energy economy improvement of 30.64% compared with the IDM method in Scenario B. The reason is that the
waiting queues and slow traffic flow tend to emerge at signalized intersections in this scenario, and the traffic
uncertainty imposes more impact on the velocity planning in a high-level congestion scenario. By contrast, all four
eco-driving approaches have similar contributions to the energy consumption in Scenario A which represents a light
traffic condition. Compared with other eco-driving schemes, the proposed MG-DRL method provides a modest
average energy efficiency improvement of 3.75% in free-flowing traffic conditions. To conclude, the proposed
method is validated to show strong adaptability in different traffic scenarios, and the energy efficiency is effectively
promoted in all scenarios, especially in high-level congestion scenarios.
Table 6. Simulation results under different traffic scenarios
Scenario A
Scenario B
Method
Energy
consumption
(kWh)
Improvement
(%)
Travel
time (s)
Idling
time
proportion
(%)
Energy
consumption
(kWh)
Improvement
(%)
Travel
time
(s)
Idling
time
proportion
(%)
IDM
0.67
-
352.9
24.6
0.62
-
464.6
19.84
G-IDM
0.54
19.4
353.6
0.93
0.56
9.68
464.5
18.20
G-DRL
0.52
22.39
352.7
1.70
0.54
12.9
464.7
16.40
MG-MPC
0.53
20.9
359
0
0.46
25.81
469
0
MG-DRL
0.51
23.88
355.8
0
0.43
30.64
468.1
0
V. CONCLUSIONS
This article proposes a DRL-based eco-driving approach to tackle the traffic uncertainty at multiple signalized
27
intersections with limited access to traffic data. The proposed method can ecologically plan velocity profiles at
signalized intersections, regarding the potential impacts of slow traffic flow and waiting queues, to avoid
unnecessary idling and decelerations. A hierarchy framework is developed, in which the upper layer predicts the
influence of traffic uncertainty based on a dynamics traffic model, and a modified traffic light model is constructed
to revise the actual SPaT information. In the lower level, a DRL-based controller is designed to plan ecological
driving speed according to the system’s states, including the motion of the host vehicle, revised SPaT information,
etc. Substantial simulations are conducted to validate the effectiveness of the proposed method, and the results
demonstrate that it significantly promotes energy efficiency when compared with both the human driver model and
other existing eco-driving approaches, with an average improvement of 17.65% and 5.77%, respectively. In addition,
the proposed method is demonstrated to have preferable energy economy in different traffic scenarios, and a growth
tendency of energy efficiency improvement is observed with the increase of traffic congestion level.
Our future work will be focused on investigating eco-driving approach in complicated environments by
considering cooperation between multiple connected vehicles. The influence of various DRL algorithms on eco-
driving control will also be investigated in our next step studies.
ACKNOWLEDGMENTS
The work is funded by the National Natural Science Foundation of China (No. 52172400 and 52272395) in
part, and Science and Technology Research Program of Chongqing Municipal Education Commission (No.
KJQN201901539) in part. Any opinions expressed in this paper are solely those of the authors and do not represent
those of the sponsors.
REFERENCES
[1] Z. Fang, Z. Chen, Q. Yu, B. Zhang, and R. Yang, "Online Power Management Strategy for Plug-in Hybrid Electric Vehicles Based
on Deep Reinforcement Learning and Driving Cycle Reconstruction," Green Energy and Intelligent Transportation, p. 100016,
2022.
[2] A. Vahidi and A. Sciarretta, "Energy saving potentials of connected and automated vehicles," Transportation Research Part C:
Emerging Technologies, vol. 95, pp. 822-843, 2018.
[3] F. Zhang, X. Hu, R. Langari, and D. Cao, "Energy management strategies of connected HEVs and PHEVs: Recent progress and
outlook," Progress in Energy and Combustion Science, vol. 73, pp. 235-256, 2019.
[4] L. Xie, Y. Luo, D. Zhang, R. Chen, and K. Li, "Intelligent energy-saving control strategy for electric vehicle based on preceding
vehicle movement," Mechanical Systems and Signal Processing, vol. 130, pp. 484-501, 2019.
[5] S. Xie, X. Hu, T. Liu, S. Qi, K. Lang, and H. Li, "Predictive vehicle-following power management for plug-in hybrid electric
vehicles," Energy, vol. 166, pp. 701-714, 2019.
[6] G. Li and D. Görges, "Ecological Adaptive Cruise Control for Vehicles With Step-Gear Transmission Based on Reinforcement
Learning," IEEE Transactions on Intelligent Transportation Systems, 2019.
[7] S. Xu, S. E. Li, H. Peng, B. Cheng, X. Zhang, and Z. Pan, "Fuel-saving cruising strategies for parallel HEVs," IEEE Transactions
on Vehicular Technology, Article vol. 65, no. 6, pp. 4676-4686, 2016, Art no. 7296694.
[8] Y. Liu, Z. Huang, J. Li, M. Ye, Y. Zhang, and Z. Chen, "Cooperative optimization of velocity planning and energy management for
28
connected plug-in hybrid electric vehicles," Applied Mathematical Modelling, vol. 95, pp. 715-733, 2021.
[9] Z. Ye, K. Li, M. Stapelbroek, R. Savelsberg, M. Gunther, and S. Pischinger, "Variable Step-Size Discrete Dynamic Programming
for Vehicle Speed Trajectory Optimization," IEEE Transactions on Intelligent Transportation Systems, Article vol. 20, no. 2, pp.
476-484, 2019, Art no. 8320319.
[10] J. Li, Y. Liu, Y. Zhang, Z. Lei, Z. Chen, and G. Li, "Data-driven based eco-driving control for plug-in hybrid electric vehicles,"
Journal of Power Sources, vol. 498, p. 229916, 2021.
[11] Z. Nie, Y. Jia, W. Wang, and R. Outbib, "Eco-Co-Optimization strategy for connected and automated fuel cell hybrid vehicles in
dynamic urban traffic settings," Energy Conversion and Management, vol. 263, p. 115690, 2022.
[12] G. De Nunzio, C. C. De Wit, P. Moulin, and D. Di Domenico, "Eco‐driving in urban traffic networks using traffic signals
information," International Journal of Robust and Nonlinear Control, vol. 26, no. 6, pp. 1307-1324, 2016.
[13] X. Wei, J. Leng, C. Sun, W. Huo, Q. Ren, and F. Sun, "Co-optimization method of speed planning and energy management for fuel
cell vehicles through signalized intersections," Journal of Power Sources, vol. 518, p. 230598, 2022.
[14] J. Wang, J. Zhou, and W. Zhao, "Deep Reinforcement Learning Based Energy Management Strategy for Fuel
Cell/Battery/Supercapacitor Powered Electric Vehicle," Green Energy and Intelligent Transportation, vol. 1, no. 2, p. 100028, 2022.
[15] Y. Wang, Y. Wu, Y. Tang, Q. Li, and H. He, "Cooperative energy management and eco-driving of plug-in hybrid electric vehicle
via multi-agent reinforcement learning," Applied Energy, vol. 332, p. 120563, 2023.
[16] B. Liu, C. Sun, B. Wang, and F. Sun, "Adaptive speed planning of connected and automated vehicles using multi-light trained deep
reinforcement learning," IEEE Transactions on Vehicular Technology, vol. 71, no. 4, pp. 3533-3546, 2021.
[17] H. Yang, F. Almutairi, and H. Rakha, "Eco-driving at signalized intersections: A multiple signal optimization approach," IEEE
Transactions on Intelligent Transportation Systems, vol. 22, no. 5, pp. 2943-2955, 2020.
[18] M. Seredynski, G. Laskaris, and F. Viti, "Analysis of cooperative bus priority at traffic signals," IEEE Transactions on Intelligent
Transportation Systems, vol. 21, no. 5, pp. 1929-1940, 2019.
[19] Z. Nie and H. Farzaneh, "Real-time dynamic predictive cruise control for enhancing eco-driving of electric vehicles, considering
traffic constraints and signal phase and timing (SPaT) information, using artificial-neural-network-based energy consumption
model," Energy, vol. 241, p. 122888, 2022.
[20] Z. Bai, P. Hao, W. Shangguan, B. Cai, and M. J. Barth, "Hybrid Reinforcement Learning-Based Eco-Driving Strategy for Connected
and Automated Vehicles at Signalized Intersections," IEEE Transactions on Intelligent Transportation Systems, 2022.
[21] M. Wegener, L. Koch, M. Eisenbarth, and J. Andert, "Automated eco-driving in urban scenarios using deep reinforcement learning,"
Transportation research part C: emerging technologies, vol. 126, p. 102967, 2021.
[22] C. Sun, J. Guanetti, F. Borrelli, and S. J. Moura, "Optimal Eco-Driving Control of Connected and Autonomous Vehicles Through
Signalized Intersections," IEEE Internet of Things Journal, Article vol. 7, no. 5, pp. 3759-3773, 2020, Art no. 8964352.
[23] H. Dong, W. Zhuang, B. Chen, G. Yin, and Y. Wang, "Enhanced eco-approach control of connected electric vehicles at signalized
intersection with queue discharge prediction," IEEE Transactions on Vehicular Technology, vol. 70, no. 6, pp. 5457-5469, 2021.
[24] H. Dong et al., "A comparative study of energy-efficient driving strategy for connected internal combustion engine and electric
vehicles at signalized intersections," Applied Energy, vol. 310, p. 118524, 2022.
[25] H. Yang, H. Rakha, and M. V. Ala, "Eco-cooperative adaptive cruise control at signalized intersections considering queue effects,"
IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 6, pp. 1575-1585, 2016.
[26] S. Dong, H. Chen, B. Gao, L. Guo, and Q. Liu, "Hierarchical energy-efficient control for CAVs at multiple signalized intersections
considering queue effects," IEEE Transactions on Intelligent Transportation Systems, 2021.
[27] C. Sun, C. Zhang, H. Yu, W. Liang, Q. Ren, and J. Li, "An Eco-driving Approach with Flow Uncertainty Tolerance for Connected
Vehicles against Waiting Queue Dynamics on Arterial Roads," IEEE Transactions on Industrial Informatics, 2021.
[28] J. Han, A. Vahidi, and A. Sciarretta, "Fundamentals of energy efficient driving for combustion engine and electric vehicles: An
optimal control perspective," Automatica, Article vol. 103, pp. 558-572, 2019.
[29] A. Fotouhi, N. Shateri, D. Shona Laila, and D. J. Auger, "Electric vehicle energy consumption estimation for a fleet management
system," International Journal of Sustainable Transportation, vol. 15, no. 1, pp. 40-54, 2021.
[30] H. X. Liu, X. Wu, W. Ma, and H. Hu, "Real-time queue length estimation for congested signalized intersections," Transportation
research part C: emerging technologies, vol. 17, no. 4, pp. 412-427, 2009.
[31] N. Vandaele, T. Van Woensel, and A. Verbruggen, "A queueing based traffic flow model," Transportation Research Part D:
Transport and Environment, vol. 5, no. 2, pp. 121-135, 2000.
[32] N. G. Polson and V. O. Sokolov, "Deep learning for short-term traffic flow prediction," Transportation Research Part C: Emerging
Technologies, vol. 79, pp. 1-17, 2017.
[33] J. Li, X. Wu, M. Xu, and Y. Liu, "Deep reinforcement learning and reward shaping based eco-driving control for automated HEVs
among signalized intersections," Energy, vol. 251, p. 123924, 2022.
[34] H. Shi, D. Chen, N. Zheng, X. Wang, Y. Zhou, and B. Ran, "A deep reinforcement learning based distributed control strategy for
connected automated vehicles in mixed traffic platoon," Transportation Research Part C: Emerging Technologies, vol. 148, p.
104019, 2023.
[35] Y. Huang, H. Hu, J. Tan, C. Lu, and D. Xuan, "Deep reinforcement learning based energy management strategy for range extend
fuel cell hybrid electric vehicle," Energy Conversion and Management, vol. 277, p. 116678, 2023.
[36] Q. Guo, O. Angah, Z. Liu, and X. J. Ban, "Hybrid deep reinforcement learning based eco-driving for low-level connected and
automated vehicles along signalized corridors," Transportation Research Part C: Emerging Technologies, vol. 124, p. 102980,
2021.
[37] J. Li et al., "Cooperative Ecological Adaptive Cruise Control for Plug-in Hybrid Electric Vehicle Based on Approximate Dynamic
Programming," IEEE Transactions on Vehicular Technology, 2022.
[38] S. Hou, H. Chen, Y. Zhang, and J. Gao, "Speed planning and energy management strategy of hybrid electric vehicles in a car-
following scenario," Control theory and technology, vol. 20, no. 2, pp. 185-196, 2022.
29
... (2) It is evident that multiple control techniques have been developed for CAVs to reduce energy consumption. Majority of these methods are concentrated on fields of control theory community or intersection scenarios wherein trajectory optimization and eco-driving are required in normal weather conditions [28][29][30][31]. However, the optimization of CAVs for reducing energy consumption on a freeway in rainy weather conditions remains an unresolved query, particularly when traffic flow theory community's focus on improving car-following behavior is taken into account [32]. ...
Article
Energy consumption on freeway significantly contributes to environmental pollution. Rainy weather, as a common adverse condition, will exert a negative impact on car-following behavior of vehicles on freeway and further affect their energy consumption. The emergence of connected automated vehicles (CAVs) has created an opportunity to mitigate these impacts. This paper aims to propose an optimization strategy for CAVs that can reduce energy consumption during car-following behavior on freeway under different rainy weather conditions. To begin with, a calibrated car-following model for regular vehicles (RVs) on freeway in rainy weather was used to derive an optimization strategy for CAVs that have vehicle-to-vehicle communication capability to stabilize traffic flow with smoothed speed fluctuations. The proposed optimization strategy for CAVs was then subjected to simulation experiments to validate its effectiveness. Results indicate that energy consumption on freeway in rainy weather is closely linked to speed fluctuations. Frequent speed fluctuations during car-following behavior could cause more energy consumption. The proposed optimization strategy for CAVs is capable of reducing energy consumption in rainy weather by smoothing speed fluctuations. CAVs equipped with this optimization strategy shows an energy-saving of 34.69%-61.11% compared to RVs under various rainy weather conditions.
Article
Full-text available
Eco-driving control generates significant energy-saving potential in car-following scenarios. However, the influence of preceding vehicle may impose unnecessary velocity waves and deteriorate fuel economy. In this research, a learning-based method is exploited to achieve satisfied fuel economy for connected plug-in hybrid electric vehicles (PHEVs) with the advantage of vehicle-to-vehicle communication system. A data-driven energy consumption model is leveraged to generate reinforcement signals for approximate dynamic programming (ADP) with the consideration of nonlinear efficiency characteristics of hybrid powertrain system. An advanced ADP scheme is designed for connected PHEVs driving in car-following scenarios. Additionally, the cooperative information is incorporated to further improve the fuel economy of the vehicle under the premise of driving safety. The proposed method is mode-free and showcases acceptable computational efficiency as well as adaptability. The simulation results demonstrate that the fuel economy during car-following processes is remarkably improved through cooperative driving information, thereby partially paving the theoretical basis for energy-saving transportation. Index Terms-Eco-driving, cooperative adaptive cruise control, velocity optimization, approximate dynamic programming, plug-in hybrid electric vehicle
Article
Full-text available
In urban traffic settings, the dynamic changes of the preceding and rear vehicles state, road gradient, road coefficient as well as the possible traffic congestion at signal intersections contribute to the difficulty of real-time optimal energy management for connected and automated fuel cell hybrid vehicles. To address this problem, an eco-co-optimization strategy is developed to achieve velocity planning and the promotion of energy management in this paper. First, gradient‑based model predictive control based on the fast projection gradient method is employed to obtain the real-time safe and optimal velocity according to the future information of driving conditions and signal lights state. Meanwhile, to achieve desirable velocity tracking and preferable power splitting, an energy management strategy based on model predictive control is designed, where a multi-objective performance function is leveraged to minimize the total cost, hydrogen consumption and extend battery service life. Additionally, an energy recovery strategy based on fuzzy logic control is executed to improve energy efficiency. The simulation results reveal that the developed strategy can obtain a real-time safe and optimal velocity sequence and enable the CAFCHV efficiently passes through the continuous signalized intersections. Simultaneously, compared with adaptive cruise control, the hydrogen consumption, SOC, global cost and battery degradation are reduced by 3.13%, 4.76%, 3.37%, and 14.48% in the planning state, respectively.
Article
Full-text available
In a connected traffic environment with signalized intersections, eco-driving control needs to co-optimize fuel economy (fuel consumption), driving safety (collisions and red lights), and travel efficiency (total travel time) of automated hybrid electric vehicles. Thus, we proposed a deep reinforcement learning based eco-driving control strategy to co-optimize the fuel economy, driving safety, and travel efficiency. A twin-delayed deep deterministic policy gradient agent is implemented to plan vehicle speed in real-time. The multi-objective optimization function of the eco-driving control problem is transformed into the value function of the deep reinforcement learning algorithm by designing fuel reward, traffic light reward, and safety reward function. Specifically, we designed potential-based shaping functions to solve the problem that the intelligent agent cannot learn an optimal policy due to the sparse and delayed traffic environment. It can steer the agent to an optimal policy and guarantee policy invariance. Finally, the proposed method is verified in a real road traffic environment with signalized intersections. The results demonstrate that the proposed method can heavily reduce fuel consumption while satisfying the constraints of traffic lights and safety rules. Meanwhile, the proposed strategy shows certain feasibility for real-time application.
Article
This paper proposes an innovative distributed longitudinal control strategy for connected automated vehicles (CAVs) in the mixed traffic environment of CAV and human-driven vehicles (HDVs), incorporating high-dimensional platoon information. For mixed traffic, the traditional CAV control method focuses on microscopic trajectory information, which may not be efficient in handling the HDV stochasticity (e.g., long reaction time; various driving styles) and mixed traffic heterogeneities. Different from traditional methods, our method, for the first time, characterizes consecutive HDVs as a whole (i.e., AHDV) to reduce the HDV stochasticity and utilize its macroscopic features to control the following CAVs. The new control strategy takes advantage of platoon information to anticipate the disturbances and traffic features induced downstream under mixed traffic scenarios and greatly outperforms the traditional methods. In particular, the control algorithm is based on deep reinforcement learning (DRL) to fulfill car-following control efficiency and further address the stochasticity for the aggregated car following behavior by embedding it in the training environment. To better utilize the macroscopic traffic features, a general platoon of mixed traffic is categorized as a CAV-HDVs-CAV pattern and described by corresponding DRL states. The macroscopic traffic flow properties are built upon the Newell car-following model to capture the characteristics of aggregated HDVs' joint behaviors. Simulated experiments are conducted to validate our proposed strategy. The results demonstrate that the proposed control method has outstanding performances in terms of oscillation dampening, eco-driving, and generalization capability.
Article
To meet the power and long-range driving requirements of the vehicle, this paper presents a dual mode operation scheme for a range extend fuel cell hybrid vehicle for the first time, with an in-depth study of the pure electric mode and the range extend mode. The deep deterministic policy gradient algorithm is a well-known deep reinforcement learning algorithm that can solve complex nonlinear problems. To achieve the optimal power distribution among energy sources in the two modes, a dual deep deterministic policy gradient algorithm framework is proposed for the first time in this paper. In addition, a pervious action guidance mechanism is proposed to enable networks to approximate the action value function more efficiently in training. The training results show that the adopted previous action guidance mechanism helps to improve the learning convergence and exploration ability. The validation results show that the proposed strategy improves the operating economy by about 30% compared to the rule-based strategy, reduces the average fuel cell output fluctuation to less than 100 W, and reduces the fuel cell lifetime loss greatly. It is hoped that the proposed new structure, patterns, and energy management strategy will provide more ideas for scholars in future research.
Article
The advanced cruise control system has expanded the energy-saving potential of the hybrid electric vehicle (HEV). Despite this, most energy-saving researches for HEV either only optimize the energy management strategy (EMS) or integrate eco-driving through a hierarchically optimized assumption that optimizes EMS and eco-driving separately. Such kinds of approaches may lead to sub-optimal results. To fill this gap, we design a multi-agent reinforcement learning (MARL) based optimal energy-saving strategy for HEV, achieving a cooperative control on the powertrain and car-following behaviors to minimize the energy consumption and keep a safe following distance simultaneously. Specifically, a plug-in HEV model is regarded as the research object in this paper. Firstly, the HEV energy management problem in the car-following scenario is decomposed into a multi-agent cooperative task into two subtasks, each of which can conduct interactive learning through cooperative optimization. Secondly, the energy-saving strategy is designed, called the independent soft actor–critic, which consists of a car-following agent and an energy management agent. Finally, the performance of velocity tracking and energy-saving are validated under different driving cycles. In comparison to the state-of-the-art hierarchical model predictive control (MPC) strategy, the proposed MARL method can reduce fuel consumption by 15.8% while ensuring safety and comfort.
Article
Vehicles using a single fuel cell as a power source often have problems such as slow response and inability to recover braking energy. Therefore, the current automobile market is mainly dominated by fuel cell hybrid vehicles. In this study, the fuel cell hybrid commercial vehicle is taken as the research object, and a fuel cell/battery/supercapacitor energy topology is proposed, and an energy management strategy based on a double-delay deep deterministic policy gradient is designed for this topological structure. This strategy takes fuel cell hydrogen consumption, fuel cell life loss, and battery life loss as the optimization goals, in which supercapacitors play the role of coordinating the power output of the fuel cell and the battery, providing more optimization ranges for the optimization of fuel cells and batteries. Compared with the deep deterministic policy gradient strategy (DDPG) and the nonlinear programming algorithm strategy, this strategy has reduced hydrogen consumption level, fuel cell loss level, and battery loss level, which greatly improves the economy and service life of the power system. The proposed EMS is based on the TD3 algorithm in deep reinforcement learning, and simultaneously optimizes a number of indicators, which is beneficial to prolong the service life of the power system.
Article
This paper proposes a novel power management strategy for plug-in hybrid electric vehicles based on deep reinforcement learning algorithm. Three parallel soft actor-critic (SAC) networks are trained for high speed, medium speed, and low-speed conditions respectively; the reward function is designed as minimizing the cost of energy cost and battery aging. During operation, the driving condition is recognized at each moment for the algorithm invoking based on the learning vector quantization (LVQ) neural network. On top of that, a driving cycle reconstruction algorithm is proposed. The historical speed segments that were recorded during the operation are reconstructed into the three categories of high speed, medium speed, and low speed, based on which the algorithms are online updated. The SAC-based control strategy is evaluated based on the standard driving cycles and Shenyang practical data. The results indicate the presented method can obtain the effect close to dynamic programming and can be further improved by up to 6.38% after the online update for uncertain driving conditions.
Article
Eco-driving incorporating multiple signalized intersections simultaneously has been proven to substantially benefit Connected Vehicles (CVs) in energy performance. However, ignoring the dynamic variation of waiting queues before downstream intersections may prevent CVs from following the obtained speed profile on security grounds. In this paper, the dynamic variation of waiting queue is modelled and predicted based on shockwave theory and data-driven based traffic flow prediction. To formulate the waiting queues as additional time-varying constraints for optimization problem, an extended traffic signal model is constructed based on the prediction. Furthermore, a hierarchical optimization framework is proposed, under which the hybrid optimization problem is decomposed into a discrete problem and a continuous one. Monte Carlo simulation demonstrates that if the proposed eco-driving approach is implemented, failure to follow the reference speed profile decreases by 79.4%. Also, the fuel consumption can be saved by over 4% compared with approaches ignoring the waiting queue.
Article
The development of intelligent connected technology has brought opportunities and challenges to the design of energy management strategies for hybrid electric vehicles. First, to achieve car-following in a connected environment while reducing vehicle fuel consumption, a power split hybrid electric vehicle was used as the research object, and a mathematical model including engine, motor, generator, battery and vehicle longitudinal dynamics is established. Second, with the goal of vehicle energy saving, a layered optimization framework for hybrid electric vehicles in a networked environment is proposed. The speed planning problem is established in the upper-level controller, and the optimized speed of the vehicle is obtained and input to the lower-level controller. Furthermore, after the lower-level controller reaches the optimized speed, it distributes the torque among the energy sources of the hybrid electric vehicle based on the equivalent consumption minimum strategy. The simulation results show that the proposed layered control framework can achieve good car-following performance and obtain good fuel economy.