ArticlePDF Available

Deep reinforcement learning-based eco-driving control for connected electric vehicles at signalized intersections considering traffic uncertainties

June 2023
Energy 279(6):128139

June 2023
279(6):128139

DOI:10.1016/j.energy.2023.128139

Authors:

Yonggang Liu

Chongqing University

Show all 6 authorsHide

Eco-driving control poses great energy-saving potential at multiple signalized intersection scenarios. However, traffic uncertainties can often lead to errors in ecological velocity planning and result in increased energy consumption. This study proposes an eco-driving approach with a hierarchical framework to be leveraged at signalized intersections that considers the impact of traffic uncertainty. The proposed approach leverages a queue-based traffic model in the upper level to estimate the impact of traffic uncertainty and generate dynamic modified traffic light information. In the lower level, a deep reinforcement learning-based controller is constructed to optimize velocity subject to the constraints from the traffic lights and traffic uncertainty, thereby reducing energy consumption while ensuring driving safety. The effectiveness of the proposed control strategy is demonstrated through numerous simulation case studies. The simulation results show that the proposed method significantly improves energy economy and prevents unnecessary idling in uncertain traffic scenarios, as compared to other approaches that ignore traffic uncertainty. Furthermore, the proposed method is adaptable to different traffic scenarios and showcases energy efficiency.

Illustration of the proposed eco-driving control system. Motivated by the above analysis, this study proposes a DRL-based eco-driving control approach at multiple signalized intersections, in which a hierarchy framework is implemented as shown in Fig 1. The upper-level controller estimates traffic uncertainty by using a queue-based traffic flow model and modifies the states of the traffic lights accordingly, while the lower-level controller uses DRL to plan an ecological velocity for the vehicle while considering driving safety and comfort constraints. The main contribution of this study is threefold: 1) an ecodriving approach is proposed for signalized intersection scenarios and accounts for the impact of traffic uncertainty.

…

The flow-density diagram for the LWR model.

…

The basic parameters of the IDM model

…

The basic parameters of the host vehicle

…

Parameters setting of the DRL algorithm

…

Figures - uploaded by Zheng Chen

Content may be subject to copyright.

Content uploaded by Zheng Chen

Content may be subject to copyright.

Deep Reinforcement Learning-Based Eco-Driving

Control for Connected Electric Vehicles at

Signalized Intersections Considering Traffic

Uncertainties

Jie Li1, Abbas Fotouhi2, Wenjun Pan1, Yonggang Liu1, *, Yuanjian Zhang3 and Zheng Chen4, **

1State Key Laboratory of Mechanical Transmissions & College of Mechanical and Vehicle Engineering

Chongqing University 400044, Chongqing, China

2Advanced Vehicle Engineering Centre, School of Aerospace, Transport and Manufacturing, Cranfield University,

Cranfield, Bedfordshire, MK43 0AL, UK

3Department of Aeronautical and Automotive Engineering, Loughborough University, Leicestershire, LE11 3TU,

4Faculty of Transportation Engineering, Kunming University of Science and Technology, Kunming, 650500,

China

* Corresponding Authors: Yonggang Liu (andyliuyg@cqu.edu.cn) and Zheng Chen (chen@kust.edu.cn)

Abstract: Eco-driving control poses great energy-saving potential at multiple signalized intersection scenarios.

However, traffic uncertainties can often lead to errors in ecological velocity planning and result in increased energy

consumption. This study proposes an eco-driving approach with a hierarchical framework to be leveraged at

signalized intersections that considers the impact of traffic uncertainty. The proposed approach leverages a queue-

based traffic model in the upper level to estimate the impact of traffic uncertainty and generate dynamic modified

traffic light information. In the lower level, a deep reinforcement learning-based controller is constructed to optimize

velocity subject to the constraints from the traffic lights and traffic uncertainty, thereby reducing energy

consumption while ensuring driving safety. The effectiveness of the proposed control strategy is demonstrated

through numerous simulation case studies. The simulation results show that the proposed method significantly

improves energy economy and prevents unnecessary idling in uncertain traffic scenarios, as compared to other

approaches that ignore traffic uncertainty. Furthermore, the proposed method is adaptable to different traffic

scenarios and showcases energy efficiency.

Key Words: Eco-driving, deep reinforcement learning, velocity optimization, signalized intersection, connected

electric vehicle.

I. INTRODUCTION

Due to the persistent increment in energy consumption in the transportation sector, energy-saving solutions

2 
have attracted huge attention from researchers all over the world [1]. Among those methods, eco-driving control 
has been gradually recognized as a prospective manner to promote energy-saving transportation [2]. With the rapid 
development of connected and automated vehicles (CAVs), vehicles are expected to have access to various driving 
environment information and automatedly plan the velocity profiles to reduce energy consumption, referred to as 
eco-driving technique, which has been widely demonstrated to remarkably reduce energy consumption in different 
scenarios [3] and is therefore the research focus of this study. 
In  this  study,  eco-driving  control  simply  means  optimization  of  the  vehicle’s  velocity  profile  to  avoid 
unnecessary acceleration or  deceleration and maintain  the vehicle to  operate in an  efficient status. According to 
different driving scenarios, eco-driving can be divided into single-vehicle and car-following scenarios. For the latter 
field, the basic idea of eco-driving control is to maintain a safe inter-vehicle distance with preceding vehicles and 
to reduce energy consumption during the car-following scenarios. In [4], car-following scenarios are divided into 
four modes, and the model predictive control (MPC) is leveraged to optimize the motor torque for electric CAVs in 
different driving modes. Xie et al. [5] proposed a predictive ecological adaptive cruise control approach for plug-in 
hybrid electric vehicles (PHEVs), in which the motion of the preceding vehicle is predicted and the driving speed 
is  optimized  accordingly  based  on MPC.  In  [6],  reinforcement  learning  (RL) is  exploited  to  improve  the fuel 
economy of CAVs in the car-following scenarios by considering the nonlinear efficiency of the powertrain system. 
The  eco-driving  control  in  car-following  scenarios  can  achieve  energy-saving,  safety  and  driving  comfort 
simultaneously.  However, these approaches  are designed  for specific driving scenarios and are more feasible for 
congested traffic conditions. For  eco-driving  control  in  single-vehicle  scenarios, existing methods mainly work 
based  on  planning  the  driving  speed  by  considering  complex  constraints  such  as  the  speed  limit,  powertrain 
efficiency  and  traffic  light  [7,  8].  For  instance,  in  [9],  a  novel  step-size  discrete  dynamic  programming  (DP) 
algorithm is  exploited to optimize driving speed and consequently reduce energy consumption at different speed 
limits. Li et al. [10] constructed a data-driven powertrain energy consumption model to achieve co-optimization of 
velocity and energy management for PHEVs with high calculation efficiency. In particular, signal phase and timing 
(SPaT) information of the traffic lights, which can be acquired via vehicle-to-infrastructure (V2I) communication, 
is quite valuable for eco-driving control in urban roadways. Since velocity profiles can be interrupted by signalized 
intersections, they result in extra velocity fluctuations and more energy consumption. 

3 
Taking SPaT information into consideration, the eco-driving control technique has been widely investigated to 
promote  energy  economy  and  traffic  throughput  [11].  Early  researches  on  eco-driving  control  at  signalized 
intersections typically calculate a constant reference speed according to the SPaT information when driving through 
the  intersections [12].  However, the employment  of  constant  driving  speed  is  not  practical  and  cannot  achieve 
optimal performance.  In [13], the optimal velocity profile at signalized intersections is simplified into a constant 
speed, and a bi-level DP method is exploited to solve the optimal eco-driving control among multiple intersections. 
In [13],  a constant-speed energy consumption map and a bi-level DP method are developed to  optimize velocity 
with high computational efficiency. With the rapid development of machine learning algorithms, RL has received 
increasing  attention  in  solving  optimal  control  problem  (OCP)  for  energy-saving  [14].  To  mitigate  the  high 
computational burden of traditional optimization-based approaches, a deep reinforcement learning (DRL) approach 
is proposed to plan an ecological velocity for connected PHEVs in a model-free method [15]. In [16], variable light 
spacing and trigonometric signal models are employed to train a DRL controller, thereby enhancing the adaptability 
to  multi-light  scenarios.  However,  the  aforementioned  approaches  do  not  account  for  the  influence  of  traffic 
uncertainties. Consequently,  these  methods  cannot  guarantee passing through traffic  lights  without  unnecessary 
idling on real-world road. 
In real urban road conditions, the traffic uncertainties arise from several aspects, which might interpret affect 
the planned velocity profiles at  intersections. The typical uncertainty stems from waiting queues and slow traffic 
flow in congested conditions, and the feasible passing duration of traffic light dynamically varies [17]. In addition, 
the signal control, such as bus signal priority control, can also generates uncertainty to eco-driving, as the signal 
timing  might  be  dynamically  adjusted [18].  Since  the  signal  control is  beyond  the  scope  of this  research,  the 
following discussion will focus on the traffic uncertainties from waiting queue and slow traffic flow. To address the 
influence  of  traffic  uncertainty  at  signalized  intersections,  the  existing research  on  eco-driving  control can  be 
categorized into two groups to account for traffic uncertainty. The first group focuses on eco-driving control with 
constraints of traffic lights and preceding vehicles. It involves switching between three driving modes based on real-
time  driving  conditions:  traffic  lights  priority,  car-following  priority  mode  and  emergency  braking  mode.  For 
instance, in [19], a predictive cruise control system is designed based on a bi-level MPC algorithm, which employs 
a car-following-oriented and a SPaT-oriented MPC controller to save energy consumption on urban roads. Similarly, 

Bai et al. [20] proposed a hybrid DRL-based eco-driving control strategy that combines rule-based and RL-based

control policies, wherein the RL-based controller addresses traffic light constraints, and the rule-based policy

focuses on velocity planning for car-following scenarios and driving safety. In [21], a DRL-based controller is

constructed to execute eco-driving control in urban scenarios based on a twin-delayed deep deterministic policy

gradient agent that takes into account the state of the host vehicle, preceding vehicle and next traffic light. However,

these approaches may result in unnecessary idling and deceleration before signalized intersections, as they do not

predict the possible waiting queues due to traffic jams before the intersections. Consequently, the controllers are

typically forced to switch to the car-following mode or emergency braking to ensure driving safety when the host

vehicle is approaching preceding cars, which can lead to a deterioration in the energy-saving performance of the

approaches that neglect traffic uncertainty.

In the second group, the waiting queues before signalized intersections are incorporated into eco-driving

control, and velocity is planned to avoid unnecessary idling. Sun et al. [22] formulated a data-driven chance-

constrained robust optimization problem based on empirical sample data in which an effective red-light duration is

proposed to illustrate the feasible passing time of a signalized intersection. In [23], an improved queue discharge

prediction method is implemented to predict the discharge time of the waiting queue before a signalized intersection,

and a hierarchical control framework is conduced to solve the ecological velocity. Dong et al. [24] designed a

vehicle queue predictor based on intelligent driver model (IDM), followed by a spatial-domain optimal control

strategy to realize energy consumption minimization and speed tracker. In addition, traffic flow models based on

shockwave theory are gradually employed in prediction of the waiting queues [25]. In [26], a kinematic wave model

is established to predict traffic dynamics and vehicle queue length on a signalized road. The optimal velocity is then

solved by the direct multiple shooting algorithm with the consideration of the constraints of modified traffic lights.

In [27], a long short-term memory neural network is implemented to predict dynamics traffic flow, thereby

promoting to waiting queue prediction. However, the above-mentioned methods are typically executed by assuming

that a waiting queue is already generated or will emerge before signalized intersections at red phase. In real-world

traffic conditions, a waiting queue is typically formed gradually when the host vehicle is approaching the next traffic

light, and it is difficult to anticipate the emergence of the waiting queue. Moreover, slow traffic flow might also

affect the motion of the host vehicle even at the green phase.

The eco-driving control with incorporating the queue effect have gained substantial attention recently.

Motivated by the above discussions, several problems are still unsolved in that area. Firstly, most existing studies

are executed via an optimization algorithm. However, the balance between the energy-saving performance and the

computational efficiency is intractable to be addressed especially in complicated urban scenarios. Secondly, in most

studies, the influence of traffic uncertainty on the eco-driving control at signalized intersections is insufficiently

considered. Moreover, most of existing studies are concerned about the traffic uncertainty executed in specific

scenarios, in which the queue progress is formed at the red phase. However, in urban scenarios, the formation of

queues is difficult to be anticipated and a slow traffic flow can also affect the motion of host vehicle at the green

phase. Hence, only taking the waiting queue at the red phase into consideration cannot sufficiently avoid

unnecessary deceleration or idling before the signalized intersections.

Departure Host vehicle

I2V

V2X

Sensors

Preceding vehicles

I2V

Multiple signalized

intersections Destination

The DDPG agent

DDPG agent

State

Environment

Reward Action

Modified traffic light modle

SPaT

information

Traffic data

Traffic density

Traffic flow

Maximum flow

Traffic uncertainty

Revised SPaT information

Longitudinal acceleration decisions

Road side units Road side units

Traffic flow model Modified SPaT

information

Fig. 1. Illustration of the proposed eco-driving control system.

Motivated by the above analysis, this study proposes a DRL-based eco-driving control approach at multiple

signalized intersections, in which a hierarchy framework is implemented as shown in Fig 1. The upper-level

controller estimates traffic uncertainty by using a queue-based traffic flow model and modifies the states of the

traffic lights accordingly, while the lower-level controller uses DRL to plan an ecological velocity for the vehicle

while considering driving safety and comfort constraints. The main contribution of this study is threefold: 1) an eco-

driving approach is proposed for signalized intersection scenarios and accounts for the impact of traffic uncertainty.

This is an important consideration as traffic flow in urban areas can be highly variable and unpredictable. 2) A

dynamic modified traffic light model is created, which considers the queue-based traffic flow model to correct the

states of the traffic lights. This model allows for more accurate traffic light predictions and can improve traffic flow.

3) The DRL-based eco-driving controller is designed to plan an ecological velocity at signalized intersections, while

considering traffic uncertainties and the constraints of the dynamic modified traffic light. Overall, this study presents

a promising approach for eco-driving control in urban areas, which can contribute to a more sustainable and efficient

transportation system.

The remainder of this paper is structured as follows: Section II presents the mathematical modeling of vehicle

and traffic uncertainty and formulates the OCP for eco-driving. Section III elaborates the eco-driving control by

consideration of the traffic uncertainty. Section IV presents the simulation results to verify the proposed method.

Finally, the main conclusions and future work are discussed in Section V.

II. MODELING AND PROBLEM STATEMENT

In this section, the vehicle kinematic model, the traffic flow model, and a modified traffic light model are

presented. The modified traffic light model is constructed to estimate the traffic uncertainty at signalized

intersections. In addition, this section formulates the OCP of eco-driving.

A. Vehicle Model

In eco-driving control, a longitudinal kinematics model is typically utilized to describe the motion of vehicles

[28]. The longitudinal driving force is presented, as:

cos( ) sin( )

air D

C Av

F Mgf Mg





= + +

(1)

where

is the resistance driving force,

is the vehicle mass,

means the rolling resistance coefficient,

is the acceleration of gravity,



expresses the road grade,

denotes the air drag coefficient,

indicates the

frontal area of the vehicle, and

air



is the air density. The driving resistance force consists of rolling resistance,

air resistance, acceleration resistance and road grade resistance. In this research, the lane change is neglected, and

the road grade is set as zero. By assuming the state variables of the vehicle model to be velocity

and position

the longitudinal kinematics model is formulated, as:

va=

(2)

dv=

(3)

m t t r

Ti F





−

(4)

where

v

are

d

the variation of velocity and position each time step, respectively,

is the longitudinal

acceleration,



is the correction coefficient of rotating mass,



is the transmission system efficiency,

indicates

the transmission ratio, and

stands for the torque of motor. In this study, the basic parameters of an electric

vehicle is applied based on our previous study [29] to calculate the energy consumption

( , , )E v a



, as:

/ , 0

( , , ) ,0

m m m

E v a P PP













(5)

9550 9550

m m m

tire t

T v T

Pri





(6)

( , )

m p p



=

(7)

where

is the battery power,

is the motor power,



is the road grade,



is the motor efficiency,

tire

is the

radius of vehicle tire,



is the rotation speed of motor,



represents the motor efficiency map.

B. Modeling and Prediction of Traffic Uncertainties

To consider traffic uncertainties at signalized intersections, a queueing-based traffic flow model [30], referred

to as Lighthill-Whitham-Richards (LWR) model, is built to predict the queue’s discharge time and traffic dynamics.

As discussed before, not only the waiting queue has an impact on the motion of the host vehicle, but also a slow

traffic flow can confine the speed of the host vehicle. Hence, the influence of traffic uncertainties at signalized

intersections are classified into two categories, i.e., waiting queue and slow traffic flow. To consider the influence

of waiting queue, the LWR model is leveraged to describe traffic dynamics of the waiting queue, as:

( , ) ( , ) 0

d t q d t





(8)





=

(9)

where

( , )dt



and

( , )q d t

represent the traffic density and flow at position

and time

, respectively. The

parameter

is the effective velocity of a shockwave whereas





and

q

show the changes of traffic density

and flow between different regions. The flow-density diagram of the LWR model in a specific region is shown in

Fig. 2. With the increasement of traffic density, a growth of traffic flow is observed until it reaches the maximum

traffic flow. By contrast, after reaching the peak point, the traffic flow gradually decreases to zero with the increase

of the traffic density. At the maximum traffic density point, we assume a waiting queue emerges in this region.

Based on (9), when the traffic light turns to green phase, the dissipating speed of waiting queue

dis

can be presented

as the effective velocity of shockwave, as:

dis



−

=−

(10)

where

and

stand for the traffic flow on the downstream and upstream of the signalized intersection,



and



represent the traffic density on the downstream and upstream of the intersection respectively. In most

situations, the traffic flow on the downstream of a red-phase intersection can be estimated as the maximum traffic

flow, and the upstream of the red-phase intersection reaches the maximum traffic density when a waiting queue is

formed. Therefore, the queue’s dissipation rate after the green light starts is reformulated, as:

max

dis



=−

(11)

where



is the traffic density corresponding to the maximum traffic flow,

max

is the maximum traffic flow,

max



is the maximum traffic density. Obviously, in a specific region, the queue’s dissipation rate is recognized as a

constant, which can be used to predict the dissipation time of the queue.

Traffic density

Traffic flow

Maximum flow

Maximum density



max,

( , )



max

( , )



Fig. 2. The flow-density diagram for the LWR model.

To consider the influence of a slow traffic flow, the arrive time to a signalized intersection is predicted for the

preceding vehicles. For this reason, the effective speed of the road is calculated based on traffic flow theory [31],

as:

()

() ()

traff j

vt t



(12)

where j means the

region before the

traffic light,

()

traff

indicates current effective speed at the

corresponding region, whereas

()

and

()



are the current traffic flow and density at that corresponding

region, respectively. If accurate traffic flow data is available, the traffic conditions can be preferably predicted at

intersections. However, the actual traffic flow information remains challenging to acquire in real world. By contrast,

the traffic count data is easy to be monitored and can be excavated to characterize the traffic flow to a large extent

[32]. In this study, we assume that loop detectors are installed at each signalized intersection and in the middle of

each two intersections. In addition, loop detectors will provide a traffic information vector of each passing vehicle,

i.e.,

[ , , ]

veh index index

Index t v

, including the index, arrive time and velocity of the vehicles. With the support of traffic

count data, the detectors at intersections can easily estimate the number of vehicles in that region, and the data from

the detectors between two traffic lights can be utilized to estimate the traffic flow and density, as:

() j

jveh

num

qt tt

=−

(13)

()

() ()

jveh

i aver

num t

tt t v



=−

(14)

()

aver i

veh

num t

=

(15)

where i = 1, 2, 3, … means the

vehicle in the region,

veh

num

is the number of vehicles in the

region,

is the

arrive time to the loop detectors for the

vehicle, and

aver

is the average speed of the passing vehicles. Finally,

based on (12) to (15), the arrival time to the next traffic light of each vehicle can be predicted,

() ()

triff

pre

triff

ttvt

(16)

where

triff

denotes the length of the

region. Thus, the arrival time to next traffic light of each preceding vehicle

is determined and continuously updated according to the current traffic flow and traffic density. By this manner, the

prediction of the traffic uncertainty includes the anticipation of queue discharge and preceding vehicles.

C. Optimal Eco-Driving Control Formulation

In this study, the aim of eco-driving control at signalized intersections is to optimize the host vehicle’s velocity

profile for energy-saving by considering the constraints of driving safety and traffic lights. The state variables vector

include the velocity and position of the host vehicle, and the control variable

is the longitudinal acceleration.

Therefore, the objective function, the constraints and the system dynamics of the OCP are formulated, as:

1 2 3

0( ( , ))

light safety

J p p E x u dt

  

= + +



(17)

min max

max

initial

v v v









=



=





(18)

()









(19)

where

( , )E x u

is the instantaneous electricity energy consumption,



are adjustable weight factors to balance

different indexes,

light

and

safety

are the penalties from traffic lights and preceding vehicles, and these penalties

are triggered when drive through a red light or a collision happens.

min

and

max

are the limits of acceleration,

min

and

max

are the limits of velocity,

and

are the initial velocity and position of the host vehicle,

respectively,

and

max

are the travel time and the maximum travel time for the whole scenario. It can be found

that the OCP contains nonlinear constraints from traffic lights and preceding vehicles, and consequently, it remains

challenging to directly solve it using traditional optimization algorithms. Previous studies attempted to solve this

problem by proposing modified optimization frameworks or algorithms, such as iterative DP algorithm and

hierarchical optimization framework [24, 27]. However, any improvement of the calculation efficiency is reported

to be achieved when the optimality of performance is sacrificed. In this study, a DRL-based method is leveraged to

find optimal control decisions of the OCP, thereby promoting performance with a slight calculation burden.

III. ECO-DRIVING CONTROL CONSIDERING TRAFFIC UNCERTAINTY AT SIGNALIZED INTERSECTIONS

To solve the multi-objective OCP of eco-driving control at signalized intersections, RL is applied to plan

velocity profiles for the host vehicle. The proposed scheme is described in Fig. 3, wherein a hierarchical framework

is implemented, in which the aforementioned traffic model is exploited to predict the traffic uncertainty, and a

modified traffic light model is developed to revise the SPaT information in the upper level. In addition, a deep

deterministic policy gradient (DDPG) agent is designed to generate acceleration control decisions at multiple

intersections under the influence of traffic uncertainty.

Traffic information from

loop detectors

Eco-driving control based on reinforcement learning

[ , , ]

veh A A

Index t v

Traffic

infrastructure

Host

vehicle

Queue

queue

mre ain

Optimal

profile

Time

Distance

Traffic model

Modified traffic light

()

() ()

traff j

vt t



() j

jveh

end

num

qt tt

=−

()

() ()

jveh

end j

A A aver

num t

tt t v



=−

Corrected traffic light state

Queue length

Corrected remaining time

DDPG agent

State

Environment

Reward Action

Acceleration

Target velocity

Traffic density

Traffic flow

Maximum flow

Traffic

infrastructure

Traffic

infrastructure

Fig. 3. Schematic of eco-driving control system at signalized intersections considering traffic uncertainties

A. Implementation of DRL for Eco-Driving Control

RL methods consider control systems as a Markov decision process, in which the next state

only depends

on the current state

and the control action

. The action is determined via a control policy

( , )



, and a

reward

is observed from the environment. In those methods, the aim of the agent is to learn a policy for

maximizing the cumulative reward, also called Q-value , which can be formulated as follows:

( , ) ( , ) max ( , )

t t t t i i

Q s a r s a r s a





(20)

where

is the duration of the whole scenario, and



is the discount factor. Eq. (20), referred to as the Bellman

function, can be recursively formulated, as:

( , ) ( , ) max ( , )

t t t t t t

Q s a r s a Q s a



(21)

A classic actor-critic architecture is typically employed in RL methods. In that structure, there are two crucial

components, i.e., the policy function and the Q-value function, which account for generating actions and estimating

the maximum cumulative reward. With the advancement of machine learning techniques, various DRL algorithms,

including distributed proximal policy optimization (DPPO), DDPG and twin-delayed deep deterministic policy

gradient (TD3), have been explored to enhance eco-driving control [33, 34]. The selection of an appropriate DRL

algorithm is primarily determined by the action space and convergence performance. The longitudinal control of

eco-driving requires continuous action space to support microscopic CAV control. Additionally, implementation of

reference target velocity, which will be discussed later, can facilitate training convergence. Previous research results

showed that the DDPG can capture the potential nonlinearity of policy function and the Q-value function through

deep neural networks and generate continuous actions [35]. Concerning the requirement of implementation

simplicity and continuous actions, the DDPG algorithm is utilized to solve the OCP of eco-driving control. To

balance the conflict of energy-saving and pursuing speed, a green light optimized speed advisory (GLOSA)

algorithm [21] is leveraged to calculate the reference target speed for the DRL controller, which is formulated, as:

_max _min

max( ,1.1 )

tar tar tl tl

v f v v=  

(22)

max

_ max

, green phase

, red phase

tl tl

remain





=





(23)

_ min

, green phase

, red phase

remain

remain green





=



+



(24)

where

_maxtl

and

_mintl

represent the maximum and minimum velocities to pass the next traffic light at the green

phase, respectively. Note that

_maxtl

and

_mintl

should satisfy the constraints of speed limits of the road. The

parameter

is the distance to the next traffic light,

remain

is the remaining time of current phase,

green

and

red

are the time duration of green and red phases, and

tar

is a factor to adjust the target velocity.

1) State and action variables

The state vector is designed to illustrate the current state of the eco-driving system. In some previous studies,

all possible variables are selected to represent the state of the host vehicle, traffic light and preceding vehicles [36].

However, a quite large state-space model is entailed, leading to difficulty in learning convergence. The state

variables are simplified with the help of target velocity, and the variables include host vehicle velocity

, host

vehicle acceleration

, velocity deviation

v

, velocity deviation integral

inte

v

and velocity deviation derivative

v



. Note that velocity deviation represents the deviation between the host vehicle’s velocity and the target velocity.

The host vehicle’s velocity

and acceleration

can describe the motion of the host vehicle whereas the other

variables can provide necessary information of speed pursuing state to pass the next traffic light in a green phase.

Furthermore, the acceleration is selected as the action variable to plan the velocity profiles for the host vehicle to

achieve energy-saving at signalized intersections.

2）Reward function

The reward function should be designed to encourage the agent to pass traffic lights in a green phase and reduce

energy consumption at the same time. Therefore, the reward function at signalized intersection scenarios mainly

consists of three aspects. The first one is energy consumption reward. The second one says the travel time should

be maintained within an acceptable range. The third aspect is related to driving safety, i.e., obeying traffic rules,

including traffic lights and speed limits. It should be noted that an emergency braking policy and a car-following

strategy [37] are applied to avoid collisions, and this item is not included in the reward function. Based on the above

discussion, the reward function

corresponding to energy consumption is designed as follows:

( , , )

r f E v a



= − 

(25)

where

is a weight factor of the energy consumption reward. Furthermore, a penalty is added to the reward

function by considering acceleration to avoid frequent speed variations, minimize energy consumption and ensure

driving comfort. A quadratic penalty function

is considered for acceleration, presented below.

r f a= − 

(26)

where

is a weight factor. The above item generates a penalty to critic module when acceleration or deceleration

emerges, especially in short maneuvers with a higher amplitude. In addition, the agent is encouraged to track the

target velocity for passing traffic lights at green phase. A reward function

for velocity deviation penalty is

formulated as follows:

()

v v tar

r f v v= −  −

(27)

where

is a weight factor. This penalty item enables the agent to eliminate exceed velocity deviations to the target

velocity. However, it does not ensure a preferable tracking of the target velocity. For instance, velocity cannot stably

follow the target velocity in a small velocity deviation situation as other reward terms dominant the final reward.

Moreover, a positive reward item is necessary to encourage the host vehicle to move forward to the destination.

Hence, two additional items in reward function are designed, as:

1 ( )

exp( )

vpr vpr







−

=−

(28)

_ min _ max

, ||

tl tl tl

v v v

rv v v v









=−  





(29)

where

vpr

is the positive reward to velocity deviation which satisfies a normal distribution,

is the velocity

deviation to the reference target speed,



is the mean value of normal distribution, and



is the standard deviation

of the normal distribution.

is the positive reward to velocity and encourages the host vehicle to move forward,

as long as the velocity is maintained within the boundaries when passing the next traffic light at a green phase.



indicates the constant value for positive reward, and

vpr

is a weight factor. Finally, a reward for safety is designed

to penalize behaviors like violating traffic rules when the host vehicle passes traffic lights at a red phase or violate

the speed limit. By contrast, the safety reward is zero if there is no unsafe behavior occurs. Hence, a piecewise

function for safety reward is presented as follows:

, traffic rules violated

0, otherwise

safe



−



=



(30)

where

safe



is the penalty for safety reward. In this manner, the total reward function can be formulated by adding

all items as presented below.

TL tl a v vpr e

r r r r r r r= + + + + +

(31)

B. Eco-Driving Approach Considering Traffic Uncertainty

As elaborated earlier, in real-world conditions, the SPaT information of the next traffic light may not be exactly

the same as practical conditions, due to the waiting queues and slow traffic flow. To address this limitation, a

modified traffic light model is developed to revise the affection of traffic uncertainty on SPaT information,

combining with the established traffic model. As such, the modified SPaT information is transferred to the DRL

controller to realize eco-driving control considering the influence of traffic uncertainty as shown in Fig. 4.

Host

vehicle

queue

Optimal

profile

Time

Modified traffic light state

Modified position

Modified remaining time

Modified traffic

light model

GLOSA algorithm

Target velocity

DDPG agent

[ , , ]

c c c

tl tl rem

st p t

TL tl a v vpr e

r r r r r r r= + + + + +

 

, , , ,

inte

v a v v v

  

Distance

Fig. 4. Schematic of eco-driving control using a modified traffic light model and DRL

The crucial information of traffic light consists of its state

, position

and remaining time

. Based on

this information, the target velocity can be obtained via (22) to (24). In this study, the lane change action is ignored,

and the host vehicle and preceding vehicles are assumed to drive in the same lane. Thus, the existence of any

preceding vehicle prevents the host vehicle from passing the signalized intersection, and the modified traffic light

state is corrected to red phase to accommodate it.

The modified state of the next traffic light is corrected as

according to the following rule:

, 0

tl veh

tl j

veh

st num

st red num

=



=





(32)

The modified state of traffic light switches to red when there are preceding vehicles in the same region, otherwise

it keeps the actual state of traffic light. The motion state of the last preceding vehicle is leveraged to modify the

position and remaining time of the next traffic light when

is transferred to red phase. The modified rules can be

divided into three cases to differentiate the influence of formed waiting queues, forming waiting queues and slow

traffic flow.

1) Case I: the queue process is completed, and the last preceding vehicle stops before the next traffic light. The

switch conditions are shown below:

( ) ,

( ) & 0

pre b e

end r r

pre

end cur end

t t t t

t t t v







 ==





(33)

where

()

pre

end

is the predicted arrival time to next traffic light for the last preceding vehicle,

and

are the

starting and ending time of the red phase,

cur

means the current time, and

end

is the velocity of the last preceding

vehicle. The modified position

and the modified remaining time

remain

of traffic light are formulated as follows:

tl tl queue

p p L=−

(34)

()

queue veh queue

L L L num=+

(35)

( ) ( )

remain remain remain t

t t t dt t



= + +

(36)

() queue

remain remain

dis

dt t t

=−

(37)

where

queue

is the queue length,

queue

num

is the number of vehicles in the queue,

is the standstill spacing,

veh

the length of each vehicle,

remain

is the correction value of remaining time, and



is the timing buffer to ensure

the safety. In this situation, a waiting queue is already formed, and a constant postponement to the actual remaining

time is generated according to the predicted queue’s dissipation time. This case typically occurs when the host

vehicle is approaching next traffic light.

2) Case II: the waiting queue is forming, and the last preceding vehicle is approaching the next traffic light.

The switch conditions are shown below:

( ) ,

( ) & 0

pre b e

end r r

pre

end cur end

t t t t

t t t v













(38)

Similar to Case I, the modified position

is predicted base on (34) and (35), whereas the correction value of

remaining time

remain

is formulated, as:

()

() queue

remain remain

dis

dt t t

(39)

In this situation, the waiting queue undergoes forming or is not formed yet. However, the traffic model predicts

there will be a waiting queue ahead, and the remaining time of the current phase is dynamically updated. Therefore,

the modified remaining time is predicted combining with the motion of the preceding vehicles, actual remaining

time and predicted queue length. In this case, there is normally a long distance between the host vehicle and the

next traffic light.

3) Case III: the last preceding vehicle can pass the next traffic light at green phase and the host vehicle will not

be affected by any waiting queue. The switch conditions are shown below:

( ) ,

pre b e

end g g

t t t t





(40)

where

and

are the starting and ending time of the green phase. The last preceding vehicle can pass the next

traffic light without idling, which indicates that the motion of the host vehicle will not be affected by any waiting

queue. Hence, the modified position of the traffic light is consistent with its actual position, and the correction value

of the remaining time

remain

is formulated, as:

( ) ( )

end

remain pre curr remain

dt t t t t t= − −

(41)

In this situation, the velocity profile of the host vehicle is affected by the preceding traffic flow instead of a waiting

queue. Therefore, a time-varying postponement of the remaining time is designed to avoid unnecessary idling before

a traffic light. On this basis, a modified traffic light model is developed, and the modified position

and modified

SPaT information of the traffic light

tl remain

st t





are leveraged to calculate the reference target velocity using the

GLOSA algorithm. As a result, the host vehicle can anticipate potential waiting queues or traffic flow before the

next traffic light and can reasonably optimize its velocity profile to reduce energy consumption and eliminate

unnecessary idling as much as possible.

IV. SIMULATION VALIDATIONS

To verify performance of the proposed method, several simulation case studies are conducted under a

MATLAB/Simulink platform. A virtual traffic environment model [33, 38], including traffic lights, preceding

vehicles and road grade, etc., is employed to test eco-driving control strategies. In this study, the motion of preceding

vehicles is controlled by IDM according to real-time driving conditions in which all preceding vehicles are assumed

to have a same desired driving speed of 20 m/s. Four benchmark approaches are constructed to validate the proposed

method (called MG-DRL). The first benchmark method (i.e., IDM) is an IDM-based strategy, which is selected to

simulate the behavior of human drivers in the virtual traffic environment. This model is selected as a baseline

benchmark in the simulations, and the IDM-based method is widely employed to evaluate the driving approaches

without eco-driving control [33]. The mathematical description of IDM is formulated, as:

max

( ( ), ( ))

()

( ) [1 ( ) ( ) ]

()

pre

desired

s v t v t

a t a v s t





= − −

(42)

*00

max

( ) ( )

() 2

v t v t

s s T v t ab



= +  +

(43)

where

desired

is the desired speed,



is the acceleration exponent,

is the desired headway distance,

pre

v

is the

velocity deviation to the preceding vehicle,

is the constant time headway,

is the desired deceleration, and

is the minimum safety distance. The basic parameters of the IDM model in this study are presented in Table 1.

The second benchmark method (i.e. G-IDM) is a GLOSA-based strategy, which plans the velocity profile

according to real SPaT information without considering traffic uncertainties [21].

tar

is set to be 0.6, which is the

same as what was set in the proposed method, whereas an IDM-based model is conducted to follow the velocity

planning result obtained from the GLOSA algorithm. Similarly, the third method (i.e., G-DRL) plans the reference

velocity based on the GLOSA algorithm, and it uses the DRL-based controller to optimize driving speed by

considering energy efficiency. The fourth benchmark (i.e., MG-MPC) plans the reference velocity considering the

prediction of traffic uncertainties proposed in this study, and a MPC-based method is implemented to plan ecological

velocity cross the upcoming signalized intersection [19]. The basic parameters of the host vehicle are presented in

Table 2.

Table 1. The basic parameters of the IDM model

Characteristic

Value

Desired speed (m/s)

Acceleration exponent

Constant time headway (s)

Desired deceleration (m)

1.6

Minimum safety distance (m)

Table 2. The basic parameters of the host vehicle

Characteristic

Value

Mass (kg)

1640

Frontal area (m2)

2.27

Air drag coefficient

0.3146

Tire rolling radius (m)

0.316

Rolling resistance coefficient

0.008

Battery capacity (Ah)

32.5

Final gear ratio

7.94

A. DRL Parameter Setting and Training

The main parameters of the DRL algorithm are presented in Table 3. Two deep networks are designed to

construct the actor network and the critic network, which consist of five and six hidden layers, respectively. These

two networks are trained based on an actor-critic structure, and the reward is calculated according to (25) to (31) at

each episode. A route with six traffic lights is selected to train the DDPG agent, and the maximum training episode

is set to 500, whereas the sample time of the simulation is set to 0.1 s.

To train the agent under different driving scenarios, several system parameters are randomized at each episode,

including the initial state of the host vehicle and preceding vehicles as well as the SPaT information. The training

result is shown in Fig. 5, the average accumulative reward increases sharply at the beginning of training, and it

gradually stabilizes after around 200 training episodes. Due to the randomness of initial conditions, the accumulative

reward of each episode has a slight difference after convergence. Thus, a trained DRL-based controller is obtained

for online execution, in which the controller can directly generate acceleration decisions according to the state of

the system.

Table 3. Parameters setting of the DRL algorithm

Parameters

Value

Number of neurons in critic network

Number of neurons in action network

Learning rate of critic network

1.00e-3

Learning rate of action network

1.00e-4

Number of layers in critic networks

Number of layers in action networks

Number of neurons in hidden layers

120

Target smooth factor

1.00e-3

Minibatch size

268

Discount factor

0.99

The weight of energy consumption

The weight of acceleration penalty

0.2

The weight of velocity deviation penalty

0.025

The positive reward for velocity



0.8

The penalty for safety reward

safe



1000

0100 200 300 400 500

Episode

-8000

-7000

-6000

-5000

-4000

-3000

-2000

-1000

1000

Average reward

Fig. 5. Convergence of training.

B. Analysis of Energy Efficiency

To verify the energy-saving performance of the proposed eco-driving approach, a driving scenario with six

signalized intersections and six preceding vehicles is constructed to simulate different eco-driving approaches for

comparison. Besides, the positions of traffic lights are randomized to distinguish them from the training scenarios.

The simulation results of distance profiles are shown in Fig. 6 (a). Obviously, the velocity profile of the MG-DRL

method can be preferably optimized in advance without unnecessary idling or deceleration, and the host vehicle can

exactly pass each signalized intersection at available green phase. By contrast, the IDM-based method can only

follow the desired speed, which has to stop three times before traffic lights in this scenario. As shown in Fig. 6 (b),

compared with IDM-based method, the G-DRL and G-IDM methods can plan similar low speed to pass the fifth

intersection while the host vehicles of these two approaches are affected by the waiting queue before the traffic light.

By contrast, the proposed MG-DRL method and the MG-MPC method successfully predict the emergence of the

waiting queue, and a lower driving speed is planned to avoid idling. Fig. 6 (c) illustrates the modified SPaT

information to consider the slow traffic flow. In summary, the proposed modified traffic light model can accurately

modify the SPaT information. A modified red phase is observed and dynamically updated when there are preceding

vehicles ahead, thus impacting the velocity planning in a complicated urban road.

The proposed m ethod

Actual red phas e

Modified red ph ase

IDM

G-DRL

G-IDM

The preceding vehicles

MG-MPC

(a)

The proposed m ethod

Actual red ph ase

Modified red pha se

IDM

G-DRL

G-IDM

The preceding v ehicles

MG-MPC

The proposed me thod

Actual red phase

Modified red pha se

IDM

G-DRL

G-IDM

The preceding v ehicles

MG-MPC

(b) (c)

Fig. 6. Position profiles of different methods when there are six preceding vehicles. (a) the position profiles of the whole

scenario, (b) the position profiles of the fifth intersection, (c) the position profiles of the third intersection.

The generated velocity and acceleration profiles are shown in Fig. 7. As depicted in Fig. 7 (a), the host vehicle

under the proposed MG-DRL method maintains a reasonable driving speed throughout the whole trip. A relative

low speed is planned when a potential waiting queue is predicted ahead to pass the next traffic light smoothly (such

as the one happens around t=50 s and t=300 s in the figure). Compared with the MG-MPC method, the proposed

method leverages the DRL-based controller to plan a smoother velocity profile, resulting in improved energy

efficiency performance. Additionally, velocity profiles under the G-DRL and G-IDM methods exhibit similar trends

in most of the cases. However, the G-DRL method shows fluctuations in velocity when the host vehicle is

approaching the preceding vehicles. The reason is that the G-DRL method does not integrate the influence of

preceding vehicles into the DDPG agent, and therefore, the simple switch logic in this method leads to frequent

switches between the DRL-oriented and car-following modes. It can be observed in Fig. 7 (b) that the accelerations

of all methods remain within a reasonable range, ensuring the planning of ecological velocities between

intersections. Fig. 8 presents the results of acceleration distribution and idling time proportions. As shown in Fig. 8

(a), compared with the benchmark methods, the host vehicle under the MG-DRL method and MG-MPC method

notably reduce unnecessary acceleration time by traffic uncertainties prediction. Furthermore, the proposed method

effectively reduces accelerations at high levels compared to MG-MPC, thereby enhancing energy efficiency and

driving comfort. Moreover, the idling proportion results, shown in Fig. 8 (b), reveal that unnecessary idling is

eliminated under the MG-DRL and MG-MPC method, which significantly contributes to reducing acceleration

duration.

050 100 150 200 250 300 350 400

Time (s)

Velocity (m/s)

MG-DRL

IDM

G-DRL

G-IDM

MG-MPC

050 100 150 200 250 300 350 400

Time (s)

-4

-3

-2

-1

Acceleration (m/s)

MG-DRL

IDM

G-DRL

G-IDM

MG-MPC

(a) (b)

Fig. 7. Results of velocity and acceleration profiles. (a) velocity profiles, (b) acceleration profiles.

0 0

G-DRL

Acceleration distribution

G-IDM IDM MG-DRL

MG-MPC

100

150

200

250

Duration (s)

(0,1]

(1,2]

(2,3]

169.7

196.6

28.7

3.0

2.4 5.8

184

4.06.4

116.5

0 0

102

7.03.0 0

Percentage (%)

G-DRL G-IDM IDM MG-DRL

MG-MPC

Idling time proportion

21.73

16.86 17.4

(a) (b)

Fig. 8. Results of acceleration distribution and idling proportion. (a) acceleration distribution, (b) idling proportion

The simulation results of different approaches are summarized in Table 4. The proposed method generates a

notable energy-saving improvement of 24.29% compared with the IDM method, which is regarded as human drivers

in this study. Furthermore, superior performance in energy consumption is raised by the proposed method, compared

with the other existing eco-driving approaches. In addition, all schemes have slight difference in the whole travel

time, indicating that the proposed method improves the energy efficiency without remarkably sacrificing traffic

efficiency. It should be noted that, compared with the G-IDM method, the G-DRL method consumes more energy

due to its frequent mode switching. To sum up, the MG-DRL method is demonstrated to effectively handle the

traffic uncertainties in multiple signalized intersections, and the energy-saving performance is significantly

improved compared with the benchmark approaches.

Table 4. Simulation results under different approaches

Method

Battery energy

consumption

(kWh)

Improvement

(%)

Travel time (s)

Idling time

proportion (%)

IDM

0.7

369.6

21.73

G-IDM

0.6

14.29

370.1

16.86

G-DRL

0.63

368.9

17.40

MG-MPC

0.55

21.43

372

MG-DRL

0.53

24.29

370.5

C. Analysis of Adaptability

To verify the adaptability performance of the proposed method in different traffic scenarios, a Monte Carlo

simulation is conducted to fully explore the performance of the proposed method. The simulation involves 500

various trials for each method in the same road. Wherein, the traffic parameters, including the initial SPaT

information of traffic light, the number of preceding vehicles and the initial state of preceding vehicles, are all

randomized in each interval. To fully explore the adaptability of the proposed method, the energy consumption,

travel time, idling time proportion and low-speed time proportion are summarized for comparison. The speed

boundary (1.39 m/s) is selected to judge the vehicle in a low-speed state [27].

Table 5 summarizes the average energy consumption and travel time. Compared with the IDM method, an

average improvement of 17.65% is observed, indicating that the proposed method can improve energy efficiency in

different traffic conditions. In addition, an average energy consumption reduction of 4.41% is achieved, compared

with other three eco-driving approaches. Furthermore, the travel time of the proposed method doses not have

significant increase, and the idling time and low-speed time proportion are remarkably decreased. A slight average

idling time proportion of 1.7% is observed due to the prediction errors of traffic uncertainties, which is notably

reduced compared with other methods. Note that the extreme congestion scenarios where idling is inevitable for the

host vehicle, are not discussed in this study. The reason is that eco-driving approaches considering the SPaT

information, have limited contribution to energy-saving in these extreme scenarios [26].

Table 5. Average simulation results under different traffic scenarios

Method

Average battery

energy

consumption

(kWh)

average

energy

consumption

improvement

(%)

Average travel

time (s)

Time variation

(%)

Average idling

time proportion

(%)

Average low-

speed time

proportion (%)

IDM

0.68

381

13.35

13.02

G-IDM

0.61

11.76

370

-2.89

8.72

7.84

G-DRL

0.60

13.24

369

-3.15

5.25

5.69

MG-MPC

0.58

14.71

383.2

0.58

2.23

2.89

MG-DRL

0.56

17.65

382

0.26

1.7

2.63

100 200 300 400 500 600 700

Traffic flow (veh/h)

Average energy consumption improvement (%)

MG-DRL

MG-MPC

G-DRL

G-IDM

Fig. 9. Average energy consumption improvement with different traffic flows.

To analyze the performance under various traffic demands, the simulation results are segmented into different

traffic flows. The outcomes are presented in Fig. 9, and the average energy consumption improvement is illustrated

across different traffic flows. The highest average energy efficiency improvement is observed at around 700 veh/h,

as traffic lights impose a greater impact on driving speed in these circumstances. Moreover, it is worth noting that

the energy consumption improvement does not have an absolute relationship with the increase of traffic flow.

Specially, two tests with different number of preceding vehicles are selected to further analyze the influence of

different traffic conditions, which are defined as Scenario A and B. The number of preceding vehicles in these two

scenarios is set as 3 and 9, respectively. These two scenarios are regarded as lighter and heavier traffic congestion

levels, and other traffic parameters are set the same as the simulation in Section IV-B. Fig. 10 illustrates the position

profiles of these two scenarios. The results demonstrate that the proposed MG-DRL method still has superior

adaptability in different traffic scenarios, and the host vehicle under the MG-DRL method can accurately pass each

intersection without idling. As shown in Fig. 10 (a), all four eco-driving approaches have similar trends in position

profiles in a light traffic scenario. The reason is that the waiting queue does not have significant impact on the

motion of the host vehicle in this traffic scenario. Besides, the slow traffic flow is concerned in the first and second

intersections, wherein the host vehicle plans an ecological speed to eliminate deceleration and extra energy

consumption when approaching the preceding vehicles. By contrast, as shown in Fig. 10 (b), the MG-DRL method

plans a preferable velocity profile to avoid unnecessary idling when a potential traffic intervene is predicted,

especially in the first two intersections. The velocity profiles are shown in Fig. 11 (a) and (b), in different scenarios,

and it can be observed that the MG-DRL method can effectively plan driving speed within a reasonable range.

Additionally, the proposed method significantly restrains speed waves, unnecessary decelerations and idling,

compared with other approaches. The results verify that the host vehicle can satisfactorily tackle the traffic

uncertainties in different traffic scenarios.

MG-DRL

Actual red ph ase

Modified red ph ase

IDM

G-DRL

G-IDM

The preceding v ehicles

MG-MPC

MG-DRL

Actual red phas e

Modified red ph ase

IDM

G-DRL

G-IDM

The preced ing vehicles

MG-MPC

(a) (b)

Fig. 10. Position profiles in different traffic scenarios. (a) Scenario A, (b) Scenario B.

(a) (b)

Fig. 11. Velocity profiles in different scenarios. (a) Scenario A, (b) Scenario B.

The results of these two scenarios are summarized in Table 6 for performance comparison. It can be observed

that the MG-DRL method can effectively promote the energy efficiency in different scenarios with a maximum

energy economy improvement of 30.64% compared with the IDM method in Scenario B. The reason is that the

waiting queues and slow traffic flow tend to emerge at signalized intersections in this scenario, and the traffic

uncertainty imposes more impact on the velocity planning in a high-level congestion scenario. By contrast, all four

eco-driving approaches have similar contributions to the energy consumption in Scenario A which represents a light

traffic condition. Compared with other eco-driving schemes, the proposed MG-DRL method provides a modest

average energy efficiency improvement of 3.75% in free-flowing traffic conditions. To conclude, the proposed

method is validated to show strong adaptability in different traffic scenarios, and the energy efficiency is effectively

promoted in all scenarios, especially in high-level congestion scenarios.

Table 6. Simulation results under different traffic scenarios

Scenario A

Scenario B

Method

Energy

consumption

(kWh)

Improvement

(%)

Travel

time (s)

Idling

time

proportion

(%)

Energy

consumption

(kWh)

Improvement

(%)

Travel

time

(s)

Idling

time

proportion

(%)

IDM

0.67

352.9

24.6

0.62

464.6

19.84

G-IDM

0.54

19.4

353.6

0.93

0.56

9.68

464.5

18.20

G-DRL

0.52

22.39

352.7

1.70

0.54

12.9

464.7

16.40

MG-MPC

0.53

20.9

359

0.46

25.81

469

MG-DRL

0.51

23.88

355.8

0.43

30.64

468.1

V. CONCLUSIONS

This article proposes a DRL-based eco-driving approach to tackle the traffic uncertainty at multiple signalized

intersections with limited access to traffic data. The proposed method can ecologically plan velocity profiles at

signalized intersections, regarding the potential impacts of slow traffic flow and waiting queues, to avoid

unnecessary idling and decelerations. A hierarchy framework is developed, in which the upper layer predicts the

influence of traffic uncertainty based on a dynamics traffic model, and a modified traffic light model is constructed

to revise the actual SPaT information. In the lower level, a DRL-based controller is designed to plan ecological

driving speed according to the system’s states, including the motion of the host vehicle, revised SPaT information,

etc. Substantial simulations are conducted to validate the effectiveness of the proposed method, and the results

demonstrate that it significantly promotes energy efficiency when compared with both the human driver model and

other existing eco-driving approaches, with an average improvement of 17.65% and 5.77%, respectively. In addition,

the proposed method is demonstrated to have preferable energy economy in different traffic scenarios, and a growth

tendency of energy efficiency improvement is observed with the increase of traffic congestion level.

Our future work will be focused on investigating eco-driving approach in complicated environments by

considering cooperation between multiple connected vehicles. The influence of various DRL algorithms on eco-

driving control will also be investigated in our next step studies.

ACKNOWLEDGMENTS

The work is funded by the National Natural Science Foundation of China (No. 52172400 and 52272395) in

part, and Science and Technology Research Program of Chongqing Municipal Education Commission (No.

KJQN201901539) in part. Any opinions expressed in this paper are solely those of the authors and do not represent

those of the sponsors.

REFERENCES

[1] Z. Fang, Z. Chen, Q. Yu, B. Zhang, and R. Yang, "Online Power Management Strategy for Plug-in Hybrid Electric Vehicles Based

on Deep Reinforcement Learning and Driving Cycle Reconstruction," Green Energy and Intelligent Transportation, p. 100016,

2022.

[2] A. Vahidi and A. Sciarretta, "Energy saving potentials of connected and automated vehicles," Transportation Research Part C:

Emerging Technologies, vol. 95, pp. 822-843, 2018.

[3] F. Zhang, X. Hu, R. Langari, and D. Cao, "Energy management strategies of connected HEVs and PHEVs: Recent progress and

outlook," Progress in Energy and Combustion Science, vol. 73, pp. 235-256, 2019.

[4] L. Xie, Y. Luo, D. Zhang, R. Chen, and K. Li, "Intelligent energy-saving control strategy for electric vehicle based on preceding

vehicle movement," Mechanical Systems and Signal Processing, vol. 130, pp. 484-501, 2019.

[5] S. Xie, X. Hu, T. Liu, S. Qi, K. Lang, and H. Li, "Predictive vehicle-following power management for plug-in hybrid electric

vehicles," Energy, vol. 166, pp. 701-714, 2019.

[6] G. Li and D. Görges, "Ecological Adaptive Cruise Control for Vehicles With Step-Gear Transmission Based on Reinforcement

Learning," IEEE Transactions on Intelligent Transportation Systems, 2019.

[7] S. Xu, S. E. Li, H. Peng, B. Cheng, X. Zhang, and Z. Pan, "Fuel-saving cruising strategies for parallel HEVs," IEEE Transactions

on Vehicular Technology, Article vol. 65, no. 6, pp. 4676-4686, 2016, Art no. 7296694.

[8] Y. Liu, Z. Huang, J. Li, M. Ye, Y. Zhang, and Z. Chen, "Cooperative optimization of velocity planning and energy management for

connected plug-in hybrid electric vehicles," Applied Mathematical Modelling, vol. 95, pp. 715-733, 2021.

[9] Z. Ye, K. Li, M. Stapelbroek, R. Savelsberg, M. Gunther, and S. Pischinger, "Variable Step-Size Discrete Dynamic Programming

for Vehicle Speed Trajectory Optimization," IEEE Transactions on Intelligent Transportation Systems, Article vol. 20, no. 2, pp.

476-484, 2019, Art no. 8320319.

[10] J. Li, Y. Liu, Y. Zhang, Z. Lei, Z. Chen, and G. Li, "Data-driven based eco-driving control for plug-in hybrid electric vehicles,"

Journal of Power Sources, vol. 498, p. 229916, 2021.

[11] Z. Nie, Y. Jia, W. Wang, and R. Outbib, "Eco-Co-Optimization strategy for connected and automated fuel cell hybrid vehicles in

dynamic urban traffic settings," Energy Conversion and Management, vol. 263, p. 115690, 2022.

[12] G. De Nunzio, C. C. De Wit, P. Moulin, and D. Di Domenico, "Eco‐driving in urban traffic networks using traffic signals

information," International Journal of Robust and Nonlinear Control, vol. 26, no. 6, pp. 1307-1324, 2016.

[13] X. Wei, J. Leng, C. Sun, W. Huo, Q. Ren, and F. Sun, "Co-optimization method of speed planning and energy management for fuel

cell vehicles through signalized intersections," Journal of Power Sources, vol. 518, p. 230598, 2022.

[14] J. Wang, J. Zhou, and W. Zhao, "Deep Reinforcement Learning Based Energy Management Strategy for Fuel

Cell/Battery/Supercapacitor Powered Electric Vehicle," Green Energy and Intelligent Transportation, vol. 1, no. 2, p. 100028, 2022.

[15] Y. Wang, Y. Wu, Y. Tang, Q. Li, and H. He, "Cooperative energy management and eco-driving of plug-in hybrid electric vehicle

via multi-agent reinforcement learning," Applied Energy, vol. 332, p. 120563, 2023.

[16] B. Liu, C. Sun, B. Wang, and F. Sun, "Adaptive speed planning of connected and automated vehicles using multi-light trained deep

reinforcement learning," IEEE Transactions on Vehicular Technology, vol. 71, no. 4, pp. 3533-3546, 2021.

[17] H. Yang, F. Almutairi, and H. Rakha, "Eco-driving at signalized intersections: A multiple signal optimization approach," IEEE

Transactions on Intelligent Transportation Systems, vol. 22, no. 5, pp. 2943-2955, 2020.

[18] M. Seredynski, G. Laskaris, and F. Viti, "Analysis of cooperative bus priority at traffic signals," IEEE Transactions on Intelligent

Transportation Systems, vol. 21, no. 5, pp. 1929-1940, 2019.

[19] Z. Nie and H. Farzaneh, "Real-time dynamic predictive cruise control for enhancing eco-driving of electric vehicles, considering

traffic constraints and signal phase and timing (SPaT) information, using artificial-neural-network-based energy consumption

model," Energy, vol. 241, p. 122888, 2022.

[20] Z. Bai, P. Hao, W. Shangguan, B. Cai, and M. J. Barth, "Hybrid Reinforcement Learning-Based Eco-Driving Strategy for Connected

and Automated Vehicles at Signalized Intersections," IEEE Transactions on Intelligent Transportation Systems, 2022.

[21] M. Wegener, L. Koch, M. Eisenbarth, and J. Andert, "Automated eco-driving in urban scenarios using deep reinforcement learning,"

Transportation research part C: emerging technologies, vol. 126, p. 102967, 2021.

[22] C. Sun, J. Guanetti, F. Borrelli, and S. J. Moura, "Optimal Eco-Driving Control of Connected and Autonomous Vehicles Through

Signalized Intersections," IEEE Internet of Things Journal, Article vol. 7, no. 5, pp. 3759-3773, 2020, Art no. 8964352.

[23] H. Dong, W. Zhuang, B. Chen, G. Yin, and Y. Wang, "Enhanced eco-approach control of connected electric vehicles at signalized

intersection with queue discharge prediction," IEEE Transactions on Vehicular Technology, vol. 70, no. 6, pp. 5457-5469, 2021.

[24] H. Dong et al., "A comparative study of energy-efficient driving strategy for connected internal combustion engine and electric

vehicles at signalized intersections," Applied Energy, vol. 310, p. 118524, 2022.

[25] H. Yang, H. Rakha, and M. V. Ala, "Eco-cooperative adaptive cruise control at signalized intersections considering queue effects,"

IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 6, pp. 1575-1585, 2016.

[26] S. Dong, H. Chen, B. Gao, L. Guo, and Q. Liu, "Hierarchical energy-efficient control for CAVs at multiple signalized intersections

considering queue effects," IEEE Transactions on Intelligent Transportation Systems, 2021.

[27] C. Sun, C. Zhang, H. Yu, W. Liang, Q. Ren, and J. Li, "An Eco-driving Approach with Flow Uncertainty Tolerance for Connected

Vehicles against Waiting Queue Dynamics on Arterial Roads," IEEE Transactions on Industrial Informatics, 2021.

[28] J. Han, A. Vahidi, and A. Sciarretta, "Fundamentals of energy efficient driving for combustion engine and electric vehicles: An

optimal control perspective," Automatica, Article vol. 103, pp. 558-572, 2019.

[29] A. Fotouhi, N. Shateri, D. Shona Laila, and D. J. Auger, "Electric vehicle energy consumption estimation for a fleet management

system," International Journal of Sustainable Transportation, vol. 15, no. 1, pp. 40-54, 2021.

[30] H. X. Liu, X. Wu, W. Ma, and H. Hu, "Real-time queue length estimation for congested signalized intersections," Transportation

research part C: emerging technologies, vol. 17, no. 4, pp. 412-427, 2009.

[31] N. Vandaele, T. Van Woensel, and A. Verbruggen, "A queueing based traffic flow model," Transportation Research Part D:

Transport and Environment, vol. 5, no. 2, pp. 121-135, 2000.

[32] N. G. Polson and V. O. Sokolov, "Deep learning for short-term traffic flow prediction," Transportation Research Part C: Emerging

Technologies, vol. 79, pp. 1-17, 2017.

[33] J. Li, X. Wu, M. Xu, and Y. Liu, "Deep reinforcement learning and reward shaping based eco-driving control for automated HEVs

among signalized intersections," Energy, vol. 251, p. 123924, 2022.

[34] H. Shi, D. Chen, N. Zheng, X. Wang, Y. Zhou, and B. Ran, "A deep reinforcement learning based distributed control strategy for

connected automated vehicles in mixed traffic platoon," Transportation Research Part C: Emerging Technologies, vol. 148, p.

104019, 2023.

[35] Y. Huang, H. Hu, J. Tan, C. Lu, and D. Xuan, "Deep reinforcement learning based energy management strategy for range extend

fuel cell hybrid electric vehicle," Energy Conversion and Management, vol. 277, p. 116678, 2023.

[36] Q. Guo, O. Angah, Z. Liu, and X. J. Ban, "Hybrid deep reinforcement learning based eco-driving for low-level connected and

automated vehicles along signalized corridors," Transportation Research Part C: Emerging Technologies, vol. 124, p. 102980,

2021.

[37] J. Li et al., "Cooperative Ecological Adaptive Cruise Control for Plug-in Hybrid Electric Vehicle Based on Approximate Dynamic

Programming," IEEE Transactions on Vehicular Technology, 2022.

[38] S. Hou, H. Chen, Y. Zhang, and J. Gao, "Speed planning and energy management strategy of hybrid electric vehicles in a car-

following scenario," Control theory and technology, vol. 20, no. 2, pp. 185-196, 2022.

Optimization strategy for connected automated vehicles to reduce energy consumption on freeway in rainy weather

Article

Jun 2024
ENERGY

Energy consumption on freeway significantly contributes to environmental pollution. Rainy weather, as a common adverse condition, will exert a negative impact on car-following behavior of vehicles on freeway and further affect their energy consumption. The emergence of connected automated vehicles (CAVs) has created an opportunity to mitigate these impacts. This paper aims to propose an optimization strategy for CAVs that can reduce energy consumption during car-following behavior on freeway under different rainy weather conditions. To begin with, a calibrated car-following model for regular vehicles (RVs) on freeway in rainy weather was used to derive an optimization strategy for CAVs that have vehicle-to-vehicle communication capability to stabilize traffic flow with smoothed speed fluctuations. The proposed optimization strategy for CAVs was then subjected to simulation experiments to validate its effectiveness. Results indicate that energy consumption on freeway in rainy weather is closely linked to speed fluctuations. Frequent speed fluctuations during car-following behavior could cause more energy consumption. The proposed optimization strategy for CAVs is capable of reducing energy consumption in rainy weather by smoothing speed fluctuations. CAVs equipped with this optimization strategy shows an energy-saving of 34.69%-61.11% compared to RVs under various rainy weather conditions.

An eco-driving strategy for autonomous electric vehicles crossing continuous speed-limit signalized intersections

Article

Feb 2024
ENERGY

Data-driven energy efficient speed planning for battery electric logistics vehicles: Forklift as a case study

Article

Mar 2024
J CLEAN PROD

Cooperative Ecological Adaptive Cruise Control for Plug-In Hybrid Electric Vehicle Based on Approximate Dynamic Programming

Article

Full-text available

Oct 2022

Eco-driving control generates significant energy-saving potential in car-following scenarios. However, the influence of preceding vehicle may impose unnecessary velocity waves and deteriorate fuel economy. In this research, a learning-based method is exploited to achieve satisfied fuel economy for connected plug-in hybrid electric vehicles (PHEVs) with the advantage of vehicle-to-vehicle communication system. A data-driven energy consumption model is leveraged to generate reinforcement signals for approximate dynamic programming (ADP) with the consideration of nonlinear efficiency characteristics of hybrid powertrain system. An advanced ADP scheme is designed for connected PHEVs driving in car-following scenarios. Additionally, the cooperative information is incorporated to further improve the fuel economy of the vehicle under the premise of driving safety. The proposed method is mode-free and showcases acceptable computational efficiency as well as adaptability. The simulation results demonstrate that the fuel economy during car-following processes is remarkably improved through cooperative driving information, thereby partially paving the theoretical basis for energy-saving transportation. Index Terms-Eco-driving, cooperative adaptive cruise control, velocity optimization, approximate dynamic programming, plug-in hybrid electric vehicle

Eco-Co-Optimization strategy for connected and automated fuel cell hybrid vehicles in dynamic urban traffic settings

Article

Full-text available

Jul 2022
ENERG CONVERS MANAGE

In urban traffic settings, the dynamic changes of the preceding and rear vehicles state, road gradient, road coefficient as well as the possible traffic congestion at signal intersections contribute to the difficulty of real-time optimal energy management for connected and automated fuel cell hybrid vehicles. To address this problem, an eco-co-optimization strategy is developed to achieve velocity planning and the promotion of energy management in this paper. First, gradient‑based model predictive control based on the fast projection gradient method is employed to obtain the real-time safe and optimal velocity according to the future information of driving conditions and signal lights state. Meanwhile, to achieve desirable velocity tracking and preferable power splitting, an energy management strategy based on model predictive control is designed, where a multi-objective performance function is leveraged to minimize the total cost, hydrogen consumption and extend battery service life. Additionally, an energy recovery strategy based on fuzzy logic control is executed to improve energy efficiency. The simulation results reveal that the developed strategy can obtain a real-time safe and optimal velocity sequence and enable the CAFCHV efficiently passes through the continuous signalized intersections. Simultaneously, compared with adaptive cruise control, the hydrogen consumption, SOC, global cost and battery degradation are reduced by 3.13%, 4.76%, 3.37%, and 14.48% in the planning state, respectively.

Deep reinforcement learning and reward shaping based eco-driving control for automated HEVs among signalized intersections

Article

Full-text available

Apr 2022
ENERGY

In a connected traffic environment with signalized intersections, eco-driving control needs to co-optimize fuel economy (fuel consumption), driving safety (collisions and red lights), and travel efficiency (total travel time) of automated hybrid electric vehicles. Thus, we proposed a deep reinforcement learning based eco-driving control strategy to co-optimize the fuel economy, driving safety, and travel efficiency. A twin-delayed deep deterministic policy gradient agent is implemented to plan vehicle speed in real-time. The multi-objective optimization function of the eco-driving control problem is transformed into the value function of the deep reinforcement learning algorithm by designing fuel reward, traffic light reward, and safety reward function. Specifically, we designed potential-based shaping functions to solve the problem that the intelligent agent cannot learn an optimal policy due to the sparse and delayed traffic environment. It can steer the agent to an optimal policy and guarantee policy invariance. Finally, the proposed method is verified in a real road traffic environment with signalized intersections. The results demonstrate that the proposed method can heavily reduce fuel consumption while satisfying the constraints of traffic lights and safety rules. Meanwhile, the proposed strategy shows certain feasibility for real-time application.

A deep reinforcement learning based distributed control strategy for connected automated vehicles in mixed traffic platoon

Article

Mar 2023
TRANSPORT RES C-EMER

This paper proposes an innovative distributed longitudinal control strategy for connected automated vehicles (CAVs) in the mixed traffic environment of CAV and human-driven vehicles (HDVs), incorporating high-dimensional platoon information. For mixed traffic, the traditional CAV control method focuses on microscopic trajectory information, which may not be efficient in handling the HDV stochasticity (e.g., long reaction time; various driving styles) and mixed traffic heterogeneities. Different from traditional methods, our method, for the first time, characterizes consecutive HDVs as a whole (i.e., AHDV) to reduce the HDV stochasticity and utilize its macroscopic features to control the following CAVs. The new control strategy takes advantage of platoon information to anticipate the disturbances and traffic features induced downstream under mixed traffic scenarios and greatly outperforms the traditional methods. In particular, the control algorithm is based on deep reinforcement learning (DRL) to fulfill car-following control efficiency and further address the stochasticity for the aggregated car following behavior by embedding it in the training environment. To better utilize the macroscopic traffic features, a general platoon of mixed traffic is categorized as a CAV-HDVs-CAV pattern and described by corresponding DRL states. The macroscopic traffic flow properties are built upon the Newell car-following model to capture the characteristics of aggregated HDVs' joint behaviors. Simulated experiments are conducted to validate our proposed strategy. The results demonstrate that the proposed control method has outstanding performances in terms of oscillation dampening, eco-driving, and generalization capability.

Deep reinforcement learning based energy management strategy for range extend fuel cell hybrid electric vehicle

Article

Feb 2023
ENERG CONVERS MANAGE

To meet the power and long-range driving requirements of the vehicle, this paper presents a dual mode operation scheme for a range extend fuel cell hybrid vehicle for the first time, with an in-depth study of the pure electric mode and the range extend mode. The deep deterministic policy gradient algorithm is a well-known deep reinforcement learning algorithm that can solve complex nonlinear problems. To achieve the optimal power distribution among energy sources in the two modes, a dual deep deterministic policy gradient algorithm framework is proposed for the first time in this paper. In addition, a pervious action guidance mechanism is proposed to enable networks to approximate the action value function more efficiently in training. The training results show that the adopted previous action guidance mechanism helps to improve the learning convergence and exploration ability. The validation results show that the proposed strategy improves the operating economy by about 30% compared to the rule-based strategy, reduces the average fuel cell output fluctuation to less than 100 W, and reduces the fuel cell lifetime loss greatly. It is hoped that the proposed new structure, patterns, and energy management strategy will provide more ideas for scholars in future research.

Cooperative energy management and eco-driving of plug-in hybrid electric vehicle via multi-agent reinforcement learning

Article

Feb 2023
APPL ENERG

The advanced cruise control system has expanded the energy-saving potential of the hybrid electric vehicle (HEV). Despite this, most energy-saving researches for HEV either only optimize the energy management strategy (EMS) or integrate eco-driving through a hierarchically optimized assumption that optimizes EMS and eco-driving separately. Such kinds of approaches may lead to sub-optimal results. To fill this gap, we design a multi-agent reinforcement learning (MARL) based optimal energy-saving strategy for HEV, achieving a cooperative control on the powertrain and car-following behaviors to minimize the energy consumption and keep a safe following distance simultaneously. Specifically, a plug-in HEV model is regarded as the research object in this paper. Firstly, the HEV energy management problem in the car-following scenario is decomposed into a multi-agent cooperative task into two subtasks, each of which can conduct interactive learning through cooperative optimization. Secondly, the energy-saving strategy is designed, called the independent soft actor–critic, which consists of a car-following agent and an energy management agent. Finally, the performance of velocity tracking and energy-saving are validated under different driving cycles. In comparison to the state-of-the-art hierarchical model predictive control (MPC) strategy, the proposed MARL method can reduce fuel consumption by 15.8% while ensuring safety and comfort.

Deep Reinforcement Learning Based Energy Management Strategy for Fuel Cell/Battery/Supercapacitor Powered Electric Vehicle

Article

Sep 2022

Vehicles using a single fuel cell as a power source often have problems such as slow response and inability to recover braking energy. Therefore, the current automobile market is mainly dominated by fuel cell hybrid vehicles. In this study, the fuel cell hybrid commercial vehicle is taken as the research object, and a fuel cell/battery/supercapacitor energy topology is proposed, and an energy management strategy based on a double-delay deep deterministic policy gradient is designed for this topological structure. This strategy takes fuel cell hydrogen consumption, fuel cell life loss, and battery life loss as the optimization goals, in which supercapacitors play the role of coordinating the power output of the fuel cell and the battery, providing more optimization ranges for the optimization of fuel cells and batteries. Compared with the deep deterministic policy gradient strategy (DDPG) and the nonlinear programming algorithm strategy, this strategy has reduced hydrogen consumption level, fuel cell loss level, and battery loss level, which greatly improves the economy and service life of the power system. The proposed EMS is based on the TD3 algorithm in deep reinforcement learning, and simultaneously optimizes a number of indicators, which is beneficial to prolong the service life of the power system.

Online Power Management Strategy for Plug-in Hybrid Electric Vehicles Based on Deep Reinforcement Learning and Driving Cycle Reconstruction

Article

Jun 2022

This paper proposes a novel power management strategy for plug-in hybrid electric vehicles based on deep reinforcement learning algorithm. Three parallel soft actor-critic (SAC) networks are trained for high speed, medium speed, and low-speed conditions respectively; the reward function is designed as minimizing the cost of energy cost and battery aging. During operation, the driving condition is recognized at each moment for the algorithm invoking based on the learning vector quantization (LVQ) neural network. On top of that, a driving cycle reconstruction algorithm is proposed. The historical speed segments that were recorded during the operation are reconstructed into the three categories of high speed, medium speed, and low speed, based on which the algorithms are online updated. The SAC-based control strategy is evaluated based on the standard driving cycles and Shenyang practical data. The results indicate the presented method can obtain the effect close to dynamic programming and can be further improved by up to 6.38% after the online update for uncertain driving conditions.

An Eco-Driving Approach With Flow Uncertainty Tolerance for Connected Vehicles Against Waiting Queue Dynamics on Arterial Roads

Article

Oct 2021

Eco-driving incorporating multiple signalized intersections simultaneously has been proven to substantially benefit Connected Vehicles (CVs) in energy performance. However, ignoring the dynamic variation of waiting queues before downstream intersections may prevent CVs from following the obtained speed profile on security grounds. In this paper, the dynamic variation of waiting queue is modelled and predicted based on shockwave theory and data-driven based traffic flow prediction. To formulate the waiting queues as additional time-varying constraints for optimization problem, an extended traffic signal model is constructed based on the prediction. Furthermore, a hierarchical optimization framework is proposed, under which the hybrid optimization problem is decomposed into a discrete problem and a continuous one. Monte Carlo simulation demonstrates that if the proposed eco-driving approach is implemented, failure to follow the reference speed profile decreases by 79.4%. Also, the fuel consumption can be saved by over 4% compared with approaches ignoring the waiting queue.

Speed planning and energy management strategy of hybrid electric vehicles in a car-following scenario

Article

Apr 2022

The development of intelligent connected technology has brought opportunities and challenges to the design of energy management strategies for hybrid electric vehicles. First, to achieve car-following in a connected environment while reducing vehicle fuel consumption, a power split hybrid electric vehicle was used as the research object, and a mathematical model including engine, motor, generator, battery and vehicle longitudinal dynamics is established. Second, with the goal of vehicle energy saving, a layered optimization framework for hybrid electric vehicles in a networked environment is proposed. The speed planning problem is established in the upper-level controller, and the optimized speed of the vehicle is obtained and input to the lower-level controller. Furthermore, after the lower-level controller reaches the optimized speed, it distributes the torque among the energy sources of the hybrid electric vehicle based on the equivalent consumption minimum strategy. The simulation results show that the proposed layered control framework can achieve good car-following performance and obtain good fuel economy.

Deep reinforcement learning-based eco-driving control for connected electric vehicles at signalized intersections considering traffic uncertainties

Abstract and Figures

Recommended publications

Review on Eco-driving Control for Connected and Automated Vehicles

Cooperative Ecological Adaptive Cruise Control for Plug-In Hybrid Electric Vehicle Based on Approxim...

Data-driven based eco-driving control for plug-in hybrid electric vehicles

Hierarchical eco-driving control for plug-in hybrid electric vehicles under multiple signalized inte...