PresentationPDF Available

A Deep Reinforcement Learning-based Ramp Metering Control Framework for Improving Traffic Operation at Freeway Weaving Sections

January 2019

January 2019

Conference: 2019 Transportation Research Board Annual Meeting
At: Washington DC, USA

Authors:

Mofeng Yang

University of Maryland, College Park

Zemian Ke

Carnegie Mellon University

Meng Li

Southeast University (China)

Ramp metering (RM) dynamically adjusts ramp flow merging into freeway mainline according to real-time traffic conditions to improve traffic operation. The effectiveness of RM is mainly determined by its control strategies, which decide how to calculate flow for various traffic states. The traditional RM control strategies are limited by the responding speed to traffic changes and the online computation workloads. They also require large and sufficient human knowledges about the traffic flow problems in the study segments. In this study, we aim at proposing a control framework that is deep reinforcement learning-based RM, named DQN-based RM. This new control framework incorporates the deep Q network (DQN) algorithm and the RM in order to reduce total travel time on freeways. A typical freeway weaving bottleneck section was simulated based on the Simulation of Urban Mobility (SUMO) platform. The results show that the proposed DQN-based RM strategy is able to response proactively to different traffic states and take immediate and correct actions to prevent traffic breakdown, without full prior knowledge of traffic flow theories. The DQN-based RM could reach the optimal control target within a short training time, and the total travel time was reduced by 51.48% and 50.58% with 15 s and 30 s as the control cycle. We also compare the performances of various RM strategies. The results show that the DQN-based RM outperforms the traditional fixed-time and the feedback-based RM control strategies in mitigating congestions and reducing travel time on freeways.

Comparison Result for Different Control Strategies 20

…

Effects of DQN-based RM Control Strategies 28

…

Parameters for DQN-based RM Control Strategy

…

Figures - uploaded by Mofeng Yang

Content may be subject to copyright.

Content uploaded by Mofeng Yang

Content may be subject to copyright.

Content uploaded by Mofeng Yang

Content may be subject to copyright.

A DEEP REINFORCEMENT LEARNING-BASED RAMP METERING CONTROL

FRAMEWORK FOR IMPROVING TRAFFIC OPERATION AT FREEWAY WEAVING

SECTIONS

Mofeng Yang

Jiangsu Key Laboratory of Urban ITS, Southeast University

Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies

Dong Nan Da Xue Rd. #2, Nanjing, China, 211189

Email: yangmofeng@seu.edu.cn

Zhibin Li, Ph.D., Corresponding Author

Jiangsu Key Laboratory of Urban ITS, Southeast University

Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies

Dong Nan Da Xue Rd. #2, Nanjing, China, 211189

Email: lizhibin@seu.edu.cn

Zemian Ke

Jiangsu Key Laboratory of Urban ITS, Southeast University

Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies

Dong Nan Da Xue Rd. #2, Nanjing, China, 211189

Email: kezemian@seu.edu.cn

Meng Li

Jiangsu Key Laboratory of Urban ITS, Southeast University

Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies

Dong Nan Da Xue Rd. #2, Nanjing, China, 211189

Email: seulimeng@163.com

Word count: 6,581 words text + 3 tables × 250 words (each) = 7,331 words

Submission Date: August 1, 2018

Yang, Li, Ke and Li 2

ABSTRACT

Ramp metering (RM) dynamically adjusts ramp flow merging into freeway mainline according to

real-time traffic conditions to improve traffic operation. The effectiveness of RM is mainly

determined by its control strategies, which decide how to calculate flow for various traffic states.

The traditional RM control strategies are limited by the responding speed to traffic changes and

the online computation workloads. They also require large and sufficient human knowledges about

the traffic flow problems in the study segments. In this study, we aim at proposing a control

framework that is deep reinforcement learning-based RM, named DQN-based RM. This new

control framework incorporates the deep Q network (DQN) algorithm and the RM in order to

reduce total travel time on freeways. A typical freeway weaving bottleneck section was simulated

based on the Simulation of Urban Mobility (SUMO) platform. The results show that the proposed

DQN-based RM strategy is able to response proactively to different traffic states and take

immediate and correct actions to prevent traffic breakdown, without full prior knowledge of traffic

flow theories. The DQN-based RM could reach the optimal control target within a short training

time, and the total travel time was reduced by 51.48% and 50.58% with 15 s and 30 s as the control

cycle. We also compare the performances of various RM strategies. The results show that the

DQN-based RM outperforms the traditional fixed-time and the feedback-based RM control

strategies in mitigating congestions and reducing travel time on freeways.

Keywords: Ramp metering, Deep reinforcement learning, Congestion, Freeway weaving sections,

Bottleneck

Yang, Li, Ke and Li 3

INTRODUCTION

The continuous increase of travel demand and flow on freeways cause frequent and severe

congestion issues, which greatly increase total travel time, delays, crash risks and fuel

consumptions (1-4). Freeway weaving sections are typical bottlenecks which are usually forms

with a merge area closely followed by a diverge area (5, 6). Within the weaving sections, vehicles

that enter and exit the freeway perform intense lane changes to access the target lanes, which are

the main cause of traffic congestion and traffic breakdown (7). The traffic operation of freeway

weaving bottleneck should be improved by properly controlling the traffic flow (8).

Ramp metering is the most commonly used active traffic management (ATM) strategy that

controls the number of ramp vehicles merging into the mainline by installing traffic lights and a

set of loop detectors on mainlines and ramps (9-11). The main objective is to set an appropriate

ramp flow leasing rate in order to minimize the negative impact of ramp flow disturbance on

mainline traffic to prevent the occurrences of traffic breakdown and capacity drop. The general

framework in previous studies is shown in Figure 1 (a). The study site is selected which contains

recurrent bottlenecks; secondly, the traffic data at loop detector stations is collected. Then, traffic

flow features as well as key congestion attributes (i.e. occupancy thresholds for prediction of

capacity drop) are analyzed based on traffic flow theories. Finally, the key parameters in the RM

control strategies are decided.

Early RM studies and practice tend to use the fixed-time control strategies, which decide

the metering rate by setting the control cycle, green light phase and red light phase (10). Such

strategies are easy to implement, but the major drawback is that the fixed parameters fail in

responding to the changing traffic states, which greatly reduce the strategy performances. Later,

the feedback-based RM algorithms were proposed, which were traffic responsive in nature. The

most famous feedback strategy, named the Asservissement Linéaire d’Entrée Autoroutière

(ALINEA) (12), and its variants (13-16), aim at adjusting the bottleneck occupancy at the expected

value to maintain the maximum flow and prevent traffic breakdown. Such approaches have already

been implemented in many practical uses (17, 18). However, the feedback-based RM adjusts the

metering rate passively according to the traffic conditions. The performance is greatly limited by

the feedback nature especially in the fast-changing traffic environments. In addition, the set of

control parameters in the feedback controller rely on human prior knowledge about the capacity

drop and the breakdown probabilities.

Some research used the online optimization methods, such as the model predictive control,

to calculate the metering rate by solving on-line optimal control problems (19-22). Though such

approaches can theoretically obtain the mathematical optimal solutions of the RM control problem,

they require accurate models to predict the traffic dynamics and contain large online computing

workload which is considered not capable for large-scale applications. Some researchers proposed

that the traffic flow theory-based RM control strategy improved traffic operation by forming a

free-flow pocket in bottleneck traffic flow (1, 23-25). The strategy was applied and tested on the

I-805 and I-5 freeway section in California and it was found to outperform the feedback-based

strategy in their case study (1). However, such approaches require deep understanding about the

traffic flow characteristics and congestion mechanisms at the target bottlenecks. The obtained

strategies may be site-specific and may not work well when applied on other freeway bottlenecks.

Yang, Li, Ke and Li 4

(a)

(b)

FIGURE 1 (a) General framework for RM studies; (b) DQN-based RM control framework.

In this study, a deep reinforcement learning (RL)-based RM control framework was

proposed for reducing total travel time at freeway bottleneck areas, as shown in Figure 1 (b). The

framework directly bridges up the correlation between the traffic flow data and the RM control

strategies. The RL enables the agent to obtain optimal strategy in an unfamiliar environment

without prior knowledge on traffic flow. In other words, the RL-based RM control strategy does

not require complex analyses based on the traffic flow theories and thus reduces the need for

human knowledges. Considering the complexity and mechanism of some traffic issues, such as

the capacity drop and traffic breakdown, are still not fully explored and understood by researchers,

previous RM strategies have clear cognitive limitations and our framework has a potential of

leading to the improved performances.

Previous researchers have incorporated the Q-learning (QL) (26), which is a basic RL

algorithm (27-29), into the RM control tasks (8, 30-35). Though the training process is extensive,

the QL-based RM can obtain the best metering actions for various traffic states. The effects of the

QL-based RM control strategies are widely reported in the previous studies. However, the QL uses

discretized traffic state sets with large intervals which may not accurately reflect the traffic

conditions. In addition, the QL’s performance is greatly limited by the storage ability of the Q-

table which is used to determine the optimal control actions in the iteration process.

Deep Q learning (DQN) is a combined use of the QL and the deep neural network (DNN)

(36, 37). The DeepMind team has put forward a number of successful applications with the DQN,

such as the use of Atari, AlphaGO, AlphaZero, etc in go games. (36-39). Compared to the

traditional QL, the DQN can deal with more complex tasks with large and continuous state space

by adding the DNN in the process of the agent’s learning structure. Some recent studies have

applied the DQN to solve intersection signal control problems (40-43). They reported that the

DQN finds appropriate and stable signal timing polices and result in a reduction in total travel time

and collision risks.

A most recent study proposed a multi-task deep RL-based RM and could achieve a control

performance on par with ALINEA (44). However, their study focused on large-scale freeway

sections with multiple on ramps and did not pay particular attentions to the specific freeway

bottlenecks where the occurrence of capacity drop was the main reason for the low efficiency of

traffic flow.

General framework for ramp metering studies

Select a

study site Data collection Traffic flow

analysis

Congestion

mechanism analysis

Ramp metering

strategies

Human knowledge

Yang, Li, Ke and Li 5

The literature review shows that none of previous studies have incorporated the DQN

algorithm with the RM control to reduce total travel time and to improve traffic operation at

freeway bottleneck areas. This study aims to fill the gap. A typical bottleneck is simulated in the

Simulation of Urban Mobility (SUMO). A training procedure for the DQN agent is proposed to

obtain the optimal actions. The effects of the DQN-based RM strategy are evaluated and compared

with some traditional RM control strategies.

METHODOLOGY

The RL approach is initially inspired by behaviorist psychology, which considers how an agent

ought to take actions in an environment to maximize the cumulative reward (27-29). An RL agent

interacts with its environment in discrete time steps, which is typically formulated as a Markov

decision process (MDP). The RL requires the determination of a metering flow control action for

the current traffic state at each decision interval. After the agent has taken a control action, the

current traffic state changes into a new state. The state transition can be evaluated by the reward

function. Therefore, the RM control problem can be formulated as a typical MDP problem and can

be resolved by the RL technique.

Deep Q Network

In a QL, the Q-value is assigned to each state-action pair to evaluate the quality of the action. The

set of Q-values can be represented as:

Q: S

→

(1)

where S is the set of possible states, A is the set of possible actions, and R is the set of

rewards. In an infinite horizon discounted reward problem, the agent’s goal is to maximize

(2)

where Rt is the reward at time step t, and γt is the discount factor that defines the relative

importance of the current rewards and those earned earlier (0≤γ≤1). For a non-deterministic

environment, the Q-value is updated with every new training sample according to

(3)

where Qt+1(st, at) is the Q-value for the state-action pair (st, at) at time step t+1, Rt+1 is the

reward received after performing action at at state st and then moves to the new state st+1, and κ(s,

a) is the learning rate which controls how fast the Q-values are altered.

The Q-table is updated in the training process. The Q-values will converge if each state-

action pair is executed for several times, and the optimal action for each state is determined by the

action with the largest Q-value. Then, the QL agent can be used for the optimal control according

to the knowledge it obtained in the training process.

The Q-table based approach is effective and efficient if the state-action set is not large.

However, in practical applications, the state-action space is usually very large, resulting in the size

of the lookup table growing exponentially. The algorithm is hard to converge, because all state-

( ) ( )

( )

( ) ( )

111

max , ,

tt tt

ttttt

Qsa Qsa

RQsaQsa

+++

éù

+× -

ëû

Yang, Li, Ke and Li 6

action pairs need to be visited multiple times according to the inherent logic of QL. In addition,

limited by the size of the lookup table, the number of variables for traffic state representation is

restricted, which makes the QL incapable of handling more complicated traffic control tasks. The

drawbacks associated with the QL greatly limit its performance and applications.

The DQN incorporates a DNN in the QL and relaxes the limitation of the state size (36,

37). In a DQN, the Q-values are mapped from states which are used as the input of the DNN,

allowing for a larger or continuous space of states. There are two common ways to stabilize the

DQN, the target network freezing and the prioritized experience replay (45, 46). The target

network freezing splits the Q-value estimation into two different networks, i.e., a value network to

estimate the Q-value of the current state and a target network to compute the targets. By selecting

an appropriate freezing interval, the targets can be partially stabilized. Experience replay is a way

to sample a batch of the experience during the training process. In the DQN with prioritized

experience replay (46), the Temporal-Difference (TD)-error is used to help sample experiences

with larger reward.

The key idea is that the transition with higher expected learning progress measured by

absolute TD-error is more likely to be replayed. The priority of transition t is calculated as

(4)

where

𝛿!

is the TD-error, and

𝜋

is a positive constant that prevents some transitions not

being sampled again once their TD-error is zero. TD error is calculated as

(5)

Then the probability of sampling transition t is determined as

(6)

where

𝑝!> 0

, the exponent

𝛼

indicates the level of prioritization, and

𝛼 = 0

means the

uniform sampling.

Importance-Sampling (IS) weights are used to correct bias introduced by prioritized replay

(47). Hence, the parameter of the value network can be updated as

(7)

where

𝑘

is the mini-batch size when sampling, and

𝜔!

is the IS weight of sampled

transition t,

𝜔!

is calculated as

(8)

where

𝑁

is the memory size, and

𝛽 = 1

indicates that the non-uniform sampling

probabilities are fully compensated.

In our study, the DQN with the prioritized experience replay and target network freezing

was used to support the proposed RM control strategy.

( )

max ( , ) ,

tt t tt

RQsasQa

+-=

() t

Pt p

=å

1(, )

tt t

Qs a

qqhwd

+××Ñ=+

()

tNPt

=×

Yang, Li, Ke and Li 7

DQN-Based RM Control strategy

A DQN-based RM control strategy was proposed in this section and the flowchart is shown in

Figure 2. The DQN-based RM consists of two parts, the DQN Agent and the simulation network.

These two parts could exchange traffic state, reward and action. The DQN agent could iterate to

the optimal control strategy by continuously interacting with the simulation network. Details will

be discussed in next section.

The DQN-based RM first perceives a particular traffic state and selects a traffic light color

at each decision interval. The traffic light leads the state transition. Then the feedback reward and

the transition of this state-action pair is store in the replay memory of the DQN agent’s mind. The

DQN agent will evaluate the reward, learn and update the policy with a batch of sample memories

with a given learning value and learning rate. The crucial elements in the DQN-based RM are

designed as follows:

(1) State. The selected states should be able to help the agent perceive traffic situations in

the freeway segment. In previous QL-based RM research, states were discretized into several

intervals. However, discrete states are not able to accurately describe the dynamics of freeway

traffic. In our study, three traffic flow variables and one traffic light status were used to represent

the traffic state at a freeway weaving section. They were: the density at the immediate upstream

of the weaving area, the density at the weaving area, the density on the on-ramp, and the color of

the traffic light. The traffic flow variable values are kept with one decimal and the light color is

represented by 0 (green) and 1 (red).

(2) Action. In this study, the action set is set to be the traffic light color which includes two

choices, i.e. turning green or turning red. The control period should be set to ensure that the effect

(or reward) after the action taken can be perceived by the agent after the interval. In our study, we

tested two control periods for the DQN-based RM which were 15 s and 30 s. In other words, the

DQN-based RM was able to turn the traffic light to green or red after 15-s or 30-s of the previous

action.

(3) Reward. The objective of the DQN-based RM was to reduce the system total travel

time (TTT). The TTT over a time horizon K can be calculated by:

(9)

where N(k) is the total number of vehicles in the network at time k, and η is the time interval.

In the simulation, the number of vehicles entering the freeway system at each time step

from both upstream mainline and on-ramp was recorded. The number of vehicles leaving the

freeway system at downstream mainline and off-ramp was also recorded. Any control measures

that managed to increase the early exit flows of the freeway section would lead to a decrease in

the total travel time (1-3, 10, 24, 48). Thus, the reward function could be determined by the

discharge flow at the bottleneck, which is defined as:

(10)

where R(s) was the reward for state s, and q(t) was the bottleneck discharge flow within

interval t, which can be obtained by simply counting the number of vehicles that passed through

the bottleneck.

()

TTT N k

=å

() ()Rs qt=

Yang, Li, Ke and Li 8

(4) Learning Parameters. The DQN has many learning parameters and the selection of

their values greatly affects the performance. The learning rate decides the learning speed of the

agent at each time step. If the learning rate is too large, the gradient descent might be too fast and

the DQN may not find the optimal policy; if the learning rate is too small, the computation cost

and training time were too long and the algorithm may be hard to converge. In previous studies,

the learning rate was typically set to be a small constant value to ensure that the optimal policy can

be found.

FIGURE 2 Flowchart of the DQN-based RM control strategy.

Another important consideration is to make a balance between exploitation and exploration

when selecting actions. The DQN should fully learn the information that has been presented in the

Q-values. Using pure exploitation may greatly save the learning time, but may also prohibit the

discovery of better actions and lead to local optimization. On the other hand, pure exploration

enhances the capability of discovering new and better actions. However, it may result in a random

action selection without making use of the existing learning results and, accordingly, is quite time

consuming. In our study, the exploitation and exploration were balanced by gradually decreasing

the exploration rate from 0.9 to 0.1 during the learning process. The DQN starts with a large

exploration rate. With the knowledge of the agent getting mature, the exploration rate decreases at

the same time.

Traditional RM Control Strategies

(1) Fixed-time Control Strategies. Traffic light changes its color according to the pre-designed

cycle length and green/red phases. For the fixed-time RM control strategy, the ramp flow can be

calculated by:

(11)

where λr is the number of lanes on ramp, G is the green time in the signal cycle, and c is

the signal cycle. This strategy is able to maintain a stable metering rate for simple scenarios.

(2) Feedback-based Control Strategies. The most representative feedback-based RM strategy,

i.e. ALINEA, is considered in this study for comparison.

DQN-based ramp metering control strategy

Simulation start DQN agent Policy update Optimal policy

State

Action

Reward

Traffic light

Detectors

Update

Input

OutputImplement

Traffic state

Discharge

flow

Simulation network DQN agentInteraction

Kweave

Kup

Kramp

1800 /

rGc

=× ×

Yang, Li, Ke and Li 9

(12)

where r(k) is the metering rate at decision interval k, r(k-1) is the metering rate at decision

interval k-1, KR>0 is a regulator parameter, ô is the desired occupancy at the bottleneck which is

usually set as the critical occupancy, oout(k) is the real time measurement of the occupancy at

decision interval k. The ramp metering rate is set to be proportional to the difference between the

expected and measured downstream occupancy.

DEVELOPMENT OF SIMULATION PLATFORM

SUMO Simulation Platform

An open-source traffic simulation software, the Simulation of Urban MObility (SUMO), is used

as the simulation platform. Compared to macroscopic simulations such as Cell Transmission

Model, SUMO can capture fine vehicle movements and generate traffic flow data with higher

granularity. Compared to other microscopic simulations, SUMO provides numerous car-following

and lane-changing models that can meet distinct needs for the different road segments, study

purposes. More importantly, SUMO gives users higher privileges to do further and deeper

developments. It also enables communications and interactions with other software, such as

importing python packages.

SUMO includes two main modules: Netedit module and Sumoconfig module. The Netedit

module enables users to define the network information, the traffic dynamic information, and flow

demand. Sumoconfig module enables users to define simulation information, such as simulation

period, waiting time, etc. SUMO also provides an Application Programing Interface (API), called

TraCI. Users are allowed to retrieve the real-time detector data, or change the state of the network

elements (detectors, traffic control etc.) by calling the TraCI.

In our study, an interactive simulation-control system was established, as shown in Figure

3. The network information was edited and stored within the Netedit module and was incorporated

into the Sumoconfig module. The DQN agent was defined in the Python Intergrated Development

Environment (IDE). At the beginning of the learning process, Python IDE calls TraCI API to start

the simulation. The real-time simulation data can be obtained at each decision interval. Data was

then sent back to Python IDE to feed the DQN agent to generate real-time RM control strategy.

The state of the control changes according to the given RM strategy which is also delivered by

Python IDE. The above steps formulate a loop. By following this loop, the freeway network can

be continuously simulated to help the DQN agent iterate to optimal control strategy.

() ( 1) ()

R out

rk rk K o o k

éù

=-+-

êú

ëû

Yang, Li, Ke and Li 10

FIGURE 3 SUMO simulation platform structure and interaction.

Experiment Design

A freeway weaving section is developed in the simulation model (see Figure 2). The upstream and

downstream sections of the mainline compose of 3 lanes, and the weaving area is 250 meter long

containing 4 lanes. The speed limit of all lanes is 33m/s (approximately 75 mph). To simulate real-

world traffic flow features near the bottleneck area, the traffic demand and the parameters in the

SUMO were carefully calibrated based on the empirical loop detector data obtained from the

Freeway Performance Measurement System (PeMS) (50).

The duration of simulation was set to be 4 hours with a 30-min warm-up period. Traffic

demand on the mainline and on-ramp started with 2200 veh/h and 400 veh/h in the warm-up hour.

The peak hour lasted for two hours with 3000 veh/h on the mainline and 1000 veh/h on the on-

ramp. Then the demand dropped backed to 2200 veh/h and 400 veh/h and lasted for another hour.

The capacity of the freeway mainline before capacity drop was found to be 3300 veh/h. The

magnitude of capacity drop was found to be 15.2%.

TRAINING OF DQN-BASED RM CONTROL STRATEGY

Experience replay and freeze interval are the most distinct features between the DQN and

the traditional QL algorithms. According to previous studies (46, 48), the replay memory size

should correspond to the specific scenarios of experiments and an appropriate freeze interval can

mitigate the correlation between the experiences and stabilize the learning process. In our study,

the parameters are carefully determined according to preliminary tests and suggestions from

previous studies (36-37, 40-43, 49), as shown in Table 1.

The simulation was activated by calling the TraCI API. The maximum epoch for the

training process is pre-defined. The DQN-based RM starts to perceive the traffic state at the

beginning of every decision interval, chooses an action (green or red light) and generates a

corresponding RM control strategy. The transition of this state action pair is store in the replay

memory of the DQN agent’s mind. At the end of the decision interval, the DQN agent would

Demand

Network

Traffic Control

Car Following

Detectors

Netedit

Sumoconfig

Raw elements Raw data

Time

Output

Flow

Occupancy

Speed

Python IDE

Data aggregation

Data collection Traffic analysis

Data storation

Simulation

On ramp controller

Fixed-time Feedback DQN agent

Bottleneck

Capacity drop

TraCI

Yang, Li, Ke and Li 11

perceive new state and a corresponding reward to the previous action. The DQN agent learns and

updates the policy with a batch of sample memories with the given learning value and learning

rate. During the simulation, the total travel time was computed at each step and the bottleneck

discharge flow was computed every 5 min.

TABLE 1 Learning Parameters for DQN-based RM Control Strategy

Parameter

Value

Optimizer

RMSProp

Replay memory size

10,000

Experience sampling

0.6

Learning rate

0.00025

Batch size

Exploration rate

From 0.9 to 0.1

Discount factor

0.99

Freeze interval

2,000

State matrix size

State matrix frame

Action size

SIMULATION RESULTS

Results of DQN-based RM Control Strategy

For the DQN-based RM with the 15 s control circle, the simulation with the smallest total travel

time was reached at the 160th training epoch; for DQN-based RM with 30 s control circle, the

simulation with the smallest total travel time was reached at the 204th training epoch. The time for

the agent to reach the optimal epoch varies, since the decision interval affects the learning time in

each epoch. The DQN-based RM was capable to quickly find an optimal solution when facing the

control scenarios. We also considered the situation in which none of the control strategies was

used.

The speed, occupancy, total travel time and bottleneck discharge flow were collected and

calculated. There was no difference between various strategies in the initial 30 min non-peak hour

period after the warm-up period. Thus, Fig. 4 (a) and (b) illustrated the speed and occupancy

profiles respectively starting from the peak hour at the bottleneck under different control strategies.

In the no control scenario, the congestion occurs and lasts until the end of the simulation. With the

DQN-based RM strategies, the speed and occupancy profiles presented similar trends, such as

maintaining larger speed and lower occupancy during peak hours. Table 2 listed the mean for the

speed and occupancy. Compared to the no control scenario, both DQN-based RM strategies

obviously increased the mean speed from 5.87 m/s to 12.95 and 13.82 m/s, and reduced the mean

occupancy from 35.60 to 20.12 and 18.51, respectively. The total travel time was reduced from

3306.26 veh·h to 1604.09 veh·h and 1633.94 veh·h with the DQN-based RM strategies, as

compared to the no control case, indicating a 51.48% and 50.58% reduction respectively. The

bottleneck discharge flow increase from 3185 veh/h to 3483 veh/h and 3462 veh/h with the two

DQN-based RM strategies, indicating a 9.39% and 8.69% increment in the bottleneck discharge

flow. The results suggest that the proposed DQN-based RM control strategy reduced significantly

the total travel time and increased bottleneck discharge flow at freeway weaving bottlenecks.

Yang, Li, Ke and Li 12

(a)

(b)

FIGURE 4 (a) Occupancy profiles (%); (b) speed profiles (m/s)

TABLE 2 Effects of DQN-based RM Control Strategies

Control strategy

Mean speed (m/s)

Mean occupancy (%)

No control

5.87

35.60

DQN-based RM (cycle=15 s)

12.95

20.12

DQN-based RM (cycle=30 s)

13.82

18.51

The different between the two DQN-based RM strategies could be attributed to the

difference in control cycles which affects the responding speeds. After each action was

implemented, the ramp vehicles need a period to merge into the mainline and then cause impacts

the traffic flow. Compared to the DQN-based RM with 30 s control cycle, DQN-based RM with

15 s control cycle can make a change of control action every 15 s which ensures a faster responding

speed to the change of traffic state. As shown in Table 2, the mean occupancy was maintained

around 20% which is close to the occupancy that triggers the capacity drop (which is

approximately 23% in our simulation). Note that if the control cycle is too short, for example 5 s,

it can mislead the agent’s learning ability and control performance because the impact of the

control on traffic flow may not have appeared completely.

Comparison between Different Control Strategies

The performances of different RM control strategies were compared. The fixed-time RM control

strategies used 5 s (with 2 s green phase and 3 s red phase) and 7 s (with 2 s green phase and 5 s

red phase) as the control cycle. Two ALINEA strategies were used with the parameters defined

according to the traffic flow features in the bottleneck section (18): desired occupancy = 22% or

23%, traffic control circle = 80 s and KR = 70 veh/h. The QL was also trained for the RM control

based on the same state and action sets as the DQN. The experiments show that the QL was not

able to convergent, or in other words, to find the optimal control strategy in the training process.

The main reason for the failure is that QL has limited ability of dealing with large space sets and

complex control tasks.

Yang, Li, Ke and Li 13

The total travel time and the bottleneck discharge flow in different scenarios were

compared in Table 3 and Figure 5. The results show that the two fixed-time RM control strategies

performs the worst, only reducing 23.33% and 32.75% in total travel time and increased 0.29%

and 2.48% bottleneck discharge flow respectively; two ALINEA strategies enhance the freeway

system moderately, resulting in 41.77% and 38.45% reduction in total travel time and 7.79% and

7.21% increment in bottleneck discharge flow. All RM control strategies could improve the

freeway efficiency to some extent, but two proposed DQN-based RM strategies outperformed the

others.

The causes for the differences between RM control strategies are discussed. The fixed-time

RM control strategy cannot adjust to different traffic conditions. Though the effects can be slightly

improved by setting proper control cycle parameters, such strategies performed the worst among

all strategies. The control parameters in the ALINEA strategies highly rely on human knowledge

about the capacity drop and breakdown probabilities at bottlenecks, as different expected

occupancies lead to different control effects. In addition, the feedback nature decides the strategies

are slow in responding to traffic changes and make proper actions, which is another major cause

for the reduced effects as compared to the DQN-based RM control strategies.

TABLE 3 Comparison Result for Different Control Strategies

Control strategy

Total Travel

time (veh∙h)

Improvement

(%)

Discharge flow

(veh/h)

Improvement

(%)

No control

3306.26

3185

Fixed-time 5-s

2534.96

23.33

3194

0.29

Fixed-time 7-s

2223.41

32.75

3264

2.48

ALINEA (occ=22)

1925.37

41.77

3433

7.79

ALINEA (occ=23)

2034.90

38.45

3415

7.21

DQN-based RM (cycle=15 s)

1604.09

51.48

3483

9.39

DQN-based RM (cycle=30 s)

1633.94

50.58

3462

8.69

(a) (b)

Yang, Li, Ke and Li 14

FIGURE 5 (a),(c) Comparison of bottleneck discharge flow with different RM control

strategies; (b),(d) Comparison of total travel time with different RM control strategies. A:

No control; B: Pre-timed 3-s; C: Pre-timed 5-s; D: ALINEA (occ=22); E: ALINEA (occ=23);

F: DQN-based RM (cycle=15 s); G: DQN-based RM (cycle=30 s)

CONCLUSIONS AND DISCUSSION

This study proposed a DQN-based RM control strategy that aimed at reducing total travel time at

freeway weaving bottlenecks. A novel framework which incorporated an open-source microscopic

simulation software, SUMO, and the deep reinforcement learning agent was proposed to

automatically obtain the optimal control action. A training procedure was proposed for the DQN-

based RM in an off-line scheme which could obtain the optimal control strategy within a short

training period. Then the trained agent can be applied for real-time RL control tasks. The results

showed that, the well-trained DQN-based RM has the capability of predicting traffic state

transitions and acts in a proactive control scheme. In our simulation tests, improved performances

in reducing total travel time and increasing bottleneck discharge flow were observed as compared

to traditional RM control strategies. In addition, compared to the QL-based RM control, the DQN-

based RM used continuous and larger states as inputs which can perceive traffic state and take

control action more precisely.

The major contribution of our proposed framework is that the DQN-based RM directly

bridged up the correlation between traffic parameters at the freeway bottlenecks and the RM

actions, releasing human prior knowledge of traffic flow theories (such as oscillation pattern,

congestion incentive, and capacity drop mechanism). Only the state variables, actions, reward

functions and DNN related parameters should be given. No other traffic-related parameters need

to be calibrated in the proposed strategy. The proposed framework can be further extended and

transferred to other control scenarios such as coordinated ramp metering, variable speed limits,

and signal timing.

Though the training process is considered reasonably fast, the DQN-based RM still requires

a number of state variables and sufficient training time to obtain the optimal control strategy before

it can be applied. For large freeway networks or complex scenarios, the DQN agent may be

difficult to converge, indicating the control performance is not optimized. A balance should be

carefully decided between the computing efficiency and learning effectiveness. In addition, the

DQN agent with DNN performs like a black box which might be hard to explain the inherent

mechanism. This fact may reduce and the easiness and practicability of transferring to other case

studies. A continuous learning procedure in the reinforcement learning algorithm is recommended

Discharge Flow (veh/h)

/通用格式

System Travel Time (veh ∙ h)

/通用格式

Yang, Li, Ke and Li 15

to enhance its robustness and transferability of control performances when applied to new

environment (51).

In the present study, a typical freeway weaving section was simulated with the proposed

DQN-based RM control strategy. Other types of bottlenecks caused by lane reduction, traffic

incident or work zone also need to be investigated. Besides, this study used bottleneck discharge

flow as the reward function in the DQN method, and successful reduction of travel time and

increment in bottleneck discharge flow were observed. Future study could consider other reward

functions, such as social equality for ramp and mainline vehicles, for system optimization.

Furthermore, for a complex freeway segment with two or more on ramp, the coordination of

multiple reinforcement learning agents with RM control strategies and variable speed limit control

strategies can be evaluated to improve the overall performance in reducing freeway traffic

congestions.

ACKNOWLEDGEMENTS

This research was jointly sponsored by the National Natural Science Foundation of China

(51508094), the National Key Research and Development Program of China: Key Projects of

International Scientific and Technological Innovation Cooperation Between Governments

(2016YFE0108000), and the Fundamental Research Funds for the Central Universities

(2242017K40130).

AUTHOR CONTRIBUTION STATEMENT

The authors confirm contribution to the paper as follows: study conception and design: Mofeng

Yang, Zhibin Li, Zemian Ke; simulation: Mofeng Yang, Meng Li; analysis and interpretation of

results: Mofeng Yang, Zhibin Li; draft manuscript preparation: Mofeng Yang, Zhibin Li. All

author reviewed the results and approved the final version of the manuscript.

REFERENCES

1. Cassidy, M. J., & Rudjanakanoknad, J.. Increasing the Capacity of an Isolated Merge by

Metering its On-ramp. Transportation Research Part B: Methodological, 2005. 39(10):

896-913.

2. Chung, K., Rudjanakanoknad, J., and Cassidy, M. J. 2007. Relation between Traffic

Density and Capacity Drop at Three Freeway Bottlenecks. Transportation Research Part

B: Methodological, 2007. 41(1): 82-95.

3. Zhang, L., and Levinson, D.. Ramp Metering and Freeway Bottleneck Capacity.

Transportation Research Part A: Policy and Practice, 2011. 44(4): 218-235.

4. Oh, S., and Yeo, H.. Microscopic Analysis on the Causal Factors of Capacity Drop in

Highway Merging Sections. Presented at 91rd Annual Meeting of the Transportation

Research Board, Washington, D.C., 2012.

5. Transportation Research Board. Highway Capacity Manual. Technical report, Washington,

D.C., 2000.

6. Transportation Research Board. Highway Capacity Manual. Technical report, Washington,

D.C., 2010.

Yang, Li, Ke and Li 16

7. Paul Ryus, Mark Vandehey, Lily Elefteriadou, Richard G Dowling, and Barbara K Ostrom.

New TRB Publication: Highway Capacity Manual 2010. TR News, 2011. (273).

8. Yang, H., and Rakha, H.. Reinforcement Learning Ramp Metering Control for Weaving

Sections in a Connected Vehicle Environment. Presented at 96th Annual Meeting of the

Transportation Research Board, Washington, D.C., 2017.

9. Shaaban, K., Khan, M. A., and Hamila, R.. Literature Review of Advancements in

Adaptive Ramp Metering. Procedia Computer Science, 2016. 83: 203-211.

10. Papageorgiou, M., and Kotsialos, A.. Freeway ramp metering: An overview. IEEE

Transactions on Intelligent Transportation Systems, 2002. 3(4): 271-281.

11. Papageorgiou, M., and Papamichail, I.. Overview of Traffic Signal Operation Policies for

Ramp Metering. Transportation Research Record: Journal of the Transportation Research

Board, 2008. 2047: 28-36.

12. Papageorgiou, M., Hadj-Salem, H., and Blosseville, J. M.. ALINEA: A Local Feedback

Control Law for On-ramp Metering. Transportation Research Record: Journal of the

Transportation Research Board, 1991. 1320: 58-67.

13. Zhang, M., Kim, T., Nie, X., Jin, W., Chu, L., and Recker, W.. California Partners for

Advanced Transit and Highways (PATH), 2001.

14. Smaragdis, E., Papageorgiou, M., and Kosmatopoulos, E.. A Flow-maximizing Adaptive

Local Ramp Metering Strategy. Transportation Research Part B: Methodological, 2004.

38(3): 251-270.

15. Kotsialos, A., Papageorgiou, M., Hayden, J., Higginson, R., McCabe, K., and Rayman, N..

Discrete Release Rate Impact on Ramp Metering Performance. IEE Proceedings-

Intelligent Transport Systems, 2006. 153(1): 85-96.

16. Smaragdis, E., & Papageorgiou, M.. Series of New Local Ramp Metering Strategies:

Emmanouil smaragdis and markos papageorgiou. Transportation Research Record:

Journal of the Transportation Research Board, 2003. 1856: 74-86.

17. Demiral, C., & Celikoglu, H. B.. Application of ALINEA Ramp Control Algorithm to

Freeway Traffic Flow on Approaches to Bosphorus Strait Crossing Bridges. Procedia-

Social and Behavioral Sciences, 2011. 20: 364-371

18. Papageorgiou, M., Hadj-Salem, H., Middelham, F.. “ALINEA Local Ramp Metering:

Summary of Field Results”, Transportation Research Record: Journal of the

Transportation Research Board, 1997. 1603: 90-98

19. Bellemans, T., B.D. Schutter, and B.D. Moor.. Model Predictive Control for Ramp

Metering of Motorway Traffic: A Case Study. Control Engineering Practice, 2006. 4(7):

757–767.

20. Hegyi, A., De Schutter, B., and Hellendoorn, H.. Model Predictive Control for Optimal

Coordination of Ramp Metering and Variable Speed Limits. Transportation Research Part

C: Emerging Technologies, 2005. 13(3): 185-209.

21. Papamichail, I., Kotsialos, A., Margonis, I., and Papageorgiou, M. Coordinated Ramp

Metering for Freeway Networks–A Model-predictive Hierarchical Control Approach.

Transportation Research Part C: Emerging Technologies, 2010. 18(3): 311-331.

22. Zegeye, S. K., De Schutter, B., Hellendoorn, J., Breunesse, E. A., and Hegyi, A.. Integrated

Macroscopic Traffic Flow, Emission, and Fuel Consumption Model for Control Purposes.

Transportation Research Part C: Emerging Technologies, 2013. 31: 158-171.

23. Cassidy, M. J., & Windover, J. R.. Methodology for Assessing Dynamics of Freeway

Traffic Flow. Transportation Research Record: Journal of the Transportation Research

Board, 1995. 1484: 73-79.

Yang, Li, Ke and Li 17

24. Cassidy, M. Freeway On-ramp Metering, Delay Savings, and Diverge Tottleneck.

Transportation Research Record: Journal of the Transportation Research Board, 2003.

1856: 1-5.

25. Kim, K., and Cassidy, M. J.. A Capacity-increasing Mechanism in Freeway Traffic.

Transportation Research Part B: Methodological, 2012. 46(9): 1260-1272.

26. Watkins, C., and Dayan, P.. Q-Learning. Machine Learning, 1992. 8(3): 279-292.

27. Sutton, R.S., and Barto, A. G.. Reinforcement Learning: An Introduction. Cambridge, MA:

MIT Press, 1998.

28. Barto, A. G., and Mahadevan, S.. Recent Advances in Hierarchical Reinforcement

Learning. Discrete Event Dynamic Systems, 2003. 13(1-2): 41-77.

29. Mahadevan, S.. Average Reward Reinforcement Learning: Foundations, Algorithms, and

Empirical Results. Machine Learning, 1996. 22(1-3): 159-195.

30. Wang, X., Liu, B., Niu, X., & Miyagi, T. 2009. Reinforcement Learning Control for On-

ramp Metering Based on Traffic Simulation. ICCTP 2009: Critical Issues In

Transportation Systems Planning, Development, and Management, 2009. 1-7.

31. Veljanovska, K., M Bombol, K., & Maher, T. 2010. Reinforcement Learning Technique in

Multiple Motorway Access Control Strategy Design. PROMET-Traffic&Transportation,

2010. 22(2): 117-123.

32. Davarynejad, M., Hegyi, A., Vrancken, J., and van den Berg, J. Motorway Ramp-metering

Control with Queuing Consideration Using Q-learning. Intelligent Transportation Systems

(ITSC), 2011 14th International IEEE Conference on. IEEE, 2012: 1652-1658.

33. Wang, X. J., Xi, X. M., and Gao, G. F.. Reinforcement Learning Ramp Metering without

Complete Information. Journal of Control Science and Engineering, 2012. 2.

34. Rezaee, K., Abdulhai, B., and Abdelgawad, H. 2012. Application of Reinforcement

Learning with Continuous State Space to Ramp Metering in Real-world Conditions.

Intelligent Transportation Systems (ITSC), 2012 15th International IEEE Conference on.

IEEE, 2012. 1590-1595

35. Rezaee, K. 2014. Decentralized Coordinated Optimal Ramp Metering Using Multi-agent

Reinforcement Learning. Doctoral dissertation, University of Toronto (Canada), 2014.

36. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and

Riedmiller, M.. Playing Atari with Deep Reinforcement Learning. arXiv preprint, 2013.

arXiv:1312.5602.

37. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... and

Petersen, S.. Human-level Control through Deep Reinforcement Learning. Nature, 2015.

518(7540): 529.

38. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... &

Dieleman, S. Mastering the Game of Go with Deep Neural Networks and Tree Search.

Nature, 2016. 529(7587): 484-489.

39. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ... and

Chen, Y. Mastering the Game of Go without Human Knowledge. Nature, 2017. 550(7676):

354.

40. Li, L., Lv, Y., and Wang, F. Y.. Traffic Signal Timing via Deep Reinforcement Learning.

IEEE/CAA Journal of Automatica Sinica, 2016. 3(3): 247-254.

41. Van der Pol, E., and Oliehoek, F. A.. Coordinated Deep Reinforcement Learners for Traffic

Light Control. Proceedings of Learning, Inference and Control of Multi-Agent Systems at

NIPS, 2016.

42. Van der Pol, E.. Deep Reinforcement Learning for Coordination in Traffic Light Control.

Master’s thesis, University of Amsterdam, 2016.

Yang, Li, Ke and Li 18

43. Mousavi, S. S., Schukat, M., and Howley, E.. Traffic Light Control Using Deep Policy-

Gradient and Value-function-based Reinforcement Learning. IET Intelligent Transport

Systems, 2017. 11(7): 417-423.

44. Belletti, F., Haziza, D., Gomes, G., & Bayen, A. M.. Expert level control of ramp metering

based on multi-task deep reinforcement learning. IEEE Transactions on Intelligent

Transportation Systems, 2017.

45. Van Hasselt, H., Guez, A., & Silver, D.. Deep Reinforcement Learning with Double Q-

Learning. AAAI, 2016. 16: 2094-2100.

46. Schaul, T., Quan, J., Antonoglou, I., and Silver, D.. Prioritized Experience Replay. arXiv

preprint, 2015. arXiv:1511.05952.

47. Mahmood, A Rupam, van Hasselt, Hado P, and Sutton, Richard S. 2014. Weighted

Importance Sampling for Off-policy Learning with Linear Function Approximation.

Advances in Neural Information Processing Systems, 2014. 3014–3022.

48. Daganzo, C. F.. Fundamentals of transportation and traffic operations. Pergamon Press,

Oxford, 1997.

49. Lu, C., Huang, J., and Gong, J.. Reinforcement Learning for Ramp Control: An Analysis

of Learning Parameters. PROMET-Traffic&Transportation, 2016. 28(4): 371-381.

50. Chen, C., Petty, K., Skabardonis, A., Varaiya, P., and Jia, Z.. Freeway Performance

Measurement System: Mining Loop Detector Data. Transportation Research Record:

Journal of the Transportation Research Board, 2001. 1748: 96-102.

51. Li, Z., Liu, P., Xu, C., Duan, H., & Wang, W.. Reinforcement Learning-based Variable

Speed Limit Control Strategy to Reduce Traffic Congestion at Freeway Recurrent

Bottlenecks. IEEE Transactions on Intelligent Transportation Systems, 2017. 18(11):

3204-3217.

Prediction of Freeway Traffic Breakdown Using Artificial Neural Networks

Article

Full-text available

Jun 2023

Traffic breakdown is the transition of traffic flow from an uncongested state to a congested state. During peak hours, when a large number of on-ramp vehicles merge with mainline traffic, it can cause a significant drop in speed and subsequently lead to traffic breakdown. Therefore, ramp meters have been used to regulate the traffic flow from the ramps to maintain stable traffic flow on the mainline. However, existing traffic breakdown prediction models do not consider on-ramp traffic flow. In this paper, an algorithm based on artificial neural networks (ANN) is developed to predict the probability of a traffic breakdown occurrence on freeway segments with merging traffic, considering temporal and spatial correlations of the traffic conditions from the location of interest, the ramp, and the upstream and downstream segments. The feature selection analysis reveals that the traffic condition of the ramps has a significant impact on the occurrence of traffic breakdown on the mainline. Therefore, the traffic flow characteristics of the on-ramp, along with other significant features, are used to build the ANN model. The proposed ANN algorithm can predict the occurrence of traffic breakdowns on freeway segments with merging traffic with an accuracy of 96%. Furthermore, the model has been deployed at a different location, which yields a predictive accuracy of 97%. In traffic operations, the high probability of the occurrence of a traffic breakdown can be used as a trigger for the ramp meters.

Bridge Inspection Strategy Analysis through Human-Drone Interaction Games

Conference Paper

Jan 2024

Bridge inspections are characterized by their labor-intensive nature and inherent risks, relying predominantly on engineers' visual analysis. Although the integration of drones has alleviated the safety concerns associated with human labor, the accurate identification of defects in vital elements continues to necessitate inspectors' specialized knowledge. Aggregating multi-inspector experiences can improve the localization of critical defects. The challenge lies in capturing and explaining drone trajectories into reusable and explainable strategies. This paper presents a framework to capture inspectors' strategies by analyzing drone control in bridge inspection simulations. It gathers and scrutinizes inspectors' drone control histories to understand their intentions. Due to the vast search space of inspection strategies in dynamic, uncertain contexts, imitation and reinforcement learning are utilized to learn reusability and explainability. Experiments demonstrate that drone trajectories aligned with bridge elements can explain inspection knowledge. Inspectors with explainable patterns, such as the human attention between the different spans inside the span, achieve better defect detection performance (correlation coefficient of 0.5). This framework promotes inspector-drone collaboration that adaptively supports human inspectors, resulting in more reliable inspections.

Coordinated Variable Speed Limit Control for Consecutive Bottlenecks on Freeways Using Multiagent Reinforcement Learning

Article

Full-text available

Jun 2023
J ADV TRANSPORT

Most of the current variable speed limit (VSL) strategies are designed to alleviate congestion in relatively short freeway segments with a single bottleneck. However, in reality, consecutive bottlenecks can occur simultaneously due to the merging flow from multiple ramps. In such situations, the existing strategies use multiple VSL controllers that operate independently, without considering the traffic flow interactions and speed limit differences. In this research, we introduced a multiagent reinforcement learning-based VSL (MARL-VSL) approach to enhance collaboration among VSL controllers. The MARL-VSL approach employed a centralized training with decentralized execution structure to achieve a joint optimal solution for a series of VSL controllers. The consecutive bottleneck scenarios were simulated in the modified cell transmission model to validate the effectiveness of the proposed strategy. An independent single-agent reinforcement learning-based VSL (ISARL-VSL) and a feedback-based VSL (feedback-VSL) were also applied for comparison. Time-varying heterogeneous traffic flow stemming from the mainline and ramps was loaded into the freeway network. The results demonstrated that the proposed MARL-VSL achieved superior performance compared to the baseline methods. The proposed approach reduced the total time spent by the vehicles by 18.01% and 17.07% in static and dynamic traffic scenarios, respectively. The control actions of the MARL-VSL were more appropriate in maintaining a smooth freeway traffic flow due to its superior collaboration performance. More specifically, the MARL-VSL significantly improved the average driving speed and speed homogeneity across the entire freeway.

Automated Traffic State Optimization in the Weaving Area of Urban Expressways by a Reinforcement Learning-Based Cooperative Method of Channelization and Ramp Metering

Article

Full-text available

Apr 2023
J ADV TRANSPORT

The traffic congestion problem on urban expressways, especially in the weaving areas, has become severe. Some cooperative methods have been proven to be more effective than a separate approach in optimizing the traffic state in weaving areas on urban expressways. However, a cooperative method that combines channelization with ramp metering has not been presented and its effectiveness has not been examined yet. Thus, to fill this research gap, this study proposes a reinforcement learning-based cooperative method of channelization and ramp metering to achieve automated traffic state optimization in the weaving area. This study uses an unmanned aerial vehicle to collect the real traffic flow data, and four control strategies (i.e., two kinds of channelization methods, a ramp metering method, and a cooperative method of channelization and ramp metering) and a baseline (without controls) are designed in the simulation platform (Simulation of Urban Mobility). The speed distributions of different control strategies on each lane were obtained and analyzed in this study. The results show that the cooperative method of channelization and ramp metering is superior to other methods, with significantly higher increases in vehicle speeds. This cooperative method can increase the average vehicle speeds in lane-1, lane-2, and lane-3 by 14.51%, 14.81%, and 37.03%, respectively. Findings in this study can contribute to the improvement of traffic efficiency and safety in the weaving area of urban expressways.

Analyzing the Safety Impacts of Variable Speed Limit Control on Aggregated Driving Behavior Based on Traffic Big Data

Article

Full-text available

Mar 2021
J ADV TRANSPORT

Variable speed limit (VSL) control dynamically adjusts the displayed speed limit to harmonize traffic speed, prevent congestions, and reduce crash risks based on prevailing traffic stream and weather conditions. Previous research studies examine the impacts of VSL control on reducing corridor-level crash risks and improving bottleneck throughput. However, less attention focuses on utilizing real-world data to see how compliant the drivers are under different VSL values and how the aggregated driving behavior changes. This study aims to fill the gap. With the high-resolution lane-by-lane traffic big data collected from a European motorway, this study performs statistical analysis to measure the difference in driving behavior under different VSL values and analyze the safety impacts of VSL controls on aggregate driving behaviors (mean speed, average speed difference, and the percentage of small space headway). The data analytics show that VSL control can effectively decrease the mean speed, the speed difference, and the percentage of small space headways. The safety impacts of VSL control on aggregated driving behavior are also discussed. The aggregated driving behavior variables follow a trend of first decreasing and then increasing with the continuous decrease in VSL values, indicating that potential traffic safety benefits can be achieved by adopting suitable VSL values that match with prevailing traffic conditions.

Machine learning and fuzzy cognitive maps in a hybrid approach toward freeway on-ramp traffic control

Conference Paper

Full-text available

May 2023

The infrequent emergence of traffic congestion on freeways can result in the decline of the transportation system over time. Without the implementation of appropriate countermeasures, congestion can escalate, leading to unfavorable impacts on other aspects of the traffic network. As a result, there is a greater need for reliable and optimal traffic control. The goal of this research is to manage the number of vehicles entering the main freeway from the ramp merging area, in order to balance the demand and capacity to satisfy the maximum utilization of the freeway capacity. Despite extensive research into different ramp metering techniques, this study aims to utilize the fuzzy cognitive map as a macroscopic traffic flow model in conjunction with the Q-learning algorithm. This combination prevents freeway congestion and maintains optimal performance by keeping freeway density below a key threshold. The inherent uncertainty of traffic conditions is addressed through the application of reinforcement learning, which is constructed on the principles of the Markov decision process. This approach represents an exploration-exploitation trade-off, as implemented through the Q-learning algorithm. The proposed technique was evaluated for its efficacy in the regulation of freeway ramp metering in both controlled and uncontrolled simulations. The findings demonstrate a significant improvement in the control of the mainstream traffic flow.

FMS-dispatch: a fast maximum stability dispatch policy for shared autonomous vehicles including exiting passengers under stochastic travel demand

Article

May 2023

Shared autonomous vehicles (SAVs) are a fleet of autonomous taxis that provide point-to-point transportation services for travellers, and have the potential to reshape the nature of the transportation market in terms of operational costs, environmental outcomes, increased tolling efficiency, etc. However, the number of waiting passengers could become arbitrarily large when the fleet size is too small for travel demand, which could cause an unstable network. An unstable network will make passengers impatient and some people will choose some other alternative travel modes, such as metro or bus. To achieve stable and reliable SAV services, this study designs a dynamic queueing model for waiting passengers and provides a fast maximum stability dispatch policy for SAVs when the average number of waiting for passengers is bounded in expectation, which is analytically proven by the Lyapunov drift techniques. After that, we expand the stability proof to a more realistic scenario accounting for the existence of exiting passengers. Unlike previous work, this study considers exiting passengers in stability analyses for the first time. Moreover, the maximum stability of the network doesn't require a planning horizon based on the proposed dispatch policy. The simulation results show that the proposed dispatch policy can ensure the waiting queues and the number of exiting passengers remain bound in several experimental settings.

Deep Koopman Traffic Modeling for Freeway Ramp Metering

Article

Jun 2023

Ramp metering has been considered as one of the most effective approaches of dealing with the traffic congestion on the freeways. The modelling of the freeway traffic flow dynamics is challenging because of its non-linearity and uncertainty. Recently, Koopman operator, which transfers a non-linear system to a linear system in an infinite-dimensional space, has been studied for modelling complex dynamics. In this paper, we propose a data-driven modelling approach based on neural networks, denoted by deep Koopman model, to learn a finite-dimensional approximation of the Koopman operator. To consider the sequential relations of the ramps and main roads on the freeway, a long short-term memory network is applied. Furthermore, a model predictive controller with the trained deep Koopman model is proposed for the real-time control of the ramp metering on the freeway. To validate the performance of the proposed approach, experiments based on the simulation in the traffic simulation software Simulation of Urban MObility (SUMO) environment are conducted. The results demonstrate the effectiveness of the proposed approach on both the dynamics prediction and the real-time control of the ramp metering.

Active traffic management strategies for expressways based on crash risk prediction of moving vehicle groups

Article

Dec 2021
ACCIDENT ANAL PREV

Active traffic management (ATM) strategies are useful methods to reduce crash risk and improve safety on expressways. Although there are some studies on ATM strategies, few studies take the moving vehicle group as the object of analysis. Based on the crash risk prediction of moving vehicle groups in a connected vehicle (CV) environment, this study developed various ATM safety strategies, that is, variable speed limits (VSLs), ramp metering (RM), and coordinated VSL and RM (VSL-RM) strategies. VSLs were updated to minimize the crash risk of multiple moving vehicle groups in the next time interval, which is 1 min, and the updated speed limits were sent directly to the CVs in the moving vehicle group. The metering rate and RM opening time were determined using mainline occupancy, the crash risk of upcoming moving vehicle groups, and the predicted time at which moving vehicle groups arrived at the on-ramp. The VSL-RM strategy was used to simultaneously control and coordinate traffic flow on the mainline and ramps. These strategies were tested in a well-calibrated and validated micro-simulation network. The crash risk index and conflict count were utilized to evaluate the safety effects of these strategies. The results indicate that the ATM strategies improved the expressway safety benefits by 2.84–15.92%. The increase in CV penetration rate would promote the safety benefits of VSL and VSL-RM. Moreover, VSL-RM was superior to VSL and RM in reducing crash risk and conflict count.

Componentry Analysis of Intelligent Transportation Systems in Smart Cities towards a Connected Future

Conference Paper

Dec 2020

Mastering the game of Go without human knowledge

Article

Full-text available

Oct 2017
NATURE

A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo's own move selections and also the winner of AlphaGo's games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100-0 against the previously published, champion-defeating AlphaGo. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning

Article

Full-text available

Apr 2017

Recent advances in combining deep neural network architectures with reinforcement learning techniques have shown promising potential results in solving complex control problems with high dimensional state and action spaces. Inspired by these successes, in this paper, we build two kinds of reinforcement learning algorithms: deep policy-gradient and value-function based agents which can predict the best possible traffic signal for a traffic intersection. At each time step, these adaptive traffic light control agents receive a snapshot of the current state of a graphical traffic simulator and produce control signals. The policy-gradient based agent maps its observation directly to the control signal, however the value-function based agent first estimates values for all legal control signals. The agent then selects the optimal control action with the highest value. Our methods show promising results in a traffic network simulated in the SUMO traffic simulator, without suffering from instability issues during the training process.

Deep Reinforcement Learning for Coordination in Traffic Light Control (MSc thesis)

Thesis

Full-text available

Aug 2016

Elise van der Pol

The cost of traffic congestion in the EU is large, estimated to be 1% of the EU's GDP, and good solutions for traffic light control may reduce traffic congestion, saving time and money and reducing environmental pollution. To find optimal traffic light control policies, reinforcement learning uses reward signals from the environment to learn how to make optimal decisions. This approach can be deployed in traffic light control to learn optimal traffic light policies to reduce traffic congestion. However, earlier reinforcement learning approaches to traffic light control relied on simplifying assumptions over the state and manual feature extraction, so that potentially vital information about the state is lost. Techniques from the field of deep learning can be used in deep reinforcement learning to enable the use of more information over the state and to potentially find better traffic light policies. This thesis builds upon the Deep Q-learning algorithm and applies it to the problem of traffic light control. The contribution of this thesis is twofold: first, it extends earlier research on applying Deep Q-learning to the problem of controlling traffic lights on intersections with the goal of achieving optimal traffic throughput, and shows that, although Deep Q-learning can find very good policies for the traffic control problem without manual feature extraction, stability is not a guarantee. Second, it combines the Deep Q-learning algorithm with an existing multi-agent coordination algorithm to achieve cooperation between traffic lights and improves upon earlier work related to coordination for traffic light control. This thesis is the first work to combine transfer planning and deep reinforcement learning, an approach that is empirically shown to be promising.

Reinforcement Learning for Ramp Control: An Analysis of Learning Parameters

Article

Full-text available

Aug 2016

Reinforcement Learning (RL) has been proposed to deal with ramp control problems under dynamic traffic conditions; however, there is a lack of sufficient research on the behaviour and impacts of different learning parameters. This paper describes a ramp control agent based on the RL mechanism and thoroughly analyzed the influence of three learning parameters; namely, learning rate, discount rate and action selection parameter on the algorithm performance. Two indices for the learning speed and convergence stability were used to measure the algorithm performance, based on which a series of simulation-based experiments were designed and conducted by using a macroscopic traffic flow model. Simulation results showed that, compared with the discount rate, the learning rate and action selection parameter made more remarkable impacts on the algorithm performance. Based on the analysis, some suggestions about how to select suitable parameter values that can achieve a superior performance were provided.

Prioritized Experience Replay

Conference Paper

Nov 2016

Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, this approach simply replays transitions at the same frequency that they were originally experienced, regardless of their significance. In this paper we develop a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently. We use prioritized experience replay in Deep Q-Networks (DQN), a reinforcement learning algorithm that achieved human-level performance across many Atari games. DQN with prioritized experience replay achieves a new state-of-the-art, outperforming DQN with uniform replay on 41 out of 49 games.

Reinforcement Learning-Based Variable Speed Limit Control Strategy to Reduce Traffic Congestion at Freeway Recurrent Bottlenecks

Article

Jun 2017

The primary objective of this paper was to incorporate the reinforcement learning technique in variable speed limit (VSL) control strategies to reduce system travel time at freeway bottlenecks. A Q-learning (QL)-based VSL control strategy was proposed. The controller included two components: a QL-based offline agent and an online VSL controller. The VSL controller was trained to learn the optimal speed limits for various traffic states to achieve a long-term goal of system optimization. The control effects of the VSL were evaluated using a modified cell transmission model for a freeway recurrent bottleneck. A new parameter was introduced in the cell transmission model to account for the overspeed of drivers in unsaturated traffic conditions. Two scenarios that considered both stable and fluctuating traffic demands were evaluated. The effects of the proposed strategy were compared with those of the feedback-based VSL strategy. The results showed that the proposed QL-based VSL strategy outperformed the feedback-based VSL strategy. More specifically, the proposed VSL control strategy reduced the system travel time by 49.34% in the stable demand scenario and 21.84% in the fluctuating demand scenario.

Freeway performance measurement system: Mining loop detector data

Article

Expert Level Control of Ramp Metering Based on Multi-Task Deep Reinforcement Learning

Article

Jan 2017

This article shows how the recent breakthroughs in Reinforcement Learning (RL) that have enabled robots to learn to play arcade video games, walk or assemble colored bricks, can be used to perform other tasks that are currently at the core of engineering cyberphysical systems. We present the first use of RL for the control of systems modeled by discretized non-linear Partial Differential Equations (PDEs) and devise a novel algorithm to use non-parametric control techniques for large multi-agent systems. We show how neural network based RL enables the control of discretized PDEs whose parameters are unknown, random, and time-varying. We introduce an algorithm of Mutual Weight Regularization (MWR) which alleviates the curse of dimensionality of multi-agent control schemes by sharing experience between agents while giving each agent the opportunity to specialize its action policy so as to tailor it to the local parameters of the part of the system it is located in.

Traffic signal timing via deep reinforcement learning

Article

Jun 2016

In this paper, we propose a set of algorithms to design signal timing plans via deep reinforcement learning. The core idea of this approach is to set up a deep neural network (DNN) to learn the Q-function of reinforcement learning from the sampled traffic state/control inputs and the corresponding traffic system performance output. Based on the obtained DNN, we can find the appropriate signal timing policies by implicitly modeling the control actions and the change of system states. We explain the possible benefits and implementation tricks of this new approach. The relationships between this new approach and some existing approaches are also carefully discussed.

Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results

Chapter

Jan 1996

Sridhar Mahadevan

This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dynamic programming methods to several (provably convergent) asynchronous algorithms from optimal control and learning automata. A general sensitive discount optimality metric called n-discount-optimality is introduced, and used to compare the various algorithms. The overview identifies a key similarity across several asynchronous algorithms that is crucial to their convergence, namely independent estimation of the average reward and the relative values. The overview also uncovers a surprising limitation shared by the different algorithms: while several algorithms can provably generate gain-optimal policies that maximize average reward, none of them can reliably filter these to produce bias-optimal (or T-optimal) policies that also maximize the finite reward to absorbing goal states. This paper also presents a detailed empirical study of R-learning, an average reward reinforcement learning method, using two empirical testbeds: a stochastic grid world domain and a simulated robot environment. A detailed sensitivity analysis of R-learning is carried out to test its dependence on learning rates and exploration levels. The results suggest that R-learning is quite sensitive to exploration strategies, and can fall into sub-optimal limit cycles. The performance of R-learning is also compared with that of Q-learning, the best studied discounted RL method. Here, the results suggest that R-learning can be fine-tuned to give better performance than Q-learning in both domains.

A Deep Reinforcement Learning-based Ramp Metering Control Framework for Improving Traffic Operation at Freeway Weaving Sections

Abstract and Figures

Recommended publications

An Efficient On-Ramp Merging Strategy for Connected and Automated Vehicles in Multi-Lane Traffic

Enhancing reinforcement learning‐based ramp metering performance at freeway uncertain bottlenecks us...

A Deep Reinforcement Learning Approach for Ramp Metering Based on Traffic Video Data

Coordinated Variable Speed Limit Control for Consecutive Bottlenecks on Freeways Using Multiagent Re...

A Deep Reinforcement Learning Approach for Ramp Metering Based on Traffic Video Data