Conference PaperPDF Available

Digital Twin-Assisted Lane-changing and Variable Speed Limit Control for Weaving Segments

Authors:

Abstract and Figures

With the rapid growth of traffic demand, the lane-changing concentration problem of weaving segments is significant on the highway, resulting in decreased operation efficiency and road safety. The cooperative intelligence transportation system (C-ITS) with vehicle-to-everything (V2X) technology provides the possibility for emerging new cooperative strategies to alleviate that problem. Most of the existing V2X-assisted traffic control strategies assume that communication and connection between vehicles are perfect, which does not reflect reality. To accurately describe the dynamics of vehicular traffic and node connections, a joint traffic and network simulator is established to construct the digital twin system, where two traffic control strategies, namely the lane-changing (LC) distribution strategy and the variable speed limit (VSL) control strategy, are proposed and validated in a complex vehicular communication network. The simulation results show that packet loss negatively affects the performance of V2X-assisted LC control, and the integrated strategy improves the overall average speed by up to 40.4%, compared with the LC control alone.
Content may be subject to copyright.
Digital Twin-Assisted Lane-changing and Variable
Speed Limit Control for Weaving Segments
Tingting Fan
dept. Electronic and Information
Engineering
The Hong Kong Polytechnic
University,
Hong Kong
tingting.fan@connect.polyu.hk
Ivan Wang-Hei Ho
dept. Electronic and Information
Engineering & Otto Poon Charitable
Foundation Smart Cities Research Institute
The Hong Kong Polytechnic University,
Hong Kong
ivanwh.ho@polyu.edu.hk
Edward Chung
dept. Electrical Engineering
The Hong Kong Polytechnic University
Hong Kong
edward.cs.chung@polyu.edu.hk
Abstract—With the rapid growth of traffic demand, the
lane-changing concentration problem of weaving segments is
significant on the highway, resulting in decreased operation
efficiency and road safety. The cooperative intelligence
transportation system (C-ITS) with vehicle-to-everything
(V2X) technology provides the possibility for emerging new
cooperative strategies to alleviate that problem. Most of the
existing V2X-assisted traffic control strategies assume that
communication and connection between vehicles are perfect,
which does not reflect reality. To accurately describe the
dynamics of vehicular traffic and node connections, a joint
traffic and network simulator is established to construct the
digital twin system, where two traffic control strategies,
namely the lane-changing (LC) distribution strategy and the
variable speed limit (VSL) control strategy, are proposed and
validated in a complex vehicular communication network. The
simulation results show that packet loss negatively affects the
performance of V2X-assisted LC control, and the integrated
strategy improves the overall average speed by up to 40.4%,
compared with the LC control alone.
Keywords—concentration problem, V2X, digital twin, lane-
changing, variable speed limit
I. INTRODUCTION
Traffic congestion is an increasing problem worldwide
with the rapid traffic demand growth. Weaving segments on
freeways are a common configuration of freeway systems,
which are defined as an intersection with two or more traffic
flows traveling in the same direction. Weaving sections are
frequently congested due to interaction between merging
and diverging vehicles. Existing research shows that most
lane-changing (LC) actions occur in the first 150 meters of
weaving segments, which is caused by the anxiety of drivers
that encourages them to execute LC actions as soon as they
enter the area [1]. This uneven LC phenomenon is known as
the LC concentration problem on the weaving segment. Due
to the limited capacity and budget, expanding the current
infrastructure or roadways is not a feasible solution. In order
to address this issue, researchers have been seeking
numerous solutions to make good use of the existing
infrastructure resources.
The cooperative intelligence transportation system (C-
ITS) provides a platform that adopts coordinated strategies
for better traffic management. With emerging wireless
communication technologies, C-ITS uses communication
protocols such as dedicated short-range communication
(DSRC) or cellular vehicle-to-everything (C-V2X) to enable
vehicles to communicate through vehicle-to-vehicle (V2V),
vehicle-to-infrastructure (V2I), or vehicle-to-pedestrian
(V2P) communications, which are collectively referred to as
vehicle-to-everything (V2X). V2X technology enables
connected vehicles to share their real-time information
including speed, position, acceleration, etc., which allows
for more intelligent control strategies, such as speed
adjustment and individual LC advisories [2, 3].
Most research in vehicular networks or transport studies
is carried out with simulation due to the high cost of real-
world experimentations. However, most existing
simulations assume that connectivity and communications
between vehicles are perfect, which does not reflect reality.
In complex urban environments, due to high-speed vehicular
movement and signal fading, the connectivity of vehicular
networks is poor and intermittent [4]. In addition, as the
number of wireless users grows, problems such as packet
collisions and signal interference are getting more serious [5].
In fact, previous studies [6] showed that imperfect
communication has negative effects on the performance of
C-ITS services, as well as various decision making in
autonomous driving. Therefore, to accurately describe the
dynamics of vehicular traffic and node connectivity, it is
essential to develop a federated joint transport and
communication simulator for the management of
transportation systems. In addition, to better understand
driving behavior in different scenarios and take
corresponding actions accordingly, real-time sensors and
input data from vehicles or driving simulator should be
incorporated into the joint simulator. Hence, the digital twin
concept is introduced and embedded in the joint simulator
thereafter. Through the digital twin, various C-ITS services
and coordinated controls of connected autonomous vehicles
can be developed and verified under a complex vehicular
communication network system.
The remainder of this paper is organized as follows:
Section II introduces the architecture of the digital twin
system leveraged in this study. Section III verifies the
robustness of the LC strategy in an imperfect V2X
environment using the digital twin. Section IV proposes the
variable speed limit to compensate for the loss of traffic
efficiency with LC control due to communication
interferences. Section V concludes the paper with future
works.
II. D
IGITAL TWIN ARCHITECTURE
Digital twin is a digital representation of a real-world
system, which takes the real-world data for physical objects
or systems as input variables, producing predictions or
simulations as outputs accordingly. In this study, the digital
twin technology is leveraged to build a vehicular network,
allowing connected vehicles to cooperate with each other
and the infrastructure. All connected vehicles in the digital
twin are linked through the digital world of the system and
regarded as replicas of the real world, which can be
instantiated in the simulation.
The digital twin architecture in this study is shown in Fig.
1, which consists of two layers, including the physical world
and the digital world. A microscopic traffic simulator,
SUMO, models the road network and traffic mobility, and
the V2X communication environment is built in a network
simulator OMNeT++. To construct the digital world in this
study, we integrated the traffic simulator and network
simulator. Furthermore, the Unity game engine, given its
easiness in connecting with external driving simulation gears,
such as driving wheels and pedals, presents the proposed 3D
virtual driving environment. The hardware-based Logitech
G29 driving controller plays a major role in the physical
world of the digital twin, which captures real-time driving
behavior from human drivers.
In summary, the digital twin system could be defined as
a cyber-physical fusion system whose inputs are obtained
from processed sensor data or driving data from the physical
world, and get feedback driving advisories and warnings by
analyzing joint simulation results in the digital world.
III. LC
ADVISORY CONTROL IN
V2I
ENVIRONMENT
In this section, the longitude lane-changing distribution
(LLCD) strategy is verified through simulation in the digital
world of the system. The connected vehicles are allowed to
communicate with the roadside unit (RSU) under the
presence of packet loss.
A. V2X-Assisted LLCD Strategy
LLCD strategy is first proposed by Chung et al. [2], who
conducted a great deal of experiment by setting different LC
proportions in different sections to distribute LC
manoeuvres more evenly, so as to alleviate the LC
concentration problem. Nevertheless, the communication
issue is not considered in their study, and V2X
communication is assumed to be reliable and performs
perfectly. However, as a matter of fact, during the
information exchanging process among vehicles and RSUs,
some uncertain factors, such as time delay and packet loss,
lead to communication failure that is unavoidable in a real
wireless communication network.
In this work, the joint simulator of the digital world
provides the possibility to verify the effectiveness of the
LLCD strategy under an unreliable V2X communication
environment. The implementation of V2X-assisted LLCD
strategy is illustrated as follows:
1) Route data collection: First of all, the origin and
destination (OD) information of each vehicle is collected by
the RSU through V2I communications. Vehicles are
identified as weaving or non-weaving vehicles based on the
OD information. In this work, the communication range and
path loss are not taken into consideration, and all vehicles
enter the road network are assumed to be within the
communication range of the RSU.
2) Vehicle route classification: Weaving vehicles are
classified into ramp-to-freeway (RF) vehicles and freeway-
to-ramp (FR) vehicles, which are also denoted as merging
and diverging vehicles respectively. The general idea of the
V2X-assisted LLCD strategy is to let the FR vehicle change
lanes first and then distribute the RF vehicle to change lanes
proportionally in the subsequent sections, which avoids
frequent interactions between different traffic flows and
Fig. 1. The architecture of digital twin system.
mitigates concentrated lane changes. Therefore, we
maintain the characteristic of FR vehicles to change lanes
after entering the weaving road without further control, and
only RF vehicles are involved in the LC assignments by
receiving LC recommendations sent by the RSU.
3) Assignment of LC position: Due to the instability of
wireless communication links, some packets sent from the
RSU might be lost. The RF vehicles that successfully
receive the packet are assigned a section number to
determine the position where the lane change is allowed to
start. The proportion of the section numbers corresponding
to the allowed LC positions are shown in Table 1.
TABLE I. PROPORTION OF SECTION NUMBERS FOR DIFFERENT
ALLOWED POSITIONS.
Section number 1 2 3
Proportion (%) 40 40 20
Allowed position (m) 50 100 150
4) Lane-changing permission: Given the position
where the lane change is allowed to start, the RF vehicle
will receive an RSU message every second indicating its
current position. We assume that vehicles can be accurately
positioned at lane level and neglect the positioning errors.
The vehicles are not allowed to change lanes until they
reach the allowed LC position, at which point the vehicle
will find a suitable gap to change. Note that the
characteristics of the LC operation are manipulated by the
human-controlled driving simulator or by the LC model in
the traffic simulator.
B. Case Study
The road network investigated in this study is based on
the Manchester Outer Ring Road (M60 Motorway), which is
presented in Fig. 2 (a) and simplified into a one-sided 400-
meter weaving segment, as Fig. 2 (b) shows. The lanes from
the innermost to the outermost are denoted as the auxiliary
lane, the left lane, and the main lane, respectively, with the
on-ramp and off-ramp connected by the auxiliary lane.
(a)
(b)
Fig. 2. The one-sided weaving segment: (a) M60 Motorway aerial scene;
(b) Layout.
Since the traffic simulator SUMO has its default LC
model which does not depict the realistic LC concentration
phenomenon, some parameters related to LC eagerness need
to be calibrated to reflect the physical world more accurately.
Table II reveals the percentage of the number of lane
changes before and after the calibration, and the results show
that the calibrated value aligns with the field data [7] better.
TABLE II. THE RESULTS OF PARAMETER CALIBRATION.
LC position
RF vehicle (%) FR vehicle (%)
0~150 m 150~400 m 0~150 m 150~400 m
Field data 95 5 100 5
Default value 9 91 12.5 87.5
Calibrated value 71.9 28.1 70.4 29.6
Different packet loss rates (PLRs) are introduced to
model unreliable V2X communications and implemented in
the network simulator. The indicator average speed along
distance is used to measure the traffic performance and
evaluate the impact of different PLR on the LLCD strategy.
In SUMO, velocities are collected by detectors placed at 10
m intervals and aggregated every minute.
(a) (
b
)
Fig. 3. Speed profiles of LC control for different packet loss: (a) Auxiliary
lane; (b) Left lane.
Fig. 3 shows the speed profiles from 0 to 400 m in the
auxiliary lane and the left lane, respectively, where the most
severe lane change turbulence occurs during merging and
diverging.
When PLR is 1, all LC instructions sent from the RSU
are lost and vehicles perform the calibrated LC model in
SUMO, i.e., no control case. It can be seen that speed drops
severely at the beginning of the weaving area due to the
frequent lane change actions, which reveals the LC
concentration problem in the physical world. When PLR=0,
vehicles are in a perfect V2X communication environment,
where the concentration problem is alleviated significantly
and the difference in speed is slight, only about 25 to 29 m/s
and 26 to 29 m/s on the two lanes, respectively. However, as
PLR increases, the overall speed gradually decreases. When
PLR is 0.4 and 0.6, the average speed over time and space is
even lower than that of the no control case, indicating that
the LC control is less effective under a poor communication
environment. This is due to the fact that when the packet loss
is more severe, more number of vehicles cannot receive the
permission to change lane in time when it reaches the
allowed LC position, thus causing a delay in the lane change.
Consequently, some vehicles do not have chance to change
lanes before leaving the weaving segment and form a queue
at the last section, which becomes more turbulent than the no
control case where vehicles scramble to change lanes as soon
as they enter the weaving road.
IV. VARIABLE SPEED LIMIT CONTROL
A. RL-based Variable Speed Limit Control
Variable speed limit (VSL) control is another important
highway strategy for improving traffic mobility and
smoothing traffic congestion. The VSL controller computes
the optimal speed limit based on real-time traffic conditions
and displays it on the VSL sign at the entrance of the
weaving segment. The VSL control can be treated as a
Markov decision process (MDP) [8], which can be solved by
reinforcement learning (RL) technology. The basic elements
of RL are listed below in detail:
Agent: The agent in this research is served by a VSL
controller, which sets different speed limits for each
controlled lane according to traffic variations. In this
study, the VSL agent controls the three lanes of the
weaving segment in Fig. 2(b) and sets the same speed
limit for each controlled lane to ensure the safety of lane
changes.
State: State is a real-time representation of traffic flow,
which depicts the traffic variations of the entire
weaving segment precisely. Previous studies tended to
take several important junctions on highways as
locations to describe the state, and set traffic density
and occupancy rates as state variables [3, 8]. In this
research, the occupancy rates are set as state variables
and are measured along the entire weaving segment.
Action: The speed limit is defined as the action output
from the agent. In practice, the speed limits posted on
the VSL signs are discrete values, so possible values are
set as integers in the simulation. In addition, a sharp
reduction in the speed limit is unsafe, so the decrease of
action is constraint to 11.1 m/s (40km/h) between
successive control periods.
Reward: As the target of the VSL controller is to
improve the efficiency of the weaving segment, one of
the traffic performance indicators, average speed, is
defined as the reward of the VSL agent, which is also
measured by the detectors placed at the concentration
area (i.e., the bottleneck).
The VSL agent interacts with the traffic environment and
generates actions, i.e. speed limits, according to its
observation of traffic state. When an action is taken, the state
changes and the agent gets a reward, which judges how good
or bad the current state is. The goal of the agent is to learn
behaviors from the environment by trial and error to
maximize its cumulative reward. The deep deterministic
policy gradient (DDPG) algorithm is applied to train the RL
agent, whose framework is illustrated in Fig. 4. The states
are measured by detectors in SUMO and fed into the actor
network whose goal is to find a policy with parameters that
maximize the cumulative reward (i.e., the Q value) over the
controlled period, and the critic network is used to evaluate
the Q value, which is related to the reward in terms of traffic
efficiency.
B. Experimental Results and Analysis
To study the impact of VSL on V2X-assisted LC control,
two algorithms are integrated. The digital twin system
establishes a vehicular network environment where V2I-
assisted LC control is performed, and the VSL agent
interacts with and learns from this environment. We
compare the following three control modes: 1) No control; 2)
LC control alone; 3) Integrated LC & VSL control. The latter
two are simulated under different packet loss scenarios
respectively.
The speed profiles of the above control modes from 200
m upstream to 100 m downstream of the weaving segment is
shown in Fig. 5. The same color lines indicate the LC control
under the same PLR with or without VSL control. It can be
clearly seen that the VSL algorithm has a positive effect on
LC control under different PLRs. Comparing with the LC
control alone, the integrated scheme improves the overall
average speed by 4.6%, 9.0%, 40.4% at packet loss rates of
0.2, 0.4, and 0.6, respectively, in the case of the auxiliary
lane. The results reveal that the VSL control has better
effectiveness when the communication environment is not
stable, especially under more severe packet loss.
(a)
(
b
)
Fig. 5. The comparison of LC under different PLRs with or withou
t
VSL: (a) Auxiliary lane; (b) Left lane.
Therefore, conclusion can be drawn that when the LC
control is operating in an unstable communication
environment, the DDPG-based VSL algorithm can make up
for the reduced efficiency of LC control due to unstable
communications and intermittent connectivity.
Speed (m/s)
Fig. 4. The DDPG algorithm for VSL.
V. CONCLUSION AND FUTURE WORK
In this study, two traffic control algorithms have been
proposed and integrated for the application of the digital twin
system. The LC control algorithm has been verified in an
unstable V2X simulation environment under the presence of
packet loss. The result shows that packet loss negatively
affects the performance of V2X-assisted LC control, and the
integrated scheme improves the overall average speed by up
to 40.4%, comparing with the LC alone, which indicates that
the RL-based VSL control can compensate for the impact of
serious packet loss on LC control to a certain extent.
ACKNOWLEDGMENT
This work was supported in part by the Otto Poon
Charitable Foundation Smart Cities Research Institute
(Projects Q-CDAS and Q-CDA8), and The Hong Kong
Polytechnic University (Projects 4-ZZMV and 1-ZVTJ).
REFERENCES
[1] M. J. Cassidy and A. D. May, "Proposed analytical technique for
estimating capacity and level of service of major freeway weaving
sections," Transportation Research Record, vol. 1320, no. 99-109, p.
75, 1991.
[2] T. Mai, R. Jiang, and E. Chung, "A Cooperative Intelligent Transport
Systems (C-ITS)-based lane-changing advisory for weaving
sections," Journal of Advanced Transportation, vol. 50, no. 5, pp.
752-768, 2016.
[3] Y. Wu, H. Tan, L. Qin, and B. Ran, "Differential variable speed limits
control for freeway recurrent bottlenecks via deep actor-critic
algorithm," Transportation Research Part C: Emerging Technologies,
vol. 117, 2020, doi: 10.1016/j.trc.2020.102649.
[4] I. W.-H. Ho, R. J. North, J. W. Polak, and K. K. Leung, “Effect of
Transport Models on Connectivity of Interbus Communication
Networks,” Journal of intelligent transportation systems, vol. 15, no.
3, pp. 161–178, 2011, doi: 10.1080/15472450.2011.594691.
[5] H. J. F. Qiu, I. W. Ho, C. K. Tse and Y. Xie, "A Methodology for
Studying 802.11p VANET Broadcasting Performance With Practical
Vehicle Distribution," in IEEE Transactions on Vehicular
Technology, vol. 64, no. 10, pp. 4756-4769, Oct. 2015, doi:
10.1109/TVT.2014.236703.
[6] M. Sybis et al., "Communication Aspects of a Modified Cooperative
Adaptive Cruise Control Algorithm," in IEEE Transactions on
Intelligent Transportation Systems, vol. 20, no. 12, pp. 4513-4523,
Dec. 2019, doi: 10.1109/TITS.2018.2886883.
[7] Al-Jameel H. Characteristics of the driver behaviour in weaving
sections: empirical study. International Journal of Engineering
Research and Technology 2013; 2(11):1430–1446.
[8] Z. Li, P. Liu, C. Xu, H. Duan, and W. Wang, "Reinforcement
Learning-Based Variable Speed Limit Control Strategy to Reduce
Traffic Congestion at Freeway Recurrent Bottlenecks," IEEE
Transactions on Intelligent Transportation Systems, vol. 18, no. 11,
pp. 3204-3217, 2017, doi: 10.1109/tits.2017.2687620.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Apart from the impacts of traffic accidents and lane drops, congestion on motorways may be attributed to merging and weaving sections. However, weaving sections receive less attention compared with other situations such as merging at motorways. This supports a real need for investigating some weaving characteristics. In the UK, Motorway Incident Detection and Automated Signalling (MIDAS) data have been used to obtain data on flow, speed, headways and occupancy. However, the MIDAS data is not capable of capturing some important details relating to weaving characteristics. Therefore, more than 50 hours of video recordings have been gathered from seven weaving sections throughout the Greater Manchester area in the UK. Different characteristics of weaving sections have been recorded, including the volume ratio, weaving ratio, merging/diverging points, segregation behaviour, effective length, courtesy yielding and the selection of the original gap by weaving vehicles. These characteristics have been studied to inform an understanding of driver behaviour at weaving sections and to calibrate and validate simulation models used for studying weaving sections.
Article
Full-text available
Weaving sections, a common design of motorways, require extensive lane‐change manoeuvres. Numerous studies have found that drivers tend to make their lane changes as soon as they enter the weaving section, as the traffic volume increases. Congestion builds up as a result of this high lane‐changing concentration. Importantly, such congestion also limits the use of existing infrastructure, the weaving section downstream. This behaviour thus affects both safety and operational aspects. The potential tool for managing motorways effectively and efficiently is cooperative intelligent transport systems (C‐ITS). This research investigates a lane‐change distribution advisory application based on C‐ITS for weaving vehicles in weaving sections. The objective of this research is to alleviate the lane‐changing concentration problem by coordinating weaving vehicles to ensure that such lane‐changing activities are evenly distributed over the existing weaving length. This is achieved by sending individual messages to drivers based on their location to advise them when to start their lane change. The research applied a microscopic simulation in aimsun to evaluate the proposed strategy's effectiveness in a one‐sided ramp weave. The proposed strategy was evaluated using different weaving advisory proportions, traffic demands and penetration rates. The evaluation revealed that the proposed lane‐changing advisory has the potential to significantly improve delay. Copyright © 2016 John Wiley & Sons, Ltd.
Article
Full-text available
In a Vehicular Ad-hoc Network (VANET), the performance of the communication protocol is influenced heavily by the vehicular density dynamics. However, most of the previous works on VANET performance modeling paid little attention to vehicle distribution, or simply assumed homogeneous car distribution. It is obvious that vehicles are distributed non-homogeneously along a road segment due to traffic signals and speed limits at different portions of the road, as well as vehicle interactions that are significant on busy streets. In light of the inadequacy, we present in this paper an original methodology to study the broadcasting performance of 802.11p VANETs with practical vehicle distribution in urban environments. Firstly, we adopt the empirically verified stochastic traffic models, which incorporates the effect of urban settings (such as traffic lights and vehicle interactions) on car distribution and generates practical vehicular density profiles. Corresponding 802.11p protocol and performance models are then developed. When coupled with the traffic models, they can predict broadcasting efficiency, delay, as well as throughput performance of 802.11p VANETs based on the knowledge of car density at each location on the road. Extensive simulation is conducted to verify the accuracy of the developed mathematical models with the consideration of vehicle interaction. In general, our results demonstrate the applicability of the proposed methodology on modeling protocol performance in practical signalized road networks, and shed insights into the design and development of future communication protocols and networking functions for VANETs.
Article
Full-text available
This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan, sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.
Article
Variable speed limit (VSL) control is a flexible way to improve traffic conditions, increase safety, and reduce emissions. There is an emerging trend of using reinforcement learning methods for VSL control. Currently, deep learning is enabling reinforcement learning to develop autonomous control agents for problems that were previously intractable. In this paper, a more effective deep reinforcement learning (DRL) model is developed for differential variable speed limit (DVSL) control, in which dynamic and distinct speed limits among lanes can be imposed. The proposed DRL model uses a novel actor-critic architecture to learn a large number of discrete speed limits in a continuous action space. Different reward signals, such as total travel time, bottleneck speed, emergency braking, and vehicular emissions are used to train the DVSL controller, and a comparison between these reward signals is conducted. The proposed DRL-based DVSL controllers are tested on a freeway with a simulated recurrent bottleneck. The simulation results show that the DRL based DVSL control strategy is able to improve the safety, efficiency and environment-friendliness of the freeway. In order to verify whether the controller generalizes to real world implementation, we also evaluate the generalization of the controllers on environments with different driving behavior attributes. and the robustness of the DRL agent is observed from the results.
Article
This paper investigates the ability of modified cooperative adaptive cruise control to support high-density car platooning. We first use a simplistic communication model to study the impact of actuation lag, message periodicity, and communication delay on the minimum feasible inter-car spacing. We then use a detailed IEEE 802.11p simulation model to evaluate platooning performance in realistic highway scenarios. Different highway traffic intensities are simulated to observe the impact of increasing contention on the wireless channel with two different transceiver configurations: a single-transceiver operating on the common safety channel and a dual-transceiver operating, simultaneously, on the common safety channel and a dedicated service channel.
Article
The primary objective of this paper was to incorporate the reinforcement learning technique in variable speed limit (VSL) control strategies to reduce system travel time at freeway bottlenecks. A Q-learning (QL)-based VSL control strategy was proposed. The controller included two components: a QL-based offline agent and an online VSL controller. The VSL controller was trained to learn the optimal speed limits for various traffic states to achieve a long-term goal of system optimization. The control effects of the VSL were evaluated using a modified cell transmission model for a freeway recurrent bottleneck. A new parameter was introduced in the cell transmission model to account for the overspeed of drivers in unsaturated traffic conditions. Two scenarios that considered both stable and fluctuating traffic demands were evaluated. The effects of the proposed strategy were compared with those of the feedback-based VSL strategy. The results showed that the proposed QL-based VSL strategy outperformed the feedback-based VSL strategy. More specifically, the proposed VSL control strategy reduced the system travel time by 49.34% in the stable demand scenario and 21.84% in the fluctuating demand scenario.