Conference PaperPDF Available

Figures

Content may be subject to copyright.
Short-Term Traffic Prediction Using Long Short-Term Memory Neural Networks
Zainab Abbas, Ahmad Al-Shishtawy, Sarunas Girdzijauskas∗†, Vladimir Vlassov
KTH Royal Institute of Technology, Stockholm, Sweden. Email: {zainabab, sarunasg, vladv}@kth.se
RISE SICS, Stockholm, Sweden. Email: {ahmad.al-shishtawy, sarunas.girdzijauskas}@ri.se
Abstract—Short-term traffic prediction allows Intelligent
Transport Systems to proactively respond to events before they
happen. With the rapid increase in the amount, quality, and
detail of traffic data, new techniques are required that can
exploit the information in the data in order to provide better
results while being able to scale and cope with increasing
amounts of data and growing cities. We propose and compare
three models for short-term road traffic density prediction
based on Long Short-Term Memory (LSTM) neural networks.
We have trained the models using real traffic data collected
by Motorway Control System in Stockholm that monitors
highways and collects flow and speed data per lane every
minute from radar sensors. In order to deal with the challenge
of scale and to improve prediction accuracy, we propose to
partition the road network into road stretches and junctions,
and to model each of the partitions with one or more LSTM
neural networks. Our evaluation results show that partitioning
of roads improves the prediction accuracy by reducing the
root mean square error by the factor of 5. We show that we
can reduce the complexity of LSTM network by limiting the
number of input sensors, on average to 35% of the original
number, without compromising the prediction accuracy.
Keywords-LSTM; neural networks; traffic prediction
I. INTRODUCTION
Smooth road traffic flow in urban cities is an ongoing
research challenge as the demand for the road infrastructure
increases faster than the speed at which cities can expand
it. The increase in demand leads to traffic congestion that
has a direct negative impact on the society in many aspects
such as reduced traffic safety, increased pollution, wasted
fuel and time, increased cost for businesses, etc.
Road transport administrations of developed large cities
acquire real-time traffic data from multiple sources such
as infrastructure sensors, mobile data, bluetooth sensors,
and traffic cameras and apply state-of-the-art techniques
to monitor and analyze road traffic. These systems and
techniques altogether became known as Intelligent Trans-
portation System (ITS) which aims at providing innovative
solutions to tackle the traffic management problem and
achieve smarter utilization of the transport network.
Short-term traffic prediction is a vital component in any
ITS. Being able to accurately predict the state of the traffic
in the near future enables ITS to be proactive rather than
reactive by actively mitigating potential problems before
they happen.
The common schools of thought on studying traffic pre-
diction involve traffic flow theory-based models [1], [2],
statistical techniques [3]–[5] that commonly use regression
[6]–[8] and neural networks [9]. One of the limitations of
conventional statistical methods is the increase in complexity
when modelling spatial dependencies that involves the effect
on traffic flow from the surrounding points of interest. To
tackle this, multivariate methods were used that capture
the effect of correlated regions of interest [10], [11]. In
parallel to this, neural networks (NNs) for short-term traffic
prediction were being explored [12]–[14].
Simple NNs are too shallow in structure to capture spatio-
temporal data dependencies efficiently. Deep learning has
proven to provide more accurate results in terms of learning
the complex and deep dependencies for the traffic data [15].
For example, deep architectures were employed to predict
traffic flow in [16]. Similarly, [17] used deep architectures to
predict congestion. Deep Belief Networks were introduced
in [18] for traffic flow prediction. Stacked encoders were
used to learn traffic flow features [19] and for traffic data
imputation [20]. Deep convolution NNs were used for traffic
speed prediction in [21]. Authors of [22] introduced the
use of Long Short-Term Memory (LSTM) Networks for
traffic prediction and shown that LSTMs are more accurate
compared to the other models considered in [22].
Considering these deep learning approaches, our work
is related to [22]–[25], which use LSTMs. It is unique
in a way that we further exploit LSTM capabilities that
were not fully utilized in the current approaches. We use
more fine-grained and high resolution data, which makes
training of a single LSTM based model over the whole
highway network challenging because the model parameters
can increase significantly. Therefore, we provide a way
to partition the road network and train LSTMs with data
streaming from sensors in those partitions. We also reduce
the complexity of our model by using only strategically
important sensors for prediction.
In this paper, we take a data driven approach to provide
accurate and scalable short-term traffic density prediction for
Motorway Control Systems (MCS). An MCS is part of an
ITS that focuses on monitoring and controlling highways
in a city due to their importance in keeping a smooth
flow in the city. We use data from the MCS in Stockholm
which monitors major highways and provides flow and speed
information per lane every minute measured from radar
sensors (Figure 1) spread only around 150-400 meters apart
from each other (Figure 6).
57
2018 IEEE International Congress on Big Data
978-1-5386-7232-7/18/$31.00 ©2018 IEEE
DOI 10.1109/BigDataCongress.2018.00015
Figure 1: Traffic sensors placed on Stockholm highway.
However, even with a relatively small city as Stockholm,
a deep neural network can quite rapidly get very complex
as the number of inputs (sensors) increase. This complexity,
in turn, might lead to infeasibility due to time and resource
requirements for training, updating, and real-time prediction.
We propose, evaluate, and discuss various design choices
and architectures, first, to improve the accuracy of predic-
tion, and second, to reduce the complexity (and potentially
the cost) while maintaining the accuracy.
We propose a novel way of exploiting LSTM networks
by partitioning the road network into smaller sections con-
taining on average 20-30 sensors and applying LSTMs on
each section. In particular, we show that after training the
LSTMs with the data from all the sensors, in the operational
phase our technique is capable of successfully predicting
short-term traffic by using only a small fraction of the initial
sensors (on average up to 35% of sensors). This allows to
significantly reduce the costs for ITS by deploying only a
small number of permanent sensors and relying on tempo-
rary (mobile) sensors for the training phases only, instead of
deploying a dense network of sensors permanently.
The main contributions of the paper are:
We provide different prediction models based on deep
learning approach using LSTMs. These include a single
sensor model that is trained only for one sensor and
two multi-sensor models that take into account various
adjacently placed sensors. Moreover, we compare their
prediction quality and execution time.
We show that taking into account the spatio-temporal
dependencies by using neighbouring sensors in our
multi-sensor models, allows improving the prediction
accuracy.
We exploit the potential of our deep neural network in
the last model by training it in a way that can help
us predict the traffic of an area by using only a small
fraction of total sensors deployed in that area.
The rest of the paper is organized as follows: Section II
explains the background work. Section III introduces our
models. Section IV contains the experimental methodology.
Section V explain the models in detail and the experimental
results. Finally, conclusion and future work is in Section VI.
II. BAC KG RO UN D
A. Traffic Data
There exist different methods, such as mathematical and
statistical models, simulations and visualization, to study,
understand, and analyze road traffic in order to plan, design
and operate transportation systems. The analysis can be
done at a microscopic scale where individual vehicles are
modelled or at a macroscopic scale where the aggregated
traffic behaviour is being modelled.
The common factor of all methods is that they require
measured traffic data. The main reason is that road traffic
depends on the collective human behaviour, interactions, and
habits which differ widely between different areas because
of various reasons such as the characteristics of the road
users (e.g, age, driving experience), type of vehicles (e.g.,
cars or trucks) and their physical properties, environmental
aspects that affects behaviour (e.g., weather, road shape and
type, nearby points of interests) etc. All this makes analyzing
road traffic more challenging and evolving over time.
Road traffic data consists of a large number of space-
time parameters. In its most basic form, it consists of traffic
counters which count the number of vehicles passing at
specific points (flow) on the road. Traffic data typically
include other parameters such as speed, vehicle mix (e.g.,
car/truck ratio), road occupancy, origin-destination, vehicle
trajectory. Traffic data can benefit from auxiliary data such as
information about accidents, road work, events and holidays,
weather, and road properties (lanes, type, speed limits).
There exists a variety of sensing techniques used to
collect traffic data. Infrastructure or road-side sensors such
as inductive loops and radars are used to collect macroscopic
flow data at fixed points on the road. GPS and cellular
network data (known as floating car data) are used to
get vehicle trajectory for microscopic analysis. Bluetooth
sensors and automatic number plate recognition can be used
to obtain origin-destination and trip time information. Many
other techniques such as audio/video based vehicle detection
are also used to obtain traffic data.
Floating car data (FCD) is obtained mainly from partici-
pating passengers carrying cell phones in the vehicle. FCD
can provide a good estimate of the traffic speed but might
fail at providing an accurate estimate of the traffic flow and
density. The main advantages of FCD are the wide coverage
and small cost. Infrastructure sensors are more expensive to
install and maintain and they measure data at a fixed location
limiting their coverage. However, data from infrastructure
sensors are more accurate and complete as they measure
and count all vehicles that pass them in real-time. Because
of the improved accuracy that comes at an increased cost,
infrastructure sensors are typically deployed only on critical
road sections such as highways. Macroscopic traffic data
comes at different aggregation levels. The main parameters
of the aggregation are: 1) the frequency of aggregation
58
(e.g., flow and average speed per minute vs. per hour). 2)
aggregation over lanes (i.e., data per lane or across all lanes).
3) spacing between sensors (e.g., every kilometer).
B. Elements of Traffic Flow Theory
Traffic flow theory is the study of dynamic traffic be-
haviour over the roads. It depends upon the driver’s reaction
towards different traffic conditions [26], [27]. It is a common
practice to show the traffic behaviours using three traffic
variable, namely: flow q (vehicles per unit time), density
k(vehicles per unit distance) and speed v (distance per
unit time). The relation between these variables can be
represented by the following equation:
q=k×v(1)
Figure 2: The fundamental traffic flow diagram.
Figure 2 plots the relation between qand k. At low
density, the speed does not depend on the density, and
vehicles move with the free flow speed vf. When the density
increases, the flow can reach the maximum value qmax based
on road capacity. The density at this point is called the
critical density kcritical. Beyond this, the speed decreases
because it becomes difficult for vehicles to overtake. Finally,
density reaches to kjam where maximum vehicles that can
fit the road are stuck in a traffic jam. This makes density k
an important parameter to indicate congestion.
C. Long Short Term Memory Networks
Recurrent Neural Networks (RNNs) recently became pop-
ular for learning and capturing latent patterns and behaviour
in the sequential data. In contrast to classic NN, the output
of RNN depends not only on the current input but also
on the previous state of the network, which acts as a
memory. Such configuration makes RNNs naturally suitable
for modelling tasks involving sequential data and time series,
such as: handwriting recognition, natural language process-
ing, speech recognition, machine translation etc. However,
RNNs have major limitations, since in practice RNNs fail
to remember longer dependencies, as well as are difficult to
train due to the vanishing gradient problem [28].
In our work, we use Long Short-Term Memory Networks
(LSTMs) [29] that are a variant of RNNs. LSTMs are ca-
pable of remembering long-term information by differently
computing the hidden state of the network. The hidden state
of LSTMs contains the chain of memory blocks which have
special gates to control the information maintained in each
cell of the memory block, effectively allowing LSTMs to
selectively decide what to keep or erase from the memory.
The outputs of LSTM are calculated by combining the
memory together with the previous state of the network as
well as the current input.
Complexity: The basic LSTM architecture (Figure 4(a))
consists of three layers: input, LSTM, and output layers.
Data from the input layer is fed to the LSTM layer, where
it recurrently flows and the memory cells are updated with
values based on the input, output and forget gates. Next,
data from the output unit is sent to the output layer.
The computational complexity of an LSTM network per
time step and weight of LSTM is O(1) [29]. Therefore,
the learning complexity of it is O(W), where Wis the
number of weights in the network that can be computed
by the equation [30]:
W=n2
c×4 + ni×nc×4 + nc×no+nc×3(2)
Here, ncis the number of memory cells, niis the number
of inputs fed into the LSTM layer and nois the number of
outputs from the LSTM layer.
III. TRA FFIC PREDICTION
In this section, we talk about the input data and the
prediction models used in our work.
A. Time Series Data
Our input data consist of different traffic parameters
measured by the sensors placed on the lanes of highways.
These sensors record the flow qand speed vof vehicles
per minute passing the sensors. The density kis computed
from these parameters using equation1. The density can be
presented in the form of time-series as shown in Figure 3
(a) and (b).
(a) One month data. (b) One week data.
Figure 3: Density values for a sensor per minute.
We can see there is a pattern in weekly data shown in
Figure 3 (b) from a single sensor, where the high peaks
are weekdays and low ones are weekends. Similarly, there
is a pattern with respect to the time of the day. We want
59
our neural network to learn this density pattern in previous
time stamps and make predictions for future timestamps. To
achieve this, we use LSTMs to remember the pattern in data.
Since the traffic contains spatial dependency, we take
neighbouring sensors into account for learning the traffic
behaviour. In order to do this, we partition the highway net-
work into areas containing long road stretches and junctions.
Next, we deploy our models over these partitions. Details
about choosing the neighbouring sensors and highway par-
titioning are given in Section IV.
B. Model Design
(a) Normal LSTM architecture. (b) Stacked LSTM architecture.
Figure 4: LSTM architectures.
We propose three prediction models: 1) The (1-1) single-
sensor model that takes into account only one sensor and
predicts traffic for the location of that sensor; 2) the (n-
n) multi-sensor model that considers nsensors on a given
area of road and gives predictions for the locations of all
nsensors; and 3) the (m-n) multi-sensor model that uses
only msignificant sensors from an area to make predictions
for all nsensors. The detailed working of these models is
explained in Section V.
Our models use deep RNNs to capture the complex non-
linear relation in the data more efficiently by making use
of the hierarchical layers compared to simple RNN [31].
Stacked LSTMs refers to the architecture where multiple
layers of LSTMs are placed over each other, as shown in
Figure 4 (b), to give a more powerful and deep network
compared to the conventional architecture in Figure 4 (a).
In order to estimate the traffic density, we empirically found
that the stacked architecture improves our results compared
to the normal architecture. We used two layers in our model,
more than two layers did not improve the accuracy due to
over-fitting.
The input density data that we fed into the network is
represented in the form of a space-time window. Consider
an area over the highway containing nsensors, labelled as
S1, S2, S 3, ..., Sn. We take a look-back of Ltime stamps.
If tis the current time stamp, then the look-back of L
time-stamps means t, t 1, t 2, ..., t Lprevious density
values. Figure 5 shows the input data representation, where
each entry kt,s denotes the density value of a sensor s, i.e,
S1, S2, S 3, ..., SN , at time t, i.e,t, t 1, t 2, ..., t L.
Figure 5: Input data representation.
The neural network is trained to predict the density
of the respective sensors corresponding to time stamps
t+ 1, t + 2, t + 3, ..., t +P, where Pcontrols the prediction
interval. After experimenting with different values of L, we
chose the value of 10 min. Less than this provided too little
information and resulted in less accuracy. Beyond this made
the input size large and the model did not give any improved
results.
IV. EXP ERI ME NTAL METHODOLOGY
This experimental work is focused on evaluating different
prediction models that we have proposed. Our experiments
are based on answering the following general questions:
Accuracy: How accurately the road traffic can be
estimated using neural networks?
Accuracy Refinement: Can the accuracy be improved
by considering the neighbouring sensors?
Execution Time: Can the execution time (the training
time and prediction time) be improved by reducing the
complexity of a neural network?
Scalability: How the prediction models can be de-
ployed over the highway network?
We later explain the dataset we use, followed by the
implementation and metrics that we measure.
A. DataSet
We use real-world traffic data set from the Swedish
Transport Administration [32] that consists of readings from
sensors placed on Stockholm highways. Each lane of the
highway contains sensors that are separated by few hundred
meters. We have used one month data, which consists of
sensor readings per minute during that month, i.e, total
44640 minutes readings as shown in Figure 3 (a). The data is
further split into 70% training, 15% validation and 15% test
data. The entire highway network of Stockholm, for which
we have the sensor data, is shown in Figure 6 (a). This
highway network consists of long road stretches connected
together by different junction points. We took one of the long
stretch and one complicated junction for our experiments.
Figure 6 (b) contains the area with long stretch and Figure 6
(c) contains the area with the junction.
60
(a) Stockholm highway. (b) Area 1: Long road stretch. (c) Area 2: Triangular junction.
Figure 6: Sensors placed on Stockholm highways.
B. Implementation
In our experiments we used a system with Intel(R)
Core(TM) i7-4980HQ CPU @ 2.80GHz processor, 16 GB
RAM and macOS 10.13 High Sierra. We built our machine
learning model using Python version 3.6.1. The libraries
used are Keras 2.0.9 and Tensorflow 1.3.0.
C. Metrics
We evaluate the following metrics for our models:
Accuracy: We evaluate the accuracy of prediction
models by computing the Root Mean Square Error
(RMSE) and Mean Absolute Error (MAE) between
the predicted and actual traffic density time series.
Execution Time: We evaluate the execution time of
models by measuring their training time and predic-
tion time.
Estimation Interval: We evaluate the change in accu-
racy of prediction models for different time intervals.
V. PREDICTION MOD ELS
In this work, we propose different models for short-term
traffic prediction. For every model we discuss the parameters
that include: 1) The number of sensors a model covers
for prediction, i.e, the output units of the model, 2) the
computational complexity of the model, 3) the input units
used by the model and, 4) the number of memory blocks for
the LSTM network required by the model. Table I contains
values of these parameters for Area 1 (long road stretch) and
Area 2 (triangular junction) shown in Figure 6 (b) and (c).
The number of memory blocks mentioned in the Table I
are for a single LSTM layer, and our models have two
stacked LSTM layers. These memory blocks are empirically
chosen. We pick the number of memory blocks after which
the accuracy stops increasing.
We categorize our models into three types based on the
categorization criterion aforementioned.
A. Single Sensor (1-1) Model
This model works for predicting the traffic density for
a single sensor. The input and output for this model being
traffic density time series from a single sensor make it less
Area 1 Area 2
Model Memory
blocks
Inputs
units
Output
units
Input
units
Output
units
Single Sensor (1-1) 50 1 1 1 1
Multi-Sensor (n-n) 200 33 33 20 20
Multi-Sensor (m-n) 150 10 33 8 20
Table I: Parameters for different prediction models.
complicated because the LSTM network has to deal with one
time series. Figure 7 shows the single sensor model, where
the input density is taken from one sensor S1to estimate
the future density. In this simple model, the prediction only
depends upon the readings of the single sensor, without
considering any neighbouring sensors information.
Figure 7: Single Sensor (1-1) Model.
Experimental Setup: We consider a random sample of
sensors from Area 1 and Area 2 shown in Figure 6 (b)
and (c) for the single sensor model. The model is used for
each sensor and the execution time in terms of its training
and prediction time is measured. Next, the accuracy for
different estimation intervals, i.e, 10 min, 20 min and 30 min
is computed. We measure the accuracy as the Root Mean
Square Error (RMSE) and Mean Absolute Error (MAE). We
compare our model (LSTM-2 with two stacked layers) with
other classical baseline statistical models that include Auto
Regression (AR), Autoregressive Integrated Moving Average
(ARIMA) [3], Support Vector Regression (SVR) [33], and
neural network based models that include, Recurrent Neural
Network (RNN) with two layers, Feed Forward Neural
Network (FFN) with two layers and LSTM-1 with a single
LSTM layer.
Experimental Results: Table II shows the RMSE and
61
MAE values for different time intervals. As the results
indicate the stacked LSTM neural network (named LSTM-2
in the table) performs better than other prediction models.
The error eventually increases with the increase in estimation
interval. Next, we want to evaluate if the accuracy improves
by taking multiple sensors into account during prediction.
10 min 20 min 30 min
Model RMSE MAE RMSE MAE RMSE MAE
AR 6.87 5.9 7.46 6.31 8.09 6.85
SVR 8.30 7.68 9.19 8.71 10.61 10.19
ARIMA 7.67 6.74 9.40 8.34 10.86 9.81
RNN 5.60 2.63 6.65 3.32 7.75 3.39
FFN 5.62 2.46 6.87 3.46 7.86 3.34
LSTM-1 5.63 2.41 7.13 3.65 7.71 3.73
LSTM-2 5.49 2.41 6.62 3.07 7.60 3.45
Table II: Accuracy of different models for a single sensor.
B. Multi-Sensor (n-n) Model
In the multi-sensor, model we consider an area over the
highway and predict the density values for the sensors that
fall in that area. In this case, the prediction is done by taking
the neighbouring sensors into account. The neighbouring
sensors provide more data for prediction. Figure 8 (a) and (b)
show a road stretch with 10 sensors on it, all these sensors
are taken as input for this model.
This model is complex because the number of inputs and
the number of outputs is equal to the total number of all
the sensors that fall in that area. The more the number
of sensors, the more memory blocks are required and the
greater is the complexity of the model according to Eq. 2.
(a) Road Stretch.
(b) Neural Network.
Figure 8: Multi-sensor (n-n) model.
Experimental Setup: We take sensors that fall in Area 1
(long road stretch) and Area 2 (triangular junction) shown
in Figure 6 (b) and (c). For Area 1 we consider the highway
path going towards North. Area 2 is complicated because it
consists of vehicles going in different directions, making it
hard for the neural network to learn the relation between
sensors. Our experiments show RMSE up to 10 without
partitioning Area 2, which is reduced by a factor of 5
after partitioning, i.e, RMSE 2. Therefore, we partition
this area into paths consisting of cars going towards the
same direction. For example one of such paths is shown in
Figure 9. The red path is for cars going towards North from
West and East.
We compare our model (LSTM-2 with two stacked layers)
with neural network based models that include: Recurrent
Neural Network (RNN) with two layers, Feed Forward
Neural Network (FFN) with two layers and LSTM-1 with
single LSTM layer. We did not use statistical models because
of their poor accuracy results in the previous experiment
(Section V-A) and their complexity to implement a multi-
variate model.
Figure 9: Area 2: Path in the triangular junction.
The road section of the considered paths can be divided
further into three sections: 1) the entrance: it consists of
beginning two groups of sensors (a group contains sensors
from all the lanes placed at the same distance reference), 2)
the exit: it consists of last two groups and 3) the middle: it
contains all the remaining sensors. We evaluate our model
over these section of roads.
Experimental Results: Tables III and IV show the aver-
age RMSE and MSE values for Area 1 (long road stretch)
and Area 2 (triangular junction) over different estimation
intervals. According to our results, stacked LSTM with two
layers (LSTM-2) has better accuracy in most of the cases
compared to other models.
In order to check the accuracy distribution along areas, we
measure the prediction accuracy at the entrance, middle and
exit sections of areas. Figure 10 and 11 contain RMSE for
Area 1 (long road stretch) and Area 2 (triangular junction)
over 10 min, 20 min and 30 min estimation intervals. For
both areas, the error is higher at the entrance of an area,
followed by the middle section of the highway area; whereas,
the exit section has the lowest error. Furthermore, the error
is increasing with the increase in estimation interval. This
increase is more for the entrance section compared to other
sections. The reason for the least prediction error at the
exit section because the model has more information for
62
prediction towards the end of the area. Stacked LSTM model
(LSTM-2) has less error compared to others.
10 min 20 min 30 min
Model RMSE MAE RMSE MAE RMSE MAE
RNN 3.33 2.35 4.6 3.41 5.0 3.49
FFN 3.36 2.36 4.4 3.39 5.2 3.44
LSTM-1 3.14 2.22 3.8 2.67 4.1 3.29
LSTM-2 2.94 2.06 2.94 2.06 3.22 2.24
Table III: Prediction accuracy for multiple sensor using
different models in Area 1, (long road stretch).
10 min 20 min 30 min
Model RMSE MAE RMSE MAE RMSE MAE
RNN 2.48 1.49 2.52 1.51 2.67 1.07
FFN 2.48 1.45 2.56 1.52 3.04 1.53
LSTM-1 2.38 1.40 2.46 1.45 2.57 1.55
LSTM-2 2.35 1.43 2.45 1.49 2.51 1.52
Table IV: Prediction accuracy for different models for mul-
tiple sensor in Area 2, (triangular junction).
C. Multi-Sensor (m-n) Model
The multi-sensor (m-n) model is a variant of the multi-
sensor (n-n) model introduced in V-B using the stacked
LSTM (LSTM-2) model. Instead of nsensors that fall in
the area under consideration, we take only msensors from
those nsensors and predict the output for all nsensors. The
sensors in the mset include boundary sensors, and sensors
located at exits and entry points to the highway. The reason
to include those sensors is that they are more important in
terms of affecting the traffic flow. Intuitively, if we know the
behaviour of cars entering and exiting the highway, we have
to guess what happens inside the highway. Therefore, we
consider the entry and exit sensors as inputs for our neural
networks to predict density for all the sensors. Figure 12 (a)
and (b) show a road stretch with 10 sensors on it, only the
boundary sensors S1, S2, S 9, S10, and the sensors located
at entry and exit points S3and S8, are taken as input for
this model. In this way, we reduce the complexity of the
neural network by reducing the number of inputs units and
the memory blocks based on Eq. 2.
Experimental Setup: The experimental setup for the
Multi-sensor (m-n) model is similar to one used for the
Multi-sensor (n-n) in Section V-B. The purpose of this
experiment is to evaluate if we can reduce the complexity of
the LSTM network by limiting the number of input sensors
without compromising the prediction accuracy.
Experimental Results: Figure 13 (a) and (b) show RMSE
of the (m-n) model for Area 1 and Area 2 over 10 min, 20
min and 30 min estimation intervals. The error is increasing
with the increase in estimation interval. The entrance has
highest error compared to other sections.
Congestion Detection: Our density predictions are useful
for detecting congestion in the road traffic. From our experi-
ments we find the critical density, kcritical (see Figure 2), to
be between 35 and 40 vehicles per km. Using our model, we
were able to correctly predict congestion, i.e, density values
near to kjam (see Figure 2), 94% of the time.
D. Comparison
Now that we know the accuracy of our models, we want
to know how fast do they perform. For this reason, we
compared the execution time of our models by measuring the
training time and prediction time, shown in Figure 14. The
single sensor (1-1) model is fast because it is considering
one sensor at a time. It might take longer execution time if
we run several such models together for multiple predictions
over limited resources of a system. The (m-n) multi-sensor
model takes less training and prediction time compared to
the (n-n) multi-sensor model. This is because the (m-n)
model has less input and memory units which reduce its
complexity and improve its execution time.
E. Discussion
Our experimental results show that using neighbourhood
sensor information gives higher prediction accuracy than
using a single sensor data. This is because the neural network
is fed with more information. It learns the behaviour of
traffic better by using sensors placed together over a path of
the highway. The reading of sensors placed at the entrance of
highway indicates the traffic conditions that will propagate
towards the middle and exit sensors. In other words, model
learns more information for the middle and the exit section.
Therefore, the prediction for these sections is better than
the entry section. Additionally, we observed that improving
the complexity of a model by reducing the input units
and memory units improves its execution time. Such lower
complexity model has a strong potential to be applied within
edge computing domain in the future.
VI. CONCLUSION AND FUTURE WO RK
Our work comprises of three prediction models for esti-
mating traffic density using stacked LSTM neural networks.
We have implemented and compared these models over
different sections of Stockholm highways using real datasets.
Our multi-sensor (m-n) model that uses input readings
from only msignificant sensors rather than all nsensors,
predicts density for all nsensors with acceptable accuracy
comparable to the multi-sensor (n-n) model, which takes
input from all nsensors. Initially, all sensors are required to
train the model, and after training only significant sensors
can be kept for prediction over all sensors with acceptable
accuracy. To train the model, temporary sensors can be
deployed together with significant sensors and then the
former can be removed or shut down. This allows reducing
the number of sensors and saving the infrastructure cost.
63
0.0
2.5
5.0
7.5
10.0
10min 20min 30min
Prediction Interval
RMSE
Entrance Middle mExit
(a) RNN
0.0
2.5
5.0
7.5
10.0
10min 20min 30min
Prediction Interval
RMSE
Entrance Middle mExit
Exit
(b) FNN
0.0
2.5
5.0
7.5
10.0
10min 20min 30min
Prediction Interval
RMSE
Entrance Middle mExit
Exit
(c) LSTM-1
0.0
2.5
5.0
7.5
10.0
10min 20min 30min
Prediction Interval
RMSE
Entrance Middle mExit
Exit
(d) LSTM-2
Figure 10: RMSE of (n-n) models for the Area 1: long road stretch.
1
2
3
4
10min 20min 30min
Prediction Interval
RMSE
Entrance Middle mExit
Exit
(a) RNN
1
2
3
4
10min 20min 30min
Prediction Interval
RMSE
Entrance Middle mExit
Exit
(b) FNN
1
2
3
4
10min 20min 30min
Prediction Interval
RMSE
Entrance Middle mExit
Exit
(c) LSTM-1
1
2
3
4
10min 20min 30min
Prediction Interval
RMSE
Entrance Middle mExit
Exit
(d) LSTM-2
Figure 11: RMSE of (n-n) models for Area 2: triangular junction.
(a) Road Stretch.
(b) Neural Network.
Figure 12: Multi-sensor (m-n) model.
Our future work includes investigating on how accuracy
depends on the size of road segments and the number of
sensors. We will also research on how aggregation levels im-
pact accuracy. We expect that fine-grained aggregation used
in this paper, captures more details but is more challenging
to predict due to high noise levels compared to a smoother
coarse-grain aggregation that captures only general trends.
We intend to develop a method to optimally partition the
road network and to place sensors in order to achieve high
prediction accuracy while lowering the infrastructure cost.
0.0
2.5
5.0
7.5
10.0
10min 20min 30min
Prediction Interval
RMSE
Entrance Middle mExit
Exit
(a) RMSE (Area 1)
2
3
4
10min 20min 30min
Prediction Interval
RMSE
Entrance Middle mExit
Exit
(b) RMSE (Area 2)
Figure 13: RMSE of (m-n) model for different road sections
of (Area 1: long road stretch and Area 2: triangular junction).
0
50
100
150
Model1 Model2 Model3
times_divsion
Time (s)
Prediction Tranining
1-1
m-n
n-n
Figure 14: Execution Time Comparison.
ACKNOWLEDGMENT
This work was supported by the project BADA: Big Auto-
motive Data Analytics in the funding program FFI: Strategic
64
Vehicle Research and Innovation (grant 2015-00677) ad-
ministrated by VINNOVA the Swedish government agency
for innovation systems, by the project BIDAF: Big Data
Analytics Framework for a Smart Society (grant 20140221)
funded by KKS the Swedish Knowledge Foundation, and by
the Erasmus Mundus Joint Doctorate in Distributed Com-
puting (EMJD-DC) programme funded by the Education,
Audiovisual and Culture Executive Agency (EACEA) of the
European Commission under FPA 2012-0030.
REFERENCES
[1] C. F. Daganzo, “The cell transmission model: A dynamic
representation of highway traffic consistent with the hydro-
dynamic theory,Transportation Research Part B: Method-
ological, vol. 28, no. 4, pp. 269–287, 1994.
[2] A. Skabardonis and N. Geroliminis, “Real-time estimation of
travel times on signalized arterials,” Tech. Rep., 2005.
[3] M. S. Ahmed and A. R. Cook, “Analysis of freeway traffic
time-series data by using box-jenkins techniques,” Trans-
portation Research Record Journal of the Transportation
Research Board, no. 722, 1979.
[4] B. M. Williams and L. A. Hoel, “Modeling and forecasting
vehicular traffic flow as a seasonal arima process: Theoretical
basis and empirical results,” Journal of transportation engi-
neering, vol. 129, no. 6, pp. 664–672, 2003.
[5] N. Juri, A. Unnikrishnan, and S. Waller, “Integrated traffic
simulation-statistical analysis framework for online prediction
of freeway travel time,Transportation Research Record:
Journal of the Transportation Research Board, no. 2039, pp.
24–31, 2007.
[6] P. E. Pfeifer and S. J. Deutrch, “A three-stage iterative
procedure for space-time modeling phillip,” Technometrics,
vol. 22, no. 1, pp. 35–47, 1980.
[7] S. Clark, “Traffic prediction using multivariate nonparametric
regression,” Journal of transportation engineering, vol. 129,
no. 2, pp. 161–168, 2003.
[8] H. Sun, H. X. Liu, H. Xiao, R. R. He, and B. Ran, “Short term
traffic forecasting using the local linear regression model,” in
82nd Annual Meeting of the Transportation Research Board,
Washington, DC, 2003.
[9] M. G. Karlaftis and E. I. Vlahogianni, “Statistical methods
versus neural networks in transportation research: Differ-
ences, similarities and some insights,” Transportation Re-
search Part C: Emerging Technologies, vol. 19, no. 3, pp.
387–399, 2011.
[10] A. Stathopoulos and M. G. Karlaftis, “A multivariate state
space approach for urban traffic flow modeling and predic-
tion,” Transportation Research Part C: Emerging Technolo-
gies, vol. 11, no. 2, pp. 121–135, 2003.
[11] B. Williams, “Multivariate vehicular traffic flow prediction:
evaluation of arimax modeling,Transportation Research
Record: Journal of the Transportation Research Board, no.
1776, pp. 194–200, 2001.
[12] M. S. Dougherty, H. R. Kirby, and R. D. Boyle, “The use of
neural networks to recognise and predict traffic congestion,
Traffic engineering & control, vol. 34, no. 6, 1993.
[13] P. Vythoulkas, “Alternative approaches to short term traffic
forecasting for use in driver information systems,Trans-
portation and traffic theory, vol. 12, pp. 485–506, 1993.
[14] H. Zhang, “Recursive prediction of traffic conditions with
neural network models,” Journal of Transportation Engineer-
ing, vol. 126, no. 6, pp. 472–481, 2000.
[15] Y. Bengio et al., “Learning deep architectures for ai,” Foun-
dations and trends® in Machine Learning, vol. 2, no. 1, pp.
1–127, 2009.
[16] N. G. Polson and V. O. Sokolov, “Deep learning for short-
term traffic flow prediction,Transportation Research Part C:
Emerging Technologies, vol. 79, pp. 1–17, 2017.
[17] X. Ma, H. Yu, Y. Wang, and Y. Wang, “Large-scale trans-
portation network congestion evolution prediction using deep
learning theory,PloS one, vol. 10, no. 3, p. e0119044, 2015.
[18] W. Huang, G. Song, H. Hong, and K. Xie, “Deep architecture
for traffic flow prediction: deep belief networks with multitask
learning,” IEEE Trans. on Intelligent Transportation Systems,
vol. 15, no. 5, pp. 2191–2201, 2014.
[19] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, “Traffic flow
prediction with big data: a deep learning approach,” IEEE
Transactions on Intelligent Transportation Systems, vol. 16,
no. 2, pp. 865–873, 2015.
[20] Y. Duan, Y. Lv, W. Kang, and Y. Zhao, “A deep learning
based approach for traffic data imputation,” in Intelligent
Transportation Systems (ITSC), IEEE 17th International Con-
ference on. IEEE, 2014, pp. 912–917.
[21] X. Ma, Z. Dai, Z. He, J. Ma, Y. Wang, and Y. Wang, “Learn-
ing traffic as images: a deep convolutional neural network for
large-scale transportation network speed prediction,Sensors,
vol. 17, no. 4, p. 818, 2017.
[22] X. Ma, Z. Tao, Y. Wang, H. Yu, and Y. Wang, “Long short-
term memory neural network for traffic speed prediction using
remote microwave sensor data,Transportation Research Part
C: Emerging Technologies, vol. 54, pp. 187–197, 2015.
[23] Z. Zhao, W. Chen, X. Wu, P. C. Chen, and J. Liu, “Lstm
network: a deep learning approach for short-term traffic
forecast,” IET Intelligent Transport Systems, vol. 11, no. 2,
pp. 68–75, 2017.
[24] Y. Wu and H. Tan, “Short-term traffic flow forecasting with
spatial-temporal correlation in a hybrid deep learning frame-
work,” arXiv preprint arXiv:1612.01022, 2016.
[25] M. Fouladgar, M. Parchami, R. Elmasri, and A. Ghaderi,
“Scalable deep traffic flow neural networks for urban traffic
congestion prediction,” in International Joint Conference on
Neural Networks (IJCNN). IEEE, 2017, pp. 2251–2258.
[26] G. Whitham, “On kinematic waves ii. a theory of traffic flow
on long crowded roads,” in Proc. R. Soc. Lond. A, vol. 229,
no. 1178. The Royal Society, 1955, pp. 317–345.
[27] P. I. Richards, “Shock waves on the highway,” Operations
research, vol. 4, no. 1, pp. 42–51, 1956.
[28] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term
dependencies with gradient descent is difficult,IEEE trans.
on neural networks, vol. 5, no. 2, pp. 157–166, 1994.
[29] S. Hochreiter and J. Schmidhuber, “Long short-term mem-
ory,Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[30] H. Sak, A. Senior, and F. Beaufays, “Long short-term memory
recurrent neural network architectures for large scale acoustic
modeling,” in Fifteenth annual conference of the international
speech communication association, 2014.
[31] M. Hermans and B. Schrauwen, “Training and analysing deep
recurrent neural networks,” in Advances in neural information
processing systems, 2013, pp. 190–198.
[32] Trafikverket, https://www.trafikverket.se/, 2010.
[33] H. Drucker, C. J. Burges, L. Kaufman, A. J. Smola, and
V. Vapnik, “Support vector regression machines,” in Advances
in neural information processing systems, 1997, pp. 155–161.
65
... The number of vehicles has increased tremendously on roads due to rapid economic growth and urbanization as traffic volume increases on highway networks, contributing to traffic congestion, vehicle accidents, traffic safety, energy consumption, economic impact due to fuel and time waste, and green gas emission [1][2][3]. To better understand and overcome the traffic volume increment problems, vehicle-to-vehicle (V2V) and vehicle-toinfrastructure (V2I) interactions must necessarily be studied [4]. ...
... As noted in Figure 3, the sensor response will be more likely to have a peak shape due to the compact sensor's size and higher vehicle speed, where each peak indicates the passing of one axle. Thus, the peaks will be tracked and counted accordingly for traffic flow (q) estimation, as stated in Equation (2). The recorded time (Time L1 and Time F1) indicates how long each vehicle travels over the known distance of (L) between the horizontal and the vertical sensors. ...
... (2) As noted in Figure 3, the sensor response will be more likely to have a peak shape due to the compact sensor's size and higher vehicle speed, where each peak indicates the passing of one axle. Thus, the peaks will be tracked and counted accordingly for traffic flow (q) estimation, as stated in Equation (2). The recorded time (Time L1 and Time F1) indicates how long each vehicle travels over the known distance of (L) between the horizontal and the vertical sensors. ...
Conference Paper
Full-text available
The numbers of vehicles on the roads has increased tremendously. Also, the number of roads that are constantly experiencing traffic jams during morning and evening peak hours has increased significantly, which calls for a better understanding of traffic stream characteristics and car-following models. Traffic stream macroscopic parameters (speed, flow, and density) could be estimated through a number of traffic-flow theory models. In order to collect accurate data regarding fundamental of traffic stream parameters, a traffic monitoring system is needed to present the data from different roads. In this study, a real-time traffic monitoring system is introduced for traffic macroscopic parameters estimation. The sensor network has been constructed using a set of linear fiber optic sensors. In order to validate the system for this study, the system was installed at MnROAD facility, Minnesota. Fiber optic sensor detects the propagated strains in highway pavement due to the vehicle movements through the changes of the laser beam characteristics. Traffic flow can be estimated by tracking the peak of each axle passed over the sensor or within the sensitivity area, time mean speed (TMS), and space mean speed (SMS). SMS can be estimated by the different times a vehicle arrived at the sensors. The density can be determined either by using fundamental traffic flow theory model or estimating the time that vehicles occupy the sensor layout. Real traffic was used to validate the sensor layout. The results show the capability of the system to estimate traffic stream characteristics successfully
... The forget gate allows an LSTM cell to remove or add needed information to the cell state [33]. Other studies [9,32,33,43,44] have shown that LSTM-based traffic estimation methods outperformed time series and classical regression methods. [9,32,33,43,44] have shown that LSTM-based traffic estimation methods outperformed time series and classical regression methods. ...
... Other studies [9,32,33,43,44] have shown that LSTM-based traffic estimation methods outperformed time series and classical regression methods. [9,32,33,43,44] have shown that LSTM-based traffic estimation methods outperformed time series and classical regression methods. Let the input time-series data in the sliding window be , , … , . ...
Article
Full-text available
Efficient and sustainable transportation is crucial for addressing the environmental and social challenges associated with urban mobility. Accurate estimation of travel time plays a pivotal role in traffic management and trip planning. This study focused on leveraging machine learning models to enhance travel time estimation accuracy on toll roads under diverse traffic conditions. Two models were developed for travel time estimation under a variety of traffic conditions on the Don Muang Tollway, Bangkok, Thailand: a long short-term memory (LSTM) recurrent neural network model and a support vector regression (SVR) model. Missing data were treated using the proposed segment-based data imputation method. Unlike other studies, the effects of missing input data on the travel time model performance were also analyzed. Traffic parameters, such as speed and flow, along with other relevant parameters (time of day, day of the week, holiday indicators, and a missing data indicator), were fed into each model to estimate travel time on each of the four specific routes. The LSTM and SVR results had similar performance levels based on evaluating the all-day pooled data. However, the mean absolute percentage errors were lower for LSTM during peak periods, while SVR performed slightly better during off-peak periods. Additionally, LSTM coped substantially better than SVR with unusual traffic fluctuations. The sensitivity analysis of the missing input data in this study also revealed that the LSTM model was more robust to the high degree of missing data than the SVR model.
... The exploration of neural networks in traffic prediction is further expanded by Abbas et.al. They investigate short-term traffic prediction using Long Short-Term Memory (LSTM) neural networks [19]. With the objective of enhancing Intelligent Transport Systems' proactive response abilities, their research introduces and juxtaposes three LSTM-based mod-els, offering a comprehensive view of road traffic density prediction. ...
Article
Full-text available
Traffic management systems have primarily relied on live traffic sensors for real-time traffic guidance. However, this dependence often results in uneven service delivery due to the limited scope of sensor coverage or potential sensor failures. This research introduces a novel approach to overcome this limitation by synergistically integrating a Physics-Informed Neural Network-based Traffic State Estimator (PINN-TSE) with a powerful Natural Language Processing model, GPT-4. The purpose of this integration is to provide a seamless and personalized user experience, while ensuring accurate traffic density prediction even in areas with limited data availability. The innovative PINN-TSE model was developed and tested, demonstrating a promising level of precision with a Mean Absolute Error of less than four vehicles per mile in traffic density estimation. This performance underlines the model’s ability to provide dependable traffic information, even in regions where conventional traffic sensors may be sparsely distributed or data communication is likely to be interrupted. Furthermore, the incorporation of GPT-4 enhances user interactions by understanding and responding to inquiries in a manner akin to human conversation. This not only provides precise traffic updates but also interprets user intentions for a tailored experience. The results of this research showcase an AI-integrated traffic guidance system that outperforms traditional methods in terms of traffic estimation, personalization, and reliability. While the study primarily focuses on a single road segment, the methodology shows promising potential for expansion to network-level traffic guidance, offering even greater accuracy and usability. This paves the way for a smarter and more efficient approach to traffic management in the future.
... RNNs are so-called since they consistently complete a similar task for each component in a sequence, with the results depending on earlier calculations. RNNs can also be thought of as having a memory that stores details about previous calculations [22]. The RNN design is made up of three layers, such as input (x d (t)), recurrent hidden (h d (t)) and output o d (t). ...
... For simulation, we collect one-month data. One month of data provides a sufficient amount of data for shortterm traffic flow modeling and analysis [18]. Our collected data set comprises a total of 3720 data instances for each road segment. ...
Article
Full-text available
Traffic congestion has an adverse impact on the economy and quality of life and thus accurate traffic flow forecasting is critical for reducing congestion and enhancing transportation management. Recently, hybrid deep-learning approaches show promising contributions in prediction by handling various dynamic traffic features. Existing methods, however, frequently neglect the uncertainty associated with traffic estimates, resulting in inefficient decision-making and planning. To overcome these issues, this research presents an attention-based deep hybrid network with Bayesian inference. The suggested approach assesses the uncertainty associated with traffic projections and gives probabilistic estimates by applying Bayesian inference. The attention mechanism improves the ability of the model to detect unexpected situations that disrupt traffic flow. The proposed method is tested using real-world traffic data from Dhaka city, and the findings show that it outperforms than other cutting-edge approaches when used with real-world traffic statistics.
Article
The practice of predicting the traffic that is headed toward a specific website is known as web traffic prediction. To govern a network, network traffic forecasting is crucial. Since clients could experience long wait times and leave a website without a suitable demand prediction, web service providers must evaluate web traffic on a web server very carefully. It is an objective that predicting network traffic is a proactive way to assure safe, dependable, and high-quality network communication. The aim of this paper is to find out the algorithms that can be best fitted for web traffic prediction. If the traffic is more than the server can handle, then it will show error to the people who are reaching the website. So, it becomes difficult to handle a large amount of traffic. One option is we can increase the number of servers but for this to know how many servers should be increased we have to forecast the web traffic. This is one of the applications of web traffic forecasting. To improve traffic control decisions, it is necessary to estimate future web traffic. In this paper, we have discussed the most efficient algorithms that can be utilized for web traffic prediction. Here, SVM, LSTM, and ARIMA are discussed which are comparatively more efficient and optimized algorithms. Many algorithms can be used to predict this website traffic, but the algorithms discussed in this paper are found to be more optimized. So, overall this algorithm can be used for website prediction with great efficiency. These algorithms are found to be quite fast as compared to others and they also give a good accuracy score. So, the results show that the prediction precision is high if these algorithms are utilized.
Article
Due to the energy crisis and environmental concerns, wind power has seen a considerable increase in use over the past 10 years as a source of renewable energy. Since wind is intermittent and variable, it is obvious that as penetration levels rise, the influence of wind power generation on the electric power system must be considered. Wind power forecasting is essential because large‐scale wind power integration will make it more difficult to plan, operate, and control the power system. An accurate forecast is an efficient way to deal with the operational problems brought on by wind variability. To better utilize wind energy resources, it is essential to increase prediction accuracy. Frequently, the noise in the dataset causes the ramp events to be misclassified or overestimated. The main emphasis of this study is the pre‐processing of wind power data that produces precise time‐series data while reducing noise or artifact content and maintaining the swing feature of the original data. For this task, the data augmentation approach is proposed, where the augmented data (synthetic data) will be added up with the training data, in such a way as to make the strategy useful for real‐time applications. As the next step, the significant features, such as higher‐order statistical features and lower‐order statistical features are extracted. The extracted features act as the input to the recurrent neural network (RNN) classifier, the weights of which are tuned using the proposed honey badger crow (HBCro) optimization algorithm. The proposed HBCro optimization algorithm acts as the major contribution of the proposed model, and it is modeled with the integration of the concepts of the crow search optimization (CSO) algorithm and the honey badger optimization algorithm (HBA). The proposed system is simulated in MATLAB and the effectiveness of the proposed method is validated by comparison with other conventional methods in terms of Error measures. Furthermore, the developed HBCro‐based RNN obtained efficient performance in terms of MSE, MAE, RMSE, RMPSE, MAPE, MARE, MSRE, and RMSRE of 0.1578, 0.0442, 0.2102, 123.238, 124.72, 0.9944, 1.1799, and 1.0732, respectively.
Article
Full-text available
The design parameters serve as an integral part of developing a robust short-term traffic forecasting model. These parameters include scope determination, input data preparation, output parameters, and modelling techniques. This paper takes a further leap to analyse the recent trend of design parameters through a Systematic Literature Review (SLR) based on peer-reviewed articles up to 2021. The key important findings are summarised along with the challenges to performing short-term traffic forecasting. Intuitively, this paper offers insights into the next wave of research that contributes significantly to industries.
Article
Full-text available
Tracking congestion throughout the network road is a critical component of Intelligent transportation network management systems. Understanding how the traffic flows and short-term prediction of congestion occurrence due to rush-hour or incidents can be beneficial to such systems to effectively manage and direct the traffic to the most appropriate detours. Many of the current traffic flow prediction systems are designed by utilizing a central processing component where the prediction is carried out through aggregation of the information gathered from all measuring stations. However, centralized systems are not scalable and fail provide real-time feedback to the system whereas in a decentralized scheme, each node is responsible to predict its own short-term congestion based on the local current measurements in neighboring nodes. We propose a decentralized deep learning-based method where each node accurately predicts its own congestion state in real-time based on the congestion state of the neighboring stations. Moreover, historical data from the deployment site is not required, which makes the proposed method more suitable for newly installed stations. In order to achieve higher performance, we introduce a regularized Euclidean loss function that favors high congestion samples over low congestion samples to avoid the impact of the unbalanced training dataset. A novel dataset for this purpose is designed based on the traffic data obtained from traffic control stations in northern California. Extensive experiments conducted on the designed benchmark reflect a successful congestion prediction.
Article
Full-text available
This paper proposes a convolutional neural network (CNN)-based method that learns traffic as images and predicts large-scale, network-wide traffic speed with a high accuracy. Spatiotemporal traffic dynamics are converted to images describing the time and space relations of traffic flow via a two-dimensional time-space matrix. A CNN is applied to the image following two consecutive steps: abstract traffic feature extraction and network-wide traffic speed prediction. The effectiveness of the proposed method is evaluated by taking two real-world transportation networks, the second ring road and north-east transportation network in Beijing, as examples, and comparing the method with four prevailing algorithms, namely, ordinary least squares, k-nearest neighbors, artificial neural network, and random forest, and three deep learning architectures, namely, stacked autoencoder, recurrent neural network, and long-short-term memory network. The results show that the proposed method outperforms other algorithms by an average accuracy improvement of 42.91% within an acceptable execution time. The CNN can train the model in a reasonable time and, thus, is suitable for large-scale transportation networks.
Article
We develop a deep learning model to predict traffic flows. The main contribution is development of an architecture that combines a linear model that is fitted using regularization and a sequence of layers. The challenge of predicting traffic flows are the sharp nonlinearities due to transitions between free flow, breakdown, recovery and congestion. We show that deep learning architectures can capture these nonlinear spatio-temporal effects. The first layer identifies spatio-temporal relations among predictors and other layers model nonlinear relations. We illustrate our methodology on road sensor data from Interstate I-55 and predict traffic flows during two special events; a Chicago Bears football game and an extreme snowstorm event. Both cases have sharp traffic flow regime changes, occurring very suddenly, and we show how deep learning provides precise short term traffic flow predictions.
Article
Short-term traffic forecast is one of the essential issues in intelligent transportation system. Accurate forecast result enables commuters make appropriate travel modes, travel routes, and departure time, which is meaningful in traffic management. To promote the forecast accuracy, a feasible way is to develop a more effective approach for traffic data analysis. The availability of abundant traffic data and computation power emerge in recent years, which motivates us to improve the accuracy of short-term traffic forecast via deep learning approaches. A novel traffic forecast model based on long short-term memory (LSTM) network is proposed. Different from conventional forecast models, the proposed LSTM network considers temporal-spatial correlation in traffic system via a two-dimensional network which is composed of many memory units. A comparison with other representative forecast models validates that the proposed LSTM network can achieve a better performance.
Article
Deep learning approaches have reached a celebrity status in artificial intelligence field, its success have mostly relied on Convolutional Networks (CNN) and Recurrent Networks. By exploiting fundamental spatial properties of images and videos, the CNN always achieves dominant performance on visual tasks. And the Recurrent Networks (RNN) especially long short-term memory methods (LSTM) can successfully characterize the temporal correlation, thus exhibits superior capability for time series tasks. Traffic flow data have plentiful characteristics on both time and space domain. However, applications of CNN and LSTM approaches on traffic flow are limited. In this paper, we propose a novel deep architecture combined CNN and LSTM to forecast future traffic flow (CLTFP). An 1-dimension CNN is exploited to capture spatial features of traffic flow, and two LSTMs are utilized to mine the short-term variability and periodicities of traffic flow. Given those meaningful features, the feature-level fusion is performed to achieve short-term forecasting. The proposed CLTFP is compared with other popular forecasting methods on an open datasets. Experimental results indicate that the CLTFP has considerable advantages in traffic flow forecasting. in additional, the proposed CLTFP is analyzed from the view of Granger Causality, and several interesting properties of CLTFP are discovered and discussed .
Article
This paper shows how neuro computing can assist the pattern recognition of two main road system problems, namely the state of a road system and short term forecasting. Recognising congestion is subjective and is difficult to express such decision making algorithmically. There is also a need for rapid short term prediction. The benefit of a neural network is that it absorbs patterns in data and so can learn to generalise. The main features of a neural network approach, the trials of its application to a congestion recognition problem (based on Leicester SCOOT data), and the trials of short term forecasting of flows (again with SCOOT data) are all described. Included are graphs of predictions made. The paper goes on to discuss whether neural networks can be used to infer parameters not directly measureable on the street. -D.Jarratt