ArticlePDF Available

A Graph Convolutional Stacked Bidirectional Unidirectional-LSTM Neural Network for Metro Ridership Prediction

Authors:

Abstract and Figures

Timely precise metro ridership forecasting is helpful to reveal real-time traffic demand, which is a crucial but challenging task in modern traffic management. Given the complex spatial correlation and temporal variation of riding behaviour in a metro system, deep learning algorithms have been widely applied owing to their superior performance in capturing spatio-temporal features. However, current deep learning models utilize regular convolutional operations, which can barely provide satisfactory accuracy due to either the ignorance of realistic topology of a traffic network or insufficiency in capturing representative spatiotemporal patterns. To further improve the accuracy in metro ridership prediction, this study proposes a parallel-structured deep learning model that consists of a Graph Convolution Network and a stacked Bidirectional unidirectional Long short-term Memory network (GCN-SBULSTM). The GCN module regards a metro network as a structured graph, and a K-hop matrix, which integrates the travel distance, population flow, and adjacency, is introduced to capture the dynamic spatial correlation among metro stations. The SBULSTM module considers both backward and forward states of ridership time series and can learn complex temporal features with stacked recurrent layers. Experiments are conducted on three real-life metro ridership datasets to demonstrate the effectiveness of the proposed model. Compared with state-of-the-art prediction models, GCN-SBULSTM presents better performance in multiple scenarios and largely enhances the efficiencies of training processes.
Content may be subject to copyright.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 1
A Graph Convolutional Stacked Bidirectional
unidirectional-LSTM Neural Network for Metro
Ridership Prediction
Pengfei CHEN, Xuandi FU, Xue WANG
Abstract—Timely precise metro ridership forecasting is helpful
to reveal real-time traffic demand, which is a crucial but challeng-
ing task in modern traffic management. Given the complex spatial
correlation and temporal variation of riding behaviour in a
metro system, deep learning algorithms have been widely applied
owing to their superior performance in capturing spatio-temporal
features. However, current deep learning models utilize regular
convolutional operations, which can barely provide satisfactory
accuracy due to either the ignorance of realistic topology of a
traffic network or insufficiency in capturing representative spa-
tiotemporal patterns. To further improve the accuracy in metro
ridership prediction, this study proposes a parallel-structured
deep learning model that consists of a Graph Convolution Net-
work and a stacked Bidirectional unidirectional Long short-term
Memory network (GCN-SBULSTM). The GCN module regards a
metro network as a structured graph, and a K-hop matrix, which
integrates the travel distance, population flow, and adjacency,
is introduced to capture the dynamic spatial correlation among
metro stations. The SBULSTM module considers both backward
and forward states of ridership time series and can learn complex
temporal features with stacked recurrent layers. Experiments
are conducted on three real-life metro ridership datasets to
demonstrate the effectiveness of the proposed model. Compared
with state-of-the-art prediction models, GCN-SBULSTM presents
better performance in multiple scenarios and largely enhances
the efficiencies of training processes.
Index Terms—Deep learning model, traffic prediction, spatio-
temporal dependency, origin-destination flow, parallel structure.
I. INTRODUCTION
MULTI-SCALE precise traffic forecasting is one of the
most fundamental and crucial tasks for urban trans-
portation control and management, where metro ridership
prediction has attracted increasing concerns from both the
academic community and authorized departments because of
the vital position of the subway in urban public transportation
system [1], [2]. It is a challenging task to make collaborative
spatial-temporal predictions for metro ridership due to the
complicated spatial structure of traffic networks, temporal
Manuscript received ... The research is not funded by a specific project
grant. P. CHEN and X. FU contribute equally to this article. (Corresponding
author: Pengfei CHEN)
P. CHEN and X. WANG are with School of Geospatial Engineering
and Science, Sun Yat-Sen University, Guangzhou 510275, Guangdong,
China, and also with the Southern Marine Science and Engineering Guang-
dong Laboratory (Zhuhai), Zhuhai 519082, Guangdong, China. (email:
chenpf9@mail.sysu.edu.cn, wangxue25@mail.sysu.edu.cn)
X. FU is with Department of Electrical and Computer Engineering, The
Carnegie Mellon University, Pittsburgh, PA 15213, USA. (e-mail: xuan-
dif@andrew.cmu.edu)
variations, and uncertainty inherited from human behaviour.
Recently, owing to the rapid development of artificial intelli-
gence, computation power and abundant traffic data supported
by novel collection and storage techniques, the booming deep
learning approaches have flushed current prediction-related re-
search and promoted significant progress in traffic forecasting
[3]–[5].
Deep learning methods have been reported to outperform
traditional statistical models in many applications, especially
in time series forecasting [6]. Typical statistical models, such
as auto-regressive integrated moving average (ARIMA) [7]
and its variants [8], [9], are commonly adopted for single time
series prediction, while they ignore the potential dependency
among multiple time series under relatively complex traffic
conditions. In contrast, deep learning approaches employ mul-
tiple processing layers and allow the models to learn abstracted
features and non-linear dependencies from large-scale traffic
datasets, which makes deep learning methods as a major
solution in current traffic forecasting.
Given the well-acknowledged performance in time series
forecasting, Recurrent Neural Networks (RNN) and its vari-
ants, such as Long Short-term Memory (LSTM) and Gated
Recurrent Unit (GRU), are widely employed in the mainstream
studies for traffic forecasting [10]–[12]. However, RNN-based
models employ only the temporal features in travel behaviour,
while ignoring the underlying spatial dependencies within a
traffic network [13]. To capture the spatial dependencies in
traffic data, a batch of studies utilizes Convolutional neural
network (CNN) to build prediction models, in which a traffic
network is commonly transformed into an image based on
its geographic locations [4], [14], [15]. However, CNN-based
models only consider the absolute distance relationship among
stations in 2D Euclidean space, while the non-Euclidean
structural features in traffic networks, such as the connectivity,
is not fully learned. Also, due to the predefined image size,
CNN-based models are prone to generating distorted spatial
relationships, which limits their adaption to the varying struc-
ture of traffic networks in the real world [16].
Compared with CNN, Graph Convolutional Network (GCN)
provides a more feasible way to model spatial dependencies
within a traffic network. Given the inherent graph structure
of a traffic network, GCN is naturally capable of preserv-
ing realistic topology and capture the dependencies between
metro stations by aggregating nodes’ information through
graph convolution [17]. However, the effective construction
of graphs and the integration of GCN with existing neural
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2
networks remain as two open problems in current studies.
For the first issue, the most relative works directly adapt
physical topologies within a network, such as the adjacency, to
build graphs [18]–[20]. Nevertheless, given the latent spatio-
temporal dependencies implied in traffic data, such as the
travel distance and population flow between traffic sites, some
virtual graphs can be build up based on prior knowledge
to improve the effectiveness of GCN [21], [22]. As for the
integration of GCN, many relative studies combine GCN and
RNN models to build a joint prediction model, so as to capture
both spatial and temporal features for the forecasting problems.
For example, Cui et al. [23] designed an architecture that uses
the output of multiple GCNs as the input of LSTM for traffic
speed prediction. Jin et at. [24] fused the output of GCN
and variational auto-encoder model and fed the result into a
Seq2seq GRU module to predict urban ride-hailing demand.
However, the extracted features based on such sequential
structure can be distorted when converting the convolution
results, which might lead to information loss and uncertain
predictions [5].
Based on the aforementioned problems, we propose a
parallel GCN and Stacked Bidirectional Unidirectional LSTM
model (GCN-SBULSTM) for metro ridership forecasting. In
GCN-SBULSTM, both physical topology and virtual graphs,
including adjacency, travel distance and population flow
among metro stations, are used to construct the GCN mod-
ule, and a K-hop weight matrix is introduced to adaptively
determine the extent of neighbor information to be considered
in each graph. The SBULSTM module is used to handle
temporal dependencies, which is capable of capturing long-
term dependencies by considering both the backward and
forward correlations in ridership time series. This architecture
is expected to inherit the merits from both GCN in extracting
realistic spatial dependencies and SBULSTM in capturing
temporal features, while reducing their interference using a
parallel instead of a sequential structure.
In summary, the main contributions of this study include:
Propose a new deep learning architecture composed of
two parallel modules considering both spatial and tem-
poral dependencies for metro ridership prediction;
Design a novel K-hop weight matrix, which integrates ad-
jacency, travel distance, and population flow among metro
stations, to represent metro networks, and incorporate the
matrix into the GCN module to enhance the extraction of
realistic spatial dependency;
Integrate stacked bidirectional recurrent layers into the
model, which improves its ability in capturing long-term
context and generating a higher level of representation of
sequence data.
II. RE LATE D WO RK
A. Machine Learning for Traffic Forecasting
In early machine learning problems for traffic forecasting,
state data form different traffic sites are always organized in
terms of their collection timestamps as a batch of time series,
and RNN-based models, such as GRU and LSTM, are widely
used given their ability in remembering important information
about the sequential input with its internal memory. For
instance, Yu et al. [10] combined deep LSTM and stacked
autoencoder to capture both the temporal and static features in
traffic data for traffic flow forecasting, and experimental results
on real-world data showed that their model can significantly
improve the predictive performance especially under extreme
conditions, such as peak-hour and post-accident scenarios.
Considering the periodicity of metro riding behaviour given
the regularity in human’s daily activities, Cui et al. [11]
designed a stacked bidirectional and unidirectional LSTM
framework, which concerned both forward and backward de-
pendencies of traffic data, for traffic speed prediction over the
whole urban traffic networks. Those studies have demonstrated
the superiority of RNN in extracting temporal features for traf-
fic forecasting. However, it is challenging to use solely RNN-
based models to maintain the spatial features and topological
information in traffic data, which limits their effectiveness in
practical applications [13].
Noticing the promising achievement of CNN in computer
vision [25], [26], many studies generalize CNN to learn the
spatial dependencies in Euclidean space. For example, Zhang
et al. [4] proposed a deep neural network to predict citywide
crowd flow, in which multiple CNN layers were applied on
traffic demand heatmaps to extract spatial features. This model
was further developed in [15] by being integrated with resid-
ual learning to capture large-scale spatial dependencies. By
sequentially connecting CNN and LSTM networks, Yu et al.
[27] proposed a spatio-temporal recurrent convolution network
(SRCNs) for traffic speed forecasting. Yao et al. [28] devel-
oped a Deep Multi-View Spatial-Temporal Network (DMVST-
Net) for taxi demand prediction, which jointly concerned the
spatial, temporal, and semantic relations using LSTM, CNN
and graph embedding, respectively. To reduce the interference
of sequentially connected LSTM and CNN modules, Ma et
al. [5] proposed a parallel CNN-BLSTM framework for metro
ridership prediction. The results also proved that the paral-
lel structure could significantly improve prediction accuracy.
However, these models inherit the drawbacks of CNN that
ignores topology information within a traffic network, which
inevitably hampers their performance in given the increasing
complexity of traffic patterns [16].
B. Graph Convolution Networks
For the last few years, the emergence of GCN has refreshed
the way of modelling traffic data. By treating a traffic network
as a graph instead of the predefined image in CNN, GCN
can largely preserve the realistic topological information and
thus benefiting the extraction of comprehensive spatial features
[17], [19]. Also, GCN can greatly preserve the globality of
metro networks through conducting convolution on the whole
structured graphs, which theoretically outperforms CNN that
can only capture neighbouring spatial pattern due to limited
kernel window size.
By combining with temporal dynamics, GCN-based models
have made significant progress in traffic forecasting problems.
For instance, Li et al. [18] proposed a diffusion convolutional
recurrent neural network (DCRNN), in which the traffic flow
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 3
was modelled as a diffusion process on a directed graph and
spatial dependency was captured using bidirectional random
walks on the graph. Wu et al. [29] developed a novel archi-
tecture named Graph-WaveNet, which adopted an adaptive ad-
jacency matrix through embedding technique to capture hidden
spatial dependencies. Yu et al. [30] combined graph convolu-
tion and gated temporal convolution to capture precise spatio-
temporal correlations for traffic speed forecasting, which also
enhanced training efficiency with a reduced number of parame-
ters. Lu et al. [21] used a dynamic weighted graph to modelled
the road relationship and developed an adaptive graph gate
convolution network for traffic flow prediction. Rather than
the single physical topology in traffic networks, some domain
knowledges also included in recent studies to guide graphs
construction. For example, Du et al. [31] extracted virtual
stations using a density-peak based clustering method and
developed a dynamic convolution neural network to predict
traffic demands. Liu et al. [22] established a Physical-Virtual
Collaboration Graph Network (PVCGN), which incorporates
the connection among metro stations, ridership similarity, and
inter-station passenger flow into a Graph Convolution Gated
Recurrent Unit for spatio-temporal dependency learning. How-
ever, due to the large number of parameters in PVGCN, its
efficiency is relatively low compared to other models.
III. METHODOLOGY
In this section, we formulize the learning problem of metro
ridership forecasting and elaborate on the motivation and
detailed steps for the construction and combination of the GCN
and SBULSTM module.
A. Metro ridership forecasting problem
Metro ridership forecasting is a fundamental spatio-
temporal prediction problem given the spatial correlation and
periodicity of people’s daily riding behaviour. Ridership data
of each metro station are commonly summarized using a
specific time interval and thus forming a batch of time series
for further operation. In this study, our goal of is to predict
the ridership in next mtime intervals given the historical data
in previous ntime intervals. Based on the observations from s
metro stations, the input data for our model can be expressed
as a matrix X:
X= [XTn, XTn+1,· · · , XT1]
=
x1
Tnx1
Tn+1 ... x1
T1
x2
Tnx2
Tn+1 ... x2
T1
... ... ...
xs
Tnxs
Tn+1 ... xs
T1
(1)
where XTiencodes the ridership vector measured at the ith
time intervals before timestamp T, and xjcorresponds to the
ridership data of jth station.
In addition to the raw ridership data, the metro network
can be represented by an undirected graph, G= (V, E )where
V,Edenotes the set of stations and lines in the network,
respectively. Vencodes the features of nodes, which in this
task refers to the ridership time series of each station. E
encloses all edges linking two nodes, of which the values can
be different based on raw data. For example, in a modern metro
system, the riding behaviour is always recorded using smart
card transaction logs, thereby some personal information, such
as the card ID and travel path, can be used as valuable
supporting information for E. For simple usage, we use I
to represent these additional data. Therefore, the forecasting
problem can be formulated as learning a function f:
[XTn,· · · ,XT1;G;I]f
[XT,· · · ,XT+m1](2)
The resultant prediction is denoted by ˆ
X=
[XT,· · · , XT+m1]in the rest of this paper, where each
element is a vector of the sstations’ ridership at a future
time step.
B. Foundation of the Graph Convolution Network Module
Currently, several strategies have been investigated to build
effective graphs for traffic forecasting. Commonly employed
are adjacency matrix [32] and Laplacian matrix [33], [34].
GCN based on Laplacian matrix incorporates the spectral
theory to graph convolution, which is often named as spec-
tral graph convolution. The classic GCN encodes adjacency
relationship among nodes to represent arbitrarily structured
graphs. It normally utilizes a binary-encoded adjacency matrix
Ato denote the connectivity among nodes. If node iand jare
directly connected in the metro network, Aij = 1, otherwise
Aij = 0.
However, within collaborated spatial-temporal prediction,
spatial dependency should be considered dynamically, as it
could vary in different scenarios. For example, ridership of
distant stations may exhibit low correlation within a short
counting period, e.g., 10-minute interval, while the correlation
could significantly increase with the length of a target predic-
tion interval due to the city-scale globality of passenger riding
behaviour. Therefore, simply applying a binary adjacency
matrix or predefined stationary distance is not sufficient to
handle complex scenarios.
To tackle aforementioned problems, we initialize the GCN
module by intuitively taking metro stations as nodes, and
define three graphs, including travel distance graph, population
flow graph, adjacency graph, to weight the edges.
Adjacency graph: The connection between metro sta-
tions is widely acknowledged to affect the relationship of
their ridership [18], [30]. However, traditional adjacency
matrix mostly focuses on the directly connected stations,
i.e. the 1-order neighboured stations, while the indirect
connection among stations has been ignored. Therefore,
as shown in the first line of 1, a k-hop adjacency matrix
Akis adopted in this study to encode the direct and
indirect adjacency relationship among metro stations.
Given a constant K, each element Ak
ij should be 1 if
station iand jare K-order neighboured; otherwise, the
element is set to 0. mathematically:
Ak
ij =1, k
0, otherwise (3)
where denotes the least number of steps from station i
to j.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 4
0.20 0.10 0.30 0.26 0.26 0.30
1 12220
21 3330
123220
123220
1232 2 0.44
234331
0.15 0.23 0.35 0.22 0.25 0.36
1 12220
21 3330
123220
123220
1232 2 0.41
234331
60 35 75 67 102 76
15 10 8 15 12 10
10 10 68015
30 8 613 10 35
25 15 8 13 11 23
45 12 010 11 63
33 2116625
30 20 55 56 80 63
15 10 8 15 12 8
10 10 6806
30 8 613 10 30
25 15 8 13 11 15
45 12 010 11 43
33 2116625
121112
1 12123
21 3234
123223
1222 23
12322 1
234331
111111
11111
11 1
11 11
1111 1
11 11 1
11
0 600 1300 300 1000 800 1600
600 0 700 900 1600 1400 2200
1300 700 0 1600 2300 2100 2900
300 900 1600 0 1300 1100 1900
1000 1600 2300 1300 0 1800 2600
800 1400 2100 1100 1800 0 800
1600 2200 2900 1900 2600 800 0
15 10 30 25 45 33
18 9 7 13 10 2
12 15 5902
33 9 516 12 19
25 15 8 13 11 6
44 16 011 12 24
35 2116820
0.16 0.22 0.34 0.24 0.36 0.39
0.16 0.27 0.12 0.19 0.17 0
0.2 0.27 00.21 00
0.34 0.12 00.21 0.15 0
0.24 0.19 0.21 0.21 0.14 0
0.36 0.17 00.15 0.14 0.31
0.39 00000.31
1 0.580.080.870.220.380.02
0.58 1 0.47 0.29 0.02 0.05 0
0.08 0.47 1 0.02 0 0 0
0.87 0.29 0.02 1 0.08 0.16 0
0.22 0.02 0 0.08 1 0.01 0
0.38 0.05 0 0.16 0.01 1 0.38
0.02 0 0 0 0 0.38 1
2
5
1
4
3
6
7
2
5
1
4
3
6
7
800
1000
800
300
600
700 2000
2
5
1
4
3
6
7
Selection
Gaussian kernel
Selection
K-hop
Selection
Normalization
(a) Adjacent graph
(b) Travel distance graph
(c) Population flow graph
(g) K-hop adjacency matrix
(h) Distance weight*
(i) Population flow weight(f) Series of OD matrix
(e) Travel distance matrix
(d) K-order neighbor matrix
Fig. 1. Illustration of graphs consisting the K-hop weight matrix used in the GCN module (K=2). *Extremely small value (i.e. smaller than 0.005) is shown
as zero.
Travel distance graph: According to the First Law
of Geography and results from previous studies, traffic
behaviours occurred closely are likely to be related [35]–
[37]. For instance, the ridership pattern of neighbouring
stations along metro lines can be highly correlated as
passengers within a region may have similar daily travel
pattern. Therefore, from the view of the geography, we
take the travel distance as an important factor during the
initialization of GCN module. An example is illustrated
in the second line of Figure 1. Specifically, following the
definition in [18], we calculate the distance weight matrix
Dusing a Gaussian Kernel [38], where the element Dij
is calculated as:
Dij = exp(dist(vi, vj)2
σ2)(4)
where dist(vi, vj)indicates the shortest travel distance
along the metro network between station viand vj,σis
the standard deviation of travel distances.
Population flow graph: Population flow is a virtual con-
nection between metro stations, which reflects their latent
dependencies implied by the regularity of passengers’
daily activity. A large population flow should indicate a
relatively high dependency between metro stations [39],
[40]. However, as the population flow temporally varies,
a series of population flow matrixes are generated to
dynamically represent the dependency. As shown in the
third line of 1, we extract the origin-destination (OD)
flows between metro station and generate the population
flow matrix Fthrough normalization. Each element Fij
of matrix Fis calculated as:
Fij =1
2(Nji
n
P
k
Njk
+Nij
n
P
k
Nik
)(5)
where Nij is the number of passengers travelling from
station ito station jduring a specific timespan. In
this work, this timespan is defined as the period from
the earliest historic frame to the last one. In addition,
considering the large number of stations in a common
metro system, we set Nij = 0 if Ak
ij = 0 to reduce the
interference from distant stations and enlarge the weights
of nearby stations with a large population exchange.
Finally, we define a k-hop weight matrix, Mk, which
integrates the graphs of K-hop adjacency, travel distance, and
population flow for dynamically capturing spatial dependency
among stations. Mathematically:
Mk=FDAk(6)
where stands for element-wise multiplication, threshold k
should be treated as a hyperparameter in the model, thereby
ensuring the most significant spatial correlation among stations
can be learned.
Based on the proposed K-hop matrix, the graph convolution
can be defined as:
hl+1
g=g(hl
g, M k)(7)
g(hl
g, M k) = ReLU(Mkhl
gWl
g)(8)
ReLU(x) = max(0, x)(9)
where hl
g,hl+1
gare the input graph generated by the former
layer land the output graph at layer l+ 1, respectively. Wg
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 5
𝒙𝑻−(𝒏−𝟏)
𝒙𝑻−𝒏 𝒙𝑻−𝟏
𝒙𝑻−𝟐
LSTM LSTMLSTM LSTM LSTM
LSTM LSTMLSTM LSTM LSTM
𝒉𝑻−(𝒏−𝟏)
𝒉𝑻−𝒏 𝒉𝑻−𝟏
𝒉𝑻−𝟐
×+
×
𝜎 𝜎 tanh 𝜎
tanh
×
𝑪𝒕−𝟏
𝒉𝒕−𝟏
𝒙𝒕
𝒉𝒕
𝑪𝒕
𝒉𝒕
𝒇𝒕𝒊𝒕
𝒄𝒕𝒐𝒕
Fig. 2. Architecture of a Bidirectional Long-short Term Memory network
and an LSTM Memory Cell.
is the trainable weight matrix for generating output features
at each layer. A non-linear activation function (ReLU) is
employed after each convolutional layer before the features
are forwarded to the next layer.
C. Foundation of the Stacked Bidirectional Unidirectional
LSTM Module
From the temporal perspective, metro ridership variations
possess several special characteristics, including non-linearity,
periodicity and regularity [41]. Considering those features,
the Stacked Bidirectional Unidirectional LSTM (SBULSTM)
framework is adopted to learn the complex temporal pattern
from the historical ridership inputs and to make sequential
predictions [11]. The theoretical foundation and detailed steps
for SBULSTM are elaborated in the following content.
1) Long short-term Memory:LSTM architecture is the
basic unit in SBULSTM for capturing temporal feature of
metro ridership data. It has been widely acknowledged that
LSTM outperforms other recurrent architectures for handling
sequence-based tasks with long-term dependencies. Its sophis-
ticated gated memory mechanism has helped to avoid gradient
vanishing or exploding problems exhibiting in traditional RNN
[42]. As demonstrated in Figure 2, each LSTM cell contains
three gates, including the input gate it, forget gate ft, and
output gate ot. The input gate determines the information to be
preserved, forget gate controls the partition to be abandoned,
and output gate decides the result to be generated [43]. De-
tailed procedures for calculating three gates and cell memory
in each memory unit is represented as follows:
it=σ(Wixt+Uiht1+bi)(10)
ft=σ(Wfxt+Ufht1+bf)(11)
ot=σ(Woxt+Uoht1+bo)(12)
ct= tanh(Wcxt+Ucht1+bc)(13)
where Wi,Wf,Woare the weighted matrices and bi,bf,bo
and bcare bias vectors of LSTM to be learned during training.
σis the gate activation function, which normally indicates the
sigmoid function. Based on those three gates, the cell output
state ctand the hidden layer output htof current cell can be
generated as follows:
ct=ftct1+it
ct,(14)
ht=ottanh(ct)(15)
where stands for element-wise multiplication, and tanh
is the hyperbolic tangent function. Here, when taking the
ridership prediction problem as an example, only the last
element of the output vector
2) Bidirectional Long short-term Memory:Bidirectional
LSTM network is utilized for capturing the periodicity and
regularity of metro ridership. It is noted that LSTM structure
can only make use of forward dependencies and inevitably fil-
ter out useful information due to the long-term gated memory
chain. The bidirectional LSTM structure can help solve the
problem through concatenating forward and backward LSTM
layers [44]. It can employ hidden states from both direc-
tions, complementing for the information loss along the chain
within LSTM. Therefore, bidirectional LSTM has a better
capability for capturing long-term contextual dependencies in
sequential prediction tasks and making more precise sequential
predictions [45], [46]. Apart from that, the periodicity of
metro ridership pattern is another consideration for including
backward temporal dependency in the model. Unlike traffic in-
cident, wind speed or other randomly organized features, metro
traffic possesses strong periodicity and regularity. Utilizing
bidirectional information can enhance the ability in modelling
periodic pattern of metro ridership and making comprehensive
predictions.
The bidirectional LSTM network contains two parallel
LSTM layers in both propagation directions, as shown in
Figure 2.
ht=LSTMfw(xt,
ht1)(16)
ht=LSTMbw(xt,
ht+1)(17)
LSTMfw and LSTMbw denote the forward and backward
LSTM, respectively.
htand
htare the hidden states of the
input temporal feature xtlearned from bidirectional LSTM.
The bidirectional hidden state htfor each input xtis obtained
through concatenating the generated forward and backward
hidden states:
ut= [
ht,
ht](18)
3) Stacked Bidirectional Unidirectional LSTM:Deep re-
current networks have demonstrated its ability to generate a
higher level of representation from sequential input in previous
studies [47]–[49]. The prediction power of a neural network
can be enhanced through deepening model structure, of which
the effectiveness has been proved in many domains, such as
speech processing [47], [48], text recognition [49] and so on.
Therefore, to break through the limited performance of single
LSTM or BLSTM architecture, this study adopts SBULSTM
proposed in [11] to learn the temporal dependencies in rid-
ership data. In SBULSTM, The output of BLSTM network
is further fed to LSTM layer to generate higher sequential
representations. Theoretically, SBULSTM inherits the merits
from both LSTM and BLSTM, which on the one hand can
capture both forward and backward temporal dependency, and
on the other hand, allow a higher level of representation of the
ridership data. Nevertheless, it has not been incorporated with
spatial learning module previously, which limits its capability
for making a comprehensive spatial-temporal prediction.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 6
Fig. 3. Architecture of the Proposed Graph Convolutional Stacked Bidirectional-LSTM Neural Network (GCN-SBULSTM)
D. Spatial-Temporal Prediction with GCN-SBULSTM
Previous studies commonly combined spatial and temporal
modules sequentially. For instance, the generated output from
CNN is fed into LSTM network in [27], [28]. However, the
original flow pattern may be distorted passing through complex
spatial operations, i.e. deep convolutions, as the generated
output from convolutional layers cannot fully represent the
pattern of raw metro ridership data [5].
Therefore, to preserve the effectiveness of spatial and
temporal modules as much as possible and integrate their
results for a mutual complement, this study establishes a new
deep learning architecture, in which a GCN and SBULSTM
module are parallelly combined to make predictions for future
time frames. The effectiveness of such a parallel structure in
ridership prediction has been proved in previous study [5],
and this study is an extension by using a dynamic graph
learning approach instead of CNN. As shown 3, ridership data
are first organized into two forms, including dynamic graphs
and time series; then these two forms of data are respectively
fed to the GCN and SBULSTM modules to learn spatial
and temporal dependencies, the outputs can be represented
by HG= [hg1,· · · , hgk]and HT= [ht1,· · · , htp], where
kand pis the number of hidden units in the last layer
of GCN and SBULSTM module, respectively; finally, the
flattened outputs of two modules, OG= Flatten(HG)and
OT= Flatten(HT), are concatenated, and a fully connected
layer with dropout mechanism are applied to obtain the
prediction results, which can be formulized as follows:
ˆ
X=Wst(OGkOT) + bst (19)
Fig. 4. Shenzhen metro network.
where Wst and bst are the trainable weight and bias parameters
for generating final predicted results ˆ
X.kis the concatenating
operator.
IV. EXP ER IM EN TS
A. Data description
Three metro ridership datasets are used to validate the
effectiveness of the proposed GCN-SBULSTM: 1) a real-
world ridership dataset named SZMetro, which was collected
from the metro system in Shenzhen, China; 2) two public
ridership datasets shared in [22], respectively named HZMetro
and SHMetro, which are used for benchmark tests. The details
of these three datasets are summarized in Table I.
SZMetro: This dataset was collected during Jan. 17 2017 to
Feb. 22 2017 based on the transaction records provided by the
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 7
TABLE I
DATASET S SU MMA RY
Dataset SZMetro HZMetro SHMetro
City Shenzhen Hangzhou Shanghai
# Station 166 80 288
Time interval 4 min 15 min 15 min
# Samples per day 270 70 70
Train Timespan
17/01/2017 -
28/01/2017; 1/01/2019 - 7/01/2016 -
10/02/2017 - 1/18/2019 8/31/2016
17/02/2017
Test Timespan 18/02/2017 - 1/21/2019 - 9/10/2016 -
22/02/2017 1/25/2019 9/30/2016
metro system in Shenzhen, China. At the collection time, there
were 8 running metro lines with a total of 166 metro stations
for Shenzhen metro system, as shown in Figure 4. Each
record contains passengers’ inbound information, including the
transaction time and name of stations. Station-level ridership is
summarized based on a 4-minute time interval. As the service
time of Shenzhen metro is from 6 AM to the midnight, 270
ridership samples per day are obtained. It is noticeable that the
timespan of training data in SZMetro is discontinuous, this is
because the period from Jan. 28 to Feb. 9 corresponds to the
Spring Festival holiday in China, leading a significant decline
in ridership and very different dynamic patterns compared
to other dates. To avoid the influence of these special dates
and retain the universality of trained models, data from Jan
28 to Feb 9 are discarded in the experiments on SZMetro.
Consequently, ridership data from the first 20 days are used
for training, and the data from the last 5 days are used for
testing.
HZMetro and SHMetro: These two datasets were built
up based on the metro system in Hangzhou and Shanghai,
respectively. They were both summarized with a 15-minute
time interval, generating 70 samples per day. It is notable
that, since no station information is provided for HZMetro and
SHMetro, we cannot transform the metro network to an image,
so that all state-of-the-art models containing CNN module will
not be tested on these two datasets. More information about
HZMetro and SHMetro can be found in [22].
B. Experiment design
Experiments include two main parts:
1) Test the performance of GCN-SBULSTM with respect
to different temporal scales (i.e., different input and output
length) and validate the effectiveness of different graphs used
in our model. This part of experiments is conducted on
SZMetro because its raw data are available, so that we can
easily reorganize the raw data for different prediction tasks.
Specifically, two tasks are designed on SZMetro to test the
performance of GCN-SBULSTM:
Task 1 (5 to 5), to forecast the next 5 samples based on
previous 5 samples,
Task 2 (10 to 10), to predict the next 10 samples using
previous 10 samples.
2) Run benchmark tests on open datasets HZMetro and
SHMetro using its original input and output length (i.e., 4)
to further verify the superiority of GCN-SBULSTM.
Fig. 5. The input image of CNN.
To demonstrate the advantages of GCN-SBULSTM, classic
deep learning architecture, including LSTM, CNN, GCN, and
advanced models, including DMVST-Net [28], CNN-LSTM
[5], SRCNs [27], SBULSTM [11], DCRNN [18], STGCN
[30], Graph WaveNet [29] and PVCGN [22] are implemented
for comparison. In addition, ablation test is performed to
analyze the effectiveness of K-hop matrix used in GCN-
SBULSTM. Specifically, suffixes are used to distinguish dif-
ferent ablated models: ”w/o dist” indicates the model without
using distance graph,”w/o OD” denotes the model without
using population flow graph; ”w/o K-hop” stands for the
model using only the traditional adjacency matrix in the GCN
module.
C. Computational environment and experimental setup
All experiments are compiled and tested on a desktop
equipped with an Intel(R) Core(TM) CPU i9-10940X and an
NVIDIA GTX 2070i running Windows 10. The parameters for
each prediction model are carefully tuned to obtain the best
accuracy on test dataset: most models are tested following the
setting in their original paper, while minor adjustments are
made on tunable parameters, such as the number of hidden
units and batch size, to enhance the accuracy as much as
possible.
To generate the traffic image for experiments on SZMetro,
the metro network map is divided by a 60 ×60 grid, which
follows the setting in [5]. Through the division, 5 pairs of
metro stations fall into the same cell, and one of each pair is
assigned to the nearest cell to avoid overlapping. The resultant
image input for CNN is exemplified in Figure 5. The value
of each cell is set to the average ridership during the trained
time frames of the corresponding metro station. The number
of hidden unit of LSTM, as well as SBULSTM, is set to 1000
for SZMetro and SHMetro, 600 for HZMetro; two stacked
GCN layers with 60 and 80 channels and a fully connected
layer with hidden units of 10 are sequentially connected in the
GCN module.
As for optimizing the training process of GCN-SBULSTM,
the batch size is set to 32 and 64, respectively, for Task 1
and Task 2 on SZMetro, while the batch size is 8 and 64 for
HZMetro and SHMetro. Adam is selected as the optimizer
for training considering its good performance in preliminary
tests. The initial learning rate is set to 0.001 and the decay
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 8
TABLE II
RES ULTS O F TASK 1ON SZME TRO
Model MAE RMSE MAPE Average time per epoch
LSTM 8.49 15.02 30.31% 2s
CNN 8.69 15.97 31.08% 1s
GCN 8.65 15.83 30.55% 1s
DMVST-Net 8.34 14.85 29.80% 2s
CNN-LSTM 8.35 14.64 30.56% 2s
SRCNs 8.36 15.68 28.15% 3s
SBULSTM 8.27 15.13 27.72% 2s
DCRNN 8.29 14.84 28.32% 23s
STGCN 8.46 14.91 31.95% 4s
Graph-WaveNet 8.35 16.12 31.24% 12s
PVGCN 8.24 14.05 30.01% 264s
GCN-SBULSTM 7.96 14.41 27.94% 3s
GCN-SBULSTM w/o OD 8.02 14.60 27.72% 3s
GCN-SBULSTM w/o dist 8.04 14.48 28.44% 3s
GCN-SBULSTM w/o K-hop 8.06 14.56 27.86% 3s
TABLE III
RES ULTS O F TASK 2ON SZME TRO
Model MAE RMSE MAPE Average time per epoch
LSTM 9.00 16.39 31.45% 2s
CNN 9.74 18.80 33.33% 2s
GCN 9.20 17.22 30.77% 1s
DMVST-Net 8.89 16.22 31.43% 3s
CNN-LSTM 8.78 15.83 31.34% 3s
SRCNs 8.84 16.78 29.57% 3s
SBULSTM 8.58 16.32 28.84% 3s
DCRNN 8.56 15.47 28.23% 45s
STGCN 8.77 15.97 33.94% 6s
Graph WaveNet 8.75 16.56 29.68% 12s
PVGCN 8.48 14.90 29.39% 530s
GCN-SBULSTM 8.36 15.62 28.38% 4s
GCN-SBULSTM w/o OD 8.42 15.87 28.47% 4s
GCN-SBULSTM w/o dist 8.44 15.86 28.87% 4s
GCN-SBULSTM w/o K-hop 8.46 15.99 28.55% 4s
ratio is 0.1. Early stopping is applied during training to avoid
overfitting.
This study evaluates the performance of each model us-
ing three common metrics, including Mean Absolute Error
(MAE), Mean Absolute Percentage Error (MAPE), and Root
Mean Square Error (RMSE), which are defined as follows:
MAE = 1
n
n
X
ib
YiYi(20)
MAPE = 1
n
n
X
i
b
YiYi
Yi
(21)
RMSE = v
u
u
t1
n
n
X
ib
YiYi(22)
where nis the length of samples, b
Yiis the predicted ridership
and Yiis the actual ridership. MAE is also adopted as the
loss function in the training process. Specifically, as MAPE
and RMSE are respectively sensitive to small ground truth
and large error value, we take MAE, which is more robust to
outlier and can reflect actual error [50], as the main metric in
our following discussion.
V. RES ULT ANA LYSI S
A. Task 1 and 2 on SZMetro
The performance of different models on Task 1 and 2 are
summarized in Table II and Table III, respectively. Among all
tested models, CNN and GCN obtain the worst accuracies
with MAE values of 8.69/9.74 and 8.65/9.20 for Task 1
and 2, which indicates the limited effectiveness of adopting
only spatial dependency in ridership forecasting. However, the
better performance of GCN demonstrates its advantages in
capturing realistic spatial dependencies for ridership forecast-
ing. By integrating CNN and LSTM to capture both spatial
and temporal dependencies, DMVST-Net, CNN-LSTM, and
SRCNs can achieve better and similar performance, which
reduces the MSE value to around 8.35 in Task 1 and 8.80
in Task 2. However, these models are easily affected by the
uncertainty of the size of input image for CNN module due
to CNN’s difficulty in fully representing the topologies of a
metro network. Thanks to the advantage of graph learning,
DCRNN and PVGCN significantly improve the accuracy with
an MAE value lower than 8.30 in Task 1 and 8.60 in
Task 2. Notably, even with graph learning module, STGCN
and Graph-WaveNet just obtain results with a similar level
of CNN-based models, which should be explained as the
interference of spatial and temporal dependencies caused by
their sequential structure. Surprisingly, SBULSTM achieves
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 9
Fig. 6. Results of GCN-SBULSTM with different values of Kon SZMetro
competitive performance compared with DCRNN with MAE
values of 8.27 in Task 1 and 8.58 in Task 2. Given the
limited accuracy of LSTM, the outstanding performance of
SBULSTM proves the effectiveness of Bidirectional-LSTM
architecture in ridership forecasting.
In comparison with the above-mentioned models, the pro-
posed GCN-SBULSTM is reported to achieve the best pre-
diction accuracy, in terms of the lowest MAE of 7.96 in
Task 1 and 8.36 in Task 2. GCN-SBULSTM is also the
only model that can obtain an MAE lower than 8.00 in
Task 1 and 8.40 in Task 2. The result of ablated models
further confirms the effectiveness of each graph used in GCN-
SBULSTM: compared with the result of GCN-SBULSTM,
GCN-SBULSTM w/o OD obtained a lower accuracy, i.e.,
MAE of 8.02 and 8.42, which proves the significance of in-
corporating dynamic spatio-temporal relationship in the GCN
module; the accuracy further decreased in GCN-SBULSTM
w/o dist, of which the MAE is 8.04 and 8.44, indicating
the positive effect of travel distance graph in GCN module;
compared to other ablated models, GCN-SBULSTM w/o K-
hop has the worst performance in two tasks with MAE values
of 8.06 and 8.46, respectively. However, even though accuracy
decreased in these ablated models, they still outperform the
other tested models, validating the general effectiveness of the
model design.
As shown in Table II and III, CNN and GCN are the most
efficient models amongst all tested models, while the proposed
GCN-SBULSTM achieves competitive training efficiency in
terms of its average training time. Specifically, GCN-LSTM
only requires 3s and 4s respectively for task 1 and 2, which is
slightly higher than all basic architectures, including LSTM,
CNN and GCN, and some advanced models, including SBUL-
STM, CNN-LSTM, and DMVST-Net. In contrast, PVCGN,
the second-best model in terms of prediction accuracy in two
tasks, is the least efficient model taking 264s and 530s per
epoch for task 1 and 2, which are over 10 times longer
than DCRNN and 100 times longer than the proposed GCN-
SBULSTM. In summary, the GCN-SBULSTM is significantly
efficient considering its high accuracy among other advanced
models, which benefits the process of parameter tuning and
its migration to different tasks.
To illustrate the influence of different values of Kon the
accuracy, RMSE and MAE values with respect to different K
ranging from 1 to 10 are plotted in Figure 6 for task 1 and
2. It can be seen that RMSE and MSE generally start with a
high value, then gradually decrease to its minimum at K= 6
and K= 7 for task 1 and 2, respectively, and finally increase
as Kbecomes larger. Notably that lines in Figure 6 are not
ideally smooth, which can be explained as the uncertainty
introduced by some random factors during the training process,
such as parameter initialization and dropout mechanism. The
general tendency shown in Figure 6 prove that: 1) except for
adjacent stations, the spatial dependencies among indirectly
connected stations to some extent also have positive influences
on building up effective prediction model; 2) an overestimated
Keven lead to negative effects.
B. Benchmark tests on HZMetro and SHMetro
There are some modifications in the setup of GCN-
SBULSTM in this section. Since no station and dynamic OD
information is provided in the original paper for HZMetro and
SHMetro, we make some alternatives to the travel distance
graph and population graph for GCN-SBULSTM. For travel
distance graph, we compute the number of hops between each
pair of stations using the Physical graph (i.e. adjacency graph)
and input the result to Equation 4 to generate a “fake” distance
weight matrix; as for the population graph, we calculate a
single overall population flow graph based on the Correlation
graph in [22]. Also, as the information about metro stations is
not provided for HZMetro and SHMetro, we cannot generate
the traffic images required in CNN-based models. Therefore,
CNN, SRCN and DMVST-Net will not be compared during
the benchmark tests on HZMetro and SHMetro.
The performances of different models on HZMetro and
SHMetro are summarized in Table IV and V, respectively.
The general tendency is similar to the previous experimental
results on SZMetro. GCN has the worst performance on
these two datasets as it only captures spatial dependency for
prediction. Surprisingly, STGCN and Graph-WaveNet become
even worse than LSTM, especially for the prediction at 60 min,
further indicating the poor performance of parallel structure
on ridership data with large time intervals, in which temporal
regularity is more significant and dominant than those with
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 10
TABLE IV
RES ULTS F OR HZMET RO
Time Metric LSTM GCN SBULSTM DCRNN STGCN Graph-WaveNet PVCGN GCN-SBULSTM (K=6)
15min
MAE 23.43 23.94 24.31 23.24 23.86 23.50 22.20 22.22
RMSE 40.13 42.89 42.67 41.43 45.03 41.88 38.12 39.83
MAPE 14.41% 14.74% 14.51% 13.65% 12.48% 13.77% 13.15% 13.16%
30min
MAE 24.38 25.99 24.75 25.78 26.07 24.75 23.13 22.84
RMSE 42.33 47.06 43.73 43.23 49.16 43.70 40.00 41.08
MAPE 15.54% 16.87% 15.04% 15.32% 13.72% 15.68% 13.87% 13.76%
45min
MAE 25.33 29.23 25.45 26.23 28.52 25.87 23.95 23.53
RMSE 44.50 53.70 45.49 46.97 52.58 46.50 41.21 42.45
MAPE 17.18% 20.06% 15.72% 16.01% 15.18% 16.77% 14.89% 14.61%
60min
MAE 26.74 33.44 26.46 27.15 31.47 27.85 24.55 24.58
RMSE 47.90 62.39 47.07 48.56 59.74 48.69 42.26 44.48
MAPE 19.88% 27.62% 17.42% 18.64% 16.95% 20.45% 16.35% 15.73%
TABLE V
RESULTS FOR SHMETRO
Time Metric LSTM GCN SBULSTM DCRNN STGCN Graph-WaveNet PVCGN GCN-SBULSTM (K=8)
15min
MAE 23.50 24.21 23.16 23.34 23.84 23.75 22.85 22.75
RMSE 47.08 49.20 45.31 47.24 47.18 47.73 45.47 46.09
MAPE 20.23% 21.05% 17.40% 18.02% 18.71% 20.23% 16.95% 16.50%
30min
MAE 24.50 25.75 24.17 25.33 26.99 27.12 24.16 23.77
RMSE 49.63 52.34 48.39 51.31 57.40 54.15 50.18 49.04
MAPE 22.64% 24.26% 18.52% 19.12% 19.41% 21.42% 18.83% 17.62%
45min
MAE 25.59 28.64 25.38 27.65 30.81 29.23 25.45 25.02
RMSE 53.35 57.48 53.67 57.21 67.61 60.10 54.84 52.89
MAPE 24.39% 29.36% 20.07% 20.42% 20.46% 22.64% 18.83% 18.95%
60min
MAE 26.87 31.60 26.41 29.01 33.82 31.56 26.37 25.87
RMSE 56.53 63.24 59.27 63.32 77.00 68.10 58.49 55.41
MAPE 26.16% 34.25% 21.45% 21.52% 23.69% 24.92% 19.67% 20.12%
short time intervals. DCRNN obtains satisfactory accuracy
with MAE of 23.24 for HZMetro and 23.34 for SHMetro at
the first time interval. However, with the increment of time, the
accuracy of DCRNN dramatically decreases. PVCGN achieves
competitive accuracy, especially on HZMetro, and in terms of
RMSE, PVCGN is always the best one on HZMetro, indicating
its advantage in reducing outlying predictions.
Compared with other models, the proposed GCN-
SBULSTM achieves the best accuracy in terms of MAE
at 30min and 45min on HZMetro (K=6) and significantly
outperforms the other models with a large margin on
SHMetro (K=8). RMSE values obtained by GCN-SBULSTM
on SHMetro are also improved and surpass those of PVCGN
in most cases except for the first interval. Given the lack
of station information and the dynamic changes of popula-
tion flow in this experiment, we believe the performance of
GCN-SBULSTM on HZMetro and SHMetro can be further
improved if necessary data are available.
C. Results analysis
1) Connotation of optimal K:According to previous exper-
iments, a K-hop matrix is proved to preserve more comprehen-
sive spatial dependencies than the traditional adjacency matrix
and thus promoting the performance of the GCN module.
However, an overestimated Kis prone to having negative
effects on prediction accuracy. To explain such observation and
investigate the connotation of optimal K, statistical analyses
are performed on each experimental dataset.
As shown in Figure 7 (a) to (c), the number of station pairs
rapidly rises as Kincreases, while the average correlation
(d) (e) (f)
K = 7 K = 9
K = 6
(a) (b) (c)
Fig. 7. Influence of Kon station pairs and their average correlation.
curve (Figure 7 (d) to (f)), which is computed as the average
absolute Pearson correlation coefficient among station pairs,
dramatically declines within the first few steps and finally
reaches a stable state. This indicates that the correlation of
ridership, either positive or negative, becomes less significant
between high-order-neighboured stations.
Given the above observation, an appropriate Kis expected
to balance the number of station pairs and the significance
of the correlation between them. In that sense, we com-
pute the elbow point of each curve in Figure 7 (d) to (f).
Mathematically, the elbow point is defined as the point with
maximum curvature on a curve [51]. In our cases, the elbow
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 11
(a) (b)
Fig. 8. Violin plots of the relationship between event ϕand average ridership.
point refers to a cutoff Kvalue so that adding higher-order-
neighboured stations does not result in better capture of spatial
correlation. It is found that the elbow point is 7, 9 and 6
for SZMetro, SHMetro and HZMetro, which is very close
to the optimal values in previous experiments, i.e. 6 and
7 for two tasks on SZMetro, 8 for SHMetro, and 6 for
HZMetro. This observation largely explains the underlying
rationality of optimal Kin each prediction task. Also, the
elbow point of average correlation curve can be used as an
important reference for selecting optimal Kwhen training
GCN-SBULSTM.
2) Visulization analysis:To further illustrate the advan-
tages of GCN-SBULSTM, a visualization analysis is con-
ducted on HZMetro and SHMetro by taking PVCGN as the
control method. Since ridership volume is proved to be a
critical factor affecting prediction accuracy [22], we focus on
exploring the performance of GCN-SBULSTM for different
ridership volumes. We first define an event ϕfor each station,
where ϕ= 0 when PVCGN obtains a lower MAE than GCN-
SBULSTM; otherwise, ϕ= 1. A violin plot is used to depict
the distribution of ϕas well as its relationship to the average
ridership. As shown in Figure 8, the violins with ϕ= 1 are
much ’fatter’ than those with ϕ= 0, which means that, for
most stations of SHMetro and HZMetro, GCN-SBULSTM
achieves lower MAE than PVCGN. Moreover, the dash lines
inside violins with ϕ= 1, which refer to quartiles of the
distribution, are generally lower than those with ϕ= 0,
indicating that GCN-SBULSTM is more suitable for low-
ridership stations.
Three instances are further selected from SHMetro to
illustrate the performance of GCN-SBULSTM on different
ridership volumes. As shown in Figure 9 (a), GCN-SBULSTM
performs well in capturing the overall trend as well as nar-
row fluctuation in low ridership, while PVCGN is likely to
overestimate the ridership in many cases. As for the station
with high ridership in Figure 9 (b), GCN-SBULSTM produces
more accurate predictions in most cases but does not fully
capture the marked drastic fluctuation. In comparison, PVCGN
seems to be more sensitive to this kind of fluctuation, but
overestimation can be still easily observed. Furthermore, such
sensitivity of PVCGN might be also invalid and introduce
uncertainty, such as the significant bias and miss of fluctuation
marked in Figure 9 (c).
In summary, PVCGN seems to provide a radical prediction,
which sometimes overreacts to ridership fluctuation and prone
to producing overestimated results. In contrast, the proposed
GCN-SBULSTM achieves a higher prediction accuracy than
PVCGN in most instances and can better handle the fluctua-
tions especially for low-ridership stations.
VI. CONCLUSION AND DISCUSSION
Metro ridership forecasting is a fundamental issue in mod-
ern public transportation management. Focusing on improving
the accuracy of metro ridership forecasting, this study proposes
GCN-SBULSTM, a novel deep learning network with a par-
allel structure concatenating GCN and SBULSTM modules.
In the GCN module, a novel K-hop weight matrix, which
integrates adjacency, travel distance, and population flow, is
introduced to capture comprehensive spatial correlation within
a metro network. GCN-SBULSTM inherits both the merits of
GCN and SBULSTM, and the parallel structure helps preserve
most independence of spatial and temporal information, thus
reduce the uncertainty caused by their interference.
According to the results on three real-world datasets,
the proposed GCN-SBULSTM outperform the state-of-the-art
models in terms of its high accuracy and training efficiency.
The slightly poorer performance of ablated models verified
the effectiveness of using both physical and virtual graphs in
improving the overall accuracy. Additionally, in comparison
with CNN-based models, the higher accuracy obtained by
GCN-based models indicates the superiority of treating traffic
network as a graph than a simple 2D image in network-
related traffic forecasting tasks. Last but not least, the relatively
lower accuracy of STGCN and Graph-WaveNet verifies the
hypothesis that the parallel structure can preserve, at the most
extent, the integrity of both spatial and temporal dependencies
for better prediction.
Improvements can be made in future work. One issue is
to incorporate more factors, such as weather condition and
holiday events which may correlate with ridership patterns, to
enhance the prediction model. Apart from that, it is notable
that the proposed model is only applied for inbound rider-
ship prediction. As passengers’ outbound preference highly
depends on time schedules and the functional zone where a
metro station locates, it is not sufficient to accurately forecast
outbound ridership using solely the number of previous trips
in a time-series form, especially for a short time interval.
However, provided accurate time schedules and other auxiliary
information that can support the diagnose of passengers’ pref-
erence, the fundamental idea of GCN-SBULSTM is expected
to apply to outbound ridership prediction as well with further
improvement.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 12
Timeinterval
Ridership
Timeinterval Timeinterval
(a) (b) (c)
Fig. 9. Snapshot of three prediction instances. Station #277 and #12 are respectively of the lowest and highest average ridership in SHMetro, Station #7 is a
more general instance with moderate average ridership.
REFERENCES
[1] C. Ding, D. Wang, X. Ma, H. L. Sustainability, and undefined 2016,
“Predicting short-term subway ridership and prioritizing its influential
factors using gradient boosting decision trees,” mdpi.com. [Online].
Available: https://www.mdpi.com/2071-1050/8/11/1100
[2] S. Derrible and C. Kennedy, “Evaluating, Comparing, and Improving
Metro Networks: Application to Plans for Toronto, Canada,
Transportation Research Record: Journal of the Transportation
Research Board, vol. 2146, no. 1, pp. 43–51, jan 2010. [Online].
Available: http://journals.sagepub.com/doi/10.3141/2146-06
[3] Y. Lv, Y. Duan, W. Kang, Z. L. I. T. on . . . , and undefined
2014, “Traffic flow prediction with big data: a deep learning
approach,” ieeexplore.ieee.org. [Online]. Available: https://ieeexplore.
ieee.org/abstract/document/6894591/
[4] J. Zhang, Y. Zheng, D. Qi, R. Li, and X. Yi, “Dnn-based
prediction model for spatio-temporal data,” in Proceedings of the
24th ACM SIGSPATIAL International Conference on Advances in
Geographic Information Systems, ser. SIGSPACIAL ’16. New
York, NY, USA: ACM, 2016, pp. 92:1–92:4. [Online]. Available:
http://doi.acm.org/10.1145/2996913.2997016
[5] X. Ma, J. Zhang, B. Du, C. Ding, and L. Sun, “Parallel architecture
of convolutional bi-directional lstm neural networks for network-wide
metro ridership prediction,” IEEE Transactions on Intelligent Trans-
portation Systems, vol. 20, no. 6, pp. 2278–2288, June 2019.
[6] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,Nature,
vol. 521, no. 7553, pp. 436–444, may 2015. [Online]. Available:
http://www.nature.com/articles/nature14539
[7] S. Shekhar and B. M. Williams, “Adaptive seasonal time series
models for forecasting short-term traffic flow,” Transportation Research
Record, vol. 2024, no. 1, pp. 116–125, 2007. [Online]. Available:
https://doi.org/10.3141/2024-14
[8] X. Li, G. Pan, Z. Wu, G. Qi, S. Li, D. Zhang, W. Zhang, and Z. Wang,
“Prediction of urban human mobility using large-scale taxi traces and its
applications,” Frontiers of Computer Science, vol. 6, no. 1, pp. 111–121,
Feb 2012.
[9] L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, and
L. Damas, “Predicting taxi-passenger demand using streaming data,”
IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 3,
pp. 1393–1402, 2013.
[10] R. Yu, Y. Li, C. Shahabi, U. Demiryurek, and Y. Liu, Deep Learning:
A Generic Approach for Extreme Condition Traffic Forecasting, pp.
777–785. [Online]. Available: https://epubs.siam.org/doi/abs/10.1137/1.
9781611974973.87
[11] Z. Cui, R. Ke, and Y. Wang, “Deep bidirectional and unidirectional
LSTM recurrent neural network for network-wide traffic speed
prediction,” CoRR, vol. abs/1801.02143, 2018. [Online]. Available:
http://arxiv.org/abs/1801.02143
[12] R. Fu, Z. Zhang, and L. Li, “Using lstm and gru neural network
methods for traffic flow prediction,” in 2016 31st Youth Academic Annual
Conference of Chinese Association of Automation (YAC). IEEE, 2016,
pp. 324–328.
[13] X. Cheng, R. Zhang, J. Zhou, and W. Xu, “Deeptransport: Learning
spatial-temporal dependency for traffic condition forecasting,” in 2018
International Joint Conference on Neural Networks (IJCNN). IEEE,
2018, pp. 1–8.
[14] X. Ma, Z. Dai, Z. He, J. Ma, Y. Wang, and Y. Wang, “Learning
traffic as images: A deep convolutional neural network for large-scale
transportation network speed prediction,” Sensors, vol. 17, no. 4, 2017.
[Online]. Available: https://www.mdpi.com/1424-8220/17/4/818
[15] J. Zhang, Y. Zheng, and D. Qi, “Deep spatio-temporal residual
networks for citywide crowd flows prediction,” in Proceedings of
the Thirty-First AAAI Conference on Artificial Intelligence, ser.
AAAI’17. AAAI Press, 2017, pp. 1655–1661. [Online]. Available:
http://dl.acm.org/citation.cfm?id=3298239.3298479
[16] Z. Xie, W. Lv, S. Huang, Z. Lu, B. Du, and R. Huang, “Sequential graph
neural network for urban road traffic speed prediction,IEEE Access,
vol. 8, pp. 63 349–63 358, 2019.
[17] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehen-
sive survey on graph neural networks,arXiv preprint arXiv:1901.00596,
2019.
[18] Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion Convolutional Recurrent
Neural Network: Data-Driven Traffic Forecasting,” arXiv:1707.01926
[cs, stat], Feb. 2018, arXiv: 1707.01926. [Online]. Available:
http://arxiv.org/abs/1707.01926
[19] L. Zhao, Y. Song, C. Zhang, Y. Liu, P. Wang, T. Lin, M. Deng, and H. Li,
“T-gcn: A temporal graph convolutional network for traffic prediction,”
IEEE Transactions on Intelligent Transportation Systems, 2019.
[20] G. Jin, Y. Cui, L. Zeng, H. Tang, Y. Feng, and J. Huang, “Urban ride-
hailing demand prediction with multiple spatio-temporal information fu-
sion network,” Transportation Research Part C: Emerging Technologies,
vol. 117, p. 102665, 2020.
[21] B. Lu, X. Gan, H. Jin, L. Fu, and H. Zhang, “Spatiotemporal adaptive
gated graph convolution network for urban traffic flow forecasting,” in
Proceedings of the 29th ACM International Conference on Information
& Knowledge Management, 2020, pp. 1025–1034.
[22] L. Liu, J. Chen, H. Wu, J. Zhen, G. Li, and L. Lin, “Physical-Virtual
Collaboration Modeling for Intra-and Inter-Station Metro Ridership
Prediction,” arXiv:2001.04889 [cs], Jun. 2020, arXiv: 2001.04889.
[Online]. Available: http://arxiv.org/abs/2001.04889
[23] Z. Cui, K. Henrickson, R. Ke, and Y. Wang, “Traffic Graph
Convolutional Recurrent Neural Network: A Deep Learning Framework
for Network-Scale Traffic Learning and Forecasting,IEEE Transactions
on Intelligent Transportation Systems, pp. 1–12, 2019. [Online].
Available: https://ieeexplore.ieee.org/document/8917706/
[24] G. Jin, Y. Cui, L. Zeng, H. Tang, Y. Feng, and J. Huang, “Urban ride-
hailing demand prediction with multiple spatio-temporal information fu-
sion network,” Transportation Research Part C: Emerging Technologies,
vol. 117, p. 102665, 2020.
[25] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in Neural Infor-
mation Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and
K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1097–1105.
[26] S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face
recognition: a convolutional neural-network approach,IEEE Trans.
Neural Networks, vol. 8, no. 1, pp. 98–113, 1997. [Online]. Available:
https://doi.org/10.1109/72.554195
[27] H. Yu, Z. Wu, S. Wang, Y. Wang, and X. Ma, “Spatiotemporal
recurrent convolutional networks for traffic prediction in transportation
networks,” Sensors, vol. 17, no. 7, 2017. [Online]. Available:
https://www.mdpi.com/1424-8220/17/7/1501
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 13
[28] H. Yao, F. Wu, J. Ke, X. Tang, Y. Jia, S. Lu, P. Gong, J. Ye, and Z. Li,
“Deep multi-view spatial-temporal network for taxi demand prediction,
in AAAI, 2018.
[29] Z. Wu, S. Pan, G. Long, J. Jiang, and C. Zhang, “Graph WaveNet
for Deep Spatial-Temporal Graph Modeling,arXiv:1906.00121
[cs, stat], May 2019, arXiv: 1906.00121. [Online]. Available:
http://arxiv.org/abs/1906.00121
[30] B. Yu, H. Yin, and Z. Zhu, “Spatio-Temporal Graph Convolutional
Networks: A Deep Learning Framework for Traffic Forecasting,”
Proceedings of the Twenty-Seventh International Joint Conference on
Artificial Intelligence, pp. 3634–3640, Jul. 2018, arXiv: 1709.04875.
[Online]. Available: http://arxiv.org/abs/1709.04875
[31] B. Du, X. Hu, L. Sun, J. Liu, Y. Qiao, and W. Lv, “Traffic demand
prediction based on dynamic transition convolutional neural network,
IEEE Transactions on Intelligent Transportation Systems, 2020.
[32] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
convolutional networks,CoRR, vol. abs/1609.02907, 2016. [Online].
Available: http://arxiv.org/abs/1609.02907
[33] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and
locally connected networks on graphs,” in 2nd International Conference
on Learning Representations, ICLR 2014, Banff, AB, Canada, April
14-16, 2014, Conference Track Proceedings, 2014. [Online]. Available:
http://arxiv.org/abs/1312.6203
[34] M. Henaff, J. Bruna, and Y. LeCun, “Deep convolutional networks
on graph-structured data,” CoRR, vol. abs/1506.05163, 2015. [Online].
Available: http://arxiv.org/abs/1506.05163
[35] W. R. Tobler, “A computer movie simulating urban growth in the detroit
region,” Economic geography, vol. 46, no. sup1, pp. 234–240, 1970.
[36] X. Ma, J. Zhang, C. Ding, and Y. Wang, “A geographically and tempo-
rally weighted regression model to explore the spatiotemporal influence
of built environment on transit ridership,Computers, Environment and
Urban Systems, vol. 70, pp. 113–124, 2018.
[37] H. Yang, X. Lu, C. Cherry, X. Liu, and Y. Li, “Spatial variations
in active mode trip volume at intersections: a local analysis utilizing
geographically weighted regression,” Journal of transport geography,
vol. 64, pp. 184–194, 2017.
[38] G. Kusano, Y. Hiraoka, and K. Fukumizu, “Persistence weighted gaus-
sian kernel for topological data analysis,” in International Conference
on Machine Learning, 2016, pp. 2004–2013.
[39] S. Raveau, J. C. Mu˜
noz, and L. De Grange, “A topological route choice
model for metro,” Transportation Research Part A: Policy and Practice,
vol. 45, no. 2, pp. 138–147, 2011.
[40] D. An, X. Tong, K. Liu, and E. H. Chan, “Understanding the impact of
built environment on metro ridership using open source in shanghai,
Cities, vol. 93, pp. 177–187, 2019.
[41] Y. Gong, Y. Lin, and Z. Duan, “Exploring the spatiotemporal structure
of dynamic urban space using metro smart card records,” Computers,
Environment and Urban Systems, vol. 64, pp. 169–183, jul 2017.
[Online]. Available: https://www.sciencedirect.com/science/article/pii/
S0198971516301089
[42] K. Greff, R. K. Srivastava, J. Koutn ˜
Ak, B. R. Steunebrink, and
J. Schmidhuber, “Lstm: A search space odyssey,” IEEE Transactions
on Neural Networks and Learning Systems, vol. 28, no. 10, pp. 2222–
2232, Oct 2017.
[43] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[49] A. Ray, S. Rajeswar, and S. Chaudhury, “Text recognition using deep
blstm networks,” in 2015 Eighth International Conference on Advances
in Pattern Recognition (ICAPR), Jan 2015, pp. 1–6.
[44] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural net-
works,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp.
2673–2681, Nov 1997.
[45] A. Graves, S. Fern´
andez, and J. Schmidhuber, “Bidirectional lstm net-
works for improved phoneme classification and recognition,” in Artificial
Neural Networks: Formal Models and Their Applications – ICANN 2005,
W. Duch, J. Kacprzyk, E. Oja, and S. Zadro˙
zny, Eds. Berlin, Heidelberg:
Springer Berlin Heidelberg, 2005, pp. 799–804.
[46] A. Graves and J. Schmidhuber, “Framewise phoneme classification
with bidirectional lstm and other neural network architectures,” Neural
networks, vol. 18, no. 5-6, pp. 602–610, 2005.
[47] A. Graves, A. Mohamed, and G. Hinton, “Speech recognition with deep
recurrent neural networks,” in 2013 IEEE International Conference on
Acoustics, Speech and Signal Processing, May 2013, pp. 6645–6649.
[48] A. Graves, N. Jaitly, and A. Mohamed, “Hybrid speech recognition with
deep bidirectional lstm,” in 2013 IEEE Workshop on Automatic Speech
Recognition and Understanding, Dec 2013, pp. 273–278.
[50] T. Chai and R. R. Draxler, “Root mean square error (rmse) or mean
absolute error (mae)?–arguments against avoiding rmse in the literature,
Geoscientific model development, vol. 7, no. 3, pp. 1247–1250, 2014.
[51] Q. Zhao, V. Hautamaki, and P. Fr¨
anti, “Knee point detection in bic
for detecting the number of clusters,” in International conference on
advanced concepts for intelligent vision systems. Springer, 2008, pp.
664–673.
Pengfei CHEN Pengfei Chen received the B.S., M.S., and Ph.D. degrees from
Wuhan University in 2012, 2015 and 2019, respectively. He also received a
joint Ph.D. degree from the Hong Kong Polytechnic University in 2020. He
is currently an Assistant Professor with the School of Geospatial Engineering
and Science, Sun Yat-Sen University, Guangdong, China. His research inter-
ests include human mobility modeling, geospatial artificial intelligence and
spatial data uncertainty.
Xuandi FU Xuandi Fu received the B.S. degree from the Hong Kong
Polytechnic University in 2017. She is currently pursuing the M.S. degree with
the Department of Electrical and Computer Engineering at Carnegie Mellon
University, USA. Her research interests include natural language processing,
graph convolutional neural networks, human mobility modeling and spatial
data analytics.
Xue WANG Xue Wang received the B.S. and M.S. degrees from Peking
University in 2012 and 2015, respectively, and the Ph.D. degree from the
Chinese University of Hong Kong in 2019. She currently works as an
Assistant Professor with the School of Geospatial Engineering and Science,
Sun Yat-Sen University, Guangdong, China. Her research interests include
urban informatics and change detection.
... In terms of temporal dimension, the popularly used methods include, but are not limited to, RNN-based [17][18][19] and Convolutional Neural Network (CNN)-based [20,21]. In terms of spatial dimension, the aggregation methods can be broadly classified into three types: CNN-based [17][18][19][20][21][22], Graph Convolutional Network (GCN)-based [23][24][25][26][27][28][29][30][31][32][33][34][35] and attention-based methods [36][37][38][39][40][41][42][43][44]. Despite these achievements, the following challenges remain in achieving accurate traffic forecasts. ...
... To address the limitations of CNN in spatial modeling, some studies utilized GCN instead of CNN to model spatial correlation by treating the traffic network as a topological graph and aggregating the node information to capture its spatial dependence through graph convolution operation [23][24][25][26][27][28][29][30][31][32][33][34][35]. However, the performance of traditional GCN-based methods is closely related to the nature of their graph structure [25]. ...
... Xu et al. constructed a graph tensor based on the connectivity of different traffic locations and derived a factorized tensor graph convolution operation to filter its spectral filters along the spatial and temporal axes, respectively, to capture the complex spatial-temporal correlations [28]. In addition, considering the potential spatial correlation implied by traffic data, some studies have attempted to construct multiple graphs to characterise the spatial correlation [30][31][32][33][34][35]. For example, Tang et al. constructed a geolocation-based neighborhood graph and a similarity graph based on POI data to represent the spatial relationships between regions within the study area [31]. ...
Article
Full-text available
In the era of data‐driven transportation development, traffic forecasting is crucial. Established studies either ignore the inherent spatial structure of the traffic network or ignore the global spatial correlation and may not capture the spatial relationships adequately. In this work, a Dynamic Spatial‐Temporal Network (DSTN) based on Joint Latent Space Representation (JLSR) is proposed for traffic forecasting. Specifically, in the spatial dimension, a JLSR network is developed by integrating graph convolution and spatial attention operations to model complex spatial dependencies. Since it can adaptively fuse the representation information of local topological space and global dynamic space, a more comprehensive spatial dependency can be captured. In the temporal dimension, a Stacked Bidirectional Unidirectional Gated Recurrent Unit (SBUGRU) network is developed, which captures long‐term temporal dependencies through both forward and backward computations and superimposed recurrent layers. On these bases, DSTN is developed in an encoder‐decoder framework and periodicity is flexibly modeled by embedding branches. The performance of DSTN is validated on two types of real‐world traffic flow datasets, and it improves over baselines.
... The spatial representation generally deals with grid data or graph data. Scholars usually employ convolution neural networks to learn the feature of grid data and learn complex and dynamic spatial dependencies of graph data through graph neural networks or their variants [35][36][37][38][39][40][41][42]. The temporal representation treats time as sequence data, researchers usually utilize RNN [43], TCN [44], Causal TCN or their variants [45,46]. ...
... Thus, many researchers mainly focus on traffic graph construction. We summarize traffic graph as static graphs [35][36][37][38][39], virtual graphs [35,[39][40][41][42]51], hierarchical graph [53,54,57], or dynamic graphs [55,56,[58][59][60][61], which are discussed in this survey and the example is shown in Fig. 5. ...
... where A t ij means an element in adjacency matrix at time t, v i and v j are different nodes in the graph. The different works treat the static graph as different names, such as physic graph [35], neighbourhood graph [36,38] and adjacency graph [37,41,42]. 2. Distance Graph: The traffic patterns of adjacent stations can be highly correlated. ...
Article
Full-text available
With the exponential increase in the urban population, urban transportation systems are confronted with numerous challenges. Traffic congestion is common, traffic accidents happen frequently, and traffic environments are deteriorating. To alleviate these issues and improve the efficiency of urban transportation, accurate traffic forecasting is crucial. In this study, we aim to provide a comprehensive overview of the overall architecture of traffic forecasting, covering aspects such as traffic data analysis, traffic data modeling, and traffic forecasting applications. We begin by introducing existing traffic forecasting surveys and preliminaries. Next, we delve into traffic data analysis from traffic data collection, traffic data formats, and traffic data characteristics. Additionally, we summarize traffic data modeling from spatial representation, temporal representation, and spatio-temporal representation. Furthermore, we discuss the application of traffic forecasting, including traffic flow forecasting, traffic speed forecasting, traffic demand forecasting, and other hybrid traffic forecasting. To support future research in this field, we also provide information on open datasets, source resources, challenges, and potential research directions. As far as we know, this paper represents the first comprehensive survey that focuses specifically on the overall architecture of traffic forecasting.
... Furthermore, ref. [20] fuses multimodal information in a spatio-temporal model to explore regional correlations in the epidemic transmission process. Due to the inherent nature of spatio-temporal features, models from other domains can also be applied to epidemic forecasting, such as [23], which proposes adaptive adjacency matrices to learn the relationships between nodes in a graph; ref. [45] chooses to model the temporal and spatial dimensions in parallel since the complex mapping of serial neural network structures may cause the original spatio-temporal relationships to change; [46] combines neural ODE with GCN and proposes a tensor-based model that models the spatio-temporal dependencies simultaneously to avoid limiting the model representation capability. Nevertheless, traditional spatio-temporal models lacking physical information have difficultly fitting the potentially complex dynamics [47]. ...
Article
Full-text available
Accurate epidemic forecasting plays a vital role for governments to develop effective prevention measures for suppressing epidemics. Most of the present spatio–temporal models cannot provide a general framework for stable and accurate forecasting of epidemics with diverse evolutionary trends. Incorporating epidemiological domain knowledge ranging from single-patch to multi-patch into neural networks is expected to improve forecasting accuracy. However, relying solely on single-patch knowledge neglects inter-patch interactions, while constructing multi-patch knowledge is challenging without population mobility data. To address the aforementioned problems, we propose a novel hybrid model called metapopulation-based spatio–temporal attention network (MPSTAN). This model aims to improve the accuracy of epidemic forecasting by incorporating multi-patch epidemiological knowledge into a spatio–temporal model and adaptively defining inter-patch interactions. Moreover, we incorporate inter-patch epidemiological knowledge into both model construction and the loss function to help the model learn epidemic transmission dynamics. Extensive experiments conducted on two representative datasets with different epidemiological evolution trends demonstrate that our proposed model outperforms the baselines and provides more accurate and stable short- and long-term forecasting. We confirm the effectiveness of domain knowledge in the learning model and investigate the impact of different ways of integrating domain knowledge on forecasting. We observe that using domain knowledge in both model construction and the loss function leads to more efficient forecasting, and selecting appropriate domain knowledge can improve accuracy further.
Article
Full-text available
Accurate and real-time traffic forecasting plays an important role in the intelligent traffic system and is of great significance for urban traffic planning, traffic management, and traffic control. However, traffic forecasting has always been considered an "open" scientific issue, owing to the constraints of urban road network topological structure and the law of dynamic change with time. To capture the spatial and temporal dependences simultaneously, we propose a novel neural network-based traffic forecasting method, the temporal graph convolutional network (T-GCN) model, which is combined with the graph convolutional network (GCN) and the gated recurrent unit (GRU). Specifically, the GCN is used to learn complex topological structures for capturing spatial dependence and the gated recurrent unit is used to learn dynamic changes of traffic data for capturing temporal dependence. Then, the T-GCN model is employed to traffic forecasting based on the urban road network. Experiments demonstrate that our T-GCN model can obtain the spatio-temporal correlation from traffic data and the predictions outperform state-of-art baselines on real-world traffic datasets. Our tensorflow implementation of the T-GCN is available at https://github.com/lehaifeng/T-GCN.
Conference Paper
Full-text available
Spatial-temporal graph modeling is an important task to analyze the spatial relations and temporal trends of components in a system. Existing approaches mostly capture the spatial dependency on a fixed graph structure, assuming that the underlying relation between entities is pre-determined. However, the explicit graph structure (relation) does not necessarily reflect the true dependency and genuine relation may be missing due to the incomplete connections in the data. Furthermore, existing methods are ineffective to capture the temporal trends as the RNNs or CNNs employed in these methods cannot capture long-range temporal sequences. To overcome these limitations, we propose in this paper a novel graph neural network architecture, {Graph WaveNet}, for spatial-temporal graph modeling. By developing a novel adaptive dependency matrix and learn it through node embedding, our model can precisely capture the hidden spatial dependency in the data. With a stacked dilated 1D convolution component whose receptive field grows exponentially as the number of layers increases, Graph WaveNet is able to handle very long sequences. These two components are integrated seamlessly in a unified framework and the whole framework is learned in an end-to-end manner. Experimental results on two public traffic network datasets, METR-LA and PEMS-BAY, demonstrate the superior performance of our algorithm.
Article
Full-text available
Accurate speed predictions for urban roads are highly important for traffic monitoring and route planning, and also help relieve the pressure of traffic congestion. Many existing studies on traffic speed prediction are based on convolutional neural networks, and these have primarily focused on capturing the spatial proximity among different road segments. However, the real cause of the spread of traffic congestion is the connectivity of these road segments, rather than their spatial proximity. This makes it very challenging to improve the prediction accuracy. Using graph neural networks (GNNs), the connectivity of these road segments can be modeled as a graph in which the properties of road segments and the connections between them are embedded as the properties of the nodes and edges, respectively. This paper describes a novel approach that combines the advantages of sequence-to-sequence (Seq2Seq) models and GNNs. Specifically, the evolution of traffic conditions on road networks is modeled as a sequential of graphs. Thus, the proposed SeqGNN model represents both the inputs and outputs as graph sequences. Finally, extensive experiments using real-world datasets demonstrate the effectiveness of our approach and its advantages over state-ofthe-art methods.
Chapter
In this paper, we carry out two experiments on the TIMIT speech corpus with bidirectional and unidirectional Long Short Term Memory (LSTM) networks. In the first experiment (framewise phoneme classification) we find that bidirectional LSTM outperforms both unidirectional LSTM and conventional Recurrent Neural Networks (RNNs). In the second (phoneme recognition) we find that a hybrid BLSTM-HMM system improves on an equivalent traditional HMM system, as well as unidirectional LSTM-HMM.
Article
Due to the widespread applications in real-world scenarios, metro ridership prediction is a crucial but challenging task in intelligent transportation systems. However, conventional methods either ignore the topological information of metro systems or directly learn on physical topology, and cannot fully explore the patterns of ridership evolution. To address this problem, we model a metro system as graphs with various topologies and propose a unified Physical-Virtual Collaboration Graph Network (PVCGN), which can effectively learn the complex ridership patterns from the tailor-designed graphs. Specifically, a physical graph is directly built based on the realistic topology of the studied metro system, while a similarity graph and a correlation graph are built with virtual topologies under the guidance of the inter-station passenger flow similarity and correlation. These complementary graphs are incorporated into a Graph Convolution Gated Recurrent Unit (GC-GRU) for spatial-temporal representation learning. Further, a Fully-Connected Gated Recurrent Unit (FC-GRU) is also applied to capture the global evolution tendency. Finally, we develop a Seq2Seq model with GC-GRU and FC-GRU to forecast the future metro ridership sequentially. Extensive experiments on two large-scale benchmarks (e.g., Shanghai Metro and Hangzhou Metro) well demonstrate the superiority of our PVCGN for station-level metro ridership prediction. Moreover, we apply the proposed PVCGN to address the online origin-destination (OD) ridership prediction and the experiment results show the universality of our method. Our code and benchmarks are available at https://github.com/HCPLab-SYSU/PVCGN .
Article
Urban ride-hailing demand prediction is a long-term but challenging task for online car-hailing system decision, taxi scheduling and intelligent transportation construction. Accurate urban ride-hailing demand prediction can improve vehicle utilization and scheduling, reduce waiting time and traffic congestion. Existing traffic flow prediction approaches mainly utilize region-based situation awareness image or station-based graph representation to capture traffic spatial dynamic while we observe that combination of situation awareness image and graph representation are also critical for accurate forecasting. In this paper, we propose the Multiple Spatio-Temporal Information Fusion Networks (MSTIF-Net), a novel deep learning approach to better fuse multiple situation awareness information and graphs representation. MSTIF-Net model integrates structures of Graph Convolutional Neural Networks (GCN), Variational Auto-Encoders (VAE) and Sequence to Sequence Learning (Seq2seq) model to obtain the joint latent representation of urban ride-hailing situation that contain both Euclidean spatial features and non-Euclidean structural features, and capture the spatio-temporal dynamics. We evaluate the proposed model on two real-world large scale urban traffic datasets and the experimental studies demonstrate MSTIF-Net has achieved superior performance of urban ride-Hailing demand prediction compared with some traditional state-of-art baseline models.
Article
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications, where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on the existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this article, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art GNNs into four categories, namely, recurrent GNNs, convolutional GNNs, graph autoencoders, and spatial-temporal GNNs. We further discuss the applications of GNNs across various domains and summarize the open-source codes, benchmark data sets, and model evaluation of GNNs. Finally, we propose potential research directions in this rapidly growing field.
Article
Precise traffic demand prediction could help government and enterprises make better management and operation decisions by providing them with data-driven insights. However, it is a nontrivial effort to design an effective traffic demand prediction method due to the spatial and temporal characteristics of traffic demand distributions, dynamics of human mobility, and impacts of multiple environmental factors. To handle these problems, a Dynamic Transition Convolutional Neural Network (DTCNN) is proposed for the purpose of precise traffic demand prediction. Particularly, a transition network is first constructed according to the citiwide historical departure and arrival records, where the nodes are virtual stations discovered by a density-peak based clustering algorithm and the edges of two nodes correspond to transition flows of two stations. Then, a dynamic transition convolution unit is designed to model the spatial distributions of the traffic demands, and to capture the evolution of the demand dynamics. Last, a unifying learning framework is provided to incorporate the spatiotemporal states of the traffic demands with environmental factors. Experiments have been conducted on NYC taxi and bike-sharing data, and the results validate the effectiveness of the proposed method.
Article
A growing body of research using the direct demand model has explored the impact of the built environment on transit ridership. However, empirical studies identified various significant factors in different cities with different datasets. This study adopts points-of-interest (POIs) data to identify the physical environmental factors affecting metro ridership in Shanghai. Independent variables in terms of the rail transit system, external connectivity, intermodal connection, and land use factors within 286 metro stations' catchment areas were selected. Principal component analysis (PCA) was used to group POIs into 6 components for dimensionality reduction. The results from ordinary least squares (OLS) regression analysis emphasize the dominating role of commercial land use and rail transit system factors, together with bus stops, tourist spots and healthcare factors, positively impact both weekday and weekend metro ridership; however, the effect of job-related land use is significant only on weekdays. Distinctively, the variable of intersection density is not positively associated with ridership as expected, revealing that street network measurements may not explain walking to rail transit in the citywide Shanghai context, so we suggest a new requirement: a multilevel-based walkability index in dense cities. The latter finding also implied that residences in central locations are less reliable than those in suburban locations. Finally, we conclude with strategies to encourage balanced trip demands other than simply increasing ridership, which has potential implications on urban planning and transit-oriented development (TOD) in China.