ArticlePDF Available

A Graph Convolutional Stacked Bidirectional Unidirectional-LSTM Neural Network for Metro Ridership Prediction

March 2021
IEEE Transactions on Intelligent Transportation Systems PP(99):1-13

March 2021
PP(99):1-13

DOI:10.1109/TITS.2021.3065404

Authors:

Pengfei Chen

Sun Yat-Sen University

Xue Wang

Sun Yat-Sen University

Timely precise metro ridership forecasting is helpful to reveal real-time traffic demand, which is a crucial but challenging task in modern traffic management. Given the complex spatial correlation and temporal variation of riding behaviour in a metro system, deep learning algorithms have been widely applied owing to their superior performance in capturing spatio-temporal features. However, current deep learning models utilize regular convolutional operations, which can barely provide satisfactory accuracy due to either the ignorance of realistic topology of a traffic network or insufficiency in capturing representative spatiotemporal patterns. To further improve the accuracy in metro ridership prediction, this study proposes a parallel-structured deep learning model that consists of a Graph Convolution Network and a stacked Bidirectional unidirectional Long short-term Memory network (GCN-SBULSTM). The GCN module regards a metro network as a structured graph, and a K-hop matrix, which integrates the travel distance, population flow, and adjacency, is introduced to capture the dynamic spatial correlation among metro stations. The SBULSTM module considers both backward and forward states of ridership time series and can learn complex temporal features with stacked recurrent layers. Experiments are conducted on three real-life metro ridership datasets to demonstrate the effectiveness of the proposed model. Compared with state-of-the-art prediction models, GCN-SBULSTM presents better performance in multiple scenarios and largely enhances the efficiencies of training processes.

Illustration of graphs consisting the K-hop weight matrix used in the GCN module (K=2). *Extremely small value (i.e. smaller than 0.005) is shown as zero.

…

Architecture of the Proposed Graph Convolutional Stacked Bidirectional-LSTM Neural Network (GCN-SBULSTM)

…

Shenzhen metro network.

…

The input image of CNN.

…

Results of GCN-SBULSTM with different values of K on SZMetro

…

Figures - uploaded by Pengfei Chen

Content may be subject to copyright.

Content uploaded by Pengfei Chen

Content may be subject to copyright.

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 1

A Graph Convolutional Stacked Bidirectional

unidirectional-LSTM Neural Network for Metro

Ridership Prediction

Pengfei CHEN, Xuandi FU, Xue WANG

Abstract—Timely precise metro ridership forecasting is helpful

to reveal real-time trafﬁc demand, which is a crucial but challeng-

ing task in modern trafﬁc management. Given the complex spatial

correlation and temporal variation of riding behaviour in a

metro system, deep learning algorithms have been widely applied

owing to their superior performance in capturing spatio-temporal

features. However, current deep learning models utilize regular

convolutional operations, which can barely provide satisfactory

accuracy due to either the ignorance of realistic topology of a

trafﬁc network or insufﬁciency in capturing representative spa-

tiotemporal patterns. To further improve the accuracy in metro

ridership prediction, this study proposes a parallel-structured

deep learning model that consists of a Graph Convolution Net-

work and a stacked Bidirectional unidirectional Long short-term

Memory network (GCN-SBULSTM). The GCN module regards a

metro network as a structured graph, and a K-hop matrix, which

integrates the travel distance, population ﬂow, and adjacency,

is introduced to capture the dynamic spatial correlation among

metro stations. The SBULSTM module considers both backward

and forward states of ridership time series and can learn complex

temporal features with stacked recurrent layers. Experiments

are conducted on three real-life metro ridership datasets to

demonstrate the effectiveness of the proposed model. Compared

with state-of-the-art prediction models, GCN-SBULSTM presents

better performance in multiple scenarios and largely enhances

the efﬁciencies of training processes.

Index Terms—Deep learning model, trafﬁc prediction, spatio-

temporal dependency, origin-destination ﬂow, parallel structure.

I. INTRODUCTION

MULTI-SCALE precise trafﬁc forecasting is one of the

most fundamental and crucial tasks for urban trans-

portation control and management, where metro ridership

prediction has attracted increasing concerns from both the

academic community and authorized departments because of

the vital position of the subway in urban public transportation

system [1], [2]. It is a challenging task to make collaborative

spatial-temporal predictions for metro ridership due to the

complicated spatial structure of trafﬁc networks, temporal

Manuscript received ... The research is not funded by a speciﬁc project

grant. P. CHEN and X. FU contribute equally to this article. (Corresponding

author: Pengfei CHEN)

P. CHEN and X. WANG are with School of Geospatial Engineering

and Science, Sun Yat-Sen University, Guangzhou 510275, Guangdong,

China, and also with the Southern Marine Science and Engineering Guang-

dong Laboratory (Zhuhai), Zhuhai 519082, Guangdong, China. (email:

chenpf9@mail.sysu.edu.cn, wangxue25@mail.sysu.edu.cn)

X. FU is with Department of Electrical and Computer Engineering, The

Carnegie Mellon University, Pittsburgh, PA 15213, USA. (e-mail: xuan-

dif@andrew.cmu.edu)

variations, and uncertainty inherited from human behaviour.

Recently, owing to the rapid development of artiﬁcial intelli-

gence, computation power and abundant trafﬁc data supported

by novel collection and storage techniques, the booming deep

learning approaches have ﬂushed current prediction-related re-

search and promoted signiﬁcant progress in trafﬁc forecasting

[3]–[5].

Deep learning methods have been reported to outperform

traditional statistical models in many applications, especially

in time series forecasting [6]. Typical statistical models, such

as auto-regressive integrated moving average (ARIMA) [7]

and its variants [8], [9], are commonly adopted for single time

series prediction, while they ignore the potential dependency

among multiple time series under relatively complex trafﬁc

conditions. In contrast, deep learning approaches employ mul-

tiple processing layers and allow the models to learn abstracted

features and non-linear dependencies from large-scale trafﬁc

datasets, which makes deep learning methods as a major

solution in current trafﬁc forecasting.

Given the well-acknowledged performance in time series

forecasting, Recurrent Neural Networks (RNN) and its vari-

ants, such as Long Short-term Memory (LSTM) and Gated

Recurrent Unit (GRU), are widely employed in the mainstream

studies for trafﬁc forecasting [10]–[12]. However, RNN-based

models employ only the temporal features in travel behaviour,

while ignoring the underlying spatial dependencies within a

trafﬁc network [13]. To capture the spatial dependencies in

trafﬁc data, a batch of studies utilizes Convolutional neural

network (CNN) to build prediction models, in which a trafﬁc

network is commonly transformed into an image based on

its geographic locations [4], [14], [15]. However, CNN-based

models only consider the absolute distance relationship among

stations in 2D Euclidean space, while the non-Euclidean

structural features in trafﬁc networks, such as the connectivity,

is not fully learned. Also, due to the predeﬁned image size,

CNN-based models are prone to generating distorted spatial

relationships, which limits their adaption to the varying struc-

ture of trafﬁc networks in the real world [16].

Compared with CNN, Graph Convolutional Network (GCN)

provides a more feasible way to model spatial dependencies

within a trafﬁc network. Given the inherent graph structure

of a trafﬁc network, GCN is naturally capable of preserv-

ing realistic topology and capture the dependencies between

metro stations by aggregating nodes’ information through

graph convolution [17]. However, the effective construction

of graphs and the integration of GCN with existing neural

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2

networks remain as two open problems in current studies.

For the ﬁrst issue, the most relative works directly adapt

physical topologies within a network, such as the adjacency, to

build graphs [18]–[20]. Nevertheless, given the latent spatio-

temporal dependencies implied in trafﬁc data, such as the

travel distance and population ﬂow between trafﬁc sites, some

virtual graphs can be build up based on prior knowledge

to improve the effectiveness of GCN [21], [22]. As for the

integration of GCN, many relative studies combine GCN and

RNN models to build a joint prediction model, so as to capture

both spatial and temporal features for the forecasting problems.

For example, Cui et al. [23] designed an architecture that uses

the output of multiple GCNs as the input of LSTM for trafﬁc

speed prediction. Jin et at. [24] fused the output of GCN

and variational auto-encoder model and fed the result into a

Seq2seq GRU module to predict urban ride-hailing demand.

However, the extracted features based on such sequential

structure can be distorted when converting the convolution

results, which might lead to information loss and uncertain

predictions [5].

Based on the aforementioned problems, we propose a

parallel GCN and Stacked Bidirectional Unidirectional LSTM

model (GCN-SBULSTM) for metro ridership forecasting. In

GCN-SBULSTM, both physical topology and virtual graphs,

including adjacency, travel distance and population ﬂow

among metro stations, are used to construct the GCN mod-

ule, and a K-hop weight matrix is introduced to adaptively

determine the extent of neighbor information to be considered

in each graph. The SBULSTM module is used to handle

temporal dependencies, which is capable of capturing long-

term dependencies by considering both the backward and

forward correlations in ridership time series. This architecture

is expected to inherit the merits from both GCN in extracting

realistic spatial dependencies and SBULSTM in capturing

temporal features, while reducing their interference using a

parallel instead of a sequential structure.

In summary, the main contributions of this study include:

•Propose a new deep learning architecture composed of

two parallel modules considering both spatial and tem-

poral dependencies for metro ridership prediction;

•Design a novel K-hop weight matrix, which integrates ad-

jacency, travel distance, and population ﬂow among metro

stations, to represent metro networks, and incorporate the

matrix into the GCN module to enhance the extraction of

realistic spatial dependency;

•Integrate stacked bidirectional recurrent layers into the

model, which improves its ability in capturing long-term

context and generating a higher level of representation of

sequence data.

II. RE LATE D WO RK

A. Machine Learning for Trafﬁc Forecasting

In early machine learning problems for trafﬁc forecasting,

state data form different trafﬁc sites are always organized in

terms of their collection timestamps as a batch of time series,

and RNN-based models, such as GRU and LSTM, are widely

used given their ability in remembering important information

about the sequential input with its internal memory. For

instance, Yu et al. [10] combined deep LSTM and stacked

autoencoder to capture both the temporal and static features in

trafﬁc data for trafﬁc ﬂow forecasting, and experimental results

on real-world data showed that their model can signiﬁcantly

improve the predictive performance especially under extreme

conditions, such as peak-hour and post-accident scenarios.

Considering the periodicity of metro riding behaviour given

the regularity in human’s daily activities, Cui et al. [11]

designed a stacked bidirectional and unidirectional LSTM

framework, which concerned both forward and backward de-

pendencies of trafﬁc data, for trafﬁc speed prediction over the

whole urban trafﬁc networks. Those studies have demonstrated

the superiority of RNN in extracting temporal features for traf-

ﬁc forecasting. However, it is challenging to use solely RNN-

based models to maintain the spatial features and topological

information in trafﬁc data, which limits their effectiveness in

practical applications [13].

Noticing the promising achievement of CNN in computer

vision [25], [26], many studies generalize CNN to learn the

spatial dependencies in Euclidean space. For example, Zhang

et al. [4] proposed a deep neural network to predict citywide

crowd ﬂow, in which multiple CNN layers were applied on

trafﬁc demand heatmaps to extract spatial features. This model

was further developed in [15] by being integrated with resid-

ual learning to capture large-scale spatial dependencies. By

sequentially connecting CNN and LSTM networks, Yu et al.

[27] proposed a spatio-temporal recurrent convolution network

(SRCNs) for trafﬁc speed forecasting. Yao et al. [28] devel-

oped a Deep Multi-View Spatial-Temporal Network (DMVST-

Net) for taxi demand prediction, which jointly concerned the

spatial, temporal, and semantic relations using LSTM, CNN

and graph embedding, respectively. To reduce the interference

of sequentially connected LSTM and CNN modules, Ma et

al. [5] proposed a parallel CNN-BLSTM framework for metro

ridership prediction. The results also proved that the paral-

lel structure could signiﬁcantly improve prediction accuracy.

However, these models inherit the drawbacks of CNN that

ignores topology information within a trafﬁc network, which

inevitably hampers their performance in given the increasing

complexity of trafﬁc patterns [16].

B. Graph Convolution Networks

For the last few years, the emergence of GCN has refreshed

the way of modelling trafﬁc data. By treating a trafﬁc network

as a graph instead of the predeﬁned image in CNN, GCN

can largely preserve the realistic topological information and

thus beneﬁting the extraction of comprehensive spatial features

[17], [19]. Also, GCN can greatly preserve the globality of

metro networks through conducting convolution on the whole

structured graphs, which theoretically outperforms CNN that

can only capture neighbouring spatial pattern due to limited

kernel window size.

By combining with temporal dynamics, GCN-based models

have made signiﬁcant progress in trafﬁc forecasting problems.

For instance, Li et al. [18] proposed a diffusion convolutional

recurrent neural network (DCRNN), in which the trafﬁc ﬂow

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 3

was modelled as a diffusion process on a directed graph and

spatial dependency was captured using bidirectional random

walks on the graph. Wu et al. [29] developed a novel archi-

tecture named Graph-WaveNet, which adopted an adaptive ad-

jacency matrix through embedding technique to capture hidden

spatial dependencies. Yu et al. [30] combined graph convolu-

tion and gated temporal convolution to capture precise spatio-

temporal correlations for trafﬁc speed forecasting, which also

enhanced training efﬁciency with a reduced number of parame-

ters. Lu et al. [21] used a dynamic weighted graph to modelled

the road relationship and developed an adaptive graph gate

convolution network for trafﬁc ﬂow prediction. Rather than

the single physical topology in trafﬁc networks, some domain

knowledges also included in recent studies to guide graphs

construction. For example, Du et al. [31] extracted virtual

stations using a density-peak based clustering method and

developed a dynamic convolution neural network to predict

trafﬁc demands. Liu et al. [22] established a Physical-Virtual

Collaboration Graph Network (PVCGN), which incorporates

the connection among metro stations, ridership similarity, and

inter-station passenger ﬂow into a Graph Convolution Gated

Recurrent Unit for spatio-temporal dependency learning. How-

ever, due to the large number of parameters in PVGCN, its

efﬁciency is relatively low compared to other models.

III. METHODOLOGY

In this section, we formulize the learning problem of metro

ridership forecasting and elaborate on the motivation and

detailed steps for the construction and combination of the GCN

and SBULSTM module.

A. Metro ridership forecasting problem

Metro ridership forecasting is a fundamental spatio-

temporal prediction problem given the spatial correlation and

periodicity of people’s daily riding behaviour. Ridership data

of each metro station are commonly summarized using a

speciﬁc time interval and thus forming a batch of time series

for further operation. In this study, our goal of is to predict

the ridership in next mtime intervals given the historical data

in previous ntime intervals. Based on the observations from s

metro stations, the input data for our model can be expressed

as a matrix X:

X= [XT−n, XT−n+1,· · · , XT−1]

=





T−nx1

T−n+1 ... x1

T−1

T−nx2

T−n+1 ... x2

T−1

... ... ...

T−nxs

T−n+1 ... xs

T−1







(1)

where XT−iencodes the ridership vector measured at the ith

time intervals before timestamp T, and xjcorresponds to the

ridership data of jth station.

In addition to the raw ridership data, the metro network

can be represented by an undirected graph, G= (V, E )where

V,Edenotes the set of stations and lines in the network,

respectively. Vencodes the features of nodes, which in this

task refers to the ridership time series of each station. E

encloses all edges linking two nodes, of which the values can

be different based on raw data. For example, in a modern metro

system, the riding behaviour is always recorded using smart

card transaction logs, thereby some personal information, such

as the card ID and travel path, can be used as valuable

supporting information for E. For simple usage, we use I

to represent these additional data. Therefore, the forecasting

problem can be formulated as learning a function f:

[XT−n,· · · ,XT−1;G;I]f

−→ [XT,· · · ,XT+m−1](2)

The resultant prediction is denoted by ˆ

[XT,· · · , XT+m−1]in the rest of this paper, where each

element is a vector of the sstations’ ridership at a future

time step.

B. Foundation of the Graph Convolution Network Module

Currently, several strategies have been investigated to build

effective graphs for trafﬁc forecasting. Commonly employed

are adjacency matrix [32] and Laplacian matrix [33], [34].

GCN based on Laplacian matrix incorporates the spectral

theory to graph convolution, which is often named as spec-

tral graph convolution. The classic GCN encodes adjacency

relationship among nodes to represent arbitrarily structured

graphs. It normally utilizes a binary-encoded adjacency matrix

Ato denote the connectivity among nodes. If node iand jare

directly connected in the metro network, Aij = 1, otherwise

Aij = 0.

However, within collaborated spatial-temporal prediction,

spatial dependency should be considered dynamically, as it

could vary in different scenarios. For example, ridership of

distant stations may exhibit low correlation within a short

counting period, e.g., 10-minute interval, while the correlation

could signiﬁcantly increase with the length of a target predic-

tion interval due to the city-scale globality of passenger riding

behaviour. Therefore, simply applying a binary adjacency

matrix or predeﬁned stationary distance is not sufﬁcient to

handle complex scenarios.

To tackle aforementioned problems, we initialize the GCN

module by intuitively taking metro stations as nodes, and

deﬁne three graphs, including travel distance graph, population

ﬂow graph, adjacency graph, to weight the edges.

•Adjacency graph: The connection between metro sta-

tions is widely acknowledged to affect the relationship of

their ridership [18], [30]. However, traditional adjacency

matrix mostly focuses on the directly connected stations,

i.e. the 1-order neighboured stations, while the indirect

connection among stations has been ignored. Therefore,

as shown in the ﬁrst line of 1, a k-hop adjacency matrix

Akis adopted in this study to encode the direct and

indirect adjacency relationship among metro stations.

Given a constant K, each element Ak

ij should be 1 if

station iand jare K-order neighboured; otherwise, the

element is set to 0. mathematically:

ij =1, ∂ ≤k

0, otherwise (3)

where ∂denotes the least number of steps from station i

to j.

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 4

0.20 0.10 0.30 0.26 0.26 0.30

1 12220

21 3330

123220

1232 2 0.44

234331

0.15 0.23 0.35 0.22 0.25 0.36

1 12220

21 3330

123220

1232 2 0.41

234331

60 35 75 67 102 76

15 10 8 15 12 10

10 10 68015

30 8 613 10 35

25 15 8 13 11 23

45 12 010 11 63

33 2116625

30 20 55 56 80 63

15 10 8 15 12 8

10 10 6806

30 8 613 10 30

25 15 8 13 11 15

45 12 010 11 43

33 2116625

121112

1 12123

21 3234

123223

1222 23

12322 1

234331

111111

11111

11 1

11 11

1111 1

11 11 1

0 600 1300 300 1000 800 1600

600 0 700 900 1600 1400 2200

1300 700 0 1600 2300 2100 2900

300 900 1600 0 1300 1100 1900

1000 1600 2300 1300 0 1800 2600

800 1400 2100 1100 1800 0 800

1600 2200 2900 1900 2600 800 0

15 10 30 25 45 33

18 9 7 13 10 2

12 15 5902

33 9 516 12 19

25 15 8 13 11 6

44 16 011 12 24

35 2116820

0.16 0.22 0.34 0.24 0.36 0.39

0.16 0.27 0.12 0.19 0.17 0

0.2 0.27 00.21 00

0.34 0.12 00.21 0.15 0

0.24 0.19 0.21 0.21 0.14 0

0.36 0.17 00.15 0.14 0.31

0.39 00000.31

1 0.580.080.870.220.380.02

0.58 1 0.47 0.29 0.02 0.05 0

0.08 0.47 1 0.02 0 0 0

0.87 0.29 0.02 1 0.08 0.16 0

0.22 0.02 0 0.08 1 0.01 0

0.38 0.05 0 0.16 0.01 1 0.38

0.02 0 0 0 0 0.38 1

800

1000

800

300

600

700 2000

Selection

Gaussian kernel

Selection

K-hop

Selection

Normalization

(a) Adjacent graph

(b) Travel distance graph

(g) K-hop adjacency matrix

(h) Distance weight*

(i) Population flow weight(f) Series of OD matrix

(e) Travel distance matrix

(d) K-order neighbor matrix

Fig. 1. Illustration of graphs consisting the K-hop weight matrix used in the GCN module (K=2). *Extremely small value (i.e. smaller than 0.005) is shown

as zero.

•Travel distance graph: According to the First Law

of Geography and results from previous studies, trafﬁc

behaviours occurred closely are likely to be related [35]–

[37]. For instance, the ridership pattern of neighbouring

stations along metro lines can be highly correlated as

passengers within a region may have similar daily travel

pattern. Therefore, from the view of the geography, we

take the travel distance as an important factor during the

initialization of GCN module. An example is illustrated

in the second line of Figure 1. Speciﬁcally, following the

deﬁnition in [18], we calculate the distance weight matrix

Dusing a Gaussian Kernel [38], where the element Dij

is calculated as:

Dij = exp(−dist(vi, vj)2

σ2)(4)

where dist(vi, vj)indicates the shortest travel distance

along the metro network between station viand vj,σis

the standard deviation of travel distances.

•Population ﬂow graph: Population ﬂow is a virtual con-

nection between metro stations, which reﬂects their latent

dependencies implied by the regularity of passengers’

daily activity. A large population ﬂow should indicate a

relatively high dependency between metro stations [39],

[40]. However, as the population ﬂow temporally varies,

a series of population ﬂow matrixes are generated to

dynamically represent the dependency. As shown in the

third line of 1, we extract the origin-destination (OD)

ﬂows between metro station and generate the population

ﬂow matrix Fthrough normalization. Each element Fij

of matrix Fis calculated as:

Fij =1

2(Nji

Njk

+Nij

Nik

)(5)

where Nij is the number of passengers travelling from

station ito station jduring a speciﬁc timespan. In

this work, this timespan is deﬁned as the period from

the earliest historic frame to the last one. In addition,

considering the large number of stations in a common

metro system, we set Nij = 0 if Ak

ij = 0 to reduce the

interference from distant stations and enlarge the weights

of nearby stations with a large population exchange.

Finally, we deﬁne a k-hop weight matrix, Mk, which

integrates the graphs of K-hop adjacency, travel distance, and

population ﬂow for dynamically capturing spatial dependency

among stations. Mathematically:

Mk=FDAk(6)

where stands for element-wise multiplication, threshold k

should be treated as a hyperparameter in the model, thereby

ensuring the most signiﬁcant spatial correlation among stations

can be learned.

Based on the proposed K-hop matrix, the graph convolution

can be deﬁned as:

hl+1

g=g(hl

g, M k)(7)

g(hl

g, M k) = ReLU(Mkhl

gWl

g)(8)

ReLU(x) = max(0, x)(9)

where hl

g,hl+1

gare the input graph generated by the former

layer land the output graph at layer l+ 1, respectively. Wg

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 5

𝒙𝑻−(𝒏−𝟏)

𝒙𝑻−𝒏 𝒙𝑻−𝟏

𝒙𝑻−𝟐

…

LSTM LSTMLSTM LSTM LSTM

𝒉𝑻−(𝒏−𝟏)

𝒉𝑻−𝒏 𝒉𝑻−𝟏

𝒉𝑻−𝟐

…

×+

𝜎 𝜎 tanh 𝜎

tanh

𝑪𝒕−𝟏

𝒉𝒕−𝟏

𝒙𝒕

𝒉𝒕

𝑪𝒕

𝒉𝒕

𝒇𝒕𝒊𝒕෤

𝒄𝒕𝒐𝒕

Fig. 2. Architecture of a Bidirectional Long-short Term Memory network

and an LSTM Memory Cell.

is the trainable weight matrix for generating output features

at each layer. A non-linear activation function (ReLU) is

employed after each convolutional layer before the features

are forwarded to the next layer.

C. Foundation of the Stacked Bidirectional Unidirectional

LSTM Module

From the temporal perspective, metro ridership variations

possess several special characteristics, including non-linearity,

periodicity and regularity [41]. Considering those features,

the Stacked Bidirectional Unidirectional LSTM (SBULSTM)

framework is adopted to learn the complex temporal pattern

from the historical ridership inputs and to make sequential

predictions [11]. The theoretical foundation and detailed steps

for SBULSTM are elaborated in the following content.

1) Long short-term Memory:LSTM architecture is the

basic unit in SBULSTM for capturing temporal feature of

metro ridership data. It has been widely acknowledged that

LSTM outperforms other recurrent architectures for handling

sequence-based tasks with long-term dependencies. Its sophis-

ticated gated memory mechanism has helped to avoid gradient

vanishing or exploding problems exhibiting in traditional RNN

[42]. As demonstrated in Figure 2, each LSTM cell contains

three gates, including the input gate it, forget gate ft, and

output gate ot. The input gate determines the information to be

preserved, forget gate controls the partition to be abandoned,

and output gate decides the result to be generated [43]. De-

tailed procedures for calculating three gates and cell memory

in each memory unit is represented as follows:

it=σ(Wixt+Uiht−1+bi)(10)

ft=σ(Wfxt+Ufht−1+bf)(11)

ot=σ(Woxt+Uoht−1+bo)(12)

∼

ct= tanh(Wcxt+Ucht−1+bc)(13)

where Wi,Wf,Woare the weighted matrices and bi,bf,bo

and bcare bias vectors of LSTM to be learned during training.

σis the gate activation function, which normally indicates the

sigmoid function. Based on those three gates, the cell output

state ctand the hidden layer output htof current cell can be

generated as follows:

ct=ftct−1+it∼

ct,(14)

ht=ottanh(ct)(15)

where stands for element-wise multiplication, and tanh

is the hyperbolic tangent function. Here, when taking the

ridership prediction problem as an example, only the last

element of the output vector

2) Bidirectional Long short-term Memory:Bidirectional

LSTM network is utilized for capturing the periodicity and

regularity of metro ridership. It is noted that LSTM structure

can only make use of forward dependencies and inevitably ﬁl-

ter out useful information due to the long-term gated memory

chain. The bidirectional LSTM structure can help solve the

problem through concatenating forward and backward LSTM

layers [44]. It can employ hidden states from both direc-

tions, complementing for the information loss along the chain

within LSTM. Therefore, bidirectional LSTM has a better

capability for capturing long-term contextual dependencies in

sequential prediction tasks and making more precise sequential

predictions [45], [46]. Apart from that, the periodicity of

metro ridership pattern is another consideration for including

backward temporal dependency in the model. Unlike trafﬁc in-

cident, wind speed or other randomly organized features, metro

trafﬁc possesses strong periodicity and regularity. Utilizing

bidirectional information can enhance the ability in modelling

periodic pattern of metro ridership and making comprehensive

predictions.

The bidirectional LSTM network contains two parallel

LSTM layers in both propagation directions, as shown in

Figure 2.

−→

ht=LSTMfw(xt,−→

ht−1)(16)

←−

ht=LSTMbw(xt,←−

ht+1)(17)

LSTMfw and LSTMbw denote the forward and backward

LSTM, respectively. −→

htand ←−

htare the hidden states of the

input temporal feature xtlearned from bidirectional LSTM.

The bidirectional hidden state htfor each input xtis obtained

through concatenating the generated forward and backward

hidden states:

ut= [−→

ht,←−

ht](18)

3) Stacked Bidirectional Unidirectional LSTM:Deep re-

current networks have demonstrated its ability to generate a

higher level of representation from sequential input in previous

studies [47]–[49]. The prediction power of a neural network

can be enhanced through deepening model structure, of which

the effectiveness has been proved in many domains, such as

speech processing [47], [48], text recognition [49] and so on.

Therefore, to break through the limited performance of single

LSTM or BLSTM architecture, this study adopts SBULSTM

proposed in [11] to learn the temporal dependencies in rid-

ership data. In SBULSTM, The output of BLSTM network

is further fed to LSTM layer to generate higher sequential

representations. Theoretically, SBULSTM inherits the merits

from both LSTM and BLSTM, which on the one hand can

capture both forward and backward temporal dependency, and

on the other hand, allow a higher level of representation of the

ridership data. Nevertheless, it has not been incorporated with

spatial learning module previously, which limits its capability

for making a comprehensive spatial-temporal prediction.

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 6

Fig. 3. Architecture of the Proposed Graph Convolutional Stacked Bidirectional-LSTM Neural Network (GCN-SBULSTM)

D. Spatial-Temporal Prediction with GCN-SBULSTM

Previous studies commonly combined spatial and temporal

modules sequentially. For instance, the generated output from

CNN is fed into LSTM network in [27], [28]. However, the

original ﬂow pattern may be distorted passing through complex

spatial operations, i.e. deep convolutions, as the generated

output from convolutional layers cannot fully represent the

pattern of raw metro ridership data [5].

Therefore, to preserve the effectiveness of spatial and

temporal modules as much as possible and integrate their

results for a mutual complement, this study establishes a new

deep learning architecture, in which a GCN and SBULSTM

module are parallelly combined to make predictions for future

time frames. The effectiveness of such a parallel structure in

ridership prediction has been proved in previous study [5],

and this study is an extension by using a dynamic graph

learning approach instead of CNN. As shown 3, ridership data

are ﬁrst organized into two forms, including dynamic graphs

and time series; then these two forms of data are respectively

fed to the GCN and SBULSTM modules to learn spatial

and temporal dependencies, the outputs can be represented

by HG= [hg1,· · · , hgk]and HT= [ht1,· · · , htp], where

kand pis the number of hidden units in the last layer

of GCN and SBULSTM module, respectively; ﬁnally, the

ﬂattened outputs of two modules, OG= Flatten(HG)and

OT= Flatten(HT), are concatenated, and a fully connected

layer with dropout mechanism are applied to obtain the

prediction results, which can be formulized as follows:

X=Wst(OGkOT) + bst (19)

Fig. 4. Shenzhen metro network.

where Wst and bst are the trainable weight and bias parameters

for generating ﬁnal predicted results ˆ

X.kis the concatenating

operator.

IV. EXP ER IM EN TS

A. Data description

Three metro ridership datasets are used to validate the

effectiveness of the proposed GCN-SBULSTM: 1) a real-

world ridership dataset named SZMetro, which was collected

from the metro system in Shenzhen, China; 2) two public

ridership datasets shared in [22], respectively named HZMetro

and SHMetro, which are used for benchmark tests. The details

of these three datasets are summarized in Table I.

SZMetro: This dataset was collected during Jan. 17 2017 to

Feb. 22 2017 based on the transaction records provided by the

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 7

TABLE I

DATASET S SU MMA RY

Dataset SZMetro HZMetro SHMetro

City Shenzhen Hangzhou Shanghai

# Station 166 80 288

Time interval 4 min 15 min 15 min

# Samples per day 270 70 70

Train Timespan

17/01/2017 -

28/01/2017; 1/01/2019 - 7/01/2016 -

10/02/2017 - 1/18/2019 8/31/2016

17/02/2017

Test Timespan 18/02/2017 - 1/21/2019 - 9/10/2016 -

22/02/2017 1/25/2019 9/30/2016

metro system in Shenzhen, China. At the collection time, there

were 8 running metro lines with a total of 166 metro stations

for Shenzhen metro system, as shown in Figure 4. Each

record contains passengers’ inbound information, including the

transaction time and name of stations. Station-level ridership is

summarized based on a 4-minute time interval. As the service

time of Shenzhen metro is from 6 AM to the midnight, 270

ridership samples per day are obtained. It is noticeable that the

timespan of training data in SZMetro is discontinuous, this is

because the period from Jan. 28 to Feb. 9 corresponds to the

Spring Festival holiday in China, leading a signiﬁcant decline

in ridership and very different dynamic patterns compared

to other dates. To avoid the inﬂuence of these special dates

and retain the universality of trained models, data from Jan

28 to Feb 9 are discarded in the experiments on SZMetro.

Consequently, ridership data from the ﬁrst 20 days are used

for training, and the data from the last 5 days are used for

testing.

HZMetro and SHMetro: These two datasets were built

up based on the metro system in Hangzhou and Shanghai,

respectively. They were both summarized with a 15-minute

time interval, generating 70 samples per day. It is notable

that, since no station information is provided for HZMetro and

SHMetro, we cannot transform the metro network to an image,

so that all state-of-the-art models containing CNN module will

not be tested on these two datasets. More information about

HZMetro and SHMetro can be found in [22].

B. Experiment design

Experiments include two main parts:

1) Test the performance of GCN-SBULSTM with respect

to different temporal scales (i.e., different input and output

length) and validate the effectiveness of different graphs used

in our model. This part of experiments is conducted on

SZMetro because its raw data are available, so that we can

easily reorganize the raw data for different prediction tasks.

Speciﬁcally, two tasks are designed on SZMetro to test the

performance of GCN-SBULSTM:

•Task 1 (5 to 5), to forecast the next 5 samples based on

previous 5 samples,

•Task 2 (10 to 10), to predict the next 10 samples using

previous 10 samples.

2) Run benchmark tests on open datasets HZMetro and

SHMetro using its original input and output length (i.e., 4)

to further verify the superiority of GCN-SBULSTM.

Fig. 5. The input image of CNN.

To demonstrate the advantages of GCN-SBULSTM, classic

deep learning architecture, including LSTM, CNN, GCN, and

advanced models, including DMVST-Net [28], CNN-LSTM

[5], SRCNs [27], SBULSTM [11], DCRNN [18], STGCN

[30], Graph WaveNet [29] and PVCGN [22] are implemented

for comparison. In addition, ablation test is performed to

analyze the effectiveness of K-hop matrix used in GCN-

SBULSTM. Speciﬁcally, sufﬁxes are used to distinguish dif-

ferent ablated models: ”w/o dist” indicates the model without

using distance graph,”w/o OD” denotes the model without

using population ﬂow graph; ”w/o K-hop” stands for the

model using only the traditional adjacency matrix in the GCN

module.

C. Computational environment and experimental setup

All experiments are compiled and tested on a desktop

equipped with an Intel(R) Core(TM) CPU i9-10940X and an

NVIDIA GTX 2070i running Windows 10. The parameters for

each prediction model are carefully tuned to obtain the best

accuracy on test dataset: most models are tested following the

setting in their original paper, while minor adjustments are

made on tunable parameters, such as the number of hidden

units and batch size, to enhance the accuracy as much as

possible.

To generate the trafﬁc image for experiments on SZMetro,

the metro network map is divided by a 60 ×60 grid, which

follows the setting in [5]. Through the division, 5 pairs of

metro stations fall into the same cell, and one of each pair is

assigned to the nearest cell to avoid overlapping. The resultant

image input for CNN is exempliﬁed in Figure 5. The value

of each cell is set to the average ridership during the trained

time frames of the corresponding metro station. The number

of hidden unit of LSTM, as well as SBULSTM, is set to 1000

for SZMetro and SHMetro, 600 for HZMetro; two stacked

GCN layers with 60 and 80 channels and a fully connected

layer with hidden units of 10 are sequentially connected in the

GCN module.

As for optimizing the training process of GCN-SBULSTM,

the batch size is set to 32 and 64, respectively, for Task 1

and Task 2 on SZMetro, while the batch size is 8 and 64 for

HZMetro and SHMetro. Adam is selected as the optimizer

for training considering its good performance in preliminary

tests. The initial learning rate is set to 0.001 and the decay

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 8

TABLE II

RES ULTS O F TASK 1ON SZME TRO

Model MAE RMSE MAPE Average time per epoch

LSTM 8.49 15.02 30.31% 2s

CNN 8.69 15.97 31.08% 1s

GCN 8.65 15.83 30.55% 1s

DMVST-Net 8.34 14.85 29.80% 2s

CNN-LSTM 8.35 14.64 30.56% 2s

SRCNs 8.36 15.68 28.15% 3s

SBULSTM 8.27 15.13 27.72% 2s

DCRNN 8.29 14.84 28.32% 23s

STGCN 8.46 14.91 31.95% 4s

Graph-WaveNet 8.35 16.12 31.24% 12s

PVGCN 8.24 14.05 30.01% 264s

GCN-SBULSTM 7.96 14.41 27.94% 3s

GCN-SBULSTM w/o OD 8.02 14.60 27.72% 3s

GCN-SBULSTM w/o dist 8.04 14.48 28.44% 3s

GCN-SBULSTM w/o K-hop 8.06 14.56 27.86% 3s

TABLE III

RES ULTS O F TASK 2ON SZME TRO

Model MAE RMSE MAPE Average time per epoch

LSTM 9.00 16.39 31.45% 2s

CNN 9.74 18.80 33.33% 2s

GCN 9.20 17.22 30.77% 1s

DMVST-Net 8.89 16.22 31.43% 3s

CNN-LSTM 8.78 15.83 31.34% 3s

SRCNs 8.84 16.78 29.57% 3s

SBULSTM 8.58 16.32 28.84% 3s

DCRNN 8.56 15.47 28.23% 45s

STGCN 8.77 15.97 33.94% 6s

Graph WaveNet 8.75 16.56 29.68% 12s

PVGCN 8.48 14.90 29.39% 530s

GCN-SBULSTM 8.36 15.62 28.38% 4s

GCN-SBULSTM w/o OD 8.42 15.87 28.47% 4s

GCN-SBULSTM w/o dist 8.44 15.86 28.87% 4s

GCN-SBULSTM w/o K-hop 8.46 15.99 28.55% 4s

ratio is 0.1. Early stopping is applied during training to avoid

overﬁtting.

This study evaluates the performance of each model us-

ing three common metrics, including Mean Absolute Error

(MAE), Mean Absolute Percentage Error (MAPE), and Root

Mean Square Error (RMSE), which are deﬁned as follows:

MAE = 1

ib

Yi−Yi(20)

MAPE = 1

b

Yi−Yi

(21)

RMSE = v

ib

Yi−Yi(22)

where nis the length of samples, b

Yiis the predicted ridership

and Yiis the actual ridership. MAE is also adopted as the

loss function in the training process. Speciﬁcally, as MAPE

and RMSE are respectively sensitive to small ground truth

and large error value, we take MAE, which is more robust to

outlier and can reﬂect actual error [50], as the main metric in

our following discussion.

V. RES ULT ANA LYSI S

A. Task 1 and 2 on SZMetro

The performance of different models on Task 1 and 2 are

summarized in Table II and Table III, respectively. Among all

tested models, CNN and GCN obtain the worst accuracies

with MAE values of 8.69/9.74 and 8.65/9.20 for Task 1

and 2, which indicates the limited effectiveness of adopting

only spatial dependency in ridership forecasting. However, the

better performance of GCN demonstrates its advantages in

capturing realistic spatial dependencies for ridership forecast-

ing. By integrating CNN and LSTM to capture both spatial

and temporal dependencies, DMVST-Net, CNN-LSTM, and

SRCNs can achieve better and similar performance, which

reduces the MSE value to around 8.35 in Task 1 and 8.80

in Task 2. However, these models are easily affected by the

uncertainty of the size of input image for CNN module due

to CNN’s difﬁculty in fully representing the topologies of a

metro network. Thanks to the advantage of graph learning,

DCRNN and PVGCN signiﬁcantly improve the accuracy with

an MAE value lower than 8.30 in Task 1 and 8.60 in

Task 2. Notably, even with graph learning module, STGCN

and Graph-WaveNet just obtain results with a similar level

of CNN-based models, which should be explained as the

interference of spatial and temporal dependencies caused by

their sequential structure. Surprisingly, SBULSTM achieves

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 9

Fig. 6. Results of GCN-SBULSTM with different values of Kon SZMetro

competitive performance compared with DCRNN with MAE

values of 8.27 in Task 1 and 8.58 in Task 2. Given the

limited accuracy of LSTM, the outstanding performance of

SBULSTM proves the effectiveness of Bidirectional-LSTM

architecture in ridership forecasting.

In comparison with the above-mentioned models, the pro-

posed GCN-SBULSTM is reported to achieve the best pre-

diction accuracy, in terms of the lowest MAE of 7.96 in

Task 1 and 8.36 in Task 2. GCN-SBULSTM is also the

only model that can obtain an MAE lower than 8.00 in

Task 1 and 8.40 in Task 2. The result of ablated models

further conﬁrms the effectiveness of each graph used in GCN-

SBULSTM: compared with the result of GCN-SBULSTM,

GCN-SBULSTM w/o OD obtained a lower accuracy, i.e.,

MAE of 8.02 and 8.42, which proves the signiﬁcance of in-

corporating dynamic spatio-temporal relationship in the GCN

module; the accuracy further decreased in GCN-SBULSTM

w/o dist, of which the MAE is 8.04 and 8.44, indicating

the positive effect of travel distance graph in GCN module;

compared to other ablated models, GCN-SBULSTM w/o K-

hop has the worst performance in two tasks with MAE values

of 8.06 and 8.46, respectively. However, even though accuracy

decreased in these ablated models, they still outperform the

other tested models, validating the general effectiveness of the

model design.

As shown in Table II and III, CNN and GCN are the most

efﬁcient models amongst all tested models, while the proposed

GCN-SBULSTM achieves competitive training efﬁciency in

terms of its average training time. Speciﬁcally, GCN-LSTM

only requires 3s and 4s respectively for task 1 and 2, which is

slightly higher than all basic architectures, including LSTM,

CNN and GCN, and some advanced models, including SBUL-

STM, CNN-LSTM, and DMVST-Net. In contrast, PVCGN,

the second-best model in terms of prediction accuracy in two

tasks, is the least efﬁcient model taking 264s and 530s per

epoch for task 1 and 2, which are over 10 times longer

than DCRNN and 100 times longer than the proposed GCN-

SBULSTM. In summary, the GCN-SBULSTM is signiﬁcantly

efﬁcient considering its high accuracy among other advanced

models, which beneﬁts the process of parameter tuning and

its migration to different tasks.

To illustrate the inﬂuence of different values of Kon the

accuracy, RMSE and MAE values with respect to different K

ranging from 1 to 10 are plotted in Figure 6 for task 1 and

2. It can be seen that RMSE and MSE generally start with a

high value, then gradually decrease to its minimum at K= 6

and K= 7 for task 1 and 2, respectively, and ﬁnally increase

as Kbecomes larger. Notably that lines in Figure 6 are not

ideally smooth, which can be explained as the uncertainty

introduced by some random factors during the training process,

such as parameter initialization and dropout mechanism. The

general tendency shown in Figure 6 prove that: 1) except for

adjacent stations, the spatial dependencies among indirectly

connected stations to some extent also have positive inﬂuences

on building up effective prediction model; 2) an overestimated

Keven lead to negative effects.

B. Benchmark tests on HZMetro and SHMetro

There are some modiﬁcations in the setup of GCN-

SBULSTM in this section. Since no station and dynamic OD

information is provided in the original paper for HZMetro and

SHMetro, we make some alternatives to the travel distance

graph and population graph for GCN-SBULSTM. For travel

distance graph, we compute the number of hops between each

pair of stations using the Physical graph (i.e. adjacency graph)

and input the result to Equation 4 to generate a “fake” distance

weight matrix; as for the population graph, we calculate a

single overall population ﬂow graph based on the Correlation

graph in [22]. Also, as the information about metro stations is

not provided for HZMetro and SHMetro, we cannot generate

the trafﬁc images required in CNN-based models. Therefore,

CNN, SRCN and DMVST-Net will not be compared during

the benchmark tests on HZMetro and SHMetro.

The performances of different models on HZMetro and

SHMetro are summarized in Table IV and V, respectively.

The general tendency is similar to the previous experimental

results on SZMetro. GCN has the worst performance on

these two datasets as it only captures spatial dependency for

prediction. Surprisingly, STGCN and Graph-WaveNet become

even worse than LSTM, especially for the prediction at 60 min,

further indicating the poor performance of parallel structure

on ridership data with large time intervals, in which temporal

regularity is more signiﬁcant and dominant than those with

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 10

TABLE IV

RES ULTS F OR HZMET RO

Time Metric LSTM GCN SBULSTM DCRNN STGCN Graph-WaveNet PVCGN GCN-SBULSTM (K=6)

15min

MAE 23.43 23.94 24.31 23.24 23.86 23.50 22.20 22.22

RMSE 40.13 42.89 42.67 41.43 45.03 41.88 38.12 39.83

MAPE 14.41% 14.74% 14.51% 13.65% 12.48% 13.77% 13.15% 13.16%

30min

MAE 24.38 25.99 24.75 25.78 26.07 24.75 23.13 22.84

RMSE 42.33 47.06 43.73 43.23 49.16 43.70 40.00 41.08

MAPE 15.54% 16.87% 15.04% 15.32% 13.72% 15.68% 13.87% 13.76%

45min

MAE 25.33 29.23 25.45 26.23 28.52 25.87 23.95 23.53

RMSE 44.50 53.70 45.49 46.97 52.58 46.50 41.21 42.45

MAPE 17.18% 20.06% 15.72% 16.01% 15.18% 16.77% 14.89% 14.61%

60min

MAE 26.74 33.44 26.46 27.15 31.47 27.85 24.55 24.58

RMSE 47.90 62.39 47.07 48.56 59.74 48.69 42.26 44.48

MAPE 19.88% 27.62% 17.42% 18.64% 16.95% 20.45% 16.35% 15.73%

TABLE V

RESULTS FOR SHMETRO

Time Metric LSTM GCN SBULSTM DCRNN STGCN Graph-WaveNet PVCGN GCN-SBULSTM (K=8)

15min

MAE 23.50 24.21 23.16 23.34 23.84 23.75 22.85 22.75

RMSE 47.08 49.20 45.31 47.24 47.18 47.73 45.47 46.09

MAPE 20.23% 21.05% 17.40% 18.02% 18.71% 20.23% 16.95% 16.50%

30min

MAE 24.50 25.75 24.17 25.33 26.99 27.12 24.16 23.77

RMSE 49.63 52.34 48.39 51.31 57.40 54.15 50.18 49.04

MAPE 22.64% 24.26% 18.52% 19.12% 19.41% 21.42% 18.83% 17.62%

45min

MAE 25.59 28.64 25.38 27.65 30.81 29.23 25.45 25.02

RMSE 53.35 57.48 53.67 57.21 67.61 60.10 54.84 52.89

MAPE 24.39% 29.36% 20.07% 20.42% 20.46% 22.64% 18.83% 18.95%

60min

MAE 26.87 31.60 26.41 29.01 33.82 31.56 26.37 25.87

RMSE 56.53 63.24 59.27 63.32 77.00 68.10 58.49 55.41

MAPE 26.16% 34.25% 21.45% 21.52% 23.69% 24.92% 19.67% 20.12%

short time intervals. DCRNN obtains satisfactory accuracy

with MAE of 23.24 for HZMetro and 23.34 for SHMetro at

the ﬁrst time interval. However, with the increment of time, the

accuracy of DCRNN dramatically decreases. PVCGN achieves

competitive accuracy, especially on HZMetro, and in terms of

RMSE, PVCGN is always the best one on HZMetro, indicating

its advantage in reducing outlying predictions.

Compared with other models, the proposed GCN-

SBULSTM achieves the best accuracy in terms of MAE

at 30min and 45min on HZMetro (K=6) and signiﬁcantly

outperforms the other models with a large margin on

SHMetro (K=8). RMSE values obtained by GCN-SBULSTM

on SHMetro are also improved and surpass those of PVCGN

in most cases except for the ﬁrst interval. Given the lack

of station information and the dynamic changes of popula-

tion ﬂow in this experiment, we believe the performance of

GCN-SBULSTM on HZMetro and SHMetro can be further

improved if necessary data are available.

C. Results analysis

1) Connotation of optimal K:According to previous exper-

iments, a K-hop matrix is proved to preserve more comprehen-

sive spatial dependencies than the traditional adjacency matrix

and thus promoting the performance of the GCN module.

However, an overestimated Kis prone to having negative

effects on prediction accuracy. To explain such observation and

investigate the connotation of optimal K, statistical analyses

are performed on each experimental dataset.

As shown in Figure 7 (a) to (c), the number of station pairs

rapidly rises as Kincreases, while the average correlation

(d) (e) (f)

K = 7 K = 9

K = 6

(a) (b) (c)

Fig. 7. Inﬂuence of Kon station pairs and their average correlation.

curve (Figure 7 (d) to (f)), which is computed as the average

absolute Pearson correlation coefﬁcient among station pairs,

dramatically declines within the ﬁrst few steps and ﬁnally

reaches a stable state. This indicates that the correlation of

ridership, either positive or negative, becomes less signiﬁcant

between high-order-neighboured stations.

Given the above observation, an appropriate Kis expected

to balance the number of station pairs and the signiﬁcance

of the correlation between them. In that sense, we com-

pute the elbow point of each curve in Figure 7 (d) to (f).

Mathematically, the elbow point is deﬁned as the point with

maximum curvature on a curve [51]. In our cases, the elbow

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 11

(a) (b)

Fig. 8. Violin plots of the relationship between event ϕand average ridership.

point refers to a cutoff Kvalue so that adding higher-order-

neighboured stations does not result in better capture of spatial

correlation. It is found that the elbow point is 7, 9 and 6

for SZMetro, SHMetro and HZMetro, which is very close

to the optimal values in previous experiments, i.e. 6 and

7 for two tasks on SZMetro, 8 for SHMetro, and 6 for

HZMetro. This observation largely explains the underlying

rationality of optimal Kin each prediction task. Also, the

elbow point of average correlation curve can be used as an

important reference for selecting optimal Kwhen training

GCN-SBULSTM.

2) Visulization analysis:To further illustrate the advan-

tages of GCN-SBULSTM, a visualization analysis is con-

ducted on HZMetro and SHMetro by taking PVCGN as the

control method. Since ridership volume is proved to be a

critical factor affecting prediction accuracy [22], we focus on

exploring the performance of GCN-SBULSTM for different

ridership volumes. We ﬁrst deﬁne an event ϕfor each station,

where ϕ= 0 when PVCGN obtains a lower MAE than GCN-

SBULSTM; otherwise, ϕ= 1. A violin plot is used to depict

the distribution of ϕas well as its relationship to the average

ridership. As shown in Figure 8, the violins with ϕ= 1 are

much ’fatter’ than those with ϕ= 0, which means that, for

most stations of SHMetro and HZMetro, GCN-SBULSTM

achieves lower MAE than PVCGN. Moreover, the dash lines

inside violins with ϕ= 1, which refer to quartiles of the

distribution, are generally lower than those with ϕ= 0,

indicating that GCN-SBULSTM is more suitable for low-

ridership stations.

Three instances are further selected from SHMetro to

illustrate the performance of GCN-SBULSTM on different

ridership volumes. As shown in Figure 9 (a), GCN-SBULSTM

performs well in capturing the overall trend as well as nar-

row ﬂuctuation in low ridership, while PVCGN is likely to

overestimate the ridership in many cases. As for the station

with high ridership in Figure 9 (b), GCN-SBULSTM produces

more accurate predictions in most cases but does not fully

capture the marked drastic ﬂuctuation. In comparison, PVCGN

seems to be more sensitive to this kind of ﬂuctuation, but

overestimation can be still easily observed. Furthermore, such

sensitivity of PVCGN might be also invalid and introduce

uncertainty, such as the signiﬁcant bias and miss of ﬂuctuation

marked in Figure 9 (c).

In summary, PVCGN seems to provide a radical prediction,

which sometimes overreacts to ridership ﬂuctuation and prone

to producing overestimated results. In contrast, the proposed

GCN-SBULSTM achieves a higher prediction accuracy than

PVCGN in most instances and can better handle the ﬂuctua-

tions especially for low-ridership stations.

VI. CONCLUSION AND DISCUSSION

Metro ridership forecasting is a fundamental issue in mod-

ern public transportation management. Focusing on improving

the accuracy of metro ridership forecasting, this study proposes

GCN-SBULSTM, a novel deep learning network with a par-

allel structure concatenating GCN and SBULSTM modules.

In the GCN module, a novel K-hop weight matrix, which

integrates adjacency, travel distance, and population ﬂow, is

introduced to capture comprehensive spatial correlation within

a metro network. GCN-SBULSTM inherits both the merits of

GCN and SBULSTM, and the parallel structure helps preserve

most independence of spatial and temporal information, thus

reduce the uncertainty caused by their interference.

According to the results on three real-world datasets,

the proposed GCN-SBULSTM outperform the state-of-the-art

models in terms of its high accuracy and training efﬁciency.

The slightly poorer performance of ablated models veriﬁed

the effectiveness of using both physical and virtual graphs in

improving the overall accuracy. Additionally, in comparison

with CNN-based models, the higher accuracy obtained by

GCN-based models indicates the superiority of treating trafﬁc

network as a graph than a simple 2D image in network-

related trafﬁc forecasting tasks. Last but not least, the relatively

lower accuracy of STGCN and Graph-WaveNet veriﬁes the

hypothesis that the parallel structure can preserve, at the most

extent, the integrity of both spatial and temporal dependencies

for better prediction.

Improvements can be made in future work. One issue is

to incorporate more factors, such as weather condition and

holiday events which may correlate with ridership patterns, to

enhance the prediction model. Apart from that, it is notable

that the proposed model is only applied for inbound rider-

ship prediction. As passengers’ outbound preference highly

depends on time schedules and the functional zone where a

metro station locates, it is not sufﬁcient to accurately forecast

outbound ridership using solely the number of previous trips

in a time-series form, especially for a short time interval.

However, provided accurate time schedules and other auxiliary

information that can support the diagnose of passengers’ pref-

erence, the fundamental idea of GCN-SBULSTM is expected

to apply to outbound ridership prediction as well with further

improvement.

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 12

Timeinterval

Ridership

Timeinterval Timeinterval

(a) (b) (c)

Fig. 9. Snapshot of three prediction instances. Station #277 and #12 are respectively of the lowest and highest average ridership in SHMetro, Station #7 is a

more general instance with moderate average ridership.

REFERENCES

[1] C. Ding, D. Wang, X. Ma, H. L. Sustainability, and undeﬁned 2016,

“Predicting short-term subway ridership and prioritizing its inﬂuential

factors using gradient boosting decision trees,” mdpi.com. [Online].

Available: https://www.mdpi.com/2071-1050/8/11/1100

[2] S. Derrible and C. Kennedy, “Evaluating, Comparing, and Improving

Metro Networks: Application to Plans for Toronto, Canada,”

Transportation Research Record: Journal of the Transportation

Research Board, vol. 2146, no. 1, pp. 43–51, jan 2010. [Online].

Available: http://journals.sagepub.com/doi/10.3141/2146-06

[3] Y. Lv, Y. Duan, W. Kang, Z. L. I. T. on . . . , and undeﬁned

2014, “Trafﬁc ﬂow prediction with big data: a deep learning

approach,” ieeexplore.ieee.org. [Online]. Available: https://ieeexplore.

ieee.org/abstract/document/6894591/

[4] J. Zhang, Y. Zheng, D. Qi, R. Li, and X. Yi, “Dnn-based

prediction model for spatio-temporal data,” in Proceedings of the

24th ACM SIGSPATIAL International Conference on Advances in

Geographic Information Systems, ser. SIGSPACIAL ’16. New

York, NY, USA: ACM, 2016, pp. 92:1–92:4. [Online]. Available:

http://doi.acm.org/10.1145/2996913.2997016

[5] X. Ma, J. Zhang, B. Du, C. Ding, and L. Sun, “Parallel architecture

of convolutional bi-directional lstm neural networks for network-wide

metro ridership prediction,” IEEE Transactions on Intelligent Trans-

portation Systems, vol. 20, no. 6, pp. 2278–2288, June 2019.

[6] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature,

vol. 521, no. 7553, pp. 436–444, may 2015. [Online]. Available:

http://www.nature.com/articles/nature14539

[7] S. Shekhar and B. M. Williams, “Adaptive seasonal time series

models for forecasting short-term trafﬁc ﬂow,” Transportation Research

Record, vol. 2024, no. 1, pp. 116–125, 2007. [Online]. Available:

https://doi.org/10.3141/2024-14

[8] X. Li, G. Pan, Z. Wu, G. Qi, S. Li, D. Zhang, W. Zhang, and Z. Wang,

“Prediction of urban human mobility using large-scale taxi traces and its

applications,” Frontiers of Computer Science, vol. 6, no. 1, pp. 111–121,

Feb 2012.

[9] L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, and

L. Damas, “Predicting taxi-passenger demand using streaming data,”

IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 3,

pp. 1393–1402, 2013.

[10] R. Yu, Y. Li, C. Shahabi, U. Demiryurek, and Y. Liu, Deep Learning:

A Generic Approach for Extreme Condition Trafﬁc Forecasting, pp.

777–785. [Online]. Available: https://epubs.siam.org/doi/abs/10.1137/1.

9781611974973.87

[11] Z. Cui, R. Ke, and Y. Wang, “Deep bidirectional and unidirectional

LSTM recurrent neural network for network-wide trafﬁc speed

prediction,” CoRR, vol. abs/1801.02143, 2018. [Online]. Available:

http://arxiv.org/abs/1801.02143

[12] R. Fu, Z. Zhang, and L. Li, “Using lstm and gru neural network

methods for trafﬁc ﬂow prediction,” in 2016 31st Youth Academic Annual

Conference of Chinese Association of Automation (YAC). IEEE, 2016,

pp. 324–328.

[13] X. Cheng, R. Zhang, J. Zhou, and W. Xu, “Deeptransport: Learning

spatial-temporal dependency for trafﬁc condition forecasting,” in 2018

International Joint Conference on Neural Networks (IJCNN). IEEE,

2018, pp. 1–8.

[14] X. Ma, Z. Dai, Z. He, J. Ma, Y. Wang, and Y. Wang, “Learning

trafﬁc as images: A deep convolutional neural network for large-scale

transportation network speed prediction,” Sensors, vol. 17, no. 4, 2017.

[Online]. Available: https://www.mdpi.com/1424-8220/17/4/818

[15] J. Zhang, Y. Zheng, and D. Qi, “Deep spatio-temporal residual

networks for citywide crowd ﬂows prediction,” in Proceedings of

the Thirty-First AAAI Conference on Artiﬁcial Intelligence, ser.

AAAI’17. AAAI Press, 2017, pp. 1655–1661. [Online]. Available:

http://dl.acm.org/citation.cfm?id=3298239.3298479

[16] Z. Xie, W. Lv, S. Huang, Z. Lu, B. Du, and R. Huang, “Sequential graph

neural network for urban road trafﬁc speed prediction,” IEEE Access,

vol. 8, pp. 63 349–63 358, 2019.

[17] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehen-

sive survey on graph neural networks,” arXiv preprint arXiv:1901.00596,

2019.

[18] Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion Convolutional Recurrent

Neural Network: Data-Driven Trafﬁc Forecasting,” arXiv:1707.01926

[cs, stat], Feb. 2018, arXiv: 1707.01926. [Online]. Available:

http://arxiv.org/abs/1707.01926

[19] L. Zhao, Y. Song, C. Zhang, Y. Liu, P. Wang, T. Lin, M. Deng, and H. Li,

“T-gcn: A temporal graph convolutional network for trafﬁc prediction,”

IEEE Transactions on Intelligent Transportation Systems, 2019.

[20] G. Jin, Y. Cui, L. Zeng, H. Tang, Y. Feng, and J. Huang, “Urban ride-

hailing demand prediction with multiple spatio-temporal information fu-

sion network,” Transportation Research Part C: Emerging Technologies,

vol. 117, p. 102665, 2020.

[21] B. Lu, X. Gan, H. Jin, L. Fu, and H. Zhang, “Spatiotemporal adaptive

gated graph convolution network for urban trafﬁc ﬂow forecasting,” in

Proceedings of the 29th ACM International Conference on Information

& Knowledge Management, 2020, pp. 1025–1034.

[22] L. Liu, J. Chen, H. Wu, J. Zhen, G. Li, and L. Lin, “Physical-Virtual

Collaboration Modeling for Intra-and Inter-Station Metro Ridership

Prediction,” arXiv:2001.04889 [cs], Jun. 2020, arXiv: 2001.04889.

[Online]. Available: http://arxiv.org/abs/2001.04889

[23] Z. Cui, K. Henrickson, R. Ke, and Y. Wang, “Trafﬁc Graph

Convolutional Recurrent Neural Network: A Deep Learning Framework

for Network-Scale Trafﬁc Learning and Forecasting,” IEEE Transactions

on Intelligent Transportation Systems, pp. 1–12, 2019. [Online].

Available: https://ieeexplore.ieee.org/document/8917706/

[24] G. Jin, Y. Cui, L. Zeng, H. Tang, Y. Feng, and J. Huang, “Urban ride-

hailing demand prediction with multiple spatio-temporal information fu-

sion network,” Transportation Research Part C: Emerging Technologies,

vol. 117, p. 102665, 2020.

[25] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcation

with deep convolutional neural networks,” in Advances in Neural Infor-

mation Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and

K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1097–1105.

[26] S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face

recognition: a convolutional neural-network approach,” IEEE Trans.

Neural Networks, vol. 8, no. 1, pp. 98–113, 1997. [Online]. Available:

https://doi.org/10.1109/72.554195

[27] H. Yu, Z. Wu, S. Wang, Y. Wang, and X. Ma, “Spatiotemporal

recurrent convolutional networks for trafﬁc prediction in transportation

networks,” Sensors, vol. 17, no. 7, 2017. [Online]. Available:

https://www.mdpi.com/1424-8220/17/7/1501

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 13

[28] H. Yao, F. Wu, J. Ke, X. Tang, Y. Jia, S. Lu, P. Gong, J. Ye, and Z. Li,

“Deep multi-view spatial-temporal network for taxi demand prediction,”

in AAAI, 2018.

[29] Z. Wu, S. Pan, G. Long, J. Jiang, and C. Zhang, “Graph WaveNet

for Deep Spatial-Temporal Graph Modeling,” arXiv:1906.00121

[cs, stat], May 2019, arXiv: 1906.00121. [Online]. Available:

http://arxiv.org/abs/1906.00121

[30] B. Yu, H. Yin, and Z. Zhu, “Spatio-Temporal Graph Convolutional

Networks: A Deep Learning Framework for Trafﬁc Forecasting,”

Proceedings of the Twenty-Seventh International Joint Conference on

Artiﬁcial Intelligence, pp. 3634–3640, Jul. 2018, arXiv: 1709.04875.

[Online]. Available: http://arxiv.org/abs/1709.04875

[31] B. Du, X. Hu, L. Sun, J. Liu, Y. Qiao, and W. Lv, “Trafﬁc demand

prediction based on dynamic transition convolutional neural network,”

IEEE Transactions on Intelligent Transportation Systems, 2020.

[32] T. N. Kipf and M. Welling, “Semi-supervised classiﬁcation with graph

convolutional networks,” CoRR, vol. abs/1609.02907, 2016. [Online].

Available: http://arxiv.org/abs/1609.02907

[33] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and

locally connected networks on graphs,” in 2nd International Conference

on Learning Representations, ICLR 2014, Banff, AB, Canada, April

14-16, 2014, Conference Track Proceedings, 2014. [Online]. Available:

http://arxiv.org/abs/1312.6203

[34] M. Henaff, J. Bruna, and Y. LeCun, “Deep convolutional networks

on graph-structured data,” CoRR, vol. abs/1506.05163, 2015. [Online].

Available: http://arxiv.org/abs/1506.05163

[35] W. R. Tobler, “A computer movie simulating urban growth in the detroit

region,” Economic geography, vol. 46, no. sup1, pp. 234–240, 1970.

[36] X. Ma, J. Zhang, C. Ding, and Y. Wang, “A geographically and tempo-

rally weighted regression model to explore the spatiotemporal inﬂuence

of built environment on transit ridership,” Computers, Environment and

Urban Systems, vol. 70, pp. 113–124, 2018.

[37] H. Yang, X. Lu, C. Cherry, X. Liu, and Y. Li, “Spatial variations

in active mode trip volume at intersections: a local analysis utilizing

geographically weighted regression,” Journal of transport geography,

vol. 64, pp. 184–194, 2017.

[38] G. Kusano, Y. Hiraoka, and K. Fukumizu, “Persistence weighted gaus-

sian kernel for topological data analysis,” in International Conference

on Machine Learning, 2016, pp. 2004–2013.

[39] S. Raveau, J. C. Mu˜

noz, and L. De Grange, “A topological route choice

model for metro,” Transportation Research Part A: Policy and Practice,

vol. 45, no. 2, pp. 138–147, 2011.

[40] D. An, X. Tong, K. Liu, and E. H. Chan, “Understanding the impact of

built environment on metro ridership using open source in shanghai,”

Cities, vol. 93, pp. 177–187, 2019.

[41] Y. Gong, Y. Lin, and Z. Duan, “Exploring the spatiotemporal structure

of dynamic urban space using metro smart card records,” Computers,

Environment and Urban Systems, vol. 64, pp. 169–183, jul 2017.

[Online]. Available: https://www.sciencedirect.com/science/article/pii/

S0198971516301089

[42] K. Greff, R. K. Srivastava, J. Koutn ˜

Ak, B. R. Steunebrink, and

J. Schmidhuber, “Lstm: A search space odyssey,” IEEE Transactions

on Neural Networks and Learning Systems, vol. 28, no. 10, pp. 2222–

2232, Oct 2017.

[43] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural

Computation, vol. 9, no. 8, pp. 1735–1780, 1997.

[49] A. Ray, S. Rajeswar, and S. Chaudhury, “Text recognition using deep

blstm networks,” in 2015 Eighth International Conference on Advances

in Pattern Recognition (ICAPR), Jan 2015, pp. 1–6.

[44] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural net-

works,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp.

2673–2681, Nov 1997.

[45] A. Graves, S. Fern´

andez, and J. Schmidhuber, “Bidirectional lstm net-

works for improved phoneme classiﬁcation and recognition,” in Artiﬁcial

Neural Networks: Formal Models and Their Applications – ICANN 2005,

W. Duch, J. Kacprzyk, E. Oja, and S. Zadro˙

zny, Eds. Berlin, Heidelberg:

Springer Berlin Heidelberg, 2005, pp. 799–804.

[46] A. Graves and J. Schmidhuber, “Framewise phoneme classiﬁcation

with bidirectional lstm and other neural network architectures,” Neural

networks, vol. 18, no. 5-6, pp. 602–610, 2005.

[47] A. Graves, A. Mohamed, and G. Hinton, “Speech recognition with deep

recurrent neural networks,” in 2013 IEEE International Conference on

Acoustics, Speech and Signal Processing, May 2013, pp. 6645–6649.

[48] A. Graves, N. Jaitly, and A. Mohamed, “Hybrid speech recognition with

deep bidirectional lstm,” in 2013 IEEE Workshop on Automatic Speech

Recognition and Understanding, Dec 2013, pp. 273–278.

[50] T. Chai and R. R. Draxler, “Root mean square error (rmse) or mean

absolute error (mae)?–arguments against avoiding rmse in the literature,”

Geoscientiﬁc model development, vol. 7, no. 3, pp. 1247–1250, 2014.

[51] Q. Zhao, V. Hautamaki, and P. Fr¨

anti, “Knee point detection in bic

for detecting the number of clusters,” in International conference on

advanced concepts for intelligent vision systems. Springer, 2008, pp.

664–673.

Pengfei CHEN Pengfei Chen received the B.S., M.S., and Ph.D. degrees from

Wuhan University in 2012, 2015 and 2019, respectively. He also received a

joint Ph.D. degree from the Hong Kong Polytechnic University in 2020. He

is currently an Assistant Professor with the School of Geospatial Engineering

and Science, Sun Yat-Sen University, Guangdong, China. His research inter-

ests include human mobility modeling, geospatial artiﬁcial intelligence and

spatial data uncertainty.

Xuandi FU Xuandi Fu received the B.S. degree from the Hong Kong

Polytechnic University in 2017. She is currently pursuing the M.S. degree with

the Department of Electrical and Computer Engineering at Carnegie Mellon

University, USA. Her research interests include natural language processing,

graph convolutional neural networks, human mobility modeling and spatial

data analytics.

Xue WANG Xue Wang received the B.S. and M.S. degrees from Peking

University in 2012 and 2015, respectively, and the Ph.D. degree from the

Chinese University of Hong Kong in 2019. She currently works as an

Assistant Professor with the School of Geospatial Engineering and Science,

Sun Yat-Sen University, Guangdong, China. Her research interests include

urban informatics and change detection.

Dynamic spatial‐temporal network for traffic forecasting based on joint latent space representation

Article

Full-text available

May 2024
IET INTELL TRANSP SY

In the era of data‐driven transportation development, traffic forecasting is crucial. Established studies either ignore the inherent spatial structure of the traffic network or ignore the global spatial correlation and may not capture the spatial relationships adequately. In this work, a Dynamic Spatial‐Temporal Network (DSTN) based on Joint Latent Space Representation (JLSR) is proposed for traffic forecasting. Specifically, in the spatial dimension, a JLSR network is developed by integrating graph convolution and spatial attention operations to model complex spatial dependencies. Since it can adaptively fuse the representation information of local topological space and global dynamic space, a more comprehensive spatial dependency can be captured. In the temporal dimension, a Stacked Bidirectional Unidirectional Gated Recurrent Unit (SBUGRU) network is developed, which captures long‐term temporal dependencies through both forward and backward computations and superimposed recurrent layers. On these bases, DSTN is developed in an encoder‐decoder framework and periodicity is flexibly modeled by embedding branches. The performance of DSTN is validated on two types of real‐world traffic flow datasets, and it improves over baselines.

An Overview Based on the Overall Architecture of Traffic Forecasting

Article

Full-text available

Mar 2024

With the exponential increase in the urban population, urban transportation systems are confronted with numerous challenges. Traffic congestion is common, traffic accidents happen frequently, and traffic environments are deteriorating. To alleviate these issues and improve the efficiency of urban transportation, accurate traffic forecasting is crucial. In this study, we aim to provide a comprehensive overview of the overall architecture of traffic forecasting, covering aspects such as traffic data analysis, traffic data modeling, and traffic forecasting applications. We begin by introducing existing traffic forecasting surveys and preliminaries. Next, we delve into traffic data analysis from traffic data collection, traffic data formats, and traffic data characteristics. Additionally, we summarize traffic data modeling from spatial representation, temporal representation, and spatio-temporal representation. Furthermore, we discuss the application of traffic forecasting, including traffic flow forecasting, traffic speed forecasting, traffic demand forecasting, and other hybrid traffic forecasting. To support future research in this field, we also provide information on open datasets, source resources, challenges, and potential research directions. As far as we know, this paper represents the first comprehensive survey that focuses specifically on the overall architecture of traffic forecasting.

MPSTAN: Metapopulation-Based Spatio–Temporal Attention Network for Epidemic Forecasting

Article

Full-text available

Mar 2024
Entropy

Accurate epidemic forecasting plays a vital role for governments to develop effective prevention measures for suppressing epidemics. Most of the present spatio–temporal models cannot provide a general framework for stable and accurate forecasting of epidemics with diverse evolutionary trends. Incorporating epidemiological domain knowledge ranging from single-patch to multi-patch into neural networks is expected to improve forecasting accuracy. However, relying solely on single-patch knowledge neglects inter-patch interactions, while constructing multi-patch knowledge is challenging without population mobility data. To address the aforementioned problems, we propose a novel hybrid model called metapopulation-based spatio–temporal attention network (MPSTAN). This model aims to improve the accuracy of epidemic forecasting by incorporating multi-patch epidemiological knowledge into a spatio–temporal model and adaptively defining inter-patch interactions. Moreover, we incorporate inter-patch epidemiological knowledge into both model construction and the loss function to help the model learn epidemic transmission dynamics. Extensive experiments conducted on two representative datasets with different epidemiological evolution trends demonstrate that our proposed model outperforms the baselines and provides more accurate and stable short- and long-term forecasting. We confirm the effectiveness of domain knowledge in the learning model and investigate the impact of different ways of integrating domain knowledge on forecasting. We observe that using domain knowledge in both model construction and the loss function leads to more efficient forecasting, and selecting appropriate domain knowledge can improve accuracy further.

Are Graphs and GCNs necessary for short-term metro ridership forecasting?

Article

Jun 2024
EXPERT SYST APPL

Real-time prediction of transit origin–destination flows during underground incidents

Article

Jun 2024

Backbone-based Dynamic Spatio-Temporal Graph Neural Network for epidemic forecasting

Article

May 2024
KNOWL-BASED SYST

Multi-Range Spatial-Temporal Attention Network for Traffic Flow Forecasting

Conference Paper

Mar 2024

Dynamic Multi-Scale Spatio-Temporal Graph ODE for Metro Ridership Prediction

Conference Paper

Mar 2024

Parallel framework of a multi-graph convolutional network and gated recurrent unit for spatial–temporal metro passenger flow prediction

Article

Apr 2024
EXPERT SYST APPL

Convolutional bidirectional GRU for dynamic functional connectivity classification in brain diseases diagnosis

Article

Mar 2024
KNOWL-BASED SYST

T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction

Article

Full-text available

Aug 2019

Accurate and real-time traffic forecasting plays an important role in the intelligent traffic system and is of great significance for urban traffic planning, traffic management, and traffic control. However, traffic forecasting has always been considered an "open" scientific issue, owing to the constraints of urban road network topological structure and the law of dynamic change with time. To capture the spatial and temporal dependences simultaneously, we propose a novel neural network-based traffic forecasting method, the temporal graph convolutional network (T-GCN) model, which is combined with the graph convolutional network (GCN) and the gated recurrent unit (GRU). Specifically, the GCN is used to learn complex topological structures for capturing spatial dependence and the gated recurrent unit is used to learn dynamic changes of traffic data for capturing temporal dependence. Then, the T-GCN model is employed to traffic forecasting based on the urban road network. Experiments demonstrate that our T-GCN model can obtain the spatio-temporal correlation from traffic data and the predictions outperform state-of-art baselines on real-world traffic datasets. Our tensorflow implementation of the T-GCN is available at https://github.com/lehaifeng/T-GCN.

Graph WaveNet for Deep Spatial-Temporal Graph Modeling

Conference Paper

Full-text available

Aug 2019

Spatial-temporal graph modeling is an important task to analyze the spatial relations and temporal trends of components in a system. Existing approaches mostly capture the spatial dependency on a fixed graph structure, assuming that the underlying relation between entities is pre-determined. However, the explicit graph structure (relation) does not necessarily reflect the true dependency and genuine relation may be missing due to the incomplete connections in the data. Furthermore, existing methods are ineffective to capture the temporal trends as the RNNs or CNNs employed in these methods cannot capture long-range temporal sequences. To overcome these limitations, we propose in this paper a novel graph neural network architecture, {Graph WaveNet}, for spatial-temporal graph modeling. By developing a novel adaptive dependency matrix and learn it through node embedding, our model can precisely capture the hidden spatial dependency in the data. With a stacked dilated 1D convolution component whose receptive field grows exponentially as the number of layers increases, Graph WaveNet is able to handle very long sequences. These two components are integrated seamlessly in a unified framework and the whole framework is learned in an end-to-end manner. Experimental results on two public traffic network datasets, METR-LA and PEMS-BAY, demonstrate the superior performance of our algorithm.

Sequential Graph Neural Network for Urban Road Traffic Speed Prediction

Article

Full-text available

May 2019

Accurate speed predictions for urban roads are highly important for traffic monitoring and route planning, and also help relieve the pressure of traffic congestion. Many existing studies on traffic speed prediction are based on convolutional neural networks, and these have primarily focused on capturing the spatial proximity among different road segments. However, the real cause of the spread of traffic congestion is the connectivity of these road segments, rather than their spatial proximity. This makes it very challenging to improve the prediction accuracy. Using graph neural networks (GNNs), the connectivity of these road segments can be modeled as a graph in which the properties of road segments and the connections between them are embedded as the properties of the nodes and edges, respectively. This paper describes a novel approach that combines the advantages of sequence-to-sequence (Seq2Seq) models and GNNs. Specifically, the evolution of traffic conditions on road networks is modeled as a sequential of graphs. Thus, the proposed SeqGNN model represents both the inputs and outputs as graph sequences. Finally, extensive experiments using real-world datasets demonstrate the effectiveness of our approach and its advantages over state-ofthe-art methods.

Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition

Chapter

Sep 2005

In this paper, we carry out two experiments on the TIMIT speech corpus with bidirectional and unidirectional Long Short Term Memory (LSTM) networks. In the first experiment (framewise phoneme classification) we find that bidirectional LSTM outperforms both unidirectional LSTM and conventional Recurrent Neural Networks (RNNs). In the second (phoneme recognition) we find that a hybrid BLSTM-HMM system improves on an equivalent traditional HMM system, as well as unidirectional LSTM-HMM.

Physical-Virtual Collaboration Modeling for Intra- and Inter-Station Metro Ridership Prediction

Article

Nov 2020

Due to the widespread applications in real-world scenarios, metro ridership prediction is a crucial but challenging task in intelligent transportation systems. However, conventional methods either ignore the topological information of metro systems or directly learn on physical topology, and cannot fully explore the patterns of ridership evolution. To address this problem, we model a metro system as graphs with various topologies and propose a unified Physical-Virtual Collaboration Graph Network (PVCGN), which can effectively learn the complex ridership patterns from the tailor-designed graphs. Specifically, a physical graph is directly built based on the realistic topology of the studied metro system, while a similarity graph and a correlation graph are built with virtual topologies under the guidance of the inter-station passenger flow similarity and correlation. These complementary graphs are incorporated into a Graph Convolution Gated Recurrent Unit (GC-GRU) for spatial-temporal representation learning. Further, a Fully-Connected Gated Recurrent Unit (FC-GRU) is also applied to capture the global evolution tendency. Finally, we develop a Seq2Seq model with GC-GRU and FC-GRU to forecast the future metro ridership sequentially. Extensive experiments on two large-scale benchmarks (e.g., Shanghai Metro and Hangzhou Metro) well demonstrate the superiority of our PVCGN for station-level metro ridership prediction. Moreover, we apply the proposed PVCGN to address the online origin-destination (OD) ridership prediction and the experiment results show the universality of our method. Our code and benchmarks are available at https://github.com/HCPLab-SYSU/PVCGN .

Spatiotemporal Adaptive Gated Graph Convolution Network for Urban Traffic Flow Forecasting

Conference Paper

Oct 2020

Urban ride-hailing demand prediction with multiple spatio-temporal information fusion network

Article

Aug 2020
TRANSPORT RES C-EMER

Urban ride-hailing demand prediction is a long-term but challenging task for online car-hailing system decision, taxi scheduling and intelligent transportation construction. Accurate urban ride-hailing demand prediction can improve vehicle utilization and scheduling, reduce waiting time and traffic congestion. Existing traffic flow prediction approaches mainly utilize region-based situation awareness image or station-based graph representation to capture traffic spatial dynamic while we observe that combination of situation awareness image and graph representation are also critical for accurate forecasting. In this paper, we propose the Multiple Spatio-Temporal Information Fusion Networks (MSTIF-Net), a novel deep learning approach to better fuse multiple situation awareness information and graphs representation. MSTIF-Net model integrates structures of Graph Convolutional Neural Networks (GCN), Variational Auto-Encoders (VAE) and Sequence to Sequence Learning (Seq2seq) model to obtain the joint latent representation of urban ride-hailing situation that contain both Euclidean spatial features and non-Euclidean structural features, and capture the spatio-temporal dynamics. We evaluate the proposed model on two real-world large scale urban traffic datasets and the experimental studies demonstrate MSTIF-Net has achieved superior performance of urban ride-Hailing demand prediction compared with some traditional state-of-art baseline models.

A Comprehensive Survey on Graph Neural Networks

Article

Mar 2020

Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications, where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on the existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this article, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art GNNs into four categories, namely, recurrent GNNs, convolutional GNNs, graph autoencoders, and spatial-temporal GNNs. We further discuss the applications of GNNs across various domains and summarize the open-source codes, benchmark data sets, and model evaluation of GNNs. Finally, we propose potential research directions in this rapidly growing field.

Traffic Demand Prediction Based on Dynamic Transition Convolutional Neural Network

Article

Jan 2020

Precise traffic demand prediction could help government and enterprises make better management and operation decisions by providing them with data-driven insights. However, it is a nontrivial effort to design an effective traffic demand prediction method due to the spatial and temporal characteristics of traffic demand distributions, dynamics of human mobility, and impacts of multiple environmental factors. To handle these problems, a Dynamic Transition Convolutional Neural Network (DTCNN) is proposed for the purpose of precise traffic demand prediction. Particularly, a transition network is first constructed according to the citiwide historical departure and arrival records, where the nodes are virtual stations discovered by a density-peak based clustering algorithm and the edges of two nodes correspond to transition flows of two stations. Then, a dynamic transition convolution unit is designed to model the spatial distributions of the traffic demands, and to capture the evolution of the demand dynamics. Last, a unifying learning framework is provided to incorporate the spatiotemporal states of the traffic demands with environmental factors. Experiments have been conducted on NYC taxi and bike-sharing data, and the results validate the effectiveness of the proposed method.

Understanding the impact of built environment on metro ridership using open source in Shanghai

Article

Oct 2019
CITIES

A growing body of research using the direct demand model has explored the impact of the built environment on transit ridership. However, empirical studies identified various significant factors in different cities with different datasets. This study adopts points-of-interest (POIs) data to identify the physical environmental factors affecting metro ridership in Shanghai. Independent variables in terms of the rail transit system, external connectivity, intermodal connection, and land use factors within 286 metro stations' catchment areas were selected. Principal component analysis (PCA) was used to group POIs into 6 components for dimensionality reduction. The results from ordinary least squares (OLS) regression analysis emphasize the dominating role of commercial land use and rail transit system factors, together with bus stops, tourist spots and healthcare factors, positively impact both weekday and weekend metro ridership; however, the effect of job-related land use is significant only on weekdays. Distinctively, the variable of intersection density is not positively associated with ridership as expected, revealing that street network measurements may not explain walking to rail transit in the citywide Shanghai context, so we suggest a new requirement: a multilevel-based walkability index in dense cities. The latter finding also implied that residences in central locations are less reliable than those in suburban locations. Finally, we conclude with strategies to encourage balanced trip demands other than simply increasing ridership, which has potential implications on urban planning and transit-oriented development (TOD) in China.

A Graph Convolutional Stacked Bidirectional Unidirectional-LSTM Neural Network for Metro Ridership Prediction

Abstract and Figures

Recommended publications

Attention Mechanism Multi-size Depthwise Convolutional Long Short-term Memory Neural Network for For...

A deep learning-based framework for road traffic prediction

Short-Term Traffic Flow Prediction for Urban Road Sections Based on Time Series Analysis and LSTM_BI...

Multi-community passenger demand prediction at region level based on spatio-temporal graph convoluti...

Graph Hierarchical Convolutional Recurrent Neural Network (GHCRNN) for Vehicle Condition Prediction

Dynamic Global-Local Spatial-Temporal Network for Traffic Speed Prediction

Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting