ArticlePDF Available

Predicting Irregularities in Arrival Times for Transit Buses with Recurrent Neural Networks Using GPS Coordinates and Weather Data

Authors:

Abstract and Figures

Intelligent transportation systems (ITS) play an important role in the quality of life of citizens in any metropolitan city. Despite various policies and strategies incorporated to increase the reliability and quality of service, public transportation authorities continue to face criticism from commuters largely due to irregularities in bus arrival times, most notably manifested in early or late arrivals. Due to these irregularities, commuters may miss important appointments, wait for too long at the bus stop, or arrive late for work. Therefore, accurate prediction models are needed to build better customer service solutions for transit systems, e.g. building accurate mobile apps for trip planning or sending bus delay/cancel notifications. Prediction models will also help in developing better appointment scheduling systems for doctors, dentists, and other businesses to take into account transit bus delays for their clients. In this paper, we seek to predict the occurrence of arrival time irregularities by mining GPS coordinates of transit buses provided by the Toronto Transit Commission (TTC) along with hourly weather data and using this data in machine learning models that we have developed. In our study, we compared the performance of a Long Short Term Memory Recurrent Neural Network (LSTM) model against four baseline models, an Artificial Neural Network (ANN), Support Vector Regression (SVR), Autoregressive Integrated Moving Average (ARIMA) and historical averages. We found that our LSTM model demonstrates the best prediction accuracy. The improved accuracy achieved by the LSTM model may lend itself to its ability to adjust and update the weights of neurons while accounting for long-term dependencies. In addition, we found that weather conditions play a significant role in improving the accuracy of
Content may be subject to copyright.
Vol.:(0123456789)
1 3
Journal of Ambient Intelligence and Humanized Computing
https://doi.org/10.1007/s12652-020-02507-9
ORIGINAL RESEARCH
Predicting irregularities inarrival timesfortransit buses withrecurrent
neural networks using GPS coordinates andweather data
OmarAlam1 · AnshumanKush1· AliEmami2· ParisaPouladzadeh3
Received: 26 March 2020 / Accepted: 27 August 2020
© Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract
Intelligent transportation systems (ITS) play an important role in the quality of life of citizens in any metropolitan city.
Despite various policies and strategies incorporated to increase the reliability and quality of service, public transportation
authorities continue to face criticism from commuters largely due to irregularities in bus arrival times, most notably mani-
fested in early or late arrivals. Due to these irregularities, commuters may miss important appointments, wait for too long
at the bus stop, or arrive late for work. Therefore, accurate prediction models are needed to build better customer service
solutions for transit systems, e.g. building accurate mobile apps for trip planning or sending bus delay/cancel notifications.
Prediction models will also help in developing better appointment scheduling systems for doctors, dentists, and other busi-
nesses to take into account transit bus delays for their clients. In this paper, we seek to predict the occurrence of arrival time
irregularities by mining GPS coordinates of transit buses provided by the Toronto Transit Commission (TTC) along with
hourly weather data and using this data in machine learning models that we have developed. In our study, we compared the
performance of a Long Short Term Memory Recurrent Neural Network (LSTM) model against four baseline models, an
Artificial Neural Network (ANN), Support Vector Regression (SVR), Autoregressive Integrated Moving Average (ARIMA)
and historical averages. We found that our LSTM model demonstrates the best prediction accuracy. The improved accuracy
achieved by the LSTM model may lend itself to its ability to adjust and update the weights of neurons while accounting for
long-term dependencies. In addition, we found that weather conditions play a significant role in improving the accuracy
of our models. Therefore, we built a prediction model that combines an LSTM model with a Recurrent Neural Network
Model (RNN) that focuses on the weather condition. Our findings also reveal that in nearly 37% of scheduled arrival times,
buses either arrive early or late by a margin of more than 5 min, suggesting room for improvement in the current strategies
employed by transit authorities.
Keywords Intelligent transportation systems· ITS· Traffic flow· Neural networks· GPS locations· Weather conditions
1 Introduction
The importance of modeling and predicting bus arrival times
for public transit has long been recognized (Kumar etal.
2014). Throughout the past decade, much work has been
done to explore means of achieving faster and more reliable
transit systems (Hua etal. 2018). However, public transit
authorities continue to face criticisms from commuters due
to discrepancies between a vehicle’s scheduled and actual
arrival times. These irregularities naturally have a nega-
tive impact on the commuter’s daily life. Commuters may
miss medical appointments, school events, or arrive late for
work. With the availability of large scale pervasive data,
e.g. GPS locations collected from buses, we believe that
machine learning algorithms can help in predicting actual
* Omar Alam
omaralam@trentu.ca
Anshuman Kush
anshumankush@trentu.ca
Ali Emami
ali.emami@mail.mcgill.ca
Parisa Pouladzadeh
Parisa.Pouladzadeh@flemingcollege.ca
1 Trent University, Peterborough, Canada
2 Mila/McGill University, Montreal, Canada
3 Fleming College, Peterborough, Canada
O.Alam et al.
1 3
arrival times for public transit buses, and assist in strategies
to overcome their discrepencies with scheduled times.
This paper aims at modelling the irregularities in arrival
times for public transit buses using historical bus arrival
times, stop locations, bus schedules, and weather data. Irreg-
ularities can be considered to occur in one of two ways, leads
(early arrival at a stop) and delays (late arrival at a stop). We
focused on predicting irregularities for transit buses for the
City of Toronto, where irregularities in bus arrival times are
so commonly occurring that Toronto Transit Commission
(TTC) issues notes for commuters who arrive late for work
due to misleading scheduling times (Star 2020).
To reduce irregularities in arrival times, transit authorities
incorporate a variety of strategies to bridge the gap between
actual and scheduled arrival times of buses. Among these
strategies, the holding control strategy is found to be the
most effective to regulate bus operations (Fu and Yang
2002). This strategy seeks to address the phenomenon called
bus headway, which is a large, accumulated arrival lead or
delay in a bus stop that results from a succession of leads or
delays that occurred in previous stops. By holding an early-
arriving bus, a bus headway can be mitigated and service
reliability can be improved (Fu and Yang 2002). Another
strategy is stop-skipping, which is particularly useful when
buses are running late and behind their schedule (Liu etal.
2013). Despite applying these strategies, transit services
continue to face delays in their daily operations, which could
be due to ongoing road constructions, bus breakdown, road
accidents, or other day-to-day factors. Therefore, transit
authorities seek to increase the quality of service by provid-
ing passengers with predicted arrival times at a bus stop
using algorithms that exploit transit data (Hua etal. 2018).
With computational power becoming cheaper and easily
accessible, it is increasingly feasible to use data driven mod-
els for accurately predicting arrival times by leveraging a
large volume of data. These prediction models can assist in
developing intelligent trip planning apps, improved schedul-
ing systems for doctors and other businesses, and improving
urban planning strategies for city authorities.
In this paper, we propose a regression task to test the abil-
ity for machine learning algorithms to predict whether a bus
at a given stop and time will be early, on time, or late based
on the transit and weather data for Toronto Transit Commis-
sion (TTC). The machine learning models that we experi-
ment with include traditional feed-forward artificial neural
network (ANN) and a recurrent neural network (RNN) using
long short term memory (LSTM).
Our contribution can be summarized as follows:
To our knowledge, this is the first work that investigates
the impact of weather data on prediction accuracy for
bus arrival times. We compare the prediction models
with and without weather features. Previous work either
avoided using weather data altogether, e.g. (Kumar etal.
2014) or did not find weather to be a useful feature for
their prediction task (Patnaik etal. 2004).
We used historical arrival times, weather data, and other
input features for arrival time prediction for transit buses.
We found that the LSTM model, a variant of Recurrent
Neural Network that uses long term dependencies, yields
the best predictive performance.
– We found that weather has strong relationship with
arrival time prediction models. In nearly half of our data,
including weather improved the prediction accuracy by
48%. We also found that including the weather data sig-
nificantly improves the accuracy when predicting bus
arrival times at multiple future stops in a trip.
Because of the importance of weather, we built a sepa-
rate RNN model that focuses on the weather feature and
combined its result with the result of the LSTM model.
This combined hybrid model improved the prediction by
more than 500%.
The rest of the paper is organized as follows. The next sec-
tion discusses the related work. Section3 discusses the data
collection. Section4 discusses the machine learning mod-
els that we used. Section5 discusses the results and Sect.6
concludes the paper.
2 Related work
This section discusses related work on bus arrival prediction.
In general, previous work used linear regression (LR) (Hua
etal. 2018), non-parametric regression (NPR) (Chang etal.
2010; Balasubramanian and Rao 2015), or Kalmann Filters
(KFT) (Shalaby and Farhan 2004).
Hua etal. (2018) use linear regression to predict bus
locations. Bus location data displays non-linear relation-
ships between its features. Therefore, data has to be con-
verted into a linear space to be used in conventional math-
ematical models such as linear regression. This requires a
significant amount of data pre-processing and be in turn,
costly and time-consuming. Kormáksson et. al. (2014) use
additive models (non linear regression models) to predict
bus arrival times using General Transit Feed Specification
(GTFS) data. GTFS data is standardized by Google, which
is used to provide schedules and geographic information to
Google Maps and other Google applications that show tran-
sit information. Regression models are easy to interpret and
fast to train. Shalaby and Farhan (2004) use very limited
AVL (Automatic Vehicle Location) and APC (Automatic
Passenger Counter) data on Kalmann Filters (KFT) to pre-
dict the arrival time for Toronto Transit Buses. Their data
size is small (only 5 days of vehicle locations). In our study,
we used 3.5 million data points which were collected over a
Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…
1 3
period of four months. We use large datasets for predicting
arrival times using machine learning algorithms. Wang et.
al. (Wang etal. 2019) applies a multi-objective optimization
technique to reduce the capacity allocation of subway sys-
tems based on different factors, e.g. number of passengers,
headway, number of available trains. Their objective is to
reduce the passenger wait time in the subway system. Simi-
larly, passenger travel time was used to predict the arrival
time in subway stations a in study done for the city of Bei-
jing (Xu etal. 2020). We used several input features in our
prediction models, e.g. past arrival time, day of the week,
hour of the day, etc. Liu etal. (2019) studied the optimal
combination of different input features in mass rapid transit
(MRT) systems. However, they did not consider the weather
in their study.
Kumar et. al. (2014) compared Kalmann Filters (1960)
with artificial neural network for bus arrival prediction in
Chennai, India. A key finding of this experiment was that
with a large volume of data, artificial neural network models
give better accuracy as compared to mathematical models
(linear regression and kalmann filters). Wang et. al. (2009)
use a Support Vector Machine (SVM) to model traffic con-
ditions. They used bus arrival times and bus schedules as
inputs to train their model. ANN and kernalized SVM have
gained popularity for predicting travel time because of their
ability to solve complex and non-linear relationships among
features (Chien etal. 2002; Kumar etal. 2014; Jeong and
Rilett 2004). In Hua etal. (2018), the performance of lin-
ear regression, artificial neural networks and support vector
machine models were compared for prediction of bus arriv-
als at a single stop using data from multiple routes. Lin-
ear regression’s performance was poor due to non-linearity
in data, however the performance of ANN and SVM were
quite competitive. These approaches did not use recurrent
neural networks in their predictions. Our work uses LSTM
recurrent neural networks. Moreover, we use weather data
in our prediction, which had not been incorporated in any
way in the previous approaches.
To our knowledge, there has not been an abundance of
work that uses weather data for predicting arrival times for
public transit buses. Yang et. al. (2016) use a combination
of genetic algorithms and support vector machines along
with weather conditions to predict bus arrival time. They
did not use historical arrival times and did not explore
recurrent neural networks in their study. Chen et. al. ( 2004)
used weather condition and automatic passenger counting
data with ANN for bus arrival prediction for New Jersey
county. The previous two studies only relied on weather con-
ditions (i.e. snow, rain, fog) in their models. We consider
other weather attributes, such as visibility and temperature.
Patnaik etal. (2004) used weather data as features for bus
arrival prediction model, however their experiment failed to
show improvement with weather data.
Ke et. al. used a combination of CNN and LSTM Recur-
rent Neural Networks along with weather data for forecasting
short-term passenger demand for ride services (Ke etal. 2017).
In contrast, we use weather data for a different problem, i.e.
to predict arrival times of transit buses. Rui et.al. compared
the performance of a GRU Model (Gated Recurrent Neural
Network) and LSTM model on yet another prediction task
concerning traffic flow prediction (Fu etal. 2016).
3 Dataset collection
We used four datasets to build our models: (1) Live Automatic
Vehicle Locations (AVL) data for Toronto Transit Comission
(TTC) transit buses, collected every 20 s, (2) bus schedules (3)
and bus stop locations retrieved from GTFS (General Transit
Feed Specification) data, (4) hourly weather data collected
from a weather station near downtown Toronto. The AVL data
comprises of GPS locations for Toronto Transit Commission
(TTC) buses. This data is publicly available through the Next-
Bus API (Nextbus 2020). We collected more than 700,000
unique live GPS locations for transit buses for two routes,
Route 28 and Route 8 (Fig.1) for the City of Toronto over 3
months, from January 2018 to March 2018. Figure2 presents
Fig. 1 GPS locations mapped to bus stop location data for TTC
routes. Markers are the GPS coordinates calculated for actual arrival
time at each stop. The top map depicts Route 28, while the bottom
map depicts Route 8
O.Alam et al.
1 3
an overview of our study. Table1 summarizes the datasets
that we used in our study. After collecting the four datasets,
we calculated the arrival time for a bus at each bus stop in the
studied routes. Then, we calculated the difference between the
actual arrival time of a bus at a stop and its scheduled arrival
time in that stop. Based on this difference, we determined if
the bus had arrived early, on time, or was delayed. Then, we
normalized the data from all four datasets and used them as
inputs to our models.
3.1 Estimating actual arrival time
The TTC data does not, in fact, specify whether a bus had
arrived at a stop. The actual bus arrival time of a bus at a stop
is calculated using the distance between the GPS location of
the bus and the bus stop location. This distance is calculated
using the haversine formula (Veness 2018), which is a well-
known formula used to calculate the path distance between
two points on the surface of the earth, and has wide range
of applications, e.g. (Chopde and Nichat 2013; Basyir etal.
2017; Ingole and Nichat 2013). The formula gives the dis-
tances between two points on a sphere using their latitudes and
longitudes while ignoring hills:
In Equation1,
𝜑
is latitude,
𝛬
is longitude. In Eq.2, c is
the angular distance in radians. In Eq.3, R is Earth’s radius
(mean radius = 6371 km) and d is the distance between two
GPS locations in kilometer. Since the real time GPS location
data is collected for every 20 s, we may miss the exact time
when the bus actually arrives at the bus stop. Furthermore,
during a 20 s window, the bus could arrive at a bus stop and
start moving again. In that case, the recorded GPS location
of the bus could be further away from the bus stop.
To mitigate these issues, we identify the GPS location
where the distance between the bus and the bus stop is mini-
mal. We do this by checking whether the bus is close to
the bus stop, where closeness corresponds to the bus being
within 100 m from the stop.
(1)
a=
sin
2(
𝛥𝜑
2
)+
cos 𝜑
1
cos 𝜑
2
sin
2(
𝛥
2
)
(2)
c=
2
atan2
(
a,
1
a
)
(3)
d=Rc
Fig. 2 An overview of bus
arrival prediction
Table 1 Datasets used for our study
Data points
TTC real time data 700,000
GTFS bus stop schedule data 18,110
GTFS bus stop location data 24
Weather data 3624
Algorithm 1: Calculatingthe Actual Arrival Time of aBus
Input: GPSTime: Reported time for the GPS location of the bus, ScheduledTime:
Scheduled arrival timeofthe busatthe stop
Output: ActualTime: Actual arrival timeofabus at a stop
Let d=Distancebetween the bus and the stop using the Harvsinedistance equation;
Let min =
while GP ST ime ScheduledTime +25minutes OR GP ST ime
ScheduledTime 25 minutes do
Calculate d;
if dmin then
min=d
if dis within 100mof the stop then
ActualTime = GPSTimefor the bus of d
Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…
1 3
Figure3 illustrates how we calculate the difference
between the actual and the scheduled arrival times of a
bus at a particular bus stop. Let
bt
denote the GPS location
of the bus, t denote the time when the bus was at location
bt
, and
St
denote the bus stop location. Since we capture
GPS locations for the bus every 20 s, we may encounter a
large number of GPS locations around a particular bus stop
during the scheduled arrival time for the bus.
To determine whether the bus arrives on time, we use
the GPS locations of the bus that are reported within a 50
min window from its scheduled arrival at the bus stop
St
(i.e. 25 min before and 25 min after the scheduled arrival
time). Then, we choose the closest bus location to the bus
stop within that time window, for example,
in Fig.3.
The next step is to check if that GPS location is within the
vicinity of the bus stop (i.e., we check if
is within 100
m distance from the bus stop). Algorithm1 summarizes
this process.
After estimating the actual arrival time of buses at a
particular bus stops, we calculate the difference between
the actual and scheduled arrival times. Equation4 calcu-
lates the actual time difference between scheduled arrival
time and actual arrival time.
In a similar way, Equation 5 calculates the difference
between scheduled arrival time and predicted arrival time.
If the difference is less than zero the bus arrived late and if
the difference is greater than zero the bus arrived early.
After preprocessing the data, we conducted a preliminary
analysis on the collected bus arrival data. We found that in
more than 37% of the time the buses on these routes were
either delayed more than 5 min or arrived early by more than
5 min (see Fig.4). In some cases the delay was more than 20
min. During the period of our study, the scheduled arrival
times did not change, i.e., the schedules did not get updated
by TTC. Therefore, we can consider that our models predict
the arrival times. However, we used the two formulas in Eqs.
(4) and (5) for prediction because we were interested in the
delays and early arrivals.
(4)
Differenceactual =ScheduleArrivaltime ActualArrivaltime
(5)
Differencepredicted =ScheduleArrivaltime PredictedArrivaltime
4 Machine learning models
This section discusses the machine learning models that
we used for predicting the arrival time of buses on selected
routes. In particular, we use regression models to estimate
the amount of time that a given bus deviates from its sched-
ule. Given historical arrival times at a stop s, our models
predict the next arrival time at stop
s+1
.
In our study, we use four baselines to which we compare
our model’s results: SVR, ANN, ARIMA and Historical
Average.
1. Support Vector Regression (SVR): SVR (Drucker
etal. 1997) is an extension of the basic support vector
machines (SVM) (Boser etal. 1992). In linear regression
models, the error rate is minimized, whereas in SVR
models, the error is fit within a certain threshold. The
model that emerges from SVR is the hyperplane that
separates a maximum number of data points.
2. Artificial Neural Network (ANN): (Zhang and Qi 2005):
is a network of interconnected neurons, inspired by stud-
ies of biological nervous systems (Zhang and Qi 2005;
Tan etal. 2005). Neurons are simple information pro-
cessing units. For time-series analysis, inputs to an ANN
model are observations from previous time-steps and the
output corresponds to the predicted observation at the
next time-step (Zhang and Qi 2005). The information
Fig. 3 Calculating actual arrival
of bus at a bus stop
Fig. 4 Distribution of the difference between actual arrival time and
scheduled arrival time, 20% of the buses are delayed more than 5 min
and 17% of the buses arrive early more than 5 min
O.Alam et al.
1 3
received from the input nodes is processed by hidden
layer units along with appropriate activation functions
to determine the output.
3. ARIMA: ARIMA stands for Autoregressive Integrated
Moving Average models. ARIMA is a mature time
series prediction model based on statistics. For time
series data, ARIMA predicts future values of the data
entirely based on the previous data points in the series.
4. Historical average: Historical averages are the mean
arrival time for bus trips. Historical averages are used as
a common reference point to compare the performance
of different machine learning models.
4.1 Long short‑term memory (LSTM) recurrent
neural networks
In Fig.5, we show in a single LSTM cell structure how
LSTM recurrent neural network maintains long term
dependencies.
The LSTM architecture contains series of connected
cells. Each LSTM cell consists of a special unit called a
memory block in the recurrent hidden layer. The memory
blocks have connections that provides necessary informa-
tion to maintain temporal state of the network. LSTM cell
has three gates: Input gate, Output gate and Forget gate.
Input gate control the flow of input information provided
to the LSTM cell. Output cell controls the output flow of
cell activations into the rest of the network. Unlike conven-
tional RNN, LSTM recurrent neural network has a separate
forget gate which makes it more suitable for time-series
analysis. The forget gate decides which information is rel-
evant for the prediction task and removes irrelevant infor-
mation. These gates together provides the overall memory
function for LSTM recurrent neural networks.
Following an iterative process, the LSTM model estab-
lishes a mapping between an input sequence and the irreg-
ularity in arrival time (output) from the training set. Below
are the equations for the LSTM neural network:
(6)
Input Gate it=𝛼(Wxixt+Whiht1+Wci Ct1+bi)
At time interval t,
𝛼
is the element-wise sigmoid function
1
1+exp(−x)
and
tanh
represents the hyperbolic tangent function
exp(x)−exp(−x)
exp(x)+exp)(−x)
.
it
,
ft
and
ot
are the input, forget and output gate states
respectively, and
Ct
is the cell input state.
xt
is input and
bi
,
bf
,
bo
and
bC
are the bias terms.
Wxi
,
Whi
and
Wci
are the weight matrices for the input
gate.
Wxf
,
Whf
and
Wcf
are weight matrices for forget gate.
Wxo
,
Who
and
Wco
are the weight matrices corresponding
to output gate.
Whi
,
Whf
,
Whc
,
Who
are the weight matrices
connecting
ht1
to the three gates.
The current cell state
Ct
is generated by calculating the
weighted sum of the previous cell state and the current cell
state.
The LSTM Recurrent Neural Network has the ability to
remove or add relevant information to the cell state, this is
because cell state is adjusted by input gate and forget gate.
The forget gate layer removes the irrelevant information
from the cell state. It uses
ht1
and
xt
, and outputs a number
between 0 and 1 for each input in the sequence in the pre-
vious cell state
Ct1
. If the number is zero, no information
passes through the gate. If the number is one, all the infor-
mation passes through the forget gate.
Similarly, Input gate decides what new information will
be stored to the cell state. The final output is based on the
cell state of LSTM network. As explained above, the current
cell state depends on the previous cell state. Therefore, the
previous cell state is taken into consideration when updating
the weights of the LSTM cell. This is how LSTM cell is able
to maintain long term dependencies for predictions. LSTM
Recurrent Neural Networks has shown promising results in
solving complex machine learning tasks (Sutskever etal.
2014).
4.2 Recurrent neural networks fortheweather
feature
(7)
Forget Gate ft=
𝛼
(Wxf xt+Whf ht1+Wcf Ct1+bf)
(8)
Cell Input Ct=ftCt1+ittanh(WxCxt+WhC ht1+bC)
(9)
Output Gate ot=𝛼(Wxoxt+Whoht 1+Wco Ct+bo)
(10)
hidden layer output ht=ottanh(Ct),
(11)
InputData
=
T1
t0W1
t0T1
t1W1
t1W1
t2
Tn
t
0
Wn
t
0
Tn
t
1
Wn
t
1
Wn
t
2
Fig. 5 LSTM cell structure
Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…
1 3
Since weather condition has significant impact on the
prediction results, we decided to create a Recurrent Neural
Network (RNN) model that focuses on the weather feature.
The output of this model is combined with the LSTM model
discussed in the previous subsection to increase the accuracy
(12)
TargetData
=
T1
t0
Tn
t
0
of prediction. The RNN model takes as input the arrival
times and weather readings at the current stop and at the
previous stop to predict the arrival time at the next stop when
the weather reading is known.
In this model, a window of three is chosen for arrival
times T and weathers W. Inputs that are shown in Eq.11,
are divided to three categories
X1
(previous bus stop),
X2
(current bus stop) and
X3
(next bus stop). These inputs are
illustrated graphically in Fig.6.
X1
is comprised of
T1
t0
and
W1
t
0
for n samples. Similarly,
X2
is comprised of n arrival
times and weather readings. The third input,
X3
only takes
weather readings. Equation12 shows the predicted arrival
times for the next bus stop. Figure6 illustrates the archi-
tecture of our RNN model. Two hidden layers,
h1
and
h2
are set in the diagram with different matrix sizes. The
inputs go to
h1
with the batch size of 32. After processing
in
h1
, the results will be transferred to
h2
in different for-
mats. In
h2
, the result of processing
X1
will be sent to a
matrix of 1×16 and will be concatenated with the result of
processing
X2
, which is sent to a matrix size of 1x32. Simi-
larly, the output of processing
X3
goes to a matrix of size
1×16 in
h2
. Then all results of
h2
are concatenated together.
Finally, the sigmoid function is applied in the last layer
which provides the arrival time prediction.
4.3 Data preprocessing andnormalization
Table2 summarizes the list of features used in our mod-
els. We have flattened the data, i.e., we augment the bus
trip travelling southbound with the next trip for the same
bus travelling northbound. The first feature (time diff) is
Fig. 6 RNN architecture
Table 2 Features used for
model building Feature name Description
time diff Difference between actual arrival time and scheduled arrival time. This is vari-
able which we are trying to predict
Tag Specifies the direction on which bus is heading
Trip.ID A unique number given to each trip
Stop sequence Assigned sequence numbers starting from 1 to each bus stop in the route
Distance traveled Cumulative distance travelled by the bus to reach the bus stop
routeTag A unique numeric code to identify a particular route on which the bus is traveling
Stop ID A unique numeric code to identify a particular bus stop
Bus ID A unique numeric code to identify a particular bus
Service class Weekday, Saturday and Sunday
Day of the
week Numerical number indicating day of the week (1-Sunday, 2-Monday..etc)
Hour Numerical number indicating hour of the day
Max temperature Maximum temperature in the hour
Min temperature Minimum temperature in the hour
Visibility Visibility in Km, i.e., how far the driver is able to see
Weather condition Weather conditions: rain, snow, fog or haze
O.Alam et al.
1 3
calculated using live GPS locations and the TTC schedules
data as discussed in the previous section. The last four
features are obtained from the weather data. The rest of the
features are obtained from the live bus stop locations data.
Before a machine learning model is trained, all features
are converted into a vector representation (e.g., the cate-
gorical features). There are two ways to convert a categori-
cal feature into a vector representation; one-hot Encoding
and Label Encoding (Tan etal. 2005).
1. One-Hot Encoding: Encodes a categorical features as a
one-hot numeric vector, i.e., it creates binary column for
each category and returns a sparse matrix, where only
the entry at the row representing the category is assigned
a 1, with the remaining entries assigned 0, creating a
sparse vector.
2. Label Encoding: Transforms categorical features to
numerical features by assigning each categorical feature
a unique number which can be normalized before using
it as an input for machine learning model.
We have two categorical features, tag and weather condi-
tions. Tag was converted using one-hot Encoding because it
only has two categories (North and South). Weather condi-
tions was converted using Label Encoding. Other features in
our data do not require encoding because they are continu-
ous variables.
After converting all the features into a vector representa-
tion, data was normalized using the following equation:
In Eq.13,
xi
is the
ith
observation of a feature and
zi
is the
ith
normalized data point.
4.4 Model training inLSTM
The input to each LSTM cells is a 3-dimensional (3D)
matrix. The following discusses briefly each dimension:
1. Sample size: sample size refers to how many rows are
given as an input to the model. In this study we used a
sample size of 32.
2. Time Steps: time step is one point of observation in the
sample. The number of steps determines steps ahead in
time the model will predict. We used one, two, three,
and four time steps in our model.
3. Features: The detailed explanation of each feature used
is discussed in Table2. Our model uses the time diff fea-
ture as a dependent feature (output of the model), which
specifies the difference between scheduled arrival time and
actual arrival time of bus from previous time stamp. We
(13)
z
i=
x
i
min(x)
max(x)−min(x)
use 11 independent features as input to model, Trip.ID,
Tag, Stop.sequence, distance travelled, maximum temper-
ature, minimum temperature, visibility, hour, day of week,
service class, weather conditions as inputs to the model.
A unique property of Neural Networks is that the when
the model adjusts the weights, it can reduce the effect of
the irrelevant features while training by assigning them
low weights. These features can still have a small nega-
tive influence on the model which can decrease its overall
accuracy. Only features which gave us the highest accuracy
were used in the final model. We did an ablation study, by
removing one feature at a time and calculating the error
rate of the model. From Table2, we found that Stop ID
and Bus ID to be insignificant to our model. Therefore,
we excluded them from our model. Other features showed
significant impact on the accuracy of the model.
In our LSTM model architecture, we use 12 input neu-
rons, this represents the number of features (11 independ-
ent features and 1 dependent feature) in our dataset used
for modeling. The number of neurons used in the output
layer is 1 which specifies the difference between predicted
arrival and scheduled arrival times (i.e., delay or early
arrivals) for a bus at a stop. We tried different variations of
LSTM hidden layers and tried different number of LSTM
cells within each layer. For the final model selection, we
choose 1 hidden layer with 100 LSTM cells with ’ReLU’
(Goodfellow etal. 2016) activation function.
When the LSTM model starts training, a sequence of 3D
samples (3D tuple) is given to an LSTM layer. The values
of a sequence are (32, 1, 12). This means, in one iteration,
the model runs 32 samples (batch size), to predict 1 time
step ahead, using 12 input features (11 independent input
features discussed previously and the previous reading for
the dependent feature, i.e. time diff). In the next model
iteration each sample will carry the cell state (weights)
and a forget gate. Forget gate controls how much from the
current cell state is passed to next cell, thus, ensuring that
model can learn longer sequences.
When training neural networks, several decisions need
to be made regarding the choice of hyperparameters used
by the model. We chose the following hyperparameters
for our model:
1. Activation functions: are non-linear mathematical functions
used to combine the output of neurons at one layer for the
next layer. They are important for a neural network model
as they introduce non-linear properties to the model.
We experimented with different activation functions,
such as, linear, sigmoid and ReLU, for our final model
we used ReLU activation function.
Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…
1 3
2. Optimization algorithms: help to minimize (or maxi-
mize) an error function, and they are used to compute
the output in such a way that it is computationally less
expensive and the model converges to the global minima
rather than a local minima. We investigated RMSprop
and ADAM optimizers. For the final model we used
ADAM optimizer.
3. Epochs: specify how many full passes of the data set
(epochs) should be used during training. If we use too
few epochs, we may underfit the model and do not allow
it to learn everything it can from the training data. If we
use too many epochs, we may overfit the model, which
leads to introducing noise to the model.
4. Early stopping: early stopping is a regularization method
used to prevent the model from over-fitting. Early stop-
ping is used to remove the need to manually adjust the
value of epochs while training a model. When the error
rate of the model stops decreasing it automatically stops
model from training. Another method for regularization
is called dropout (Srivastava etal. 2014), we found that
Early Stopping works best for our model.
5. Batch size: batch size is the number of samples that will
be propagated through the network in one iteration. A
batch size can be either less than or equal to the total num-
ber of training samples. Advantages of using a small batch
size is that it requires less memory for training. A small
batch size also reduces the overall training time required
by the model, which important when working with large
datasets because it is not possible to fit all of the data
into memory at once. However, if the batch size is too
small, it can lead to less accurate models because we are
not providing sufficient number of samples to the model,
which leads to less accurate estimate of the output.
Table3 shows different configurations of LSTM model that
we tried in our experiments. For our final model, we used
one hidden layer with 100 cells and one dimensional output
representing the next arrival time. This configuration provides
the best performance for both Route 28 and Route 8.
5 Results/model performance
To measure the performance of our models, we calculate its
Mean Absolute Percentage Error (MAPE) and the Root Mean
Square Error (RMSE) on the testing data. All models were
trained ten times and the average of MAPE and RMSE error
rates were considered as the final value for the models.
The equation for these performance measures are defined
as follows:
Where
yt
is the actual value and
xt
is the predicted value. In
our case,
yt
is the difference of scheduled arrival time and
actual arrival time,
xt
is the difference between scheduled
arrival time and predicted arrival time, and n is the number
of samples.
Table4 shows the MAPE and RMSE values for different
models for Route 28. The LSTM model substantially outper-
formed other models. It shows a 7 fold reduction in MAPE
over historical average. A possible reason that LSTM model
MAPE
=
n
t=1
ytxt
yt
RMSE =
n
t=1(ytxt)2
n
Table 3 Model tuning for
LSTM
Bold values indicate that the final values used in the experiments
Activation Layers Cells Batch size Rote 28 Rote 8
RMSE MAPE RMSE MAPE
ReLU 1 10 32 433.15 0.2 284.77 0.44
ReLU 1 50 32 427.87 0.14 277.97 0.45
ReLU 1 100 32 422.22 0.13 269.49 0.36
Linear 1 50 32 426.56 0.17 283.58 0.55
Linear 1 100 32 425.52 0.16 276.74 0.45
ReLU 1 100 64 426.24 0.23 275.76 0.41
Sigmoid 1 100 32 433.58 0.25 279.62 0.4
ReLU 3 40,80,40 32 427.68 0.31 283.56 0.54
ReLU 3 40,80,40 64 427.77 0.28 283.75 0.54
ReLU 2 40,40 64 431.50 0.2 279.32 0.5
Table 4 Comparison of different models for Route 28
Bold values indicate that the final values used in the experiments
Historical average ARIMA SVR ANN LSTM
MAPE 0.91 0.80 0.68 0.30 0.13
RMSE 477.87 432.69 428.79 427.33 422.2
O.Alam et al.
1 3
performs better than other models is because it may account
more directly to the long term dependencies between input
and output features. LSTM model also was best performing
for Route 8 as shown in Table5.
We observe that the RMSE value for LSTM model is
not substantially lower than the baseline models. RMSE is
sensitive to large outlying errors which occurred in our data,
and performs best when errors follow a normal distribution
(Chai and Draxler 2014). Chai and Draxler (2014) suggest
to remove the outliers that are larger than other errors by
several orders of magnitude. However, we did not need to
remove outliers, i.e., extreme irregularities, because MAPE
clearly showed LSTM model outperforms other models, and
the RMSE value for LSTM model is lower than all other
baseline models. In addition, we were interested to see the
impact of weather on extreme irregularities. In the next sub-
section, we investigate the performance of LSTM model
with and without the weather data.
5.1 Signicance oftheweather data
We investigated the impact of the weather data on the accu-
racy of our prediction models. When we ran our models with
weather data features (i.e., when we included the following
features: maximum temperature, minimum temperature,
visibility and weather conditions), we noticed significant
improvement in the results (see Table6 for Route 28 and
Table7 for Route 8).
Figure7 compares the actual arrival time versus predicted
arrival time with and without using weather data for Route
28. The x-axis shows the ordered observations of bus arriv-
als at stops. As mentioned previously, we augment the bus
trip travelling on a direction with the next trip for the same
bus travelling the opposite direction. This means the x-axis
depicts the arrival of the bus at the first stop, followed by
its arrival at the next stop. When the bus arrives at the last
stop, it returns back on the same route. The next observation
after the last stop would be next arrival of the same bus at
the stop before the last stop. The y-axis is time in seconds.
It can be observed from the plot that the model created with
the weather data has better accuracy than the model that was
created without the weather data. In particular, we notice
that the model that was created using weather data was able
to capture extreme delays and early arrivals better than the
model that was created without the weather data. We notice
similar trend for Route 8 (see Fig.8).
Furthermore, we compared the results of LSTM mod-
els for different portions of the data. We observed that
Table 5 Comparison of different models for Route 8
Bold values indicate that the final values used in the experiments
Historical average ARIMA SVR ANN LSTM
MAPE 0.92 0.84 0.76 0.49 0.36
RMSE 292.38 286.64 279.01 278.69 269.49
Table 6 Comparison of models with and without weather data for
Route 28
Bold values indicate that the final values used in the experiments
LSTM without weather
data
LSTM with weather data
MAPE 0.21 0.13
RMSE 427.02 422.2
Table 7 Comparison of models with and without weather data for
Route 8
Bold values indicate that the final values used in the experiments
LSTM without weather
data
LSTM with weather
data
MAPE 0.43 0.36
RMSE 279.11 269.49
Fig. 7 Model performance of LSTM Model with and without weather data on Route 28
Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…
1 3
for 16% of the data, the model with the weather data has
much higher prediction accuracy when compared to the
model created without the weather data (see Table8). The
model accuracy improves with weather by 310% when we
compare RMSE and by 282% when we compare MAPE.
Table8 clearly demonstrates that weather plays a signifi-
cant impact on the prediction accuracy for nearly half of
the data (49%). We observed similar results for Route 8,
where weather had higher impact (for nearly half of the
data, the model accuracy improved by more than 150%
as shown in Table9). The impact of weather decreases as
we see more data points because additional factors may
also contribute to bus arrival prediction, suggesting that
weather has complex non-linear relationship with bus
arrival times. Examples of these factors are traffic condi-
tions, construction zones, emergency vehicles, number of
passengers which we are planning to explore in future.
However, we will mitigate this issue by modelling weather
and arrival times in a separate RNN model as explained
by end of this section.
To investigate further how much impact an individual
weather feature has on the model, we created three LSTM
models by just removing one feature and keep other features.
Fig. 8 Model performance of LSTM Model with and without weather data on Route 8
Table 8 Difference in RMSE
and MAPE of with and without
the weather data for Route 28
Bold values indicate that the final values used in the experiments
% Data RMSE MAPE
Weather No Weather % Weather No Weather %
16% 10.75 44.12 310% 934.4 282%
33% 20.94 42.74 104% 20.339.5 95%
49% 34.27 47.56 39% 34.150.5 48%
66% 55.63 62.12 11% 47 63.2 34%
82% 116.13 117.72 1.3% 61.58 72.97 18%
Table 9 Difference in RMSE
and MAPE of with and without
the weather data for Route 8
Bold values indicate that the final values used in the experiments
% Data RMSE MAPE
Weather No Weather %RMSE Weather No Weather %MAPE
16% 4.85 36.91 661% 4.534.2 660%
33% 9.25 35.86 288% 9.737 281%
49% 14.32 36.66 156% 16.36 41.5 154%
66% 22.98 37.49 63% 21.448.2 100%
82% 30.41 41.04 35% 39.47 62.08 57%
Table 10 Comparison of models with different features for Route 28
Bold values indicate that the final values used in the experiments
Visibility Weather
conditions
Temperature All weather
features
MAPE 0.17 0.15 0.18 0.13
RMSE 424.76 423.65 425.08 422.2
O.Alam et al.
1 3
The first model removes visibility, the second model removes
weather conditions (rain, snow, haze, fog), and the third model
removes temperature. Table10 shows the comparison of dif-
ferent LSTM models as we remove different features from the
model for Route 28. The MAPE value increases from 0.13 to
0.17 when we remove visibility feature from the model. Simi-
larly, when we keep all the other features except the weather
conditions the MAPE value increases to 0.15. Removing tem-
perature increases the MAPE value to 0.18. Similar observa-
tions were found for Route 8 (see Table11). These results
suggest that all weather features that we use in our models are
important to achieve better prediction accuracy.
5.2 Multi‑stop forecasting models
Apart from comparing different machine learning models,
we also compared the accuracy of the LSTM model in pre-
dicting irregularities for multiple future stops in a trip (i.e.,
predicting the delay/early arrivals for the future arrivals of
the bus after its immediate next scheduled arrival).
We created 4 different models: (
s+1
,
s+2
,
s+3
,
s+4
).
The first model was discussed througout the paper and pre-
dicts one stop ahead in time (i.e., given the historical arrival
times and weather data for stop s, it predicts the irregulari-
ties for the next scheduled bus arrival at the next stop
s+1
).
The second model predicts the irregularities for the bus
arrival at stop
s+2
. Similarly the third and fourth models
predict irregularities for the bus arrival at stop
s+3
and
s+4
, respectively. Figures9 and 10 show the comparison
between the MAPE% errors when predicting irregularities
for multiple stops with and without the weather data.
It is clear from Figs.9 and 10 that the model performance
decreases as we predict for multiple future stops ahead in
time. This is similar to the findings by (Duan etal. 2016),
(Hua etal. 2018) and (Kormáksson etal. 2014)). However,
we found that when weather data was excluded (the dotted
lines), the rate of decrease in prediction accuracy increases
as we predict for more future stops. This suggests that
weather plays a significant role when predicting arrival times
or their irregularities for multiple future stops.
5.3 Modelling weather feature withRNN model
Since the previous experiments clearly established that
weather has a significance influence on the prediction
results, we decided to use this feature in a separate RNN
model and combine the result with the LSTM model (which
also included the weather features as discussed previously).
The final prediction is the average of the two models. The
architecture of the RNN model was discussed in Sect.4.
Our motivation was to investigate whether we can improve
the prediction accuracy if we create a model dedicated to
the weather. We tested and trained the RNN model with dif-
ferent hyper parameters and finally we have tuned the hyper
parameters as follow:
learning rate = 0.001
training epochs = 300
batch size = 32
display step = 1
Table12 compares the performance of this model with the
LSTM model for route 28. The RMSE of our new hybrid
model showed improvement of 562.38% over the LSTM
model for route 28 for 82% of the data. For route 8, the
improvement was 873.85% as shown in Table13. We also
noticed that the accuracy does not decrease when we add
more data to the model, contrast to the findings in Sect.5.1.
This could be because the RNN model focuses on the
weather features, while the LSTM model includes other
Table 11 Comparison of models with different weather features for
Route 8
Visibility Weather
conditions
Temperature All weather
features
MAPE 0.42 0.40 0.42 0.36
RMSE 278.30 278.43 281.51 269.49
Fig. 9 Prediction accuracy with and without weather features for mul-
tiple stops for Route 28
Fig. 10 Prediction accuracy with and without weather features for
multiple stops for Route 8
Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…
1 3
features along with the weather. In other words, in small
portion of the data, weather condition played a significant
role in improving the prediction results in the LSTM model.
However, when a separate RNN model is used for weather,
its role to improve accuracy included larger segments of the
data.
6 Conclusion
Nowadays, complex machine learning algorithms can be
applied quickly over large datasets, thanks to the advances
in the area of big data analytics. This paper investigates
different prediction model for irregularities in bus arrival
times, using machine learning algorithms. In particular,
we built Long Short-Term Memory Recurrent Neural
Network models to predict the next arrival time for a bus
at a particular stop. Our prediction models use historical
bus arrival data, i.e. real time GPS locations for Toronto
Transit buses, bus schedules obtained from a Google API,
and weather condition data obtained from a weather sta-
tion in Toronto. Our analysis show that Toronto transit
buses experience significant irregularities in arrival times.
In nearly 37% of times, transit buses are either delayed or
arrive early by more than 5 min, showing great room for
improvement. To our knowledge, this is the first work to
investigate the impact of weather on bus arrival prediction.
We found that weather plays a significant role improv-
ing prediction accuracy. Therefore, we built a prediction
model that combines two machine learning models: an
LSTM model that focuses on a range of input features,
e.g. arrival times and hour of the day, and an RNN model
which focuses on the weather features. We also investi-
gated prediction accuracy for multiple scheduled arrival
of buses ahead in time using weather data. In future, we
plan collect more data in order to run our experiments over
the entire year. Our current study covers the Winter season
and the beginning of the Spring season in Toronto. We
plan to extend our study to cover all weather seasons. In
addition, we plan to extend our work on bus arrival predic-
tion by using machine learning algorithms with additional
datasets, such as passenger count and traffic condition.
Furthermore, we plan to use different RNN extensions,
such as the Gated Recurrent Unit (GRU) (Cho etal. 2014;
Che etal. 2016).
References
Balasubramanian P, Rao KR (2015) An adaptive long-term bus
arrival time prediction model with cyclic variations. J Public
Transport 18:1–18. https ://doi.org/10.5038/2375-0901.18.1.6
Basyir M, Nasir M, Suryati S, Mellyssa W (2017) Determination of
nearest emergency service office using haversine formula based
on android platform. EMITTER Int J Eng Technol 5(2):270–278
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm
for optimal margin classifiers. In: Proceedings of the fifth
annual workshop on computational learning theory, ACM,
New York, NY, USA, COLT ’92, pp 144–152. https ://doi.
org/10.1145/13038 5.13040 1
Chai T, Draxler RR (2014) Root mean square error (rmse) or mean
absolute error (mae)? arguments against avoiding rmse in the
literature. Geosci Model Dev 7(3):1247–1250. https ://doi.
org/10.5194/gmd-7-1247-2014
Chang H, Park D, Lee S, Lee H, Baek S (2010) Dynamic multi-
interval bus travel time prediction using bus transit data. Trans-
portmetrica 6(1):19–38
Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2016) Recurrent
neural networks for multivariate time series with missing val-
ues. Sci Rep. https ://doi.org/10.1038/s4159 8-018-24271 -9
Chen M, Liu X, Xia J, Chien SIJ (2004) A dynamic bus-arrival
time prediction model based on apc data. Comput Aided
Civ Infrastruct Eng 19:364–376. https ://doi.org/10.111
1/j.1467-8667.2004.00363 .x
Chien SIJ, Ding Y, Wei C (2002) Dynamic bus arrival time prediction
with artificial neural networks. J Transport Eng 128(5):429–438.
https ://doi.org/10.1061/(ASCE)0733-947X(2002)128:5(429)
Cho K, van Merriënboer B, Gülçehre Ç, Bahdanau D, Bougares F,
Schwenk H, Bengio Y (2014) Learning phrase representations
using rnn encoder–decoder for statistical machine translation.
In: Proceedings of the 2014 Conference on Empirical Methods
in Natural Language Processing (EMNLP), Association for
Computational Linguistics, Doha, Qatar, pp 1724–1734. http://
www.aclwe b.org/antho logy/D14-1179
Chopde NR, Nichat MK (2013) Landmark based shortest path detec-
tion by using a* and haversine formula. Int J Innov Res Comput
Commun Eng 1(2):298–302
Drucker H, Burges CJC, Kaufman L, Smola AJ, Vapnik V (1997)
Support vector regression machines. In: Mozer MC, Jordan
MI, Petsche T (eds) Advances in neural information processing
Table 12 Difference in RMSE for the LSTM model and our
LSTM+RNN (weather) model for Route 28
Bold values indicate that the final values used in the experiments
RMSE
% Data LSTM+RNN LSTM %
49% 12.97 34.27 264.23%
66% 18.55 55.63 299.90%
82% 20.65 116.13 562.38%
Table 13 Difference in RMSE for the LSTM model and our
LSTM+RNN (weather) model for Route 8
Bold values indicate that the final values used in the experiments
RMSE
% Data LSTM+RNN LSTM %
49% 7.28 14.32 196.71%
66% 5.84 22.98 393.50%
82% 3.48 30.41 873.85%
O.Alam et al.
1 3
systems 9, MIT Press, Cambridge, pp 155–161, http://paper
s.nips.cc/paper /1238-suppo rt-vecto r-regre ssion -machi nes.pdf
Duan Y, Lv Y, Wang FY (2016) Travel time prediction with lstm
neural network. In: 2016 IEEE 19th international conference on
intelligent transportation systems (ITSC), pp 1053–1058
Fu L, Yang X (2002) Design and implementation of bus–holding
control strategies with real-time information. Transp Res Rec
1791(1):6–12
Fu R, Zhang Z, Li L (2016) Using lstm and gru neural network
methods for traffic flow prediction. pp 324–328. https ://doi.
org/10.1109/YAC.2016.78049 12
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. The
MIT Press, Cambridge
Hua X, Wang W, Wang Y, Ren M (2018) Bus arrival time prediction
using mixed multi-route arrival time data at previous stop. Trans-
port 33(2):543–554
Ingole P, Nichat MMK (2013) Landmark based shortest path detec-
tion by using dijkestra algorithm and haversine formula. Int J
Eng Res Appl (IJERA) 3(3):162–165
Jeong R, Rilett R (2004) Bus arrival time prediction using artificial
neural network model. In: Proceedings. The 7th international
IEEE conference on intelligent transportation systems (IEEE
Cat. No.04TH8749), pp 988–993. https ://doi.org/10.1109/
ITSC.2004.13990 41
Kalman RE (1960) A new approach to linear filtering and prediction
problems. Trans ASME J Basic Eng 82(Series D):35–45
Ke J, Zheng H, Yang HXC (2017) Short-term forecasting of passenger
demand under on-demand ride services: a spatio-temporal deep
learning approach. Transport Res Part C Emerg Technol. https ://
doi.org/10.1016/j.trc.2017.10.016
Kormáksson M, Barbosa L, Vieira MR, Zadrozny B (2014) Bus travel
time predictions using additive models. In: 2014 IEEE inter-
national conference on data mining, pp 875–880. https ://doi.
org/10.1109/ICDM.2014.107
Kumar V, Kumar BA, Vanajakshi L, Subramanian SC (2014) Com-
parison of model based and machine learning approaches for 1
bus arrival time prediction. Transportation Research Board 93rd
Annual Meeting. http://docs.trb.org/prp/14-2518.pdf
Liu L, Chen RC, Zhao Q, Zhu S (2019) Applying a multistage of input
feature combination to random forest for improving mrt passenger
flow prediction. J Ambient Intell Hum Comput 10(11):4515–4532
Liu Z, Yan Y, Qu X, Zhang Y (2013) Bus stop-skipping scheme with
random travel time. Transport Res Part C Emerg Technol 35:46–
56. https ://doi.org/10.1016/j.trc.2013.06.004
Nextbus Nexbus public feed. https ://www.nextb us.com/xmlFe edDoc s/
NextB usXML Feed.pdf. Accessed 2020
Patnaik J, Chien S, Bladikas A (2004) Estimation of bus arrival times
using APC data. J Public Transp 7(1):1
Shalaby A, Farhan A (2004) Prediction model of bus arrival and depar-
ture times using avl and apc data. J Public Transport 7(1):41–61.
https ://doi.org/10.5038/2375-0901.7.1.3
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R
(2014) Dropout: a simple way to prevent neural networks from
overfitting. J Mach Learn Res 15(1):1929–1958. http://dl.acm.
org/citat ion.cfm?id=26274 35.26703 13
Star TT (2020) Ttc gives notes for affected customers arriving late
for work. https ://www.thest ar.com/news/gta/2017/12/01/late-for-
work-the-ttc-can-give-you-a-note-for-that.html. Accessed 2020
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learn-
ing with neural networks. In: Ghahramani Z, Welling M, Cor-
tes C, Lawrence ND, Weinberger KQ (eds) Advances in neural
information processing systems 27, Curran Associates, Inc., pp
3104–3112. http://paper s.nips.cc/paper /5346-seque nce-to-seque
nce-learn ing-with-neura l-netwo rks.pdf
Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining,
1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston
Veness C (2018) Movable type scripts: calculate distance, bearing and
more between latitude/longitude points. URL:https ://www.movab
le-type.co.uk/scrip ts/latlo ng.html
Wang B, Huang J, Xu J (2019) Capacity optimization and allocation of
an urban rail transit network based on multi-source data. J Ambi-
ent Intell Hum Comput 10(1):373–383
Wang J, Chen X, Guo S (2009) Bus travel time prediction model with
v-support vector regression. In: 2009 12th International IEEE con-
ference on intelligent transportation systems, pp 1–6
Xu J, Wu Y, Jia L, Qin Y (2020) A reckoning algorithm for the predic-
tion of arriving passengers for subway station networks. J Ambient
Intell Hum Comput 11(2):845–864
Yang M, Chen C, Wang L, Yan X, Zhou L (2016) Bus arrival time
prediction using support vector machine with genetic algorithm.
Neural Netw World 26:205–217. https ://doi.org/10.14311 /
NNW.2016.26.011
Zhang P, Qi M (2005) Neural network forecasting for seasonal and
trend time series. Eur J Oper Res 160:501–514. https ://doi.
org/10.1016/j.ejor.2003.08.037
Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
... In terms of methodologies, with the increasing availability and complexity of data, more complex nonlinear models from conventional machine learning (Huang et al. 2021;Yu et al. 2018;Chen 2018), deep learning (Jin et al. 2022He et al. 2018;Petersen et al. 2019), and hybrid methods (Alam et al. 2021;Xie et al. 2021) have become increasingly popular. Compared to probabilistic and statistical models (Hans et al. 2015;Wepulanon et al. 2018), the above models provide more accurate predictions. ...
... Among these variables, the number of stops, distance, scheduled departure time, route identifier, week number, area type, and day of the week are ranked as the most important. Taking all of these factors into account is important to enhance the accuracy of bus state prediction (Alam et al. 2021). ...
Article
Full-text available
To effectively manage and control public transport operations, understanding the various factors that impact bus arrival delays is crucial. However, limited research has focused on a comprehensive analysis of bus delay factors, often relying on single-step delay prediction models that are unable to account for the heterogeneous impacts of spatiotemporal factors along the bus route. To analyze the heterogeneous impact of bus arrival delay factors, the paper proposes a set of regression equations conditional on the bus location. A seemingly unrelated regression equation (SURE) model is developed to estimate the regression coefficients, accounting for potential correlations between regression residuals caused by shared unobserved factors among equations. The model is validated using bus operations data from Stockholm, Sweden. The results highlight the importance of developing stop-specific bus arrival delay models to understand the heterogeneous impact of explanatory variables. The significant factors impacting bus arrival delays are primarily associated with bus operations, such as delays at consecutive upstream stops, dwell time, scheduled travel time, recurrent congestion, and current traffic conditions. Factors like the calendar and weather have significant but marginal impacts on arrival delays. The study suggests that different bus operating management strategies, such as schedule adjustments, route optimization, and real-time monitoring and control, should be tailored to the characteristics of stop sections since the impacts of these factors vary depending on the stop location.
... Larsen et al. [34] employed an NN to predict the travel times of buses using open real-time data derived from the Sao Paulo City bus fleet location, real-time traffic data, and traffic forecast from Google Maps. Alam et al. [35] used a Recurrent NN (RNN) architecture to predict the ETA irregularities by exploring live AVL data from buses, provided by the Toronto Transit Commission, along with schedules retrieved from GTFS and weather data. Chondrodima et al. [36] addressed the challenge of predicting public transport ETA using General Transit Feed Specification (GTFS) data. ...
Article
Full-text available
In maritime logistics, accurately predicting the Estimated Time of Arrival (ETA) of vessels is pivotal for optimizing port operations and the global supply chain. This study proposes a machine learning method for predicting ETA, drawing on historical Automatic Identification System (AIS) data spanning 2018 to 2020. The proposed framework includes a preprocessing module for extracting, transforming, and applying feature engineering to raw AIS data, alongside a modeling module that employs an XGBoost model to accurately estimate vessel travel times. The framework's efficacy was validated using AIS data from the Port of Houston, and the results indicate that the model can estimate travel times with a Mean Absolute Percentage Error (MAPE) of just 5%. Moreover, the model retains consistent accuracy in a simplified form, pointing towards the potential for reduced complexity and increased generalizability in maritime ETA predictions.
... The general conclusion of the above-outlined studies is that the use of machine learning schemes (remarkably ANNs and SVMs) noticeably improves the arrival time estimation of buses when compared to using the (simpler) average travel time between bus stops. Nevertheless, the proposed schemes usually employ either only historical pass-time data or real-time GPS information (Alam et al. 2021;Taparia and Brady 2021). It is important to remark again that the above-presented procedures have been developed mainly from the perspective of the transportation service provider which usually has access to all the available information concerning the service. ...
... Previous studies have utilized the delivery of forecasted results through notifications in various domains. For example, messages sent via notifications to feed animals and manage dairy cows [22], notifying farmers about preventive actions and disinfectant spraying in agriculture [32], weather forecasting notifications [25,32,67,68], malaria notifications [30,31,69,70], honeybee activity with an alarm [71], bus delay notifications [72], and air quality notifications [73][74][75] as predictive information or utilize notifications to collect preliminary data on air quality [76]. However, most researchers primarily focused on consent to forecasting, considering notifications merely as information regarding the forecasted results. ...
Article
Full-text available
In this paper, we propose the notification optimization method by providing multiple alternative times as a reminder for a forecasted activity with and without probabilistic considerations for the activity that needs to be completed and needs notification. It is important to consider various factors when sending notifications to people after obtaining the results of the forecasted activity. We should not send notifications only when we have forecasted results because future daily activities are unpredictable. Therefore, it is important to strike a balance between providing useful reminders and avoiding excessive interruptions, especially for low probabilities of forecasted activity. Our study investigates the impact of the low probability of forecasted activity and optimizes the notification time with reinforcement learning. We also show the gaps between forecasted activities that are useful for self-improvement by people for the balance of important tasks, such as tasks completed as planned and additional tasks to be completed. For evaluation, we utilize two datasets: the existing dataset and data we collected in the field with the technology we have developed. In the data collection, we have 23 activities from six participants. To evaluate the effectiveness of these approaches, we assess the percentage of positive responses, user response rate, and response duration as performance criteria. Our proposed method provides a more effective way to optimize notifications. By incorporating the probability level of activity that needs to be done and needs notification into the state, we achieve a better response rate than the baseline, with the advantage of reaching 27.15%, as well as than the other criteria, which are also improved by using probability.
... Since we could not find practical applications of Machine Learning in the same context of our study, we analyzed the use of ML approaches to solve similar problems in order to select the most suitable approach for our model. In this sense, the study of Alam et al. [34] tested the use of Artificial Neural Neutwork (ANN) to construct a flexible environment in transit buses, but their research did not address the classification of events and prediction of arrival behaviour. Also, Gradient Boosting Machine have being implemented to predict the estimated time of arrival in the aircraft industry, but the performance of the method was considered less efficient than traditional prediction methods in the study of [35]. ...
Conference Paper
In public transit (PT) planning, passenger demand forecasting is an important process to periodically update operation management and planning infrastructure in the future. In the past, many researchers considered passenger demand forecasting a fundamental need for transportation planning and developed forecasting models based on statistical methods and Artificial Intelligence (AI). To increase the precision of the model, spatial and temporal attributes that influence the passenger movement at the station level, corridor level, and network level, are need to be considered. Hence, in this study, a detailed literature review is carried out to understand the pros and cons of various methods used in passenger demand forecasting and how distinctively spatial and temporal attributes are used in the development of the models. External factors like weather and events are also considered by the researchers in the development of the model. In the end, what are the challenges in the PT passenger demand forecasting are discussed and directions for future research are given.
Article
Full-text available
In today’s fast-paced world, efficient and reliable public transportation systems are crucial for optimising time and reducing carbon dioxide emissions. However, developing countries face numerous challenges in their public transportation networks, including infrequent services, delays, inaccurate and unreliable arrival times, long waiting time, and limited real-time information available to the users. GPS-based systems have been widely used for fleet management, but they can be a significant infrastructure investment for smaller operators in developing countries. The accuracy of the GPS location can be easily affected by the weather condition and GPS signals are susceptible to spoofing attacks. When the GPS device is faulty, the entire location traces will be unavailable. This paper proposes the use of Internet-of-Things (IoT)-enabled Bluetooth Low Energy (BLE) systems as an alternative approach to fleet tracking for public bus service. The proposed approach offers simplicity and easy implementation for bus operators by deploying BLE proximity beacons on buses to track their journeys, with detection devices using Raspberry Pi (RPi) Zero strategically placed at terminals and selected stops. When the bus approaches and stops at the bus stops, the BLE advertisements emitted by the proximity beacons can be reliably detected by the RPi Zero. Experiment results show that the BLE signals can be detected up to 20 m in range when the RPi Zero is placed inside a metal enclosure. The location of the bus is then sent to the cloud to estimate the arrival times. A field trial of the proposed IoT-based BLE proximity sensing system involving two public bus services in southern Malaysian cities, namely, Johor Bahru , Iskandar Puteri and Kulai is presented. Based on the data collected, a bus arrival time estimation algorithm is designed. Our analysis shows that there was a 5–10 min reduction in journey time on public holidays as compared to a normal day. Overall, the paper emphasises the importance of addressing public transportation challenges. It also describes the challenges, experience, and mitigation drawn from the deployment of this real-world use case, demonstrating the feasibility and reliability of IoT-based proximity sensing as an alternative approach to tracking public bus services.
Article
Accurate forecasting of bus travel time and its uncertainty is critical to service quality and operation of transit systems: it can help passengers make informed decisions on departure time, route choice, and even transport mode choice, and it also support transit operators on tasks such as crew/vehicle scheduling and timetabling. However, most existing approaches in bus travel time forecasting are based on deterministic models that provide only point estimation. To this end, we develop in this paper a Bayesian probabilistic model for forecasting bus travel time and estimated time of arrival (ETA). To characterize the strong dependencies/interactions between consecutive buses, we concatenate the link travel time vectors and the headway vector from a pair of two adjacent buses as a new augmented variable and model it with a mixture of constrained multivariate Gaussian distributions. This approach can naturally capture the interactions between adjacent buses (e.g., correlated speed and smooth variation of headway), handle missing values in data, and depict the multimodality in bus travel time distributions. Next, we assume different periods in a day share the same set of Gaussian components, and we use time-varying mixing coefficients to characterize the systematic temporal variations in bus operation. For model inference, we develop an efficient Markov chain Monte Carlo (MCMC) algorithm to obtain the posterior distributions of model parameters and make probabilistic forecasting. We test the proposed model using the data from two bus lines in Guangzhou, China. Results show that our approach significantly outperforms baseline models that overlook bus-to-bus interactions, in terms of both predictive means and distributions. Besides forecasting, the parameters of the proposed model contain rich information for understanding/improving the bus service, for example, analyzing link travel time and headway correlation using covariance matrices and understanding time-varying patterns of bus fleet operation from the mixing coefficients. Funding: This research is supported in part by the Fonds de Recherche du Quebec-Societe et Culture (FRQSC) under the NSFC-FRQSC Research Program on Smart Cities and Big Data, the Canadian Statistical Sciences Institute (CANSSI) Collaborative Research Teams grants, and the Natural Sciences and Engineering Research Council (NSERC) of Canada. X. Chen acknowledges funding support from the China Scholarship Council (CSC). Supplemental Material: The e-companion is available at https://doi.org/10.1287/trsc.2022.0214 .
Article
Full-text available
Emergency Reporting Application is an android-based application that serves to help the community in reporting the emergency condition. This application allows users to choose and contact the emergency services office, without the need to notice their position and phone number. Selection of emergency services office is also automatically selected by the system by taking into account the distance between the complainant and the emergency services office. The selected emergency services office is the nearest emergency service office from the complainant so that the delay in coming assistance can be minimized. Therefore, this proposed application requires a GPS feature to recording, reporting and SMS positioning for message delivery of reports. The distance between the position of the complainant and the position of the emergency service office, in the form of latitude and longitude data, is requested using the Haversine formula taking into account the degree of curvature of the earth. Emergency service offices include police and hospital offices spread over 25 different districts. Furthermore, the reporter's position calculation results were compared with all selected emergency service offices and obtained 1 nearest emergency service office. Calculating the accuracy and delay value of the system will do system testing. Accuracy test results using the method of 100% Haversine and the average delay of the system is 4.5 seconds.
Article
Full-text available
Knowing the volume of arriving passengers (APs) is fundamental for optimizing their paths through subway stations and evacuating them under emergency conditions. To predict AP volume online, we first analyze arrival and departure parameters and discuss the relationships among various parameters to determine the train a passenger will most likely take. Interconnecting stations and transfer paths among stations are considered direct connections and direct transfer connections, respectively, to define and construct traveling route sets. Then, travel time chains (TTCs) of transfer and nontransfer passengers are constructed to illustrate the possible routes and time costs between the origin and destination (O/D) stations of passengers. Furthermore, based on TTCs, train capacities and the inbound and outbound times of passengers accessed from an automated fare collection system, we predict the AP volumes at specified stations using a stage-by-stage reckoning algorithm in real time. Finally, to validate the model and the algorithm, we estimate the AP volume for the Beijing Subway network.
Article
Full-text available
As one of the main public transport systems all over the world, mass rapid transit (MRT) is widely served in the metropolitan areas. To meet the increasing travel demands in the future, accurately predicting MRT passenger flow is becoming more and more urgent and crucial. This paper aims to use an experimental way to objectively quantify and analyze the impacts of various combinations of traditional input features to improve the accuracy of MRT passenger flow prediction. We have built a series of passenger flow prediction models with different input features using a random forest approach. The features of passenger flow direction, temporal date, national holiday, lunar calendar date, previous average hourly passenger flow, and previous k-step hourly passenger flow and their trends are selected and applied in a multi-stage of the input feature combination. The typical encoding strategies of the input features have been further discussed and implemented. Finally, the optimal combination of the input features has been proposed with a case study at Taipei Main Station. The experimental results show that the proposed optimal combination of the input features and their appropriate codes can be helpful to improve the accuracy of passenger flow prediction, not only for the prediction results on weekdays and weekends, but also for them on national holidays.
Article
Full-text available
This study establishes a multi-objective optimization model for the capacity allocation of an urban rail transit network based on multi-source data on the Beijing metro passenger flow. The model considers the operating costs of trains and the expenses related to the waiting time of transferring passengers. The model constraints include the distribution characteristics of passenger flow, headway, load factor, and available trains. The capacity allocation scheme for 16 railway lines was obtained by adopting a model of the Beijing rail transit network and its passenger flow. We also analyzed the frequency at which Line 4 is reduced from 55 to 45 and the frequency at which Line 10 is reduced from 60 to 52 if a short turn is adopted. In addition, when the upper limit of the load factor increased from 60 to 70%, the operational costs were reduced by 4.6%, while the total passenger waiting time increased by 1%. The transfer costs changed the capacity optimization and allocation scheme, and the proportion of the transfer cost among the total costs increased when the time value increased.
Article
Full-text available
Short-term passenger demand forecasting is of great importance to the on-demand ride service platform, which can incentivize vacant cars moving from over-supply regions to over-demand regions. The spatial dependences, temporal dependences, and exogenous dependences need to be considered simultaneously, however, which makes short-term passenger demand forecasting challenging. We propose a novel deep learning (DL) approach, named the fusion convolutional long short-term memory network (FCL-Net), to address these three dependences within one end-to-end learning architecture. The model is stacked and fused by multiple convolutional long short-term memory (LSTM) layers, standard LSTM layers, and convolutional layers. The fusion of convolutional techniques and the LSTM network enables the proposed DL approach to better capture the spatio-temporal characteristics and correlations of explanatory variables. A tailored spatially aggregated random forest is employed to rank the importance of the explanatory variables. The ranking is then used for feature selection. The proposed DL approach is applied to the short-term forecasting of passenger demand under an on-demand ride service platform in Hangzhou, China. Experimental results, validated on real-world data provided by DiDi Chuxing, show that the FCL-Net achieves better predictive performance than traditional approaches including both classical time-series prediction models and neural network based algorithms (e.g., artificial neural network and LSTM). This paper is one of the first DL studies to forecast the short-term passenger demand of an on-demand ride service platform by examining the spatio-temporal correlations.
Article
Full-text available
The primary objective of this paper is to develop models to predict bus arrival time at a target stop using actual multi-route bus arrival time data from previous stop as inputs. In order to mix and fully utilize the multiple routes bus arrival time data, the weighted average travel time and three Forgetting Factor Functions (FFFs) – F1, F2 and F3 – are introduced. Based on different combinations of input variables, five prediction models are proposed. Three widely used algorithms, i.e. Support Vector Machine (SVM), Artificial Neutral Network (ANN) and Linear Regression (LR), are tested to find the best for arrival time prediction. Bus location data of 11 road segments from Yichun (China), covering 12 bus stops and 16 routes, are collected to evaluate the performance of the proposed approaches. The results show that the newly introduced parameters, the weighted average travel time, can significantly improve the prediction accuracy: the prediction errors reduce by around 20%. The algorithm comparison demonstrates that the SVM and ANN outperform the LR. The FFFs can also affect the performance errors: F1 is more suitable for ANN algorithm, while F3 is better for SVM and LR algorithms. Besides, the virtual road concept in this paper can slightly improve the prediction accuracy and halve the time cost of predicted arrival time calculation. First published online 02 May 2017
Conference Paper
Full-text available
Accurate and real-time traffic flow prediction is important in Intelligent Transportation System (ITS), especially for traffic control. Existing models such as ARMA, ARIMA are mainly linear models and cannot describe the stochastic and nonlinear nature of traffic flow. In recent years, deep-learning-based methods have been applied as novel alternatives for traffic flow prediction. However, which kind of deep neural networks is the most appropriate model for traffic flow prediction remains unsolved. In this paper, we use Long Short Term Memory (LSTM) and Gated Recurrent Units (GRU) neural network (NN) methods to predict short-term traffic flow, and experiments demonstrate that Recurrent Neural Network (RNN) based deep learning methods such as LSTM and GRU perform better than auto regressive integrated moving average (ARIMA) model. To the best of our knowledge, this is the first time that GRU is applied to traffic flow prediction.
Conference Paper
Travel time is one of the key concerns among travelers before starting a trip and also an important indicator of traffic conditions. However, travel time acquisition is time delayed and the pattern of travel time is usually irregular. In this paper, we explore a deep learning model, the LSTM neural network model, for travel time prediction. By employing the travel time data provided by Highways England, we construct 66 series prediction LSTM neural networks for the 66 links in the data set. Through model training and validation, we obtain the optimal structure within the setting range for each link. Then we predict multi-step ahead travel times for each link on the test set. Evaluation results show that the 1-step ahead travel time prediction error is relatively small, the median of mean relative error for the 66 links in the experiments is 7.0% on the test set. Deep learning models considering sequence relation are promising in traffic series data prediction.
Article
Accurate prediction of bus arrival time is of great significance to improve passenger satisfaction and bus attraction. This paper presents the prediction model of bus arrival time based on Support Vector Machine with genetic algorithm (GA-SVM). The character of the time period, the length of road, the weather, the bus speed and the rate of road usage are adopted as input vectors in Support Vector Machine (SVM), and the genetic algorithm search algorithm is combined to find the best parameters. Finally, the data from Bus No. 249 in Shenyang, China are used to check the model. The experimental results show that the forecasting model is superior to the traditional SVM model and the Artificial Neural Network (ANN) model in terms of the same data, and is of higher accuracy, which verified the feasibility of the model to predict the bus arrival time.