ArticlePDF Available

Predicting Irregularities in Arrival Times for Transit Buses with Recurrent Neural Networks Using GPS Coordinates and Weather Data

July 2021
Journal of Ambient Intelligence and Humanized Computing 12(2)

July 2021
12(2)

Authors:

Trent University

Intelligent transportation systems (ITS) play an important role in the quality of life of citizens in any metropolitan city. Despite various policies and strategies incorporated to increase the reliability and quality of service, public transportation authorities continue to face criticism from commuters largely due to irregularities in bus arrival times, most notably manifested in early or late arrivals. Due to these irregularities, commuters may miss important appointments, wait for too long at the bus stop, or arrive late for work. Therefore, accurate prediction models are needed to build better customer service solutions for transit systems, e.g. building accurate mobile apps for trip planning or sending bus delay/cancel notifications. Prediction models will also help in developing better appointment scheduling systems for doctors, dentists, and other businesses to take into account transit bus delays for their clients. In this paper, we seek to predict the occurrence of arrival time irregularities by mining GPS coordinates of transit buses provided by the Toronto Transit Commission (TTC) along with hourly weather data and using this data in machine learning models that we have developed. In our study, we compared the performance of a Long Short Term Memory Recurrent Neural Network (LSTM) model against four baseline models, an Artificial Neural Network (ANN), Support Vector Regression (SVR), Autoregressive Integrated Moving Average (ARIMA) and historical averages. We found that our LSTM model demonstrates the best prediction accuracy. The improved accuracy achieved by the LSTM model may lend itself to its ability to adjust and update the weights of neurons while accounting for long-term dependencies. In addition, we found that weather conditions play a significant role in improving the accuracy of

GPS locations mapped to bus stop location data for TTC routes. Markers are the GPS coordinates calculated for actual arrival time at each stop. The top map depicts Route 28, while the bottom map depicts Route 8.

…

An Overview of Bus Arrival Prediction

…

Calculating actual arrival of bus at a bus stop

Figures - uploaded by Omar Alam

Content may be subject to copyright.

Content uploaded by Omar Alam

Content may be subject to copyright.

Content uploaded by Omar Alam

Content may be subject to copyright.

Content uploaded by Omar Alam

Content may be subject to copyright.

Vol.:(0123456789)

1 3

Journal of Ambient Intelligence and Humanized Computing

https://doi.org/10.1007/s12652-020-02507-9

ORIGINAL RESEARCH

Predicting irregularities inarrival timesfortransit buses withrecurrent

neural networks using GPS coordinates andweather data

OmarAlam1 · AnshumanKush1· AliEmami2· ParisaPouladzadeh3

Received: 26 March 2020 / Accepted: 27 August 2020

Abstract

Intelligent transportation systems (ITS) play an important role in the quality of life of citizens in any metropolitan city.

Despite various policies and strategies incorporated to increase the reliability and quality of service, public transportation

authorities continue to face criticism from commuters largely due to irregularities in bus arrival times, most notably mani-

fested in early or late arrivals. Due to these irregularities, commuters may miss important appointments, wait for too long

at the bus stop, or arrive late for work. Therefore, accurate prediction models are needed to build better customer service

solutions for transit systems, e.g. building accurate mobile apps for trip planning or sending bus delay/cancel notiﬁcations.

Prediction models will also help in developing better appointment scheduling systems for doctors, dentists, and other busi-

nesses to take into account transit bus delays for their clients. In this paper, we seek to predict the occurrence of arrival time

irregularities by mining GPS coordinates of transit buses provided by the Toronto Transit Commission (TTC) along with

hourly weather data and using this data in machine learning models that we have developed. In our study, we compared the

performance of a Long Short Term Memory Recurrent Neural Network (LSTM) model against four baseline models, an

Artiﬁcial Neural Network (ANN), Support Vector Regression (SVR), Autoregressive Integrated Moving Average (ARIMA)

and historical averages. We found that our LSTM model demonstrates the best prediction accuracy. The improved accuracy

achieved by the LSTM model may lend itself to its ability to adjust and update the weights of neurons while accounting for

long-term dependencies. In addition, we found that weather conditions play a signiﬁcant role in improving the accuracy

of our models. Therefore, we built a prediction model that combines an LSTM model with a Recurrent Neural Network

Model (RNN) that focuses on the weather condition. Our ﬁndings also reveal that in nearly 37% of scheduled arrival times,

buses either arrive early or late by a margin of more than 5 min, suggesting room for improvement in the current strategies

employed by transit authorities.

Keywords Intelligent transportation systems· ITS· Traﬃc ﬂow· Neural networks· GPS locations· Weather conditions

1 Introduction

The importance of modeling and predicting bus arrival times

for public transit has long been recognized (Kumar etal.

2014). Throughout the past decade, much work has been

done to explore means of achieving faster and more reliable

transit systems (Hua etal. 2018). However, public transit

authorities continue to face criticisms from commuters due

to discrepancies between a vehicle’s scheduled and actual

arrival times. These irregularities naturally have a nega-

tive impact on the commuter’s daily life. Commuters may

miss medical appointments, school events, or arrive late for

work. With the availability of large scale pervasive data,

e.g. GPS locations collected from buses, we believe that

machine learning algorithms can help in predicting actual

* Omar Alam

omaralam@trentu.ca

Anshuman Kush

anshumankush@trentu.ca

Ali Emami

ali.emami@mail.mcgill.ca

Parisa Pouladzadeh

Parisa.Pouladzadeh@ﬂemingcollege.ca

1 Trent University, Peterborough, Canada

2 Mila/McGill University, Montreal, Canada

3 Fleming College, Peterborough, Canada

O.Alam et al.

1 3

arrival times for public transit buses, and assist in strategies

to overcome their discrepencies with scheduled times.

This paper aims at modelling the irregularities in arrival

times for public transit buses using historical bus arrival

times, stop locations, bus schedules, and weather data. Irreg-

ularities can be considered to occur in one of two ways, leads

(early arrival at a stop) and delays (late arrival at a stop). We

focused on predicting irregularities for transit buses for the

City of Toronto, where irregularities in bus arrival times are

so commonly occurring that Toronto Transit Commission

(TTC) issues notes for commuters who arrive late for work

due to misleading scheduling times (Star 2020).

To reduce irregularities in arrival times, transit authorities

incorporate a variety of strategies to bridge the gap between

actual and scheduled arrival times of buses. Among these

strategies, the holding control strategy is found to be the

most eﬀective to regulate bus operations (Fu and Yang

2002). This strategy seeks to address the phenomenon called

bus headway, which is a large, accumulated arrival lead or

delay in a bus stop that results from a succession of leads or

delays that occurred in previous stops. By holding an early-

arriving bus, a bus headway can be mitigated and service

reliability can be improved (Fu and Yang 2002). Another

strategy is stop-skipping, which is particularly useful when

buses are running late and behind their schedule (Liu etal.

2013). Despite applying these strategies, transit services

continue to face delays in their daily operations, which could

be due to ongoing road constructions, bus breakdown, road

accidents, or other day-to-day factors. Therefore, transit

authorities seek to increase the quality of service by provid-

ing passengers with predicted arrival times at a bus stop

using algorithms that exploit transit data (Hua etal. 2018).

With computational power becoming cheaper and easily

accessible, it is increasingly feasible to use data driven mod-

els for accurately predicting arrival times by leveraging a

large volume of data. These prediction models can assist in

developing intelligent trip planning apps, improved schedul-

ing systems for doctors and other businesses, and improving

urban planning strategies for city authorities.

In this paper, we propose a regression task to test the abil-

ity for machine learning algorithms to predict whether a bus

at a given stop and time will be early, on time, or late based

on the transit and weather data for Toronto Transit Commis-

sion (TTC). The machine learning models that we experi-

ment with include traditional feed-forward artiﬁcial neural

network (ANN) and a recurrent neural network (RNN) using

long short term memory (LSTM).

Our contribution can be summarized as follows:

– To our knowledge, this is the ﬁrst work that investigates

the impact of weather data on prediction accuracy for

bus arrival times. We compare the prediction models

with and without weather features. Previous work either

avoided using weather data altogether, e.g. (Kumar etal.

2014) or did not ﬁnd weather to be a useful feature for

their prediction task (Patnaik etal. 2004).

– We used historical arrival times, weather data, and other

input features for arrival time prediction for transit buses.

We found that the LSTM model, a variant of Recurrent

Neural Network that uses long term dependencies, yields

the best predictive performance.

– We found that weather has strong relationship with

arrival time prediction models. In nearly half of our data,

including weather improved the prediction accuracy by

48%. We also found that including the weather data sig-

niﬁcantly improves the accuracy when predicting bus

arrival times at multiple future stops in a trip.

– Because of the importance of weather, we built a sepa-

rate RNN model that focuses on the weather feature and

combined its result with the result of the LSTM model.

This combined hybrid model improved the prediction by

more than 500%.

The rest of the paper is organized as follows. The next sec-

tion discusses the related work. Section3 discusses the data

collection. Section4 discusses the machine learning mod-

els that we used. Section5 discusses the results and Sect.6

concludes the paper.

2 Related work

This section discusses related work on bus arrival prediction.

In general, previous work used linear regression (LR) (Hua

etal. 2018), non-parametric regression (NPR) (Chang etal.

2010; Balasubramanian and Rao 2015), or Kalmann Filters

(KFT) (Shalaby and Farhan 2004).

Hua etal. (2018) use linear regression to predict bus

locations. Bus location data displays non-linear relation-

ships between its features. Therefore, data has to be con-

verted into a linear space to be used in conventional math-

ematical models such as linear regression. This requires a

signiﬁcant amount of data pre-processing and be in turn,

costly and time-consuming. Kormáksson et. al. (2014) use

additive models (non linear regression models) to predict

bus arrival times using General Transit Feed Speciﬁcation

(GTFS) data. GTFS data is standardized by Google, which

is used to provide schedules and geographic information to

Google Maps and other Google applications that show tran-

sit information. Regression models are easy to interpret and

fast to train. Shalaby and Farhan (2004) use very limited

AVL (Automatic Vehicle Location) and APC (Automatic

Passenger Counter) data on Kalmann Filters (KFT) to pre-

dict the arrival time for Toronto Transit Buses. Their data

size is small (only 5 days of vehicle locations). In our study,

we used 3.5 million data points which were collected over a

Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…

1 3

period of four months. We use large datasets for predicting

arrival times using machine learning algorithms. Wang et.

al. (Wang etal. 2019) applies a multi-objective optimization

technique to reduce the capacity allocation of subway sys-

tems based on diﬀerent factors, e.g. number of passengers,

headway, number of available trains. Their objective is to

reduce the passenger wait time in the subway system. Simi-

larly, passenger travel time was used to predict the arrival

time in subway stations a in study done for the city of Bei-

jing (Xu etal. 2020). We used several input features in our

prediction models, e.g. past arrival time, day of the week,

hour of the day, etc. Liu etal. (2019) studied the optimal

combination of diﬀerent input features in mass rapid transit

(MRT) systems. However, they did not consider the weather

in their study.

Kumar et. al. (2014) compared Kalmann Filters (1960)

with artiﬁcial neural network for bus arrival prediction in

Chennai, India. A key ﬁnding of this experiment was that

with a large volume of data, artiﬁcial neural network models

give better accuracy as compared to mathematical models

(linear regression and kalmann ﬁlters). Wang et. al. (2009)

use a Support Vector Machine (SVM) to model traﬃc con-

ditions. They used bus arrival times and bus schedules as

inputs to train their model. ANN and kernalized SVM have

gained popularity for predicting travel time because of their

ability to solve complex and non-linear relationships among

features (Chien etal. 2002; Kumar etal. 2014; Jeong and

Rilett 2004). In Hua etal. (2018), the performance of lin-

ear regression, artiﬁcial neural networks and support vector

machine models were compared for prediction of bus arriv-

als at a single stop using data from multiple routes. Lin-

ear regression’s performance was poor due to non-linearity

in data, however the performance of ANN and SVM were

quite competitive. These approaches did not use recurrent

neural networks in their predictions. Our work uses LSTM

recurrent neural networks. Moreover, we use weather data

in our prediction, which had not been incorporated in any

way in the previous approaches.

To our knowledge, there has not been an abundance of

work that uses weather data for predicting arrival times for

public transit buses. Yang et. al. (2016) use a combination

of genetic algorithms and support vector machines along

with weather conditions to predict bus arrival time. They

did not use historical arrival times and did not explore

recurrent neural networks in their study. Chen et. al. ( 2004)

used weather condition and automatic passenger counting

data with ANN for bus arrival prediction for New Jersey

county. The previous two studies only relied on weather con-

ditions (i.e. snow, rain, fog) in their models. We consider

other weather attributes, such as visibility and temperature.

Patnaik etal. (2004) used weather data as features for bus

arrival prediction model, however their experiment failed to

show improvement with weather data.

Ke et. al. used a combination of CNN and LSTM Recur-

rent Neural Networks along with weather data for forecasting

short-term passenger demand for ride services (Ke etal. 2017).

In contrast, we use weather data for a diﬀerent problem, i.e.

to predict arrival times of transit buses. Rui et.al. compared

the performance of a GRU Model (Gated Recurrent Neural

Network) and LSTM model on yet another prediction task

concerning traﬃc ﬂow prediction (Fu etal. 2016).

3 Dataset collection

We used four datasets to build our models: (1) Live Automatic

Vehicle Locations (AVL) data for Toronto Transit Comission

(TTC) transit buses, collected every 20 s, (2) bus schedules (3)

and bus stop locations retrieved from GTFS (General Transit

Feed Speciﬁcation) data, (4) hourly weather data collected

from a weather station near downtown Toronto. The AVL data

comprises of GPS locations for Toronto Transit Commission

(TTC) buses. This data is publicly available through the Next-

Bus API (Nextbus 2020). We collected more than 700,000

unique live GPS locations for transit buses for two routes,

Route 28 and Route 8 (Fig.1) for the City of Toronto over 3

months, from January 2018 to March 2018. Figure2 presents

Fig. 1 GPS locations mapped to bus stop location data for TTC

routes. Markers are the GPS coordinates calculated for actual arrival

time at each stop. The top map depicts Route 28, while the bottom

map depicts Route 8

O.Alam et al.

1 3

an overview of our study. Table1 summarizes the datasets

that we used in our study. After collecting the four datasets,

we calculated the arrival time for a bus at each bus stop in the

studied routes. Then, we calculated the diﬀerence between the

actual arrival time of a bus at a stop and its scheduled arrival

time in that stop. Based on this diﬀerence, we determined if

the bus had arrived early, on time, or was delayed. Then, we

normalized the data from all four datasets and used them as

inputs to our models.

3.1 Estimating actual arrival time

The TTC data does not, in fact, specify whether a bus had

arrived at a stop. The actual bus arrival time of a bus at a stop

is calculated using the distance between the GPS location of

the bus and the bus stop location. This distance is calculated

using the haversine formula (Veness 2018), which is a well-

known formula used to calculate the path distance between

two points on the surface of the earth, and has wide range

of applications, e.g. (Chopde and Nichat 2013; Basyir etal.

2017; Ingole and Nichat 2013). The formula gives the dis-

tances between two points on a sphere using their latitudes and

longitudes while ignoring hills:

In Equation1,

𝜑

is latitude,

𝛬

is longitude. In Eq.2, c is

the angular distance in radians. In Eq.3, R is Earth’s radius

(mean radius = 6371 km) and d is the distance between two

GPS locations in kilometer. Since the real time GPS location

data is collected for every 20 s, we may miss the exact time

when the bus actually arrives at the bus stop. Furthermore,

during a 20 s window, the bus could arrive at a bus stop and

start moving again. In that case, the recorded GPS location

of the bus could be further away from the bus stop.

To mitigate these issues, we identify the GPS location

where the distance between the bus and the bus stop is mini-

mal. We do this by checking whether the bus is close to

the bus stop, where closeness corresponds to the bus being

within 100 m from the stop.

(1)

sin

𝛥𝜑

∕

cos 𝜑

1∗

cos 𝜑

2∗

sin

𝛥

∕

)

(2)

∗

atan2

(

√

−

)

(3)

d=R∗c

Fig. 2 An overview of bus

arrival prediction

Table 1 Datasets used for our study

Data points

TTC real time data 700,000

GTFS bus stop schedule data 18,110

GTFS bus stop location data 24

Weather data 3624

Algorithm 1: Calculatingthe Actual Arrival Time of aBus

Input: GPSTime: Reported time for the GPS location of the bus, ScheduledTime:

Scheduled arrival timeofthe busatthe stop

Output: ActualTime: Actual arrival timeofabus at a stop

Let d=Distancebetween the bus and the stop using the Harvsinedistance equation;

Let min = ∞

while GP ST ime ≤ScheduledTime +25minutes OR GP ST ime ≥

ScheduledTime −25 minutes do

Calculate d;

if d≤min then

min=d

if dis within 100mof the stop then

ActualTime = GPSTimefor the bus of d

Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…

1 3

Figure3 illustrates how we calculate the diﬀerence

between the actual and the scheduled arrival times of a

bus at a particular bus stop. Let

denote the GPS location

of the bus, t denote the time when the bus was at location

, and

denote the bus stop location. Since we capture

GPS locations for the bus every 20 s, we may encounter a

large number of GPS locations around a particular bus stop

during the scheduled arrival time for the bus.

To determine whether the bus arrives on time, we use

the GPS locations of the bus that are reported within a 50

min window from its scheduled arrival at the bus stop

(i.e. 25 min before and 25 min after the scheduled arrival

time). Then, we choose the closest bus location to the bus

stop within that time window, for example,

bt5

in Fig.3.

The next step is to check if that GPS location is within the

vicinity of the bus stop (i.e., we check if

bt5

is within 100

m distance from the bus stop). Algorithm1 summarizes

this process.

After estimating the actual arrival time of buses at a

particular bus stops, we calculate the diﬀerence between

the actual and scheduled arrival times. Equation4 calcu-

lates the actual time diﬀerence between scheduled arrival

time and actual arrival time.

In a similar way, Equation 5 calculates the difference

between scheduled arrival time and predicted arrival time.

If the diﬀerence is less than zero the bus arrived late and if

the diﬀerence is greater than zero the bus arrived early.

After preprocessing the data, we conducted a preliminary

analysis on the collected bus arrival data. We found that in

more than 37% of the time the buses on these routes were

either delayed more than 5 min or arrived early by more than

5 min (see Fig.4). In some cases the delay was more than 20

min. During the period of our study, the scheduled arrival

times did not change, i.e., the schedules did not get updated

by TTC. Therefore, we can consider that our models predict

the arrival times. However, we used the two formulas in Eqs.

(4) and (5) for prediction because we were interested in the

delays and early arrivals.

(4)

Diﬀerenceactual =ScheduleArrivaltime −ActualArrivaltime

(5)

Diﬀerencepredicted =ScheduleArrivaltime −PredictedArrivaltime

4 Machine learning models

This section discusses the machine learning models that

we used for predicting the arrival time of buses on selected

routes. In particular, we use regression models to estimate

the amount of time that a given bus deviates from its sched-

ule. Given historical arrival times at a stop s, our models

predict the next arrival time at stop

s+1

In our study, we use four baselines to which we compare

our model’s results: SVR, ANN, ARIMA and Historical

Average.

1. Support Vector Regression (SVR): SVR (Drucker

etal. 1997) is an extension of the basic support vector

machines (SVM) (Boser etal. 1992). In linear regression

models, the error rate is minimized, whereas in SVR

models, the error is ﬁt within a certain threshold. The

model that emerges from SVR is the hyperplane that

separates a maximum number of data points.

2. Artiﬁcial Neural Network (ANN): (Zhang and Qi 2005):

is a network of interconnected neurons, inspired by stud-

ies of biological nervous systems (Zhang and Qi 2005;

Tan etal. 2005). Neurons are simple information pro-

cessing units. For time-series analysis, inputs to an ANN

model are observations from previous time-steps and the

output corresponds to the predicted observation at the

next time-step (Zhang and Qi 2005). The information

Fig. 3 Calculating actual arrival

of bus at a bus stop

Fig. 4 Distribution of the diﬀerence between actual arrival time and

scheduled arrival time, 20% of the buses are delayed more than 5 min

and 17% of the buses arrive early more than 5 min

O.Alam et al.

1 3

received from the input nodes is processed by hidden

layer units along with appropriate activation functions

to determine the output.

3. ARIMA: ARIMA stands for Autoregressive Integrated

Moving Average models. ARIMA is a mature time

series prediction model based on statistics. For time

series data, ARIMA predicts future values of the data

entirely based on the previous data points in the series.

4. Historical average: Historical averages are the mean

arrival time for bus trips. Historical averages are used as

a common reference point to compare the performance

of diﬀerent machine learning models.

4.1 Long short‑term memory (LSTM) recurrent

neural networks

In Fig.5, we show in a single LSTM cell structure how

LSTM recurrent neural network maintains long term

dependencies.

The LSTM architecture contains series of connected

cells. Each LSTM cell consists of a special unit called a

memory block in the recurrent hidden layer. The memory

blocks have connections that provides necessary informa-

tion to maintain temporal state of the network. LSTM cell

has three gates: Input gate, Output gate and Forget gate.

Input gate control the ﬂow of input information provided

to the LSTM cell. Output cell controls the output ﬂow of

cell activations into the rest of the network. Unlike conven-

tional RNN, LSTM recurrent neural network has a separate

forget gate which makes it more suitable for time-series

analysis. The forget gate decides which information is rel-

evant for the prediction task and removes irrelevant infor-

mation. These gates together provides the overall memory

function for LSTM recurrent neural networks.

Following an iterative process, the LSTM model estab-

lishes a mapping between an input sequence and the irreg-

ularity in arrival time (output) from the training set. Below

are the equations for the LSTM neural network:

(6)

Input Gate ∶it=𝛼(Wxixt+Whiht−1+Wci Ct−1+bi)

At time interval t,

𝛼

is the element-wise sigmoid function

1+exp(−x)

and

tanh

represents the hyperbolic tangent function

exp(x)−exp(−x)

exp(x)+exp)(−x)

and

are the input, forget and output gate states

respectively, and

is the cell input state.

is input and

and

are the bias terms.

Wxi

Whi

and

Wci

are the weight matrices for the input

gate.

Wxf

Whf

and

Wcf

are weight matrices for forget gate.

Wxo

Who

and

Wco

are the weight matrices corresponding

to output gate.

Whi

Whf

Whc

Who

are the weight matrices

connecting

ht1

to the three gates.

The current cell state

is generated by calculating the

weighted sum of the previous cell state and the current cell

state.

The LSTM Recurrent Neural Network has the ability to

remove or add relevant information to the cell state, this is

because cell state is adjusted by input gate and forget gate.

The forget gate layer removes the irrelevant information

from the cell state. It uses

ht−1

and

, and outputs a number

between 0 and 1 for each input in the sequence in the pre-

vious cell state

Ct−1

. If the number is zero, no information

passes through the gate. If the number is one, all the infor-

mation passes through the forget gate.

Similarly, Input gate decides what new information will

be stored to the cell state. The ﬁnal output is based on the

cell state of LSTM network. As explained above, the current

cell state depends on the previous cell state. Therefore, the

previous cell state is taken into consideration when updating

the weights of the LSTM cell. This is how LSTM cell is able

to maintain long term dependencies for predictions. LSTM

Recurrent Neural Networks has shown promising results in

solving complex machine learning tasks (Sutskever etal.

2014).

4.2 Recurrent neural networks fortheweather

feature

(7)

Forget Gate ∶ft=

𝛼

(Wxf xt+Whf ht−1+Wcf Ct−1+bf)

(8)

Cell Input ∶Ct=ftCt−1+ittanh(WxCxt+WhC ht−1+bC)

(9)

Output Gate ∶ot=𝛼(Wxoxt+Whoht −1+Wco Ct+bo)

(10)

hidden layer output ∶ht=ottanh(Ct),

(11)

InputData

⎛

⎜

⎝

t0W1

t0T1

t1W1

⋮

⎞

⎟

⎠

Fig. 5 LSTM cell structure

Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…

1 3

Since weather condition has signiﬁcant impact on the

prediction results, we decided to create a Recurrent Neural

Network (RNN) model that focuses on the weather feature.

The output of this model is combined with the LSTM model

discussed in the previous subsection to increase the accuracy

(12)

TargetData

⎛

⎜

⎝

⋮

⎞

⎟

⎠

of prediction. The RNN model takes as input the arrival

times and weather readings at the current stop and at the

previous stop to predict the arrival time at the next stop when

the weather reading is known.

In this model, a window of three is chosen for arrival

times T and weathers W. Inputs that are shown in Eq.11,

are divided to three categories

(previous bus stop),

(current bus stop) and

(next bus stop). These inputs are

illustrated graphically in Fig.6.

is comprised of

and

for n samples. Similarly,

is comprised of n arrival

times and weather readings. The third input,

only takes

weather readings. Equation12 shows the predicted arrival

times for the next bus stop. Figure6 illustrates the archi-

tecture of our RNN model. Two hidden layers,

and

are set in the diagram with diﬀerent matrix sizes. The

inputs go to

with the batch size of 32. After processing

, the results will be transferred to

in diﬀerent for-

mats. In

, the result of processing

will be sent to a

matrix of 1×16 and will be concatenated with the result of

processing

, which is sent to a matrix size of 1x32. Simi-

larly, the output of processing

goes to a matrix of size

1×16 in

. Then all results of

are concatenated together.

Finally, the sigmoid function is applied in the last layer

which provides the arrival time prediction.

4.3 Data preprocessing andnormalization

Table2 summarizes the list of features used in our mod-

els. We have ﬂattened the data, i.e., we augment the bus

trip travelling southbound with the next trip for the same

bus travelling northbound. The ﬁrst feature (time diﬀ) is

Fig. 6 RNN architecture

Table 2 Features used for

model building Feature name Description

time diﬀ Diﬀerence between actual arrival time and scheduled arrival time. This is vari-

able which we are trying to predict

Tag Speciﬁes the direction on which bus is heading

Trip.ID A unique number given to each trip

Stop sequence Assigned sequence numbers starting from 1 to each bus stop in the route

Distance traveled Cumulative distance travelled by the bus to reach the bus stop

routeTag A unique numeric code to identify a particular route on which the bus is traveling

Stop ID A unique numeric code to identify a particular bus stop

Bus ID A unique numeric code to identify a particular bus

Service class Weekday, Saturday and Sunday

Day of the

week Numerical number indicating day of the week (1-Sunday, 2-Monday..etc)

Hour Numerical number indicating hour of the day

Max temperature Maximum temperature in the hour

Min temperature Minimum temperature in the hour

Visibility Visibility in Km, i.e., how far the driver is able to see

Weather condition Weather conditions: rain, snow, fog or haze

O.Alam et al.

1 3

calculated using live GPS locations and the TTC schedules

data as discussed in the previous section. The last four

features are obtained from the weather data. The rest of the

features are obtained from the live bus stop locations data.

Before a machine learning model is trained, all features

are converted into a vector representation (e.g., the cate-

gorical features). There are two ways to convert a categori-

cal feature into a vector representation; one-hot Encoding

and Label Encoding (Tan etal. 2005).

1. One-Hot Encoding: Encodes a categorical features as a

one-hot numeric vector, i.e., it creates binary column for

each category and returns a sparse matrix, where only

the entry at the row representing the category is assigned

a 1, with the remaining entries assigned 0, creating a

sparse vector.

2. Label Encoding: Transforms categorical features to

numerical features by assigning each categorical feature

a unique number which can be normalized before using

it as an input for machine learning model.

We have two categorical features, tag and weather condi-

tions. Tag was converted using one-hot Encoding because it

only has two categories (North and South). Weather condi-

tions was converted using Label Encoding. Other features in

our data do not require encoding because they are continu-

ous variables.

After converting all the features into a vector representa-

tion, data was normalized using the following equation:

In Eq.13,

is the

ith

observation of a feature and

is the

ith

normalized data point.

4.4 Model training inLSTM

The input to each LSTM cells is a 3-dimensional (3D)

matrix. The following discusses brieﬂy each dimension:

1. Sample size: sample size refers to how many rows are

given as an input to the model. In this study we used a

sample size of 32.

2. Time Steps: time step is one point of observation in the

sample. The number of steps determines steps ahead in

time the model will predict. We used one, two, three,

and four time steps in our model.

3. Features: The detailed explanation of each feature used

is discussed in Table2. Our model uses the time diﬀ fea-

ture as a dependent feature (output of the model), which

speciﬁes the diﬀerence between scheduled arrival time and

actual arrival time of bus from previous time stamp. We

(13)

−min(x)

max(x)−min(x)

use 11 independent features as input to model, Trip.ID,

Tag, Stop.sequence, distance travelled, maximum temper-

ature, minimum temperature, visibility, hour, day of week,

service class, weather conditions as inputs to the model.

A unique property of Neural Networks is that the when

the model adjusts the weights, it can reduce the eﬀect of

the irrelevant features while training by assigning them

low weights. These features can still have a small nega-

tive inﬂuence on the model which can decrease its overall

accuracy. Only features which gave us the highest accuracy

were used in the ﬁnal model. We did an ablation study, by

removing one feature at a time and calculating the error

rate of the model. From Table2, we found that Stop ID

and Bus ID to be insigniﬁcant to our model. Therefore,

we excluded them from our model. Other features showed

signiﬁcant impact on the accuracy of the model.

In our LSTM model architecture, we use 12 input neu-

rons, this represents the number of features (11 independ-

ent features and 1 dependent feature) in our dataset used

for modeling. The number of neurons used in the output

layer is 1 which speciﬁes the diﬀerence between predicted

arrival and scheduled arrival times (i.e., delay or early

arrivals) for a bus at a stop. We tried diﬀerent variations of

LSTM hidden layers and tried diﬀerent number of LSTM

cells within each layer. For the ﬁnal model selection, we

choose 1 hidden layer with 100 LSTM cells with ’ReLU’

(Goodfellow etal. 2016) activation function.

When the LSTM model starts training, a sequence of 3D

samples (3D tuple) is given to an LSTM layer. The values

of a sequence are (32, 1, 12). This means, in one iteration,

the model runs 32 samples (batch size), to predict 1 time

step ahead, using 12 input features (11 independent input

features discussed previously and the previous reading for

the dependent feature, i.e. time diﬀ). In the next model

iteration each sample will carry the cell state (weights)

and a forget gate. Forget gate controls how much from the

current cell state is passed to next cell, thus, ensuring that

model can learn longer sequences.

When training neural networks, several decisions need

to be made regarding the choice of hyperparameters used

by the model. We chose the following hyperparameters

for our model:

1. Activation functions: are non-linear mathematical functions

used to combine the output of neurons at one layer for the

next layer. They are important for a neural network model

as they introduce non-linear properties to the model.

We experimented with diﬀerent activation functions,

such as, linear, sigmoid and ReLU, for our ﬁnal model

we used ReLU activation function.

Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…

1 3

2. Optimization algorithms: help to minimize (or maxi-

mize) an error function, and they are used to compute

the output in such a way that it is computationally less

expensive and the model converges to the global minima

rather than a local minima. We investigated RMSprop

and ADAM optimizers. For the ﬁnal model we used

ADAM optimizer.

3. Epochs: specify how many full passes of the data set

(epochs) should be used during training. If we use too

few epochs, we may underﬁt the model and do not allow

it to learn everything it can from the training data. If we

use too many epochs, we may overﬁt the model, which

leads to introducing noise to the model.

4. Early stopping: early stopping is a regularization method

used to prevent the model from over-ﬁtting. Early stop-

ping is used to remove the need to manually adjust the

value of epochs while training a model. When the error

rate of the model stops decreasing it automatically stops

model from training. Another method for regularization

is called dropout (Srivastava etal. 2014), we found that

Early Stopping works best for our model.

5. Batch size: batch size is the number of samples that will

be propagated through the network in one iteration. A

batch size can be either less than or equal to the total num-

ber of training samples. Advantages of using a small batch

size is that it requires less memory for training. A small

batch size also reduces the overall training time required

by the model, which important when working with large

datasets because it is not possible to ﬁt all of the data

into memory at once. However, if the batch size is too

small, it can lead to less accurate models because we are

not providing suﬃcient number of samples to the model,

which leads to less accurate estimate of the output.

Table3 shows diﬀerent conﬁgurations of LSTM model that

we tried in our experiments. For our ﬁnal model, we used

one hidden layer with 100 cells and one dimensional output

representing the next arrival time. This conﬁguration provides

the best performance for both Route 28 and Route 8.

5 Results/model performance

To measure the performance of our models, we calculate its

Mean Absolute Percentage Error (MAPE) and the Root Mean

Square Error (RMSE) on the testing data. All models were

trained ten times and the average of MAPE and RMSE error

rates were considered as the ﬁnal value for the models.

The equation for these performance measures are deﬁned

as follows:

Where

is the actual value and

is the predicted value. In

our case,

is the diﬀerence of scheduled arrival time and

actual arrival time,

is the diﬀerence between scheduled

arrival time and predicted arrival time, and n is the number

of samples.

Table4 shows the MAPE and RMSE values for diﬀerent

models for Route 28. The LSTM model substantially outper-

formed other models. It shows a 7 fold reduction in MAPE

over historical average. A possible reason that LSTM model

MAPE



t=1





yt−xt





RMSE =



t=1(yt−xt)2

Table 3 Model tuning for

LSTM

Bold values indicate that the ﬁnal values used in the experiments

Activation Layers Cells Batch size Rote 28 Rote 8

RMSE MAPE RMSE MAPE

ReLU 1 10 32 433.15 0.2 284.77 0.44

ReLU 1 50 32 427.87 0.14 277.97 0.45

ReLU 1 100 32 422.22 0.13 269.49 0.36

Linear 1 50 32 426.56 0.17 283.58 0.55

Linear 1 100 32 425.52 0.16 276.74 0.45

ReLU 1 100 64 426.24 0.23 275.76 0.41

Sigmoid 1 100 32 433.58 0.25 279.62 0.4

ReLU 3 40,80,40 32 427.68 0.31 283.56 0.54

ReLU 3 40,80,40 64 427.77 0.28 283.75 0.54

ReLU 2 40,40 64 431.50 0.2 279.32 0.5

Table 4 Comparison of diﬀerent models for Route 28

Bold values indicate that the ﬁnal values used in the experiments

Historical average ARIMA SVR ANN LSTM

MAPE 0.91 0.80 0.68 0.30 0.13

RMSE 477.87 432.69 428.79 427.33 422.2

O.Alam et al.

1 3

performs better than other models is because it may account

more directly to the long term dependencies between input

and output features. LSTM model also was best performing

for Route 8 as shown in Table5.

We observe that the RMSE value for LSTM model is

not substantially lower than the baseline models. RMSE is

sensitive to large outlying errors which occurred in our data,

and performs best when errors follow a normal distribution

(Chai and Draxler 2014). Chai and Draxler (2014) suggest

to remove the outliers that are larger than other errors by

several orders of magnitude. However, we did not need to

remove outliers, i.e., extreme irregularities, because MAPE

clearly showed LSTM model outperforms other models, and

the RMSE value for LSTM model is lower than all other

baseline models. In addition, we were interested to see the

impact of weather on extreme irregularities. In the next sub-

section, we investigate the performance of LSTM model

with and without the weather data.

5.1 Signicance oftheweather data

We investigated the impact of the weather data on the accu-

racy of our prediction models. When we ran our models with

weather data features (i.e., when we included the following

features: maximum temperature, minimum temperature,

visibility and weather conditions), we noticed signiﬁcant

improvement in the results (see Table6 for Route 28 and

Table7 for Route 8).

Figure7 compares the actual arrival time versus predicted

arrival time with and without using weather data for Route

28. The x-axis shows the ordered observations of bus arriv-

als at stops. As mentioned previously, we augment the bus

trip travelling on a direction with the next trip for the same

bus travelling the opposite direction. This means the x-axis

depicts the arrival of the bus at the ﬁrst stop, followed by

its arrival at the next stop. When the bus arrives at the last

stop, it returns back on the same route. The next observation

after the last stop would be next arrival of the same bus at

the stop before the last stop. The y-axis is time in seconds.

It can be observed from the plot that the model created with

the weather data has better accuracy than the model that was

created without the weather data. In particular, we notice

that the model that was created using weather data was able

to capture extreme delays and early arrivals better than the

model that was created without the weather data. We notice

similar trend for Route 8 (see Fig.8).

Furthermore, we compared the results of LSTM mod-

els for diﬀerent portions of the data. We observed that

Table 5 Comparison of diﬀerent models for Route 8

Bold values indicate that the ﬁnal values used in the experiments

Historical average ARIMA SVR ANN LSTM

MAPE 0.92 0.84 0.76 0.49 0.36

RMSE 292.38 286.64 279.01 278.69 269.49

Table 6 Comparison of models with and without weather data for

Route 28

Bold values indicate that the ﬁnal values used in the experiments

LSTM without weather

data

LSTM with weather data

MAPE 0.21 0.13

RMSE 427.02 422.2

Table 7 Comparison of models with and without weather data for

Route 8

Bold values indicate that the ﬁnal values used in the experiments

LSTM without weather

data

LSTM with weather

data

MAPE 0.43 0.36

RMSE 279.11 269.49

Fig. 7 Model performance of LSTM Model with and without weather data on Route 28

Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…

1 3

for 16% of the data, the model with the weather data has

much higher prediction accuracy when compared to the

model created without the weather data (see Table8). The

model accuracy improves with weather by 310% when we

compare RMSE and by 282% when we compare MAPE.

Table8 clearly demonstrates that weather plays a signiﬁ-

cant impact on the prediction accuracy for nearly half of

the data (49%). We observed similar results for Route 8,

where weather had higher impact (for nearly half of the

data, the model accuracy improved by more than 150%

as shown in Table9). The impact of weather decreases as

we see more data points because additional factors may

also contribute to bus arrival prediction, suggesting that

weather has complex non-linear relationship with bus

arrival times. Examples of these factors are traﬃc condi-

tions, construction zones, emergency vehicles, number of

passengers which we are planning to explore in future.

However, we will mitigate this issue by modelling weather

and arrival times in a separate RNN model as explained

by end of this section.

To investigate further how much impact an individual

weather feature has on the model, we created three LSTM

models by just removing one feature and keep other features.

Fig. 8 Model performance of LSTM Model with and without weather data on Route 8

Table 8 Diﬀerence in RMSE

and MAPE of with and without

the weather data for Route 28

Bold values indicate that the ﬁnal values used in the experiments

% Data RMSE MAPE

Weather No Weather % Weather No Weather %

16% 10.75 44.12 310% 934.4 282%

33% 20.94 42.74 104% 20.339.5 95%

49% 34.27 47.56 39% 34.150.5 48%

66% 55.63 62.12 11% 47 63.2 34%

82% 116.13 117.72 1.3% 61.58 72.97 18%

Table 9 Diﬀerence in RMSE

and MAPE of with and without

the weather data for Route 8

Bold values indicate that the ﬁnal values used in the experiments

% Data RMSE MAPE

Weather No Weather %RMSE Weather No Weather %MAPE

16% 4.85 36.91 661% 4.534.2 660%

33% 9.25 35.86 288% 9.737 281%

49% 14.32 36.66 156% 16.36 41.5 154%

66% 22.98 37.49 63% 21.448.2 100%

82% 30.41 41.04 35% 39.47 62.08 57%

Table 10 Comparison of models with diﬀerent features for Route 28

Bold values indicate that the ﬁnal values used in the experiments

Visibility Weather

conditions

Temperature All weather

features

MAPE 0.17 0.15 0.18 0.13

RMSE 424.76 423.65 425.08 422.2

O.Alam et al.

1 3

The ﬁrst model removes visibility, the second model removes

weather conditions (rain, snow, haze, fog), and the third model

removes temperature. Table10 shows the comparison of dif-

ferent LSTM models as we remove diﬀerent features from the

model for Route 28. The MAPE value increases from 0.13 to

0.17 when we remove visibility feature from the model. Simi-

larly, when we keep all the other features except the weather

conditions the MAPE value increases to 0.15. Removing tem-

perature increases the MAPE value to 0.18. Similar observa-

tions were found for Route 8 (see Table11). These results

suggest that all weather features that we use in our models are

important to achieve better prediction accuracy.

5.2 Multi‑stop forecasting models

Apart from comparing diﬀerent machine learning models,

we also compared the accuracy of the LSTM model in pre-

dicting irregularities for multiple future stops in a trip (i.e.,

predicting the delay/early arrivals for the future arrivals of

the bus after its immediate next scheduled arrival).

We created 4 diﬀerent models: (

s+1

s+2

s+3

s+4

The ﬁrst model was discussed througout the paper and pre-

dicts one stop ahead in time (i.e., given the historical arrival

times and weather data for stop s, it predicts the irregulari-

ties for the next scheduled bus arrival at the next stop

s+1

The second model predicts the irregularities for the bus

arrival at stop

s+2

. Similarly the third and fourth models

predict irregularities for the bus arrival at stop

s+3

and

s+4

, respectively. Figures9 and 10 show the comparison

between the MAPE% errors when predicting irregularities

for multiple stops with and without the weather data.

It is clear from Figs.9 and 10 that the model performance

decreases as we predict for multiple future stops ahead in

time. This is similar to the ﬁndings by (Duan etal. 2016),

(Hua etal. 2018) and (Kormáksson etal. 2014)). However,

we found that when weather data was excluded (the dotted

lines), the rate of decrease in prediction accuracy increases

as we predict for more future stops. This suggests that

weather plays a signiﬁcant role when predicting arrival times

or their irregularities for multiple future stops.

5.3 Modelling weather feature withRNN model

Since the previous experiments clearly established that

weather has a significance influence on the prediction

results, we decided to use this feature in a separate RNN

model and combine the result with the LSTM model (which

also included the weather features as discussed previously).

The ﬁnal prediction is the average of the two models. The

architecture of the RNN model was discussed in Sect.4.

Our motivation was to investigate whether we can improve

the prediction accuracy if we create a model dedicated to

the weather. We tested and trained the RNN model with dif-

ferent hyper parameters and ﬁnally we have tuned the hyper

parameters as follow:

– learning rate = 0.001

– training epochs = 300

– batch size = 32

– display step = 1

Table12 compares the performance of this model with the

LSTM model for route 28. The RMSE of our new hybrid

model showed improvement of 562.38% over the LSTM

model for route 28 for 82% of the data. For route 8, the

improvement was 873.85% as shown in Table13. We also

noticed that the accuracy does not decrease when we add

more data to the model, contrast to the ﬁndings in Sect.5.1.

This could be because the RNN model focuses on the

weather features, while the LSTM model includes other

Table 11 Comparison of models with diﬀerent weather features for

Route 8

Visibility Weather

conditions

Temperature All weather

features

MAPE 0.42 0.40 0.42 0.36

RMSE 278.30 278.43 281.51 269.49

Fig. 9 Prediction accuracy with and without weather features for mul-

tiple stops for Route 28

Fig. 10 Prediction accuracy with and without weather features for

multiple stops for Route 8

Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…

1 3

features along with the weather. In other words, in small

portion of the data, weather condition played a signiﬁcant

role in improving the prediction results in the LSTM model.

However, when a separate RNN model is used for weather,

its role to improve accuracy included larger segments of the

data.

6 Conclusion

Nowadays, complex machine learning algorithms can be

applied quickly over large datasets, thanks to the advances

in the area of big data analytics. This paper investigates

diﬀerent prediction model for irregularities in bus arrival

times, using machine learning algorithms. In particular,

we built Long Short-Term Memory Recurrent Neural

Network models to predict the next arrival time for a bus

at a particular stop. Our prediction models use historical

bus arrival data, i.e. real time GPS locations for Toronto

Transit buses, bus schedules obtained from a Google API,

and weather condition data obtained from a weather sta-

tion in Toronto. Our analysis show that Toronto transit

buses experience signiﬁcant irregularities in arrival times.

In nearly 37% of times, transit buses are either delayed or

arrive early by more than 5 min, showing great room for

improvement. To our knowledge, this is the ﬁrst work to

investigate the impact of weather on bus arrival prediction.

We found that weather plays a signiﬁcant role improv-

ing prediction accuracy. Therefore, we built a prediction

model that combines two machine learning models: an

LSTM model that focuses on a range of input features,

e.g. arrival times and hour of the day, and an RNN model

which focuses on the weather features. We also investi-

gated prediction accuracy for multiple scheduled arrival

of buses ahead in time using weather data. In future, we

plan collect more data in order to run our experiments over

the entire year. Our current study covers the Winter season

and the beginning of the Spring season in Toronto. We

plan to extend our study to cover all weather seasons. In

addition, we plan to extend our work on bus arrival predic-

tion by using machine learning algorithms with additional

datasets, such as passenger count and traﬃc condition.

Furthermore, we plan to use diﬀerent RNN extensions,

such as the Gated Recurrent Unit (GRU) (Cho etal. 2014;

Che etal. 2016).

References

Balasubramanian P, Rao KR (2015) An adaptive long-term bus

arrival time prediction model with cyclic variations. J Public

Transport 18:1–18. https ://doi.org/10.5038/2375-0901.18.1.6

Basyir M, Nasir M, Suryati S, Mellyssa W (2017) Determination of

nearest emergency service oﬃce using haversine formula based

on android platform. EMITTER Int J Eng Technol 5(2):270–278

Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm

for optimal margin classifiers. In: Proceedings of the fifth

annual workshop on computational learning theory, ACM,

New York, NY, USA, COLT ’92, pp 144–152. https ://doi.

org/10.1145/13038 5.13040 1

Chai T, Draxler RR (2014) Root mean square error (rmse) or mean

absolute error (mae)? arguments against avoiding rmse in the

literature. Geosci Model Dev 7(3):1247–1250. https ://doi.

org/10.5194/gmd-7-1247-2014

Chang H, Park D, Lee S, Lee H, Baek S (2010) Dynamic multi-

interval bus travel time prediction using bus transit data. Trans-

portmetrica 6(1):19–38

Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2016) Recurrent

neural networks for multivariate time series with missing val-

ues. Sci Rep. https ://doi.org/10.1038/s4159 8-018-24271 -9

Chen M, Liu X, Xia J, Chien SIJ (2004) A dynamic bus-arrival

time prediction model based on apc data. Comput Aided

Civ Infrastruct Eng 19:364–376. https ://doi.org/10.111

1/j.1467-8667.2004.00363 .x

Chien SIJ, Ding Y, Wei C (2002) Dynamic bus arrival time prediction

with artiﬁcial neural networks. J Transport Eng 128(5):429–438.

https ://doi.org/10.1061/(ASCE)0733-947X(2002)128:5(429)

Cho K, van Merriënboer B, Gülçehre Ç, Bahdanau D, Bougares F,

Schwenk H, Bengio Y (2014) Learning phrase representations

using rnn encoder–decoder for statistical machine translation.

In: Proceedings of the 2014 Conference on Empirical Methods

in Natural Language Processing (EMNLP), Association for

Computational Linguistics, Doha, Qatar, pp 1724–1734. http://

www.aclwe b.org/antho logy/D14-1179

Chopde NR, Nichat MK (2013) Landmark based shortest path detec-

tion by using a* and haversine formula. Int J Innov Res Comput

Commun Eng 1(2):298–302

Drucker H, Burges CJC, Kaufman L, Smola AJ, Vapnik V (1997)

Support vector regression machines. In: Mozer MC, Jordan

MI, Petsche T (eds) Advances in neural information processing

Table 12 Diﬀerence in RMSE for the LSTM model and our

LSTM+RNN (weather) model for Route 28

Bold values indicate that the ﬁnal values used in the experiments

RMSE

% Data LSTM+RNN LSTM %

49% 12.97 34.27 264.23%

66% 18.55 55.63 299.90%

82% 20.65 116.13 562.38%

Table 13 Diﬀerence in RMSE for the LSTM model and our

LSTM+RNN (weather) model for Route 8

Bold values indicate that the ﬁnal values used in the experiments

RMSE

% Data LSTM+RNN LSTM %

49% 7.28 14.32 196.71%

66% 5.84 22.98 393.50%

82% 3.48 30.41 873.85%

O.Alam et al.

1 3

systems 9, MIT Press, Cambridge, pp 155–161, http://paper

s.nips.cc/paper /1238-suppo rt-vecto r-regre ssion -machi nes.pdf

Duan Y, Lv Y, Wang FY (2016) Travel time prediction with lstm

neural network. In: 2016 IEEE 19th international conference on

intelligent transportation systems (ITSC), pp 1053–1058

Fu L, Yang X (2002) Design and implementation of bus–holding

control strategies with real-time information. Transp Res Rec

1791(1):6–12

Fu R, Zhang Z, Li L (2016) Using lstm and gru neural network

methods for traﬃc ﬂow prediction. pp 324–328. https ://doi.

org/10.1109/YAC.2016.78049 12

Goodfellow I, Bengio Y, Courville A (2016) Deep learning. The

MIT Press, Cambridge

Hua X, Wang W, Wang Y, Ren M (2018) Bus arrival time prediction

using mixed multi-route arrival time data at previous stop. Trans-

port 33(2):543–554

Ingole P, Nichat MMK (2013) Landmark based shortest path detec-

tion by using dijkestra algorithm and haversine formula. Int J

Eng Res Appl (IJERA) 3(3):162–165

Jeong R, Rilett R (2004) Bus arrival time prediction using artiﬁcial

neural network model. In: Proceedings. The 7th international

IEEE conference on intelligent transportation systems (IEEE

Cat. No.04TH8749), pp 988–993. https ://doi.org/10.1109/

ITSC.2004.13990 41

Kalman RE (1960) A new approach to linear ﬁltering and prediction

problems. Trans ASME J Basic Eng 82(Series D):35–45

Ke J, Zheng H, Yang HXC (2017) Short-term forecasting of passenger

demand under on-demand ride services: a spatio-temporal deep

learning approach. Transport Res Part C Emerg Technol. https ://

doi.org/10.1016/j.trc.2017.10.016

Kormáksson M, Barbosa L, Vieira MR, Zadrozny B (2014) Bus travel

time predictions using additive models. In: 2014 IEEE inter-

national conference on data mining, pp 875–880. https ://doi.

org/10.1109/ICDM.2014.107

Kumar V, Kumar BA, Vanajakshi L, Subramanian SC (2014) Com-

parison of model based and machine learning approaches for 1

bus arrival time prediction. Transportation Research Board 93rd

Annual Meeting. http://docs.trb.org/prp/14-2518.pdf

Liu L, Chen RC, Zhao Q, Zhu S (2019) Applying a multistage of input

feature combination to random forest for improving mrt passenger

ﬂow prediction. J Ambient Intell Hum Comput 10(11):4515–4532

Liu Z, Yan Y, Qu X, Zhang Y (2013) Bus stop-skipping scheme with

random travel time. Transport Res Part C Emerg Technol 35:46–

56. https ://doi.org/10.1016/j.trc.2013.06.004

Nextbus Nexbus public feed. https ://www.nextb us.com/xmlFe edDoc s/

NextB usXML Feed.pdf. Accessed 2020

Patnaik J, Chien S, Bladikas A (2004) Estimation of bus arrival times

using APC data. J Public Transp 7(1):1

Shalaby A, Farhan A (2004) Prediction model of bus arrival and depar-

ture times using avl and apc data. J Public Transport 7(1):41–61.

https ://doi.org/10.5038/2375-0901.7.1.3

Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R

(2014) Dropout: a simple way to prevent neural networks from

overﬁtting. J Mach Learn Res 15(1):1929–1958. http://dl.acm.

org/citat ion.cfm?id=26274 35.26703 13

Star TT (2020) Ttc gives notes for aﬀected customers arriving late

for work. https ://www.thest ar.com/news/gta/2017/12/01/late-for-

work-the-ttc-can-give-you-a-note-for-that.html. Accessed 2020

Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learn-

ing with neural networks. In: Ghahramani Z, Welling M, Cor-

tes C, Lawrence ND, Weinberger KQ (eds) Advances in neural

information processing systems 27, Curran Associates, Inc., pp

3104–3112. http://paper s.nips.cc/paper /5346-seque nce-to-seque

nce-learn ing-with-neura l-netwo rks.pdf

Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining,

1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston

Veness C (2018) Movable type scripts: calculate distance, bearing and

more between latitude/longitude points. URL:https ://www.movab

le-type.co.uk/scrip ts/latlo ng.html

Wang B, Huang J, Xu J (2019) Capacity optimization and allocation of

an urban rail transit network based on multi-source data. J Ambi-

ent Intell Hum Comput 10(1):373–383

Wang J, Chen X, Guo S (2009) Bus travel time prediction model with

v-support vector regression. In: 2009 12th International IEEE con-

ference on intelligent transportation systems, pp 1–6

Xu J, Wu Y, Jia L, Qin Y (2020) A reckoning algorithm for the predic-

tion of arriving passengers for subway station networks. J Ambient

Intell Hum Comput 11(2):845–864

Yang M, Chen C, Wang L, Yan X, Zhou L (2016) Bus arrival time

prediction using support vector machine with genetic algorithm.

Neural Netw World 26:205–217. https ://doi.org/10.14311 /

NNW.2016.26.011

Zhang P, Qi M (2005) Neural network forecasting for seasonal and

trend time series. Eur J Oper Res 160:501–514. https ://doi.

org/10.1016/j.ejor.2003.08.037

Publisher’s Note Springer Nature remains neutral with regard to

jurisdictional claims in published maps and institutional aﬃliations.

A preview of this full-text is provided by Springer Nature.

Learn more

Content available from Journal of Ambient Intelligence and Humanized Computing

This content is subject to copyright. Terms and conditions apply.

Real-time bus arrival delays analysis using seemingly unrelated regression model

Article

Full-text available

Jun 2024
TRANSPORTATION

To effectively manage and control public transport operations, understanding the various factors that impact bus arrival delays is crucial. However, limited research has focused on a comprehensive analysis of bus delay factors, often relying on single-step delay prediction models that are unable to account for the heterogeneous impacts of spatiotemporal factors along the bus route. To analyze the heterogeneous impact of bus arrival delay factors, the paper proposes a set of regression equations conditional on the bus location. A seemingly unrelated regression equation (SURE) model is developed to estimate the regression coefficients, accounting for potential correlations between regression residuals caused by shared unobserved factors among equations. The model is validated using bus operations data from Stockholm, Sweden. The results highlight the importance of developing stop-specific bus arrival delay models to understand the heterogeneous impact of explanatory variables. The significant factors impacting bus arrival delays are primarily associated with bus operations, such as delays at consecutive upstream stops, dwell time, scheduled travel time, recurrent congestion, and current traffic conditions. Factors like the calendar and weather have significant but marginal impacts on arrival delays. The study suggests that different bus operating management strategies, such as schedule adjustments, route optimization, and real-time monitoring and control, should be tailored to the characteristics of stop sections since the impacts of these factors vary depending on the stop location.

Citation: Automatic Identification System-Based Prediction of Tanker and Cargo Estimated Time of Arrival in Narrow Waterways

Article

Full-text available

Jan 2024

In maritime logistics, accurately predicting the Estimated Time of Arrival (ETA) of vessels is pivotal for optimizing port operations and the global supply chain. This study proposes a machine learning method for predicting ETA, drawing on historical Automatic Identification System (AIS) data spanning 2018 to 2020. The proposed framework includes a preprocessing module for extracting, transforming, and applying feature engineering to raw AIS data, alongside a modeling module that employs an XGBoost model to accurately estimate vessel travel times. The framework's efficacy was validated using AIS data from the Port of Houston, and the results indicate that the model can estimate travel times with a Mean Absolute Percentage Error (MAPE) of just 5%. Moreover, the model retains consistent accuracy in a simplified form, pointing towards the potential for reduced complexity and increased generalizability in maritime ETA predictions.

Improving bus arrival time predictors using only public transport API data

Article

Aug 2023

Optimizing Forecasted Activity Notifications with Reinforcement Learning

Article

Full-text available

Jul 2023
SENSORS-BASEL

In this paper, we propose the notification optimization method by providing multiple alternative times as a reminder for a forecasted activity with and without probabilistic considerations for the activity that needs to be completed and needs notification. It is important to consider various factors when sending notifications to people after obtaining the results of the forecasted activity. We should not send notifications only when we have forecasted results because future daily activities are unpredictable. Therefore, it is important to strike a balance between providing useful reminders and avoiding excessive interruptions, especially for low probabilities of forecasted activity. Our study investigates the impact of the low probability of forecasted activity and optimizes the notification time with reinforcement learning. We also show the gaps between forecasted activities that are useful for self-improvement by people for the balance of important tasks, such as tasks completed as planned and additional tasks to be completed. For evaluation, we utilize two datasets: the existing dataset and data we collected in the field with the technology we have developed. In the data collection, we have 23 activities from six participants. To evaluate the effectiveness of these approaches, we assess the percentage of positive responses, user response rate, and response duration as performance criteria. Our proposed method provides a more effective way to optimize notifications. By incorporating the probability level of activity that needs to be done and needs notification into the state, we achieve a better response rate than the baseline, with the advantage of reaching 27.15%, as well as than the other criteria, which are also improved by using probability.

Integration of Machine Learning and Simulation for dynamic rescheduling in Truck Appointment Systems

Article

Mar 2023
SIMUL MODEL PRACT TH

Analyzing Public Transit Schedule Deviations: A Case Study on Montreal Using Real-Time Data

Conference Paper

May 2024

Spatio-Temporal Factors Affecting Short-Term Public Transit Passenger Demand Prediction: A Review

Conference Paper

Jan 2024

In public transit (PT) planning, passenger demand forecasting is an important process to periodically update operation management and planning infrastructure in the future. In the past, many researchers considered passenger demand forecasting a fundamental need for transportation planning and developed forecasting models based on statistical methods and Artificial Intelligence (AI). To increase the precision of the model, spatial and temporal attributes that influence the passenger movement at the station level, corridor level, and network level, are need to be considered. Hence, in this study, a detailed literature review is carried out to understand the pros and cons of various methods used in passenger demand forecasting and how distinctively spatial and temporal attributes are used in the development of the models. External factors like weather and events are also considered by the researchers in the development of the model. In the end, what are the challenges in the PT passenger demand forecasting are discussed and directions for future research are given.

Transforming urban mobility with internet of things: public bus fleet tracking using proximity-based bluetooth beacons

Article

Full-text available

Dec 2023

In today’s fast-paced world, efficient and reliable public transportation systems are crucial for optimising time and reducing carbon dioxide emissions. However, developing countries face numerous challenges in their public transportation networks, including infrequent services, delays, inaccurate and unreliable arrival times, long waiting time, and limited real-time information available to the users. GPS-based systems have been widely used for fleet management, but they can be a significant infrastructure investment for smaller operators in developing countries. The accuracy of the GPS location can be easily affected by the weather condition and GPS signals are susceptible to spoofing attacks. When the GPS device is faulty, the entire location traces will be unavailable. This paper proposes the use of Internet-of-Things (IoT)-enabled Bluetooth Low Energy (BLE) systems as an alternative approach to fleet tracking for public bus service. The proposed approach offers simplicity and easy implementation for bus operators by deploying BLE proximity beacons on buses to track their journeys, with detection devices using Raspberry Pi (RPi) Zero strategically placed at terminals and selected stops. When the bus approaches and stops at the bus stops, the BLE advertisements emitted by the proximity beacons can be reliably detected by the RPi Zero. Experiment results show that the BLE signals can be detected up to 20 m in range when the RPi Zero is placed inside a metal enclosure. The location of the bus is then sent to the cloud to estimate the arrival times. A field trial of the proposed IoT-based BLE proximity sensing system involving two public bus services in southern Malaysian cities, namely, Johor Bahru , Iskandar Puteri and Kulai is presented. Based on the data collected, a bus arrival time estimation algorithm is designed. Our analysis shows that there was a 5–10 min reduction in journey time on public holidays as compared to a normal day. Overall, the paper emphasises the importance of addressing public transportation challenges. It also describes the challenges, experience, and mitigation drawn from the deployment of this real-world use case, demonstrating the feasibility and reliability of IoT-based proximity sensing as an alternative approach to tracking public bus services.

Probabilistic Forecasting of Bus Travel Time with a Bayesian Gaussian Mixture Model

Article

Oct 2023

Accurate forecasting of bus travel time and its uncertainty is critical to service quality and operation of transit systems: it can help passengers make informed decisions on departure time, route choice, and even transport mode choice, and it also support transit operators on tasks such as crew/vehicle scheduling and timetabling. However, most existing approaches in bus travel time forecasting are based on deterministic models that provide only point estimation. To this end, we develop in this paper a Bayesian probabilistic model for forecasting bus travel time and estimated time of arrival (ETA). To characterize the strong dependencies/interactions between consecutive buses, we concatenate the link travel time vectors and the headway vector from a pair of two adjacent buses as a new augmented variable and model it with a mixture of constrained multivariate Gaussian distributions. This approach can naturally capture the interactions between adjacent buses (e.g., correlated speed and smooth variation of headway), handle missing values in data, and depict the multimodality in bus travel time distributions. Next, we assume different periods in a day share the same set of Gaussian components, and we use time-varying mixing coefficients to characterize the systematic temporal variations in bus operation. For model inference, we develop an efficient Markov chain Monte Carlo (MCMC) algorithm to obtain the posterior distributions of model parameters and make probabilistic forecasting. We test the proposed model using the data from two bus lines in Guangzhou, China. Results show that our approach significantly outperforms baseline models that overlook bus-to-bus interactions, in terms of both predictive means and distributions. Besides forecasting, the parameters of the proposed model contain rich information for understanding/improving the bus service, for example, analyzing link travel time and headway correlation using covariance matrices and understanding time-varying patterns of bus fleet operation from the mixing coefficients. Funding: This research is supported in part by the Fonds de Recherche du Quebec-Societe et Culture (FRQSC) under the NSFC-FRQSC Research Program on Smart Cities and Big Data, the Canadian Statistical Sciences Institute (CANSSI) Collaborative Research Teams grants, and the Natural Sciences and Engineering Research Council (NSERC) of Canada. X. Chen acknowledges funding support from the China Scholarship Council (CSC). Supplemental Material: The e-companion is available at https://doi.org/10.1287/trsc.2022.0214 .

Prediction Model of Ship Arrival Time using Neural Network and Kalman Filter

Conference Paper

Feb 2023

Determination of Nearest Emergency Service Office using Haversine Formula Based on Android Platform

Article

Full-text available

Jan 2018

Emergency Reporting Application is an android-based application that serves to help the community in reporting the emergency condition. This application allows users to choose and contact the emergency services office, without the need to notice their position and phone number. Selection of emergency services office is also automatically selected by the system by taking into account the distance between the complainant and the emergency services office. The selected emergency services office is the nearest emergency service office from the complainant so that the delay in coming assistance can be minimized. Therefore, this proposed application requires a GPS feature to recording, reporting and SMS positioning for message delivery of reports. The distance between the position of the complainant and the position of the emergency service office, in the form of latitude and longitude data, is requested using the Haversine formula taking into account the degree of curvature of the earth. Emergency service offices include police and hospital offices spread over 25 different districts. Furthermore, the reporter's position calculation results were compared with all selected emergency service offices and obtained 1 nearest emergency service office. Calculating the accuracy and delay value of the system will do system testing. Accuracy test results using the method of 100% Haversine and the average delay of the system is 4.5 seconds.

A reckoning algorithm for the prediction of arriving passengers for subway station networks

Article

Full-text available

Feb 2020

Knowing the volume of arriving passengers (APs) is fundamental for optimizing their paths through subway stations and evacuating them under emergency conditions. To predict AP volume online, we first analyze arrival and departure parameters and discuss the relationships among various parameters to determine the train a passenger will most likely take. Interconnecting stations and transfer paths among stations are considered direct connections and direct transfer connections, respectively, to define and construct traveling route sets. Then, travel time chains (TTCs) of transfer and nontransfer passengers are constructed to illustrate the possible routes and time costs between the origin and destination (O/D) stations of passengers. Furthermore, based on TTCs, train capacities and the inbound and outbound times of passengers accessed from an automated fare collection system, we predict the AP volumes at specified stations using a stage-by-stage reckoning algorithm in real time. Finally, to validate the model and the algorithm, we estimate the AP volume for the Beijing Subway network.

Applying a multistage of input feature combination to random forest for improving MRT passenger flow prediction

Article

Full-text available

Nov 2019

As one of the main public transport systems all over the world, mass rapid transit (MRT) is widely served in the metropolitan areas. To meet the increasing travel demands in the future, accurately predicting MRT passenger flow is becoming more and more urgent and crucial. This paper aims to use an experimental way to objectively quantify and analyze the impacts of various combinations of traditional input features to improve the accuracy of MRT passenger flow prediction. We have built a series of passenger flow prediction models with different input features using a random forest approach. The features of passenger flow direction, temporal date, national holiday, lunar calendar date, previous average hourly passenger flow, and previous k-step hourly passenger flow and their trends are selected and applied in a multi-stage of the input feature combination. The typical encoding strategies of the input features have been further discussed and implemented. Finally, the optimal combination of the input features has been proposed with a case study at Taipei Main Station. The experimental results show that the proposed optimal combination of the input features and their appropriate codes can be helpful to improve the accuracy of passenger flow prediction, not only for the prediction results on weekdays and weekends, but also for them on national holidays.

Capacity optimization and allocation of an urban rail transit network based on multi-source data

Article

Full-text available

Jan 2019

This study establishes a multi-objective optimization model for the capacity allocation of an urban rail transit network based on multi-source data on the Beijing metro passenger flow. The model considers the operating costs of trains and the expenses related to the waiting time of transferring passengers. The model constraints include the distribution characteristics of passenger flow, headway, load factor, and available trains. The capacity allocation scheme for 16 railway lines was obtained by adopting a model of the Beijing rail transit network and its passenger flow. We also analyzed the frequency at which Line 4 is reduced from 55 to 45 and the frequency at which Line 10 is reduced from 60 to 52 if a short turn is adopted. In addition, when the upper limit of the load factor increased from 60 to 70%, the operational costs were reduced by 4.6%, while the total passenger waiting time increased by 1%. The transfer costs changed the capacity optimization and allocation scheme, and the proportion of the transfer cost among the total costs increased when the time value increased.

Short-Term Forecasting of Passenger Demand under On-Demand Ride Services: A Spatio-Temporal Deep Learning Approach

Article

Full-text available

Jun 2017

Short-term passenger demand forecasting is of great importance to the on-demand ride service platform, which can incentivize vacant cars moving from over-supply regions to over-demand regions. The spatial dependences, temporal dependences, and exogenous dependences need to be considered simultaneously, however, which makes short-term passenger demand forecasting challenging. We propose a novel deep learning (DL) approach, named the fusion convolutional long short-term memory network (FCL-Net), to address these three dependences within one end-to-end learning architecture. The model is stacked and fused by multiple convolutional long short-term memory (LSTM) layers, standard LSTM layers, and convolutional layers. The fusion of convolutional techniques and the LSTM network enables the proposed DL approach to better capture the spatio-temporal characteristics and correlations of explanatory variables. A tailored spatially aggregated random forest is employed to rank the importance of the explanatory variables. The ranking is then used for feature selection. The proposed DL approach is applied to the short-term forecasting of passenger demand under an on-demand ride service platform in Hangzhou, China. Experimental results, validated on real-world data provided by DiDi Chuxing, show that the FCL-Net achieves better predictive performance than traditional approaches including both classical time-series prediction models and neural network based algorithms (e.g., artificial neural network and LSTM). This paper is one of the first DL studies to forecast the short-term passenger demand of an on-demand ride service platform by examining the spatio-temporal correlations.

Bus arrival time prediction using mixed multi-route arrival time data at previous stop

Article

Full-text available

May 2017

The primary objective of this paper is to develop models to predict bus arrival time at a target stop using actual multi-route bus arrival time data from previous stop as inputs. In order to mix and fully utilize the multiple routes bus arrival time data, the weighted average travel time and three Forgetting Factor Functions (FFFs) – F1, F2 and F3 – are introduced. Based on different combinations of input variables, five prediction models are proposed. Three widely used algorithms, i.e. Support Vector Machine (SVM), Artificial Neutral Network (ANN) and Linear Regression (LR), are tested to find the best for arrival time prediction. Bus location data of 11 road segments from Yichun (China), covering 12 bus stops and 16 routes, are collected to evaluate the performance of the proposed approaches. The results show that the newly introduced parameters, the weighted average travel time, can significantly improve the prediction accuracy: the prediction errors reduce by around 20%. The algorithm comparison demonstrates that the SVM and ANN outperform the LR. The FFFs can also affect the performance errors: F1 is more suitable for ANN algorithm, while F3 is better for SVM and LR algorithms. Besides, the virtual road concept in this paper can slightly improve the prediction accuracy and halve the time cost of predicted arrival time calculation. First published online 02 May 2017

Using LSTM and GRU neural network methods for traffic flow prediction

Conference Paper

Full-text available

Nov 2016

Accurate and real-time traffic flow prediction is important in Intelligent Transportation System (ITS), especially for traffic control. Existing models such as ARMA, ARIMA are mainly linear models and cannot describe the stochastic and nonlinear nature of traffic flow. In recent years, deep-learning-based methods have been applied as novel alternatives for traffic flow prediction. However, which kind of deep neural networks is the most appropriate model for traffic flow prediction remains unsolved. In this paper, we use Long Short Term Memory (LSTM) and Gated Recurrent Units (GRU) neural network (NN) methods to predict short-term traffic flow, and experiments demonstrate that Recurrent Neural Network (RNN) based deep learning methods such as LSTM and GRU perform better than auto regressive integrated moving average (ARIMA) model. To the best of our knowledge, this is the first time that GRU is applied to traffic flow prediction.

Support vector regression machines

Article

Full-text available

Jan 1997
Adv Neural Inform Process Syst

Travel time prediction with LSTM neural network

Conference Paper

Nov 2016

Travel time is one of the key concerns among travelers before starting a trip and also an important indicator of traffic conditions. However, travel time acquisition is time delayed and the pattern of travel time is usually irregular. In this paper, we explore a deep learning model, the LSTM neural network model, for travel time prediction. By employing the travel time data provided by Highways England, we construct 66 series prediction LSTM neural networks for the 66 links in the data set. Through model training and validation, we obtain the optimal structure within the setting range for each link. Then we predict multi-step ahead travel times for each link on the test set. Evaluation results show that the 1-step ahead travel time prediction error is relatively small, the median of mean relative error for the 66 links in the experiments is 7.0% on the test set. Deep learning models considering sequence relation are promising in traffic series data prediction.

Bus Arrival Time Prediction using Support Vector Machine with Genetic Algorithm

Article

Jan 2016

Accurate prediction of bus arrival time is of great significance to improve passenger satisfaction and bus attraction. This paper presents the prediction model of bus arrival time based on Support Vector Machine with genetic algorithm (GA-SVM). The character of the time period, the length of road, the weather, the bus speed and the rate of road usage are adopted as input vectors in Support Vector Machine (SVM), and the genetic algorithm search algorithm is combined to find the best parameters. Finally, the data from Bus No. 249 in Shenyang, China are used to check the model. The experimental results show that the forecasting model is superior to the traditional SVM model and the Artificial Neural Network (ANN) model in terms of the same data, and is of higher accuracy, which verified the feasibility of the model to predict the bus arrival time.

Predicting Irregularities in Arrival Times for Transit Buses with Recurrent Neural Networks Using GPS Coordinates and Weather Data

Abstract and Figures

Recommended publications

Multi-step Short Term Traffic Flow Forecasting Using Temporal and Spatial Data

Analyzing Public Transit Schedule Deviations: A Case Study on Montreal Using Real-Time Data

Bus Arrival Time Prediction Using Recurrent Neural Network with LSTM Architecture

Bus Arrival Time Prediction with LSTM Neural Network

Performance comparison of machine learning methods in the bus arrival time prediction problem