Content uploaded by Ahmed Jaber
Author content
All content in this area was uploaded by Ahmed Jaber on Sep 23, 2022
Content may be subject to copyright.
Smart Cities Symposium Prague 2022
978-1-6654-7923-3/22/$31.00 ©2022 IEEE
Abstract— Bike-sharing services provide easy access to
environmentally-friendly mobility reducing congestion in urban
areas. Increasing demand requires more service planning based on
the behavior of bike-sharing users. The Time Series models
Seasonal Auto-Regressive Integrated Moving Average, Artificial
Neural Network, and Exponential Smoothing have been
investigated to reveal bike-sharing use for five years. Results show
that weekends are attracting more trips. Summer is the most
season influencing more demand. The model is predicted within a
seasonal trend with a three-day lag. Compared to the Exponential
Smoothing Model, SARIMA and ANN provide better predictions.
Similarities are obtained in the periods of COVID-19 and after
that, in the lags and highest days having bike-sharing trips. This
study helps decision-makers in forecasting bike-sharing trips.
Index Terms—ARIMA, ANN, Bike Sharing, Forecasting, Time
Series, COVID-19.
I. INTRODUCTION
n the 1960s, the first bike-sharing service (BSS) was
introduced in Amsterdam, the Netherlands, in response to the
rising use over owning bicycles. This development has
resulted in a rapid spread of bike-sharing systems around the
world [1, 2, 3].
In recent years, there has been a rise in interest in eco-friendly
transportation due to increasing traffic congestion and air
pollution [4]. Particularly, bike-sharing services provide an
environmentally friendly alternative in urban areas [1, 5].
Furthermore, bike-sharing has a positive impact on
economic, transportation, and health, and improves rider safety
by boosting driver awareness [6].
BSS have become popular tourist attractions because people
can use BSSs to explore and visit new locations [7] with a low-
cost alternatives [2]. As Xu et al. [8] found, there is a higher
demand for shared bikes and tourism attractions on weekends.
This research explores bike-sharing demand trend for four
years in Budapest, Hungary using the time series model
SARIMA (Seasonal Auto-Regressive Integrated Moving
Average). The city was chosen because the BSS is well
designed and has several ticket types and plans available. The
bicycle infrastructure in Budapest has good network coverage
[9, 10].
The structure of the paper is as follows: after a brief literature
The research was supported by Budapest University of Technology and
Economics, as well as, the company of “Donley Republic” for bike-sharing.
Ahmed Jaber (e-mail: ahjaber6@edu.bme.hu), Bálint Csonka (e-mail:
csonka.balint@kjk.bme.hu) are with the Department of Transportation
Technology and Economics, Faculty of Transportation Engineering and
review, a description of the methodology can be found in
Section 3. Results of the analysis are presented in section 4 with
interoperating more explanations.
II. LITERATURE REVIEW
Several review papers [1, 11] investigated factors affecting
bike-sharing demand, such as weather, built environment [12],
land use, public transportation, spatial aspects [13], socio-
demographics [14], temporal factors, and safety. Several
approaches were conducted to investigate bike sharing demand,
such as Poisson Regression, Negative Binomial, Random
Forest, ARIMA (Autoregressive Integrated Moving Average ),
etc. This research depends on the SARIMA method.
Research on bike-sharing using the ARIMA method was
conducted in several fields and objectives as follows. Cho et al.
[15] determined the main component of the adjacency matrix
and the node feature matrix to connect public transportation
modes to BSS. The results imply that the bus service has a
stronger connection to bike-sharing than the subway.
ARIMA was used as well in Azimi et al. research [16] to
determine how the built environment affects bike ridership in
Houston. The influence of the active stations on the weekend
daily average ridership is two times greater than the impact on
weekday ridership and weekly average ridership, according to
the findings. Temperature and wind speed have no significant
effect on daily average trip counts. Precipitation, on the other
hand, had a negative impact on daily average ridership. [17, 18,
19] utilized ARIMA to conduct a comparison with other models
of machine learning approaches for short-term demand in small
areas. Results showed that ARIMA is a good predictor for bike-
sharing schemes.
Feng et al. [20] estimated bike availability at bike stations
based on historical data. ARIMA and Markov queueing were to
be good predictors for bike availability. Dias et al. [21]
forecasted the status of the BSS stations in Barcelona using
ARIMA. The authors employed publicly available data, such as
weather forecast information, to categorize the state of the
stations using the Random Forest algorithm. Yoon et al. [22]
proposed a personal journey adviser for BSS users to assist
navigation in cities. In terms of trip time, the authors modeled
the behavior of real mobile bikers using ARIMA.
Vehicle Engineering, Budapest University of Technology and Economics,
H1111 Budapest, Műegyetem rkp. 3., Hungary and Janos Juhász (e-mail:
juhasz.janos@emk.bme.hu) are with the Faculty of Civil Engineering,
Budapest University of Technology and Economics, H1111 Budapest,
Műegyetem rkp. 3., Hungary.
Long Term Time Series Prediction of Bike
Sharing Trips: A Cast Study of Budapest City
Ahmed JABER, Bálint CSONKA, and Janos JUHÁSZ
I
2022 Smart City Symposium Prague (SCSP) | 978-1-6654-7923-3/22/$31.00 ©2022 IEEE | DOI: 10.1109/SCSP54748.2022.9792540
Authorized licensed use limited to: BME OMIKK. Downloaded on June 14,2022 at 06:33:24 UTC from IEEE Xplore. Restrictions apply.
In regards to Artificial Neural Network models in the bike-
sharing field, Liu et al. [23] developed an ANN model for
predicting bike-sharing demand at several stations locations in
New York City, US resulting in an accuracy of 85.2%. Ma et
al. [24] predicted short-run bike-sharing demand at a station
level using historical weather data, users’ personal information,
and land-use data. Similarly, Thu et al. [25] in New York City,
compared the Weighted KNN with Artificial Neural Networks
(ANNs). The experimental results show that the ANNs have
better performance indicating that it is possible to use multi-
source data for predicting the bike pick-up demand with high
accuracy.
III. METHODOLOGY
ARIMA models forecast time series considering periodicity
and seasonality based on equally spaced univariate time series
data, transfer function data, and intervention data. Because
ARIMA was the most representative model in the time-series
domain, many studies in the transportation field used it as a
baseline. In this study, we applied SARIMA on a weekly basis
to consider weekly differences that may cause bias in regression
models.
SARIMA (p,d,q)(P,D,Q) is expressed in equation (1):
(1)
Where
dependent variables, which are daily bike trips;
;
constant term;
coefficient in AR (Auto-Regressive);
coefficient in MA (Moving Average);
p the order of the AR;
Ps the order of the AR-Seasonal;
q the order of the MA;
Q the order of the MA-Seasonal;
d non-seasonal difference;
D seasonal difference.
Moreover, a conventional artificial neural network (ANN) is
applied in the research, which is one of the most used artificial
intelligence algorithms for modeling time-series data in
transportation [26, 27]. ANNs are fully convolutional systems
comprised of a network of interconnected artificial neurons.
They are generally arranged in layers. The input layer presents
objects to an ANN, which communicates with one or more
hidden layers via weighted connections. The hidden layers are
linked to the output layer in the same way. ANN is widely used
as a useful analytical tool to solve prediction problems in
various research fields [25].
Additionally, as a time-series decomposition method, an
Exponential Smoothing (ETS) model predicts future values
based on existing values that follow a seasonal pattern. The
calculation of this function is based on triple exponential
smoothing. Overall smoothing, trend smoothing, and seasonal
smoothing are used in this approach.
The function calculates the results by multiplying the
seasonality (S) with the trend component (T), which is a longer-
term regular pattern of a time series that differs from the yearly
recurring (7 days) seasonality pattern. After regression analysis,
the trend is calculated by accumulating the least-square
regression coefficients [28].
IV. RESULTS AND DISCUSSION
The dataset consists of daily trips from October 2017 to
March 2022 in Budapest, Hungary. It was noticed that there is
a rise in the trips of the bike-sharing system. The number of
bike trips was 289% at the end of the period than at the
beginning, indicating increasing demand for this kind of
transportation mode.
Temporally, it is found that the summer season has the most
influence on bike-sharing users, with approximately 40% share
across the year. Specifically, August and July have the most
bike-sharing trips, which is expected due to the good weather
and more touristic movement to the city. Saturdays draw the
most trips, while Monday is the least. In line with that,
weekends attract the third of total trips. Figures (1) and (2)
present the daily and monthly distribution in average values for
bike-sharing trips.
Fig. 1. Average daily trips per day
Fig. 2. Average daily trips per month
For ARIMA analysis, in order to achieve the model that can
fit the data in the best way, an automatic function was used. The
function finds all possible models and chooses the best model
along with the normalized Bayesian information criteria
(NBIC). Ljung-Box (LB) statistic test was used to determine
model suitability. Models with an LB significance value of
377
384
399 396 395
413
396
350
360
370
380
390
400
410
420
Mon Tue Wed Thu Fri Sat Sun
AVERAGE DAILY TRIPS
DAY
0
100
200
300
400
500
600
700
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
AVERAGE DAILY TRIPS
MONTH
Authorized licensed use limited to: BME OMIKK. Downloaded on June 14,2022 at 06:33:24 UTC from IEEE Xplore. Restrictions apply.
more than 0.05 were considered suitable [29]. Table I shows the
R-squared, normalized Bayesian information criteria as well as
the Ljung-Box test.
TABLE I
MODEL FIT STATISTICS
R-squared 0.964
Normalized BIC 8.741
Ljung-Box 0.183
Accordingly, the best fit combination of (p,d,q)(P,D,Q) is
(0,1,3)(1,0,1). The outcome plotting of actual and predicted
data is shown in Figure (3)
Fig. 3. Actual and Predicted Data
The resulted SARIMA combination values of (0,1,3)(1,0,1)
mean that there is no Auto Regressive (AR) parameter in the
equation, and the Moving Average (MA) variable depends on
three lags. The estimation of these parameters is shown in Table
II.
TABLE II
ARIMA MODEL PARAMETERS
Estimate Significance
Difference 1 0.000
MA Lag 1
MA Lag 2
MA Lag 3
0.543
0.192
0.096
0.000
0.000
0.000
AR, Seasonal Lag 1 0.073 0.017
MA, Seasonal 0.919 0.000
This leads to the fact that the forecasted value of bike-sharing
trips in a day depends on three previous successive days.
The final equation of SARIMA model that is used for
forecasting is represented as follows:
!"
(2)
To find the accuracy of the model, we calculated the Root
Mean Square Error (RMSE), and Coefficient of Variation (CV)
as performance indicators, as shown in the equations (3) and
(4), where #$ is the actual values, %$ is the predicted value, and
n is the total number of cases (days).
&'() *
+,-
+
$. #$%$ (3)
CV = RMSE / Mean (4)
In the SARIMA model, RMSE is 74.06, which is acceptable
as the CV is 18.7% [30].
V. COMPARISON TO ETS AND ANN MODELS
The model fit statistics for ETS are as follows: R-squared
equals 0.958, the normalized BIC is 8.963, while Ljung-Box
test is significant (>0.05). The auto-function used alpha as
0.296 and delta as 0.091.
For the estimation accuracy, RMSE for the exponential
smoothing model is 87.93. Thus, the coefficient of variation
equals 22.2%. As a result, SARIMA is better than ETS for
explaining this data on bike-sharing in Budapest.
Several multilayer perceptron (MLP) types are applied to get
the best results for ANN. After that, the best five types based
on the number of neurons in the hidden layer, hidden activation
function, and output activation function are compared to get the
optimal structure of ANN. The configuration that we used for
the ANN model is a feedforward neural network with at least 2
hidden layers. The activation functions used are hyperbolic
tangent sigmoid (T), identity (I), logistic (L), and exponential
(E).
The performance results for the best five networks are shown
in Table III.
TABLE III
ANN MODELS PERFORMANCE
Network Hidden
Layers
Hidden
Activation
Function
Output
Activation
Function
R2 RMSE
MLP 1-8-1 8 E L 96.9 84.7
MLP 1-5-1 5 L T 96.9 84.9
MLP 1-7-1 7 E L 96.8 85.4
MLP 1-6-1 6 L T 96.8 85.2
MLP 1-6-1 6 E E 96.6 87.9
Compared to the SARIMA model, it is clear that in terms of
R-squared, all models are approximately the same with a better
fit in ANN. In regards to RMSE, SARIMA performs better than
ANN does.
VI. COVID PANDEMIC EFFECTS ON BIKE-SHARING TRIPS
In this section, an analysis of the trend through different
periods is conducted, highlighting the main differences in
temporal behavior. The dataset is divided into the following
periods:
- Before COVID-19: from 1-10-2017 to 29-2-2020
- During the pandemic: from 1-03-2020 to 31-05-2021
- After COVID-19: from 1-06-2021 to 15-03-2022
Studying the monthly distribution is not that accurate to use
in the results as the periods are not symmetric, and some gaps
are shown in the one-cycle year. Though, there was a variance
in the highest months that involve bike-sharing trips; July and
August were the most in the pre-COVID, April and May were
the most during the pandemic, and January and October were
the most after the COVID period.
Before the pandemic, weekends had the most bike-sharing
trips in addition to Friday. During and after the pandemic, bike-
sharing use was the highest on Friday, followed by Wednesday
and Thursday. Table IV shows the average daily bike-sharing
Authorized licensed use limited to: BME OMIKK. Downloaded on June 14,2022 at 06:33:24 UTC from IEEE Xplore. Restrictions apply.
trips in the three periods.
TABLE IV
AVERAGE DAILY BIKE-SHARING TRIPS AMONG PERIODS
Day Before During After
Monday 84.9 402.3 938.0
Tuesday 85.1 424.4 985.8
Wednesday 83.6 442.7 1003.3
Thursday 82.4 435.4 1020.9
Friday 94.2 453.5 1072.6
Saturday 110.3 422.6 924.5
Sunday 101.0 379.7 923.1
SARIMA models are conducted to investigate the temporal
trend characteristics in the three periods. For the pre-COVID
period, the best-fit combination of (p,d,q)(P,D,Q) is
(1,0,6)(0,1,1). There is a connection to the day before and the
six days before, while R-squared is 71.9, and RMSE equals
38.2. For the pandemic period, the best-fit combination of
(p,d,q)(P,D,Q) is (0,1,3)(1,0,1), which is similar to the allover
trend. In addition, there is a connection to the day earlier and
the third day earlier, while R-squared is 80.2, and RMSE equals
81.7. For the after the pandemic period, the best-fit combination
of (p,d,q)(P,D,Q) is (1,0,2)(1,0,1), and there is a connection to
the day before and two days before, while R-squared is 67.5,
and RMSE equals 135.4, see Table V.
TABLE V
ARIMA MODEL PARAMETERS FOR THE THREE PERIODS
Pre-COVID period
Estimate Significance
AR Lag 1 0.611 0.000
MA Lag 6 -0.139 0.000
Seasonal Difference 1 0.000
MA, Seasonal 0.767 0.000
COVID period
Estimate Significance
Difference 1 0.000
MA Lag 1
MA Lag 3
0.677
0.164
0.000
0.000
AR, Seasonal 0.989 0.000
MA, Seasonal 0.951 0.000
After-COVID period
Estimate Significance
Constant 976.2 0.000
AR Lag 1 0.904 0.000
MA Lag 1
MA Lag 2
0.344
0.199
0.000
0.005
AR, Seasonal 0.998 0.000
MA, Seasonal 0.976 0.000
VII. CONCLUSION
In this research, several time series models are examined and
applied to reveal the temporal trend of bike-sharing usage over
a long period (5 years). The Seasonal Auto-Regressive
Integrated Moving Average (SARIMA), Exponential
Smoothing (ETS), and Artificial Neural Network (ANN)
models are used. In general, Saturdays, Sundays, and the
summer period have more influence on increasing bike-sharing
trips. SARIMA and ANN explained the data significantly
within acceptable fitting in similar ways over ETS. The moving
average is affected by three days lag. In aspects of COVID-19
period, bike-sharing usage increased during and after the
pandemic. The main differences between the periods are: the
number of bike-sharing trips was the highest on weekends
before the pandemic and the highest on Fridays and midweek
days after the pandemic. Also, the number of bike-sharing trips
before the pandemic was affected by the trips 6 days earlier
(lag=6), while during and after the pandemic, it was reduced to
two and three days. This helps decision-makers and researchers
in predicting bike-sharing trips more effectively.
REFERENCES
[1]
E. Eren and V. E. Uz, "A review on bike-sharing: The factors affecting
bike-sharing demand," Sustainable Cities and Society, vol. 54, no.
March 2020, p. 101882, 2020.
[2]
X. Wei, S. Luo and Y. Nie, "Diffusion behavior in a docked bike-
sharing system," Transportation Research Part C: Emerging
Technologies, vol. 107, no. October 2019, pp. 510-524, 2019.
[3]
A. Jaber, J. Juhász and B. Csonka, "An Analysis of Factors Affecting
the Severity of Cycling Crashes Using Binary Regression Model,"
Sustainability, vol. 13, no. 12, p. 6945, 2021.
[4]
R. Desta, D. Tesfaye and J. Tóth, "Microscopic Traffic Characterization
of Light Rail Transit Systems at Level Crossings," Advances in Civil
Engineering, vol. 2021, pp. 1-11, 2021.
[5]
S. Cai, X. Long, L. Li, H. Liang, Q. Wang and X. Ding, "Determinants
of intention and behavior of low carbon commuting through bicycle-
sharing in China," Journal of Cleaner Production, vol. 212, no. March
2019, pp. 602-609, 2019.
[6]
E. Murphy and J. Usher, "The Role of Bicycle-sharing in the City:
Analysis of the Irish Experience," International Journal of Sustainable
Transportation, vol. 9, no. 2, pp. 116-125, 2012.
[7]
M. Kabak, M. Erbaş, C. Çetinkaya and E. Özceylan, "A GIS-based
MCDM approach for the evaluation of bike-share stations," Journal of
Cleaner Production, vol. 201, no. November 2018, pp. 49-60, 2018.
[8]
Y. Xu, D. Chen, X. Zhang, W. Tu, Y. Chen, Y. Shen and C. Ratti,
"Unravel the landscape and pulses of cycling activities from a dockless
bike-sharing system," Computers, Environment and Urban Systems,
vol. 75, no. May 2019, pp. 184-203, 2019.
[9]
S. Nagy and C. Csiszár, "Assessment Methods for Comparing Shared
Mobility and Conventional Transportation Modes in Urban Areas,"
Periodica Polytechnica Social and Management Sciences, 2022.
[10]
D. SILVA, D. FÖLDES and C. CSISZÁR, "The effect of modal shift to
micromobility upon the parking demand," in Smart City Symposium
Prague (SCSP), 2021, 2021.
[11]
E. Fishman, "Bikeshare: A Review of Recent Literature," Transport
Reviews: A Transnational Transdisciplinary Journal, pp. 1-22, 2015.
[12]
E. Fishman, S. Washington, N. Haworth and A. Mazzei, "Barriers to
bikesharing: an analysis from Melbourne and Brisbane," Journal of
Transport Geography, vol. 41, no. December 2014, pp. 325-337, 2014.
[13]
D. Fuller, L. Gauvin, Y. Kestens, M. Daniel, M. Fournier, P. Morency
and L. Drouin, "Use of a New Public Bicycle Share Program in
Montreal, Canada," American Journal of Preventive Medicine, vol. 41,
no. 1, pp. 80-83, 2011.
[14]
D. Buck, R. Buehler, P. Happ, B. Rawls, P. Chung and N. Borecki,
"Are Bik
eshare Users Different from Regular Cyclists?: A First Look at
Short-Term Users, Annual Members, and Area Cyclists in the
Washington, D.C., Region," Transportation Research Record: Journal
of the Transportation Research Board, vol. 2387, no. 1, pp. 112-119,
2013.
[15]
J. H. Cho, S. W. Ham and D. K. Kim, "Enhancing the Accuracy of
Peak Hourly Demand in Bike-Sharing Systems using a Graph
Convolutional Network with Public Transit Usage Data,"
Transportation Research Record: Journal of the Transportation
Research Board, 2021.
[16]
M. Azimi, L. Zhou and Y. Qi, "Exploring the Impact of Infrastructure
on Bike Sharing System Performance in Houston City," Center for
Authorized licensed use limited to: BME OMIKK. Downloaded on June 14,2022 at 06:33:24 UTC from IEEE Xplore. Restrictions apply.
Advanced Multimodal MobilitySolutions and Education, Charlotte,
2021.
[17]
B. Wang, H. L. Vu, I. Kim and C. Cai, "Short-term traffic flow
prediction in bike-sharing networks," Journal of Intelligent
Transportation Systems, 2021.
[18]
Y. Yang, A. Heppenstall, A. Turner and A. Comber, "Using graph
structural information about flows to enhance short-term demand
prediction in bike-sharing systems," Computers, Environment and
Urban Systems, vol. 83, no. September 2020, p. 101521, 2020.
[19]
T. S. Kim, W. K. Lee and S. Y. Sohn, "Graph convolutional network
approach applied to predict hourly bike-sharing demands considering
spatial, temporal, and global effects," PLOS ONE, vol. 14, no. 9, 2019.
[20]
C. Feng, J. Hillston and D. Reijsbergen, "Moment-based availability
prediction for bike-sharing systems," Performance Evaluation, vol.
117, no. December 2017, pp. 58-74, 2017.
[21]
G. M. Dias, B. Bellalta and S. Oechsner, "Predicting Occupancy Trends
in Barcelona’s Bicycle Service Stations Using Open Data," in 2015 SAI
Intelligent Systems Conference (IntelliSys), London, UK, 2015.
[22]
J. W. Yoon, F. Pinelli and F. Calabrese, "Cityride: a predictive bike
sharing journey advisor," in International Conference on Mobile Data
Management, Bengaluru, India, 2012.
[23]
J. Liu, Q. Li, M. Qu, W. Chen, J. Yang, H. Xiong, H. Zhong and Y. Fu,
"Station Site Optimization in Bike Sharing Systems," in IEEE
International Conference on Data Mining, 2015.
[24]
X. Ma, Y. Yin, Y. Jin, M. He and M. Zhu, "Short-Term Prediction of
Bike-Sharing Demand Using Multi-Source Data: A Spatial-Temporal
Graph Attentional LSTM Approach," Applied Sciences, vol. 12, no. 3,
p. 1161, 2022.
[25]
N. T. H. Thu, L. T. Thanh, C. T. P. Dung, N. Linh-Trung and H. V. Le,
"Multi-source data analysis for bike sharing systems," in 2017
International Conference on Advanced Technologies for
Communications (ATC), 2017, 2017.
[26]
V. Albuquerque, M. S. Dias and F. Bacao, "Machine Learning
Approaches to Bike-Sharing Systems: A Systematic Literature
Review," ISPRS, vol. 10, no. 2, p. 62, 2021.
[27]
M. He, X. Ma and Y. Jin, "Station Importance Evaluation in Dynamic
Bike-Sharing Rebalancing Optimization Using an Entropy-Based
TOPSIS Approach," IEEE Access, vol. 9, pp. 38119-38131, 2021.
[28]
A. Jaber and J. Juhász, "Measuring and Forecasting of Passengers
Modal Split Through Road Accidents Statistical Data," in Intelligent
Solutions for Cities and Mobility of the Future. TSTP 2021. Lecture
Notes in Networks and Systems, 2021.
[29]
M. M. Kifle, T. T. Teklemariam, A. M. Teweldeberhan, E. H.
Tesfamariam, A. K. Andegiorgish and E. A. Kidane, "Malaria Risk
Stratification and Modeling the Effect of Rainfall on Malaria Incidence
in Eritrea," Journal of Environmental and Public Health, vol. 2019, p.
11, 2019.
[30]
L. Al-
Hyari and M. Kassai, "Development and Experimental Validation
of TRNSYS Simulation Model for Heat Wheel Operated in Air
Handling Unit," Energies, vol. 13, p. 4957, 2020.
[3
1]
D. Földes and C. Csiszár, "Personalised information services for
bikers," Int. J. Applied Management Science, vol. 10, no. 1, 2018.
Authorized licensed use limited to: BME OMIKK. Downloaded on June 14,2022 at 06:33:24 UTC from IEEE Xplore. Restrictions apply.