ArticlePDF Available

Supply level planning for shared e-scooters considering spatiotemporal heteroscedastic demand

Authors:
Transportation Research Interdisciplinary Perspectives 23 (2024) 101019
Available online 7 February 2024
2590-1982/© 2024 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Supply level planning for shared e-scooters considering spatiotemporal
heteroscedastic demand
Narith Saum
a
,
b
, Mongkut Piantanakulchai
a
,
*
, Satoshi Sugiura
b
a
School of Civil Engineering and Technology, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani 12120, Thailand
b
Division of Engineering and Policy for Sustainable Environment, Hokkaido University, Hokkaido 060-0808, Japan
ARTICLE INFO
Keywords:
Box Cox Transformation
Deep Learning
Machine Learning
SGARCH
Shared E-Scooters
Supply Planning
ABSTRACT
Accurate demand forecasting is a key success for mobility service businesses, especially shared electric (e-)
scooters, for their volatile demand, high operational costs, and strict regulations. The heteroscedasticity of
transportation demand is usually overlooked even it is very important for designing efcient supply manage-
ment. This study proposed a supply planning framework considering heteroscedasticity in the hourly e-scooter
demand. Three shared e-scooter datasets (Austin TX, Minneapolis MN, and Thammasat TH) were examined to
extract temporal patterns. These features were used as inputs for the demand prediction models, including
machine learning and deep learning models. Then, the squared residuals were subjected to variance prediction,
including constant or daily variance and variance predicted by Autoregressive Conditional Heteroscedasticity
(ARCH). Finally, the outputs of these models were combined to determine the supply level. Four supply level
models (with constant, daily, Seasonal Generalized ARCH or SGARCH, and Box Cox variances) were compared
based on the Mean Oversupply (MO) metric. As a result, demand prediction models with Box Cox transformed
data possibly provide higher prediction accuracy than those with original or normalized data, specically Mean
Absolute Error (MAE). Supply level models with Box Cox variance had the lowest MO at lower percentages of
served demand, whereas those with SGARCH variance had lower MO at higher percentages of served demand. At
95 % served demand, considering heteroscedastic demand in supply level planning could reduce oversupply by
26.22 %. From a policy perspective, operators could use our framework to minimize the demand uncertainty for
daily operation, along with other potential policies such as customer incentives and hybrid real-time and periodic
rebalancing.
Introduction
Shared electric scooters (e-scooters) have many advantages
compared to the existing shared bikes for their ease of registration,
parking and pick-up convenience (as a dockless mode), and a relaxed
riding experience. Consequently, e-scooter sharing services have gained
so much popularity in many big cities worldwide, where they help
mitigate several urban transportation problems such as congestion,
limited parking space, air pollution, and unsystematic public transit
connectivity. The history of e-scooter development and adoption to
sharing services, regulations, social perception, and advantages/disad-
vantages of this transportation mode was summarized by Saum and
Piantanakulchai (2019). The spatiotemporal comparison between
shared e-scooters and shared bikes was also examined (McKenzie, 2019;
Zhu et al., 2020). McKenzie (2019) studied these two shared modes in
Washington D.C., nding that station-based shared bikes were primarily
used for commuting while shared e-scooters were more commonly used
for leisure, recreation, or tourism. In Singapore, Zhu et al. (2020)
compared the usage patterns of these two shared modes. They concluded
that the usage pattern of shared e-scooters was spatially compact and
denser than shared bikes, although shared e-scooters required higher
costs for rebalancing and charging. In addition, they examined the
correlation between the hourly trip starts (and trip ends) with rainfall
and air temperature. Shared e-scooters were found to be time-saving
during rush hours compared to ride-hailing services (McKenzie, 2020).
According to the previous studies on shared e-scooters, the opera-
tional planning of this mode is more challenging compared to other
transportation modes for some reasons. First, the demand of shared e-
scooters is highly volatile due to their trip characteristics and purposes.
This is because shared e-scooters are typically used for short-range trips
* Corresponding author.
E-mail addresses: saumnarith@gmail.com (N. Saum), mongkut@siit.tu.ac.th (M. Piantanakulchai), sugiura@eng.hokudai.ac.jp (S. Sugiura).
Contents lists available at ScienceDirect
Transportation Research Interdisciplinary Perspectives
journal homepage: www.sciencedirect.com/journal/transportation-
research-interdisciplinary-perspectives
https://doi.org/10.1016/j.trip.2024.101019
Received 28 June 2022; Received in revised form 17 January 2024; Accepted 25 January 2024
Transportation Research Interdisciplinary Perspectives 23 (2024) 101019
2
of around 1.8 km or 14 min; otherwise, it would no longer be time and
cost-saving (McKenzie, 2020; Smith and Schwieterman, 2018). With a
higher fee than shared bikes, shared e-scooters are not preferable for
commuting trips but for leisure and tourism activities. These irregular
trip purposes and dockless policy lead to inaccurate demand prediction
and require a highly satisfying service level. Second, there are several
important regulations for operators, including registration fee per e-
scooter, limited number of e-scooters per operator, distribution regula-
tion, and response to any spot with excessive e-scooters (Blickstein et al.,
2019). Third, an e-scooter is a lightweight vehicle powered by batteries,
so it requires intensive maintenance (Zhu et al., 2020) and especially has
short-life service (Moreau et al., 2020). Lastly, charging and rebalancing
operations can produce more emissions than the replaced trips, which
may technically damage its environmentally friendly reputation (Ches-
ter, 2018; Moreau et al., 2020; Severengiz et al., 2020).
To deal with these challenging problems, proper operational plan-
ning for shared e-scooters is necessary to maximize their positive im-
pacts on urban mobility. Operational planning of shared e-scooters
consists of two main parts: trip forecasting and route optimization for
distribution and rebalancing. However, this study focused on the rst
part with the aim of extracting the spatiotemporal patterns from his-
torical trip data. From previous studies in Section 2, many robust pre-
diction models were proposed to forecast the transportation demand,
particularly shared bikes and e-scooters, but most of them only focused
on the accuracy performance. For this reason, the heteroscedasticity of
transportation demand was disregarded; hence, the information from
historical data was not effectively explored. Furthermore, the variance
of heteroscedastic data is not constant, so supply planning must consider
this variation. In other words, the inventory or supply level partly de-
pends on the residuals of the demand prediction model and the heter-
oscedasticity of the data, so variance analysis is required to achieve a
more effective supply level estimation. This can be accomplished by
developing a conditional variance model, such as SGARCH, or utilizing
data transformation techniques like Box Cox transformation.
This study provided three potential contributions to the eld of
shared e-scooters and supply level planning. First, the spatiotemporal
patterns of shared e-scooter demand were revealed based on three
different datasets: Thammasat University (Thailand), Minneapolis
(Minnesota), and Austin (Texas). Second, the heteroscedasticity of
shared e-scooter demand was accounted for in designing the supply
level, whereas the Mean Oversupply (MO) metric was proposed to
compare the efciency at a specic percentage of served demand. Lastly,
the advantages and disadvantages of Box Cox transformation were
revealed, including its impact on demand prediction accuracy and
supply level planning.
This paper is organized into six sections. Section 1 describes the
general background of shared e-scooters, the research gap, and the ob-
jectives of this study. Section 2 outlines the recent studies on spatio-
temporal prediction models in the eld of transportation and variance
analysis. The research framework and mathematical expressions are
claried in Section 3. Section 4 outlines the data preparation and
featuring, while Section 5 reports the demand prediction, variance
prediction, and supply planning. Lastly, Section 6 summarizes the
ndings and future studies.
Related work
As stated previously, the operational planning for shared e-scooters
is challenging for various reasons, such as the volatile demand, high
operating costs, and strict regulations. Safety stock is typically utilized to
tolerate demand volatility, whereas higher demand variation requires
greater safety stock or inventory (King, 2011). Consequently, opera-
tional costs for shared e-scooters are high due to unproductive e-scooters
(i.e., low usage per e-scooter per day), battery degradation, recharging
costs and emissions (Masoud et al., 2019), and maintenance costs (Zhu
et al., 2020). To improve vehicle equitability, shared e-scooter operators
are advised to distribute e-scooters to specic regions, such as low-
income, minority, and other disadvantaged communities (Clewlow
et al., 2018). However, this objective is difcult to achieve when there is
a limited number of shared e-scooters. To improve operational planning
efciency for shared e-scooters, previous studies related to demand
uncertainty, including demand prediction models and volatility anal-
ysis, were reviewed in this study.
The prediction of transportation demand is a time-series problem in
which the demand changes over time (daily, weekly, and seasonal
trends). Nonetheless, it is generally considered to be less volatile than
some time-series problems, such as the stock market. As a result, the
heteroscedasticity of transportation demand is mostly ignored, which
means some vital information for operational planning was lost. Con-
dence intervals and inventory levels, consisting of the expected demand
and predicted variance, are crucial for supply management. However,
numerous studies have investigated the rst term, and many models
were proposed, including statistical regression models, machine
learning algorithms, and deep learning models.
One of the most popular models for time-series data is the Autore-
gressive Integrated Moving Average (ARIMA), while its extension for
seasonal datasets is Seasonal-ARIMA (SARIMA), and other extensions
can be found in StataCorp (2013). Normality and stationarity are
required for ARIMA, but Box Cox transformation can cope with the rst
requirement (Rusyana et al., 2016). One of the most accurate models in
machine learning is Random Forest (RF) regression. This model could
have a comparable result with some deep learning architectures (Wang
and Kim, 2018). With the same concept of RF, XGBoost ts the data
based on gradient-boosted decision trees, which effectively reduces the
training time (Chen and Guestrin, 2016).
Recently, Deep Learning has gained popularity for its promising
performance over statistical regression and machine learning models,
especially with the availability of powerful computational devices. The
most basic deep learning models are Articial Neural Networks (ANNs),
which connect a bunch of nodes or articial neurons of one layer with
another layer using an activation function. To deal with the limitation of
ANNs on sequential data, Recurrent Neural Networks (RNNs) were
proposed by modifying the conventional perceptron to include the
outputs from the previous state, called the recurrent cell (Yu et al.,
2019). This recurrent cell was later extended with several gates inside
(forget gate, input gate, and output gate), called Long-Short Term
Memory Neural Networks (LSTM NNs), to improve the performance and
eliminate the limitations of RNNs, including vanishing and exploding
gradient (Hochreiter and Schmidhuber, 1997). Cho et al. (2014) com-
bined the forget gate and input gate into a single update gate to reduce
the number of trainable parameters of LSTM NNs, called Gated Recur-
rent Units (GRUs). Several other extensions of RNNs can be found in (Yu
et al., 2019).
These popular machine learning models and deep learning models
have been widely applied in transportation demand prediction for their
advantage of learning temporal patterns. For example, Xu et al. (2018)
used LSTM NNs to predict the spatiotemporal demand of dockless shared
bikes in Nanjing (Jiangsu), China. Similarly, Wang and Kim (2018)
employed RF, LSTM NNs, and GRUs to predict station-level bike avail-
ability in Suzhou (China), and these models yielded almost the same
performance. This architecture of LSTM NNs was also used to predict
multi-step bike availability (Liu et al., 2019). Likewise, Gradient
Boosting Regression Trees (GRBT), alongside four other baseline models
(ARIMA, RF, ANNs, and LSTM NNs), were employed to forecast station-
level shared-car rentals in Shanghai (Wang et al., 2021). To improve the
performance of standard LSTM NNs, Le Quy et al. (2019) proposed
Neighborhood-Augmented LSTM NNs that include the historical data of
neighboring regions to forecast taxi-passenger demand in Porto,
Portugal. Zhang et al. (2020) co-predict the taxi pick-up and drop-off
demands using a multi-task learning model consisting of three parallel
LSTM layers. The nonlinear Granger causality test was employed to
enhance the spatiotemporal feature selection of LSTM NNs for short-
N. Saum et al.
Transportation Research Interdisciplinary Perspectives 23 (2024) 101019
3
term taxi demand forecasting (Luo et al., 2021). Li et al. (2021) proposed
a Spatial-Temporal Memory Network (STMN) to predict the hourly de-
mand of bike-sharing in four different cities: Singapore, Taipei, Chicago,
and New York. Xu et al. (2020) examined a novel Multi-Block Hybrid
(MBH) model in predicting the supplydemand of the bike-sharing
system in Shanghai. A recent study utilized various tree-based models,
such as DT, RF, ExtraTree, XGBoost, CatBoost, and LightGBM, to forecast
the nationwide census-level population inow in the USA (Hu et al.,
2023).
Several machine learning and deep learning models were also
employed and adapted to predict the spatiotemporal demand of shared
e-scooters. For instance, He and Shin (2020) proposed a deep learning
model called graph capsule neural networks (GCScoot) to predict the
spatiotemporal trip ow of shared e-scooters in three cities: Austin TX,
Louisville KY, and Minneapolis MN. Ham et al. (2021) proposed an
EncoderRecurrent neural networkDecoder (ERD) framework to pre-
dict served and unmet demand of shared e-scooters operated in
Gwangjin district, Seoul, South Korea. Similarly, Khan et al. (2022)
developed a bagging ensemble of XGBoost, RF, and Extra Tree regressors
to forecast the daily demand of e-scooter sharing on Jeju Island, South
Korea. The shared e-scooter demand in Austin TX, and Louisville KY was
predicted using a deep learning model called 3D-CloST, and then the
surplus and shortfall e-scooters were relocated by dedicated workers
using a simple greedy strategy (Tolomei et al., 2021).
On the other hand, volatility or variance analysis has been pre-
dominantly studied in the econometric eld, where it could provide
more valuable information to support decision-making. Autoregressive
Conditional Heteroscedasticity (ARCH) is a statistical regression model
used to predict future variance or volatility (StataCorp, 2013). ARCH has
two different models, ARCH in variance and ARCH in mean (ARCH-M).
ARCH has only the squared residuals from the previous lags as the in-
dependent variables, while the Generalized ARCH (GARCH) also in-
cludes the past variances. Many extensions of GARCH have been
proposed, such as Power ARCH, Threshold ARCH, Exponential ARCH,
etc. Several models of GARCH were applied to predict the return rate of
the daily closing price of the Shanghai and Shenzhen 300 Index (Wu,
2011). Similarly, Ti et al. (2019) employed ARCH-M (ARMA-GARCH
and ARMA-TARCH) to forecast the volatility of traditional and sus-
tainable stock indices from the FTSE4Good index series family.
Furthermore, ARMA-GARCH, SARIMA-GARCH, and SARIMA-SGARCH
were trained to predict the precipitation index (Zhang et al., 2019),
daily peak electricity demand (Sigauke and Chikobvu, 2011), and
internet trafc (Kim, 2011), respectively. ARCH-M could slightly
improve the prediction accuracy of ARIMA, but it may struggle with the
convergence criteria and training time. Recently, GARCH was combined
with some deep learning models to predict the price volatility of main
metals like Gold, Silver, and Copper (Hu et al., 2020; Kristjanpoller and
Hern´
andez, 2017).
Throughout the literature review, two research gaps were observed.
First, many machine learning models and comprehensive deep learning
architectures were proposed to forecast the transportation demand, but
none examined the residuals. These studies focused only on the accuracy
performance of demand prediction models, but they did not consider the
supply level planning, which partly depends on the demand variation.
Second, no attempts have been applied to the ARCH model to forecast
the conditional variance of transportation demand. Therefore, this
research aims to ll these gaps by bridging these two methodologies to
develop a practical supply level planning framework for the new
transportation mode, shared dockless e-scooters.
Methodology
Research framework
To achieve the purposes of this study, the research framework was
separated into ve steps, including data preparation, data
transformation, demand prediction, variance prediction, and supply
level design (see Fig. 1). The rst step involved collecting, encoding, and
featuring shared e-scooter data, weather attributes, annual events,
public holidays, day of the week, and time of the day. Based on the
literature review, the data were mostly normalized between 0 and 1 to
be trainable with some specic activation functions, but training using
the original scale was also found. Since Box Cox transformation could
improve the prediction accuracy and minimize the heteroscedasticity
effect (Saum et al., 2020), it was added as another data transformation
option in the second step.
In the third step, several machine learning and deep learning models
were developed to predict the hourly demand of shared e-scooters, while
their hyperparameters were optimized using Grid Search and Bayesian
Optimization. The demand prediction models included Seasonal
Autoregressive Integrated Moving Average with exogenous variables
(SARIMAX), Random Forest (RF), Extreme Gradient Boosting
(XGBoost), Fully Connected Neural Networks (FCNNs), Recurrent Neu-
ral Networks (RNNs), and Gated Recurrent Units (GRUs). The main
objective of the performance comparison between Box Cox and original/
normalized data was to show that, in contrast to original or normalized
data, the residuals of the prediction models using Box Cox transformed
data had no ARCH effects. Subsequently, the residuals of these models
were used to forecast the future variance.
Since Box Cox transformation can remove the heteroscedasticity
(Rusyana et al., 2016; Saum et al., 2020), its variance is constant. On the
other hand, the most accurate models between original and normalized
data were chosen for variance analysis under three cases: constant
variance, daily variance, and predicted variance by SGARCH. Therefore,
the variance of the Box Cox transformed data has only one model
(Constant Variance), and that of original or normalized data has three
models (Constant Variance, Daily Variance, and SGARCH Variance). For
variance analysis and supply level design, we only investigated three
models (SARIMAX, XGBoost, and GRUs) because XGBoost and RF have
comparable prediction results, and GRUs have similar performance to
FCNNs and RNNs (see Table 4). Finally, the predicted demand (step 3)
and predicted variance (step 4) were used to design the Supply Level in
Fig. 1. Research framework.
N. Saum et al.
Transportation Research Interdisciplinary Perspectives 23 (2024) 101019
4
step 5. In this step, the Mean Oversupply (MO) metric was proposed to
compare the performance of the four supply level models at a specic
percentage of served demand ranging from 70 % to 98 %. Its worth
noting that the proposed framework was primarily designed for one-step
(i.e., one hour) ahead prediction. However, it could be further extended
for multi-step ahead prediction in future studies or practical
implications.
Data transformation
This paper investigated three types of data transformation: original
scale, normalized scale, and Box Cox transformation. Normalization is a
popular transformation technique that likely improves prediction per-
formance. It has several formulations for different purposes, including
changing the input data to have the same scale (minmax normalization
or 01 scale) or similar distribution (mean and Z-score normalization).
Min-Max normalization in Eq. (1) was employed to transform inputs and
outputs so that deep learning models could have output activation
functions like Tanh and Sigmoid. The formulation of minmax
normalization was given as follows:
xnorm
t=xtmin(xt)(max(xt) min(xt)) (1)
where xnorm
t is the normalized scale of the variable (xt) at time interval t,
including the e-scooter demand and exogenous variables.
Box Cox is a powered monotonic transformation that stabilizes the
variance, minimizes skewness, and makes the data more Gaussian-like
based on the likelihood maximization technique. This transformation
requires the input data to be strictly positive, while the generalized form
supports both positive and negative data and improves the normality and
symmetry, called Yeo-Johnson Transformation (Yeo and Johnson, 2000).
The expressions of Box Cox transformation and log-likelihood function
are as follows:
xBC
t,r=
λ1
rxt,r+1λr1if λr= 0,xt,r0
lnxt,r+1if λr=0,xt,r0
xt,r+12λr1(2λr)if λr= 2,xt,r<0
lnxt,r+1if λr=2,xt,r<0
(2)
where θr=λr,
μ
r,
σ
2
r
, Xr=x1,r,x2,r,x3,r,xT,r
and xBC
t,r
μ
r,
σ
2
r.
Therefore, the best estimator of
μ
r and
σ
2
r could be computed by maxi-
mizing the log-likelihood function at any xed value of λr as following:
μ
r(λr) = 1
T
T
t=1
xBC
t,r(4)
σ
2(λr) = 1
T
T
t=1xBC
t,r
μ
r(λr)2(5)
Therefore,
θr=
λr,
μ
r(
λr),
σ
2(
λr)
could be obtained by maximizing
the log-likelihood function in Eq. (3). xBC
t,r is the Box Cox scale of e-
scooter demand xt,r at time t and region r. λr is the parameter of Box Cox
transformation for region r. This means that e-scooter demand was
transformed by Box Cox spatially independent, while other exogenous
variables were not transformed. Both input and output were trans-
formed unless the residuals were not homoscedastic. Since the hourly
demand of shared e-scooters is a nonnegative variable, the trans-
formation falls into the rst case of Eq. (2). However, this equation has
the maximum requirement, i.e., the maximum value of xBC
t,r must be less
than 1/λ, unless it cannot be converted back. In other words, the
predicted transformed demand
xBC
t,r, including the supply level, must
follow this requirement, specically when λ<0.
Demand prediction
GRUs model is a popular deep learning model in the family of
recurrent neural networks. This architecture has only two gates: reset
gate and update gate. Thus, it requires a shorter training time than LSTM
NNs (Wang and Kim, 2018) with comparable performance (Kumar et al.,
2018). For this reason, GRUs are more suitable for hyperparameter
tuning than LSTM NNs, particularly when many parameters need to be
optimized. The learning process of standard GRUs (Yu et al., 2019) could
be expressed as follows:
rt=sigmoid(Wrhht1+Wrx xt+br)(6)
zt=sigmoid(Wzhht1+Wzx xt+bz)(7)
ht=tanhW˜
hhr
tht1+W˜
hxxt+b˜
h(8)
ht= (1zt)ht1+z
t
ht(9)
sigmoid(x) = 1/(1+ex)(10)
tanh(x) = (exex)/(ex+ex)(11)
where denotes pointwise multiplication of two matrices called Hada-
mard product. W and b are trainable weight matrices and bias vectors,
respectively. rt is reset gate, and zt is update gate. In this case, the GRUs
output ht at time t is a linear interpolation between the previous output
ht1 and the candidate output
ht.
The introduction of other benchmark models (RNNs, FCNNs,
XGBoost, Random Forest, and SARIMAX) was provided in Section 5.1,
along with the hyperparameter setting. These demand prediction
models were compared based on two popular metrics, including Mean
Absolute Error (MAE) and Root Mean Squared Error (RMSE) as follows:
MAE =1
RT
R
r=1
T
t=1D(t,r)
D(t,r)(12)
RMSE =
1
RT
R
r=1
T
t=1D(t,r)
D(t,r)2
(13)
where D(t,r), and
D(t,r)are the actual hourly demand and predicted de-
mand of shared e-scooters at the time t of the region r, respectively.
Variance prediction
As we know, the actual data could not be accurately predicted
llT(θr|Xr) = T
2log(2
π
) T
2log
σ
2
r1
2
σ
2
r
T
t=1xBC
t,r
μ
r2+ (λr1)
T
t=1
signxt,rlogxt,r+1(3)
N. Saum et al.
Transportation Research Interdisciplinary Perspectives 23 (2024) 101019
5
(i.e., y=
y+
σ
(X)
ε
) as we assumed no related error in the observed
data. In this case,
y=E(y|X),E(
ε
|X) = 0,Var(
ε
|X) = E
ε
2|XE2(
ε
|X) = 1,
Var(y|X) =
σ
2(X)>0,and X and
ε
are independent. Homoscedasticity
refers to the condition that the variance is constant, otherwise hetero-
scedasticity. Model diagnostics are necessary for probabilistic-based
models, starting with assumptions such as data stationary, data distri-
bution, and homoscedasticity. However, this step was mostly neglected
in machine learning and deep learning. In the case of heteroscedastic
data, the variance can be formulated as the function of the random
variables X. In the time-series problem, the autocorrelation of squared
residuals and the Lagrange Multiplier (ARCH-LM) test were mainly
employed to conrm the heteroscedasticity of the residuals. The vari-
ance was usually formulated as the previous variance and squared error
function. This conditional variance stands on the idea that the periods of
high and low variance are grouped together (StataCorp, 2013). At this
point, there are two possible options: to allow or not to allow the con-
ditional variance to inuence the conditional mean. Simultaneous pre-
diction (i.e., including conditional variance into the conditional mean)
may have a nonconvex objective function, and it is computationally
expensive since there are more parameters to be estimated, especially
for hyperparameter tuning. Therefore, this study chose to predict the
expected mean and the conditional variance separately (i.e., ignore the
conditional variance on the expected mean). This technique has a few
advantages, such as uncomplicated model formulation (univariate
variance model) and easier hyperparameter tuning for both demand and
variance prediction. The disadvantage, however, is the possibility of
accuracy improvement from including the conditional variance in the
demand prediction model. For instance, Trapero et al. (2019) employed
ARIMA to predict the demand and GARCH to predict the variance for
safety stock estimation. Similarly, the residuals of demand prediction
from Section 3.3 were used to train the variance models. Three variance
models were formulated, including constant variance in Eq. (14), daily
seasonal variance in Eq. (15), and predicted variance by SGARCH in Eq.
(16) as given below:
σ
2
con(r) = 1
T
T
t=1
ε
2
(t,r)(14)
σ
2
seas(t,r) = 1
N
ε
2
(t24,r)+
ε
2
(t2*24,r)++
ε
2
(tN*24,r)(15)
σ
2
SGARCH(t,r) = a0+a1
ε
2
(t1,r)+a2
ε
2
(t2,r)+a3
ε
2
(t24,r)+b1
σ
2
(t1,r)
+b2
σ
2
(t2,r)+b3
σ
2
(t24,r)
(16)
where the constant variance of the region r,
σ
2
con(r)in Eq. (14) is simply
the average squared residuals of the predicted demand in that region.
Similarly, the seasonal variance
σ
2
seas(t,r)in Eq. (15) is the average
squared residuals of the predicted demand at the same hour of the day. N
is the total number of days. The average of seasonal variance theoreti-
cally equals the constant variance, but it is mostly slightly smaller
because the mean of evaluation residuals tends to differ from zero,
σ
2
con(r) = 1
2424
t=1
σ
2
seas(t,r) + 1
2424
t=1E[
ε
seas(t,r)] E
ε
(r)2. The con-
stant and daily seasonal variances were calculated based on the training
dataset. Lastly,
σ
2
SGARCH(t,r)in Eq. (16) is the predicted variance by
SGARCH, which was trained spatially independently. SGARCH was
trained with maximum log-likelihood estimation (StataCorp, 2013). A
daily seasonal pattern (S =24) was employed, while the insignicant
parameters (95 %) in this equation would be dropped.
Supply planning
As previously stated, shared dockless e-scooters face many chal-
lenges in daily operation such as short-range trips with unforeseeable
trip purposes, high operational costs, unproductive e-scooters, emissions
from rebalancing, and strict regulations. For station-based shared bikes,
the number of trips is mostly limited by the number of docks in the
station, but dockless shared e-scooters likely have a wider range of de-
mand (higher volatility). Thus, the operators must use rebalancing
strategies to balance unserved demand (or shortages) and operational
constraints. Rebalancing (or relocating) refers to the process of regularly
distributing (or collecting) the e-scooters to starving (or from excessive)
regions according to the target inventory level or supply level. Under
various operational constraints, the operator can choose between peri-
odic rebalancing (based on historical data) or real-time rebalancing.
Periodic rebalancing cost depends on the frequency, several times per
day or during peak hours. For real-time rebalancing, the rebalancing
vehicle may visit only a few locations if the number of available e-
scooters falls below (or higher than) the limited threshold value. Real-
time rebalancing can respond to unusual demand on time that can be
tracked through indicators such as local events or fairs, app login ac-
tivities, number of new registrations, and distribution of active users.
However, this strategy requires higher operational costs since the staff
must standby for rebalancing calls. Therefore, it is less popular in
practical operations (Shui and Szeto, 2020).
As shown in Fig. 1, the proposed framework in this study aims to
assist the periodic rebalancing by extracting all helpful information from
the historical data to forecast the future demand and variance for
designing effective supply levels or inventory levels. The research de-
nes the supply level as the total supply, which includes supplies from
the operators rebalancing, drop-offs, and available e-scooters around
the area. It is noted that operators aiming to rebalance e-scooters must
take into account variations in drop-off demand, stock level, and lead
time, all of which may be approximated using models in this study. In
our approach, we estimate the total supply from the demand side (pick-
up demand). This study focuses on determining the level of overall de-
mand prior to rebalancing; hence, the total supply is derived from the
demand side by using only pick-up data. In other words, the term
supply level(referring to inventory or order-up-to level) in this study
has the same formulation of the condence interval as the sum of pre-
dicted pick-up demand (from Section 3.3) and safety stock (based on the
predicted variance from Section 3.4). The comparison of supply level
models was examined to reveal the effectiveness of accounting for the
heteroscedasticity of shared e-scooter demand for operational planning.
Safety stock is the inventory to prevent stockout caused by uctu-
ating demand, forecast inaccuracy, and supply lead time (King, 2011).
For station-based shared bikes, the supply level is designed according to
the target service level (or probability of shortage event) for both pick-
up and drop-off trips (King, 2011; OMahony, 2015). In the case of
dockless shared e-scooters, the user has more freedom to nish the trip
anywhere, so the service level of drop-off trips can be ignored. Since
different supply level models have different service levels and backorder
levels, the curves of deviation from the target cycle service level and
backorder level (by scaled safety stock) are usually employed to
compare the supply level or inventory level models (Trapero et al.,
2019). However, this study compared the supply level models at the
same percentage of served demand (see Fig. 2); we thus could compare
them using only one metric, Mean Oversupply. The expressions of
Supply Level (S(t,r)), Served Demand (SDt,r), Percentage of Served De-
mand (P), Oversupply (Ot,r) and Mean Oversupply (MO) are dened as
below:
S(t,r)=
D(t,r)+d*
σ
(t,r)(17)
SD(t,r)=minD(t,r),S(t,r)(18)
P=
R
r=1
T
t=1
SD(t,r)
R
r=1
T
t=1
D(t,r)(19)
Ot,r=maxS(t,r)D(t,r);0(20)
N. Saum et al.
Transportation Research Interdisciplinary Perspectives 23 (2024) 101019
6
MO =1
RT
R
r=1
T
t=1
Ot,r(21)
where Eq. (17) shows the Supply Level, S(t,r), as the sum of the predicted
hourly demand
D(t,r)and the predicted safety stock at the time t interval
of region r. Safety stock in this equation is the product of the predicted
standard deviation
σ
(t,r)and safety stock parameter d in the function of
the target service level Zscore, lead time, and time increment (King, 2011;
Seo, 2020; Trapero et al., 2019). The predicted standard deviation
σ
(t,r)is
the square root of the predicted variance, as shown in Section 3.4,
possibly as the constant, daily seasonal, or conditional variance by
SGARCH. Lead time and time increment depend on rebalancing fre-
quency, so these two parameters are constant across different supply
level models. For this reason, these two parameters were chosen as the
unit value. For this case, d and Zscore are equivalent, but the safety stock
parameter (d)was manually adjusted to reach the target percentage of
served demand (see Fig. 2).
Served demand in Eq. (18) is the minimum of actual demand D(t,r)
and the supply level S(t,r). If the supply level is smaller than the actual
demand, there is some unserved demand (i.e., SD(t,r)=S(t,r)). On the
contrary, if the supply level is higher than the actual demand, there are
some oversupplies, as in Eq. (20) (i.e., SD(t,r)=D(t,r)and O(t,r)=
S(t,r)D(t,r)0). The percentage of served demand in Eq. (19) refers to
the total expected served demand ratio to the total actual demand. Since
the total actual demand equals the sum of served and unserved demand,
the percentage of unserved demand equals one minus the percentage of
served demand (1P). Therefore, a supply level model is considered
efcient if it has the smallest mean oversupply in Eq. (21) while
retaining the same percentage of served (or unserved) demand. At a
specic percentage of served demand, the value of the safety stock
parameter of each supply level model might have a different value, see
Fig. 2. R and T are the total number of regions and time intervals,
respectively.
Data preparation and featuring
Three different datasets were employed to examine the effectiveness
of the proposed framework and compare the temporal pattern. We got
the data from Neuron Mobility (https://www.rideneuron.com), the
operator of shared e-scooters in Thammasat University Rangsit Campus
(Thailand). The other two datasets were retrieved from open data
websites operating in Austin Texas (https://austintexas.gov/share
dmobility) and Minneapolis Minnesota (https://opendata.minneapolis
mn.gov), US.
Due to data limitations, historical trip data were commonly used to
evaluate the proposed methodologies in previous studies (He and Shin,
2020; Le Quy et al., 2019; Li et al., 2021; Xu et al., 2018; Zhang et al.,
2020). Similarly, this study examined the proposed framework based on
the observed demand data (or historical trip data). The data obtained by
Neuron Mobility and all available open data websites are the observed
demand (the actual services provided). To acquire the potential demand
(i.e., trips that might occur if there were available e-scooters), one may
need to access the data of the users activities on the mobile application
(Ham et al., 2021), such as the users requests or searches for potential e-
scooters nearby. These kinds of data were commonly unavailable unless
provided by the operators. However, there were also cases where the
observed data could be used, for example, when the e-scooters were not
fully utilized at most stations (in the case of Thammasat University, the
data were collected during the rst few months, whereas the demand of
e-scooters was still low). In this case, the observed demand equals the
potential demand. The operators needed to increase the safety stock to
cover a higher uncertainty of demand variation whenever the observed
demand data was used instead of the potential demand data. It is noted
that the potential demand data should be used in the planning when
they are available.
We removed the trips during the rst several months for Austin
because it mainly operated in the Downtown area. The abnormal trips
were removed using several criteria such as trip duration (less than 30 s
or more than 2 h), trip distance (less than 20 m or more than 10 km), and
date (out of nal date boundary). As a result, the total number of sam-
ples from Thammasat, Minneapolis, and Austin were 2,352 (24 x 98
days), 4,704 (24 x 196 days), and 13,680 (24 x 570 days), respectively,
during the date mentioned in Table 1.
In this study, the term demand refers to the total pick-up trips
during a specic time interval (1 h) and region. Thammasat Rangsit
Campus has an area of only 3.21 km
2
, so we attempted to predict overall
demand. In Austin, the data come from each census tract, which has an
average area of 2.05 km
2
. Shared e-scooters were operated in more than
50 census tracts of the Austin metropolitan area, but we selected only
the top 30 census tracts with an average hourly demand of more than
one trip. There was a signicant difference in the demand between the
Downtown and other census tracts, with an average hourly demand of
around 208 and 10, respectively, see Fig. 3. The trip locations of Min-
neapolis data were recorded using the street name. The street center thus
was used as the trips coordinates. To diversify the spatial clustering, the
K-means algorithm was employed to group the trips in Minneapolis. The
Elbow methods optimal number of spatial clustering was 15. Therefore,
the average area of these clusters was about 10 km
2
, but the inner
Fig. 2. Flowchart of supply level models comparison.
Table 1
Datasets information.
Description Thammasat (TH) Minneapolis (MN) Austin (TX)
Start Date 23-Jan-19 14-May-19 1-Aug-18
End Date 30-Apr-19 25-Nov-19 21-Feb-20
# Days 98 196 570
# Trips 29,132 913,781 8,689,720
# Time Intervals (T) 2,352 4,704 13,680
# Regions (R) 1 15 30
Trip Distance (km) 1.3 1.7 1.5
Trip Duration (min) 11.6 12.7 10.4
N. Saum et al.
Transportation Research Interdisciplinary Perspectives 23 (2024) 101019
7
clusters with denser trips were several times smaller than the outskirt
area.
The ndings by Wang et al. (2021) demonstrate that the time-
varying variables signicantly impact prediction performance (with
aggregated relative importance metrics of 88 %), surpassing the inu-
ence of other static variables such as the built environment and socio-
economic factors. The impact of these time-varying variables could be
even more pronounced, especially for shorter prediction time intervals
like our study. Conversely, static variables might prove essential for
longer time intervals (e.g., daily predictions), with a comprehensive set
detailed in Hu and Xiong (2023). Therefore, our research focuses on
collecting time-varying variables as predictive inputs, encompassing
weather attributes, public holidays, and local fairs and festivals.
Similar to shared bikes, e-scooter ridership is also affected by
weather conditions. Weather Underground is a global weather network
providing a variety of weather attributes at one-hour intervals
(https://www.wunderground.com). We got seven weather attributes for
training such as temperature, precipitation, wind speed, humidity, wind
gust, pressure, and dew point. Linear interpolation was employed to ll
in the missing values.
Fig. 4 shows some abnormal patterns of shared e-scooter demand.
We found that high demand was correlated with some annual festivals or
fairs. In Austin, those special annual events were the Annual SXSW,
Pecan Street Festival, H-E-B Austin Symphony, and City Limits Music
Festival. Likewise, Minneapolis had high ridership during annual events
such as OpenStreets, Pride Festival Parade, Stone Arch Bridge Festival,
Uptown Art Fair, and State Fair Festival. The special promotion during
the season Market event at Thammasat University also led to very high
demand. Public holidays also affected ridership, especially in the
Thammasat dataset, where ridership dropped sharply when most
students did not come to school. Besides the daily and weekly patterns,
ridership of shared e-scooters also had seasonal patterns. In general, we
could see that the demand was very high during summer but relatively
low during winter. Due to the riders safety, the operators in Minneap-
olis had to stop the operation during winter, while the number of rid-
erships gradually dropped towards the coming snow. Moreover, the
operators of shared e-scooters were advised to postpone the operation
during the US presidents state visit to Minneapolis on Oct. 10, 2019.
Therefore, these attributes were recorded as binary variables for de-
mand prediction models, including annual festivals or events, public
holidays, hour of the day, day of the week, day of the month, and
temporary ban (in Minneapolis). As a short-range mode, Thammasat,
Austin, and Minneapoliss average trip distances were 1.3, 1.5, and 1.7
km, respectively. The riders spent around 11.6, 10.4, and 12.7 min on e-
scooters. Based on the average fee in (Saum and Piantanakulchai, 2019),
the revenue from each trip in these three cities was about 1.75, 2.56, and
2.91 US dollars, respectively. These fares were relatively higher than
those for shared bikes, which could be one reason why shared e-scooters
were not favored for commuting trips.
As shown in Fig. 5, the ridership in Austin and Minneapolis had a
very similar pattern from Monday to Thursday, while the demand
increased gradually and sharply during both the afternoon and evening
on Friday and Saturday, respectively. This was simply because people
used e-scooters for other relaxing activities after a tiring week. On
Sunday, the demand in Austin was just like other weekdays but rela-
tively high in the afternoon. A bit different from Austin, the ridership on
this day in Minneapolis was very similar to the weekdays but very low in
the evening compared to other days of the week. In addition, we could
also observe the difference between weekdays and weekends for a small
peak in the morning. This means that shared e-scooters were also used as
Fig. 3. Average hourly demand (# trips/hour) in each census tract of Austin, TX.
N. Saum et al.
Transportation Research Interdisciplinary Perspectives 23 (2024) 101019
8
a commuting mode, but the ratio was still low. During the public holi-
days, the demand in Austin and Minneapolis is lower than on an ordi-
nary day but slightly higher than most weekdays in the afternoon. For
both cities, the demand strongly increased on the annual festival days,
especially in Austin, where the demand was about twice the regular
days. For Thammasat University dataset, the demand on weekends was
relatively low compared to weekdays, while the demand on Friday af-
ternoon was lower than on other weekdays. This pattern showed the
correlation between e-scooter demand to the presence of students and
staff on campus. In general, the demand on Tuesday was higher than on
the other days of the week. Moreover, the ridership was also correlated
with student activities, i.e., the demand increased from the early
morning until the afternoon. Like the previous two datasets, the
ridership in Thammasat was considerably high during the annual events.
In summary from all datasets, the demand of shared e-scooters had a
signicant weekly pattern, especially between weekdays and weekends,
relatively low demand on public holidays, and surprisingly high demand
on annual festivals or events.
From the demand patterns explained above, the inputs for demand
prediction models were selected accordingly, as summarized in Table 2.
Since there were both daily and weekly seasonal patterns, the lookback
length for demand prediction would range from 24 to 168 (24 x 7 days).
Table 2 shows that Box Cox transformation (# Trips BC) signicantly
minimizes hourly demand volatility compared to the original scale (#
Trips). The three inputs, the historical average of overall demand (HAO
of weekly, holiday, and event), are very important for SARIMAX as it
Fig. 4. Hourly demand of shared e-scooters in Austin TX (top), Thammasat University TH (bottom left), and Minneapolis MN (bottom right).
Fig. 5. Average hourly demand of shared e-scooters by day of the week, public holiday, and annual festival or event.
N. Saum et al.
Transportation Research Interdisciplinary Perspectives 23 (2024) 101019
9
impractically accepts the binary value of these exogenous variables. Ban
was imposed only in Minneapolis, so this feature was included in the
prediction models as the binary variable.
To assure the model generalization, the total datasets were split into
two parts: training part (in sample) and testing part (out of sample). The
rst 75 % of the dataset was used as model training, and the rest was for
testing (see Fig. 4). As shown in Fig. 4, the volatility at attening de-
mand is low, but the volatility is proportionally higher at high demand.
Similarly, there are many relaxing big or small activities, festivals, and
fairs during the summer, so the demand for shared e-scooters is also high
and uctuates. Therefore, 75 % of the training part was randomly
selected for model training, and another 25 % was for model evaluation.
Random split was chosen because it requires far less computational time
compared to K-Folding (especially in hyperparameter tuning), and it
could learn some explanatory variables (such as events, holidays, and
ban) that happened on a specic date that might not be included in the
model if using the conventional time-series split. However, the Tham-
masat Dataset seemed relatively small, so all the data were used for
model training and evaluation.
Demand and variance prediction
Demand prediction
The formulation of GRU cell was described in Section 3.3. The
conguration of GRUs is shown in Fig. 6, composing the input layer with
GRU nodes, one dropout layer, a group of hidden layers with GRU nodes,
and the output layer with conventional neurons. Fig. 6 means all inputs
were sequentially arranged before proceeding to the input layer, while
the outputs from this layer were dropped at some specic rate to in-
crease the learning performance with smoother steps. The hidden layers
were set to have the same activation function and the number of nodes.
This study addresses spatiotemporal dependencies in shared e-scooter
demands through two distinct approaches, where the models were
trained spatially independently or combined. Consequently, the output
layer of GRUs in this study has one or multiple neurons for the training
with spatial independence or spatial combination, respectively. Both
spatially independent and spatially combined architectures were
examined, while the best result was selected. These two training ap-
proaches have their advantages and disadvantages. Spatially indepen-
dent training allows the models to reach the optimal learning curve
freely, but it may lose some vital information from neighboring regions.
On the contrary, the model with multiple spatial outputs shares the
correlated information across regions to improve the prediction per-
formance, but the optimal results must be leveraged.
In the rst approach, prediction models were trained with spatial
independence, meaning one model for each zone. In this scenario,
spatiotemporal dependencies were primarily addressed by including
various external features in the input layer and the optimized lookback
length. In other words, inputs in this approach consisted of historical
demands specic to each zone and other external variables, while the
models had only a single output. This conguration is similar to that
proposed by Yang et al. (2023) but with a few differences, like multi-step
prediction and an additional attention layer. In contrast, the second
approach was trained with spatial combination (i.e., one model for all
zones). Inputs included all spatial demands and external variables for a
specic lookback length, while outputs comprised all spatial demands.
Spatiotemporal dependencies in this approach were managed through
the weights and biases of the prediction models. A similar architecture
was proposed to predict short-term taxi demands in New York City,
except for an additional feature selection mechanism based on a
nonlinear Granger causality test (Luo et al., 2021).
Even deep learning models could outperform conventional proba-
bilistic models or machine learning algorithms; they also require time-
Table 2
Description of inputs for demand prediction models.
Inputs Thammasat (TH) Minneapolis (MN) Austin (TX)
# Trips 11.59 ±11.65 12.50 ±22.39 16.91 ±59.98
# Trips BC* 2.72 ±1.55 1.65 ±1.73 1.88 ±2.56
Temperature 30.46 ±3.31 15.96 ±9.33 19.70 ±9.37
Dew point 23.48 ±2.89 9.29 ±8.81 13.24 ±8.84
Humidity 68.28 ±16.05 66.89 ±16.22 69.96 ±20.27
Wind speed 12.15 ±5.00 14.26 ±7.78 13.01 ±9.23
Wind gust 0.06 ±2.19 7.66 ±16.82 4.95 ±13.46
Pressure 1010.24 ±2.78 983.87 ±6.31 996.93 ±13.38
Precipitation 0.12 ±0.46 0.16 ±1.09 0.10 ±1.03
HAO
**
weekly 11.88 ±8.78 192 ±159.68 527.44 ±369.44
HAO
**
holiday 11.63 ±7.68 157.98 ±141.91 409.31 ±282.48
HAO
**
event 25.38 ±18.14 253.53 ±220.21 1082.10 ±793.52
Hour of day 01 01 01
Day of week 01 01 01
Day of month 131 131 131
Holiday 01 01 01
Event 01 01 01
Ban 01
*
BC: Box Cox scale.
**
HAO: Historical Average of Overall demand.
Fig. 6. The proposed architecture of GRUs model.
N. Saum et al.
Transportation Research Interdisciplinary Perspectives 23 (2024) 101019
10
consuming hyperparameter optimization. Hyperparameter Optimiza-
tion (HPO) refers to the approach of optimizing the number of GRU
nodes for the input layer, dropout rate, number of hidden layers, etc.
Many techniques were employed in this stage, including grid search,
random search, and automatic optimization algorithms (Bayesian
Optimization, Tree-structured Parzen Estimator, genetic algorithm,
etc.). Bayesian Optimization (BO) is a popular sequential optimization
technique for expensive problems, especially HPO of deep learning
models. BOs two critical components are the surrogate function
(Gaussian Processes) and the acquisition function (Upper Condence
Bound), which play an essential role in balancing exploration and
exploitation. Keras Tuner (OMalley et al., 2019), a Python package for
HPO based on Bayesian optimization, was employed to optimize the
congurations of the GRUs, which were run on top of Keras and Ten-
sorFlow on Jupyter Notebook. All parameters of BO were set as defaults
while the objective function was validation loss, the number of initial
points was 10, and the maximum iterations were 80. The number of
epochs was tuned using the early stopping criteria with the patience
value of 10 and the maximum number of epochs of 150.
In this study, BO was employed to optimize nine crucial hyper-
parameters of GRUs, including lookback length, activation of input
layer, number of GRU nodes in input layer, dropout rate, number of
hidden layers, number of GRU nodes in hidden layer, activation of
hidden layer, activation of output layer, and batch size (see Table 3). The
HPO was split into several sequential steps for a few reasons: training
time caused by lookback length and the number of hidden layers, local
optima, non-convergence iteration, and exploding iteration (i.e., loss
function becomes innite). First, deep learning models with only one
hidden layer were optimized independently for different lookback
lengths (24, 48, , 168) to nd the optimal lookback length. In each
case of lookback length, BO with the above settings searched the mini-
mum validation loss by changing the number of nodes per layer, dropout
rate, the activation function of each layer, and batch size. After nding
the optimal lookback length, deep learning congurations were reopti-
mized to account for a higher number of hidden layers. Three activation
functions were considered, such as ReLU, Tanh, and Sigmoid, while the
dropout rate was between 0.00 and 0.40 with the step of 0.01. The
number of nodes per layer was between 10 and 500 at the grid of 10. The
batch size had a range of 41000. Other parameters of GRUs were set as
the default value, including optimizer (Adam), learning rate (0.001),
and loss function as Mean Squared Error (MSE).
The prediction performance of GRUs was compared to the other ve
benchmark models, including SARIMAX, RF, XGBoost, FCNNs, and
RNNs. The historical average (HA) was also included to show the impact
of Box Cox transformation on RMSE and MAE. The other ve demand
prediction models were also optimized using BO (for FCNNs and RNNs)
and grid search (for SARIMAX, RF, and XGBoost), as shown in Table 3.
The SARIMAX, RF, and XGBoost models were trained independently for
each zone, while FCNNs and RNNs had the same congurations as GRUs.
The difference between training and validation loss was set to around
15 % to control the overtting problem, especially RF and XGBoost. A
short introduction of these ve baseline models is given as follows:
SARIMAX: is popular statistical regression assuming the linear cor-
relation between future demand and explanatory variables, including
past observations, residuals, and exogenous variables. As mentioned
above, three exogenous variables (see Fig. 5) were included in SAR-
IMAX: the hourly average demand by day of the week, public holiday,
and event. SARIMAX was trained using a statistical program, STATA,
since it was more convenient for out-of-bag evaluation. As shown in
Table 3, six parameters of SARIMAX were optimized by grid search,
including degree of differencing (d), deseasonalizing degree (D), sea-
sonal (P)/non-seasonal (p) autoregressive lag polynomial, and seasonal
(Q)/non-seasonal (q) moving average lag polynomial. All parameters,
including the exogenous variables of this model, must be statistically
signicant at 95 %, while the model with the smallest RMSE was
selected.
RF: is a powerful machine learning algorithm dealing with high-
dimensional data while requiring just a small amount of data and
training time, introduced by Breiman (2001). RF leverages the results
from many random trees predictions, while numerous trees are built
from randomly selected inputs or combinations of inputs (bootstrapped
sampling). RF was tuned for three important hyperparameters,
including lookback length (24168), the number of trees in the forest
(10500), and the maximum depth of the tree (015). Random Forest
was trained by a Python library, Scikit-learn.
XGBoost: is one of the most popular machine learning algorithms for
regression and classication problems based on gradient-boosted deci-
sion trees. This approach could handle massive data using sparsity-
aware splitting algorithm, cache-aware algorithm, and distributed
memory computing technology (Chen and Guestrin, 2016). In this
paper, three hyperparameters of this model were tuned, such as look-
back length, the number of gradient-boosted trees, and the maximum
depth of the tree using the XGBoost python module.
FCNNs: are the most basic architecture of articial neural networks
(ANNs), where all the nodes in one layer are connected to the nodes in
the next layer. FCNNs are widely applied to classication and regression
Table 3
Description of hyperparameter optimization for demand prediction models.
Model Parameters Value range Tuning
GRUs
RNNs
FCNNs
Lookback length 24, 48, , 168 Bayesian Optimization
Activation input layer Relu, Tanh, Sigmoid
# Nodes input layer 10, 20, 30, , 500
Dropout rate 0.00.40
# Hidden layer 15
Activation of hidden layer Relu, Tanh, Sigmoid
# Nodes in hidden layer 10, 20, 30, , 500
Activation output layer Relu, Tanh, Sigmoid
Batch size 41000
XGBoost Lookback length 24, 48, , 168 Grid Search
# Gradient-boosted trees 10, 15, 20, , 500
Max-depth of tree 0, 1, 2, , 10
Random Forest Lookback length 24, 48, 72, , 168 Grid Search
# Trees in forest 10, 15, 20, , 300
Max-depth of tree 0, 1, 2, , 15
SARIMAX(p, d, q) *
(P, D, Q, 24)
p 05 Grid Search
d 02
q 05
P 02
D 02
Q 02
N. Saum et al.
Transportation Research Interdisciplinary Perspectives 23 (2024) 101019
11
problems in transportation engineering. In this study, the conguration
was the same as GRUs (Fig. 6) and was optimized by BO. As a benchmark
model, the number of hidden layers and nodes per layer of FCNNs was
set to at most 2 and 100, respectively, the standard architecture (Liu
et al., 2019; Wang and Kim, 2018).
RNNs: is a popular deep learning approach for time-series datasets
for their ability to accurately learn the temporal sequences and their
long-range dependencies. Similar to FCNNs, the conguration of RNNs
was set as shown in Fig. 6 and optimized by Bayesian Optimization while
limiting the number of hidden layers and nodes per layer to at most 2
and 100, respectively.
According to our ndings, the sigmoid activation function required
nearly twice as many epochs as the Tanh or ReLU functions but had a
smoother learning curve. Austin data needed to be trained spatially
independently, while Minneapolis data were trained as multiple spatial
outputs. Overall, training GRUs with the original data yielded better
results than training them with the normalized data, particularly in
terms of model generalization. The reason was that the optimal archi-
tectures of normalized data had Tanh or Sigmoid activation function,
which effectively learned the training data and fell into the overtting
problem, especially when compared to the benchmark models. Similar
ndings as in (Saum et al., 2020), Box Cox transformation had simpler
models than the original scale. This was observed from SARIMAX
models in which the exogenous variables were mostly insignicant. In
addition, the optimal GRUs model of Austin data had two hidden layers
for the original scale but only one for Box Cox transformed data.
Therefore, Box Cox transformation is suitable for deep learning as it
could reduce the training time, especially during hyperparameter
tuning.
Table 4 shows the performance comparison between Box Cox
transformed data and original or normalized data (Thammasat Dataset
does not have testing data). For original or normalized data, deep
learning could improve the prediction performance of both RMSE and
MAE, which strongly depend on the number of tuned hyperparameters.
Box Cox transformation also had similar patterns for Austin and
Thammasat datasets, but Minneapolis. As a generalized logarithmic
transformation, Box Cox exponentially transformed the abnormal de-
mand (outliers) closer to the mean value, leading to a simpler model and
accuracy improvement (especially MAE metric). This characteristic can
be found in Table 2, as the mean and standard deviation ratio of demand
between the original and Box Cox scales was about 7 and 15, respec-
tively. In addition, Table 4 also shows that the RMSE of Historical
Average (HA) in the original scale was lower than that of Box Cox scale,
but MAE metric. The effect of Box Cox transformation on demand
volatility is the reason why deep learning models of the Minneapolis
dataset had even worse performance than that of SARIMAX. The logic
here was Box Cox transformation made the temporal information from
neighbor regions unnecessary. This meant that the Box Cox transformed
data of Minneapolis should be trained spatially independently. There-
fore, Box Cox transformation is desirable for training datasets with a
high abnormality or less exogenous variables. In summary from all
datasets, Box Cox transformation reduced the RMSE and MAE metrics by
0.14 % and 5.36 %, respectively. This accuracy improvement by Box Cox
transformation may not be very signicant, but it is acceptable for ease
of implementation and dealing with outliers.
SARIMAX had better prediction accuracy on the testing dataset than
other models because the testing data was during the low-demand sea-
son. However, this regression model had limited performance during the
high-demand season (like summer), while GRUs achieved precise per-
formance for both training and testing datasets. Fig. 7 shows the com-
parison of e-scooter demand prediction by GRUs with the original and
Box Cox scale for the Downtown census in Austin, Texas. These two
models performed very well in learning the hourly demand of shared e-
scooters. Even though they also have some different prediction results,
especially during peak demand. Overall, both models correctly predict
the nighttime demand (low demand) but perform poorly during the
afternoon and evening as demand and volatility are high.
Table 4
Performance comparison based on RMSE and MAE.
Dataset Models Original or Normalized Data Box Cox Transformed Data
RMSE-Eval. RMSE-Test MAE-Eval. MAE-Test RMSE-Eval. RMSE-Test MAE-Eval. MAE-Test
Thammasat Thailand GRUs 5.27 3.41 5.18 3.37
RNNs 5.52 3.75 4.91 3.40
FCNNs 5.52 3.76 5.00 3.46
XGBoost 5.21 3.64 5.17 3.46
Random forest 5.30 3.72 5.31 3.56
SARIMAX 5.47 3.82 5.26 3.62
Historical average 11.65 8.69 12.24 8.32
Minneapolis Minnesota GRUs 6.96 6.34 3.58 2.89 7.18 6.92 3.67 2.99
RNNs 7.07 6.25 3.49 2.85 7.75 6.82 3.80 2.95
FCNNs 7.53 6.48 4.06 3.08 8.38 7.30 3.83 3.33
XGBoost 7.44 6.44 4.04 3.37 7.16 6.04 3.66 2.73
Random forest 7.34 6.39 3.85 3.21 7.35 6.20 3.76 2.81
SARIMAX 7.79 6.24 4.08 3.07 7.72 6.03 3.85 2.64
Historical average 21.21 16.83 12.97 11.08 23.80 17.39 12.28 8.55
Austin Texas GRUs 11.24 11.28 4.15 3.70 11.20 11.01 4.00 3.54
RNNs 11.34 11.58 4.21 3.73 11.52 11.96 4.10 3.60
FCNNs 11.47 11.22 4.16 3.67 12.50 11.86 4.21 3.72
XGBoost 11.29 11.83 4.29 3.89 13.06 11.75 4.29 3.70
Random forest 12.18 12.00 4.31 3.92 12.54 12.21 4.26 3.77
SARIMAX 12.30 11.13 4.58 3.83 12.60 11.23 4.40 3.54
Historical average 50.49 36.44 14.82 12.53 53.95 37.30 14.34 10.90
Fig. 7. Demand prediction by GRUs with original and Box Cox scale for the
Downtown in Austin, TX.
N. Saum et al.
Transportation Research Interdisciplinary Perspectives 23 (2024) 101019
12
Fig. 8. Daily scatter plot and histogram of GRUsresiduals for Downtown Census in Austin, TX: (top) original data and (bottom) Box Cox transformed data.
Fig. 9. Variance prediction for residuals of GRUs with original scale data of Downtown Census in Austin, TX.
N. Saum et al.
Transportation Research Interdisciplinary Perspectives 23 (2024) 101019
13
Variance prediction and supply planning
Extracting all useful information from historical data is crucial for
dockless shared e-scooters daily operational planning to properly
manage resources and minimize the related operating costs. Even de-
mand prediction models could forecast future demand at state-of-the-art
performance; some uncertainties still arise from both prediction models
and the related errors in historical data. Safety stock is commonly
employed to cover these uncertainties, depending on demand variation
in Eq. (17). For this reason, variance analysis is necessary for designing
an efcient supply level. Two critical characteristics of residuals are
essential for supply level design, distribution and heteroscedasticity. The
residuals of the forecasting model commonly follow normal distribution
or students t-distribution. This characteristic is important for choosing
the condence level parameter (Zscore) or the cover rate (number of data
lay within the condence interval bound). As explained in Section 3.4,
heteroscedasticity refers to the temporal pattern of residuals; the in-
ventory thus should be designed proportionally.
Fig. 8 shows the daily scatter plot and histogram of GRUsresiduals
with the original and Box Cox scale. For the original scale, the distri-
bution had fatter tails than the normal distribution, so the students t-
distribution was more appropriate for these residuals. For GRUs with
Box Cox scale, the residuals had slightly fat tails as it was practically
ignorable. From the daily scatter plot, we could see a clear daily pattern
of residuals of GRUs with the original scale but almost constant for the
Box Cox scale. To conrm the heteroscedasticity of the residuals, ARCH-
LM test was performed. As a result, we could reject the null hypothesis
(no ARCH effects) as the p-value was less than 5 % for both cases original
and Box Cox scale. However, the coefcients of SGARCH model of Box
Cox scale were relatively small, so we could statistically ignore the
ARCH effects (Saum et al., 2020; StataCorp, 2013).
In the case of GRUs with the original scale in Fig. 8, the 97.5 % upper
Condence Interval (CI) with constant standard deviation had the cover
rate (or service level) of 96.79 %. The slight difference was not the main
problem, but the distribution of residuals above the upper CI. In the rst
half (011), only 0.56 % of the residuals lay above the upper CI, while
2.65 % of the residuals lay above the upper CI in the second half
(1223). This meant that the upper CI with constant standard deviation
had an excellent cover rate in the rst half but a poor cover rate in the
other half. On the other hand, the 97.5 % upper CI with daily standard
deviation had an overall cover rate of 96.8 %, while the outliers (re-
siduals lay above upper CI) were 1.65 % and 1.55 % for the rst and the
second half, respectively. For this cover rate, the percentage of served
demand was 99.24 % and 99.36 % for upper CI with constant and daily
standard deviation, respectively. The upper CI with constant standard
deviation had a supply ratio (i.e., the ratio of total supply to total actual
demand) of 145 %, and that of the upper CI with daily standard devia-
tion was only 139.4 %. In other words, despite having the same cover
rate, the upper CI (or supply level) with daily standard deviation had
lower inventory (lower operational cost) but a higher percentage of
served demand (higher trip revenue) than the upper CI with constant
standard deviation.
As shown in Fig. 8, the residuals of the original scale still had the
seasonal pattern, while the ARCH-LM test also conrmed the presence of
ARCH effects. Therefore, Seasonal GARCH in Eq. (16) was trained to
extract temporal variance patterns further. Fig. 9 compares variance
prediction models (Constant, Daily Seasonal, and SGARCH) for the ab-
solute residuals of GRUs with the original scale of Downtown Census in
Austin, Texas. This graph shows that the constant variance or mean
squared error (Constant_STD) approach performs poorly, as it cannot
capture the conditional variance. The daily seasonal variance (Dai-
ly_STD) could somehow include the daily volatility pattern, but it is not
exible enough for long-term demand. On the other hand, the predicted
variance by SGARCH (SGARCH_STD) is very adaptable to conditional
variance. However, it has one main disadvantage: SGARCH transfers it
to the next seasonal step once there is a considerable error.
Fig. 10 compares the four supply level models of GRUs at 98 % served
demand. Supply levels with constant variance had high oversupply at
nighttime demand but failed to meet the afternoon demand. On the
other hand, supply levels with Box Cox variance perform very well,
except for some peak points caused by the logarithmic inversion effect.
Supply levels with daily and SGARCH variance had a similar pattern, but
SGARCH variance better allocated the uncertainty in the long-term de-
mand. In summary, variance analysis was necessary for the original or
normalized data, but the constant variance was sufcient for the Box
Cox transformed data. For original or normalized data, three types of
variances were examined: constant, daily, and predicted variance by
SGARCH. Four supply level models were compared for each demand
Fig. 10. Comparison of supply level models of GRUs at 98% served demand (cover rate of around 90%) of Downtown Census in Austin, TX.
N. Saum et al.
Transportation Research Interdisciplinary Perspectives 23 (2024) 101019
14
prediction model as three of them were the demand prediction model
with the original or normalized scale with three different variance
models (constant variance, daily variance, and predicted variance by
SGARCH). And another one was the demand prediction model with Box
Cox transformed data with constant variance. The supply level designs
were compared for three demand prediction models: SARIMAX,
XGBoost, and GRUs. These three models were popular prediction models
in the probabilistic based, machine learning, and deep learning models.
As mentioned above, the condence interval was unsuitable for daily
operational planning as it did not account for the intensity of the re-
siduals, specically for the heteroscedastic dataset. Moreover, different
models of CI tended to have different inventory levels (operational cost)
and expected served demand (trip revenue) even they had the same
cover rate. So, this study chose to compare the supply level models at the
same percentage of served demand (same number of served or unserved
demand) in order to compare the number of oversupplies (as MO). To
achieve the same percentage of served demand, the safety stock
parameter (d) was adjusted independently for each supply level model
following the owchart in Fig. 2. In practice, this parameter should be
set following the desired service level (same as Zscore) or adjusted until
the supply level reaches the maximum number of e-scooters.
Mean oversupply (MO) in Table 5 was compared on training data for
the Thammasat dataset, while the other two were compared on the
testing dataset. According to the Thammasat result, Box Cox trans-
formation provided an efcient supply level (the lowest mean over-
supply), except XGBoost. Moreover, SARIMAX model of Box Cox
transformed data could have a comparable MO with the worst case of
GRUs, constant variance, and even better at a higher percentage of
served demand (95 % up). Overall, the MO of GRUs was smaller than
that of SARIMAX, which showed the importance of demand prediction
performance. The difference between the worst and the best cases of
GRUssupply level model was signicantly increased by the percentage
of served demand, up to one at 98 % served demand. In other words, the
operators had the average hourly oversupplies of 8 e-scooters to achieve
98 % served demand using the supply level model with constant vari-
ance, but they could reduce the oversupply to around 7 e-scooters per
hour for using the supply level with Box Cox variance. At this reduction
rate, the operator could save up to 30 e-scooters for 10 spatial regions at
a 3-hour rebalancing cycle (i.e., reduce 30 e-scooters from the reba-
lancing operation).
In the present comparison of the Minneapolis dataset, the MO metric
trend was similar to that of accuracy performance, i.e., SARIMAX and
XGBoost had good performance with Box Cox transformation, but GRUs
had small MO with the original scale. At a low percentage of served
demand (or d<0), the supply level slightly differed in MO, but it
changed signicantly at a high percentage of served demand. This
dataset also showed the limitation of Box Cox transformation as high MO
at 98 % served demand. The reason was the effect of the exponential
transformation, specically when the value of lambda (λ) is close to 1.
Therefore, the maximum value of the designed supply level, S(t,r), should
be carefully limited for Box Cox transformed data. SARIMAX had the
lowest MO in this dataset because SARIMAX had better demand pre-
diction on the testing dataset. However, GRUs likely perform better in
both demand prediction accuracy and mean oversupply during high-
Table 5
Mean oversupply comparison for four supply planning models.
Dataset Supply Level Model Mean Oversupply by Percentage of Served Demand
Demand Model Variance Model 70 % 75 % 80 % 85 % 90 % 95 % 98 %
Thammasat Thailand GRUs Constant Variance 0.615 0.835 1.179 1.710 2.707 4.880 8.069
Daily Variance 0.586 0.816 1.160 1.704 2.629 4.490 7.130
SGARCH Variance 0.591 0.815 1.181 1.704 2.605 4.458 7.259
Box Cox Variance 0.465 0.695 1.069 1.631 2.557 4.330 7.091
XGBoost Constant Variance 0.546 0.774 1.106 1.667 2.754 4.928 8.184
Daily Variance 0.546 0.793 1.138 1.670 2.601 4.441 7.075
SGARCH Variance 0.577 0.814 1.147 1.668 2.566 4.313 6.909
Box Cox Variance 0.527 0.773 1.132 1.672 2.572 4.385 6.951
SARIMAX Constant Variance 0.745 1.029 1.414 2.029 3.063 5.304 8.974
Daily Variance 0.772 1.046 1.418 2.021 2.941 4.779 7.563
SGARCH Variance 0.721 0.992 1.401 2.030 3.022 5.022 7.831
Box Cox Variance 0.631 0.897 1.282 1.875 2.813 4.603 7.304
Minneapolis Minnesota GRUs Constant Variance 0.582 0.803 1.142 1.980 3.562 6.894 11.845
Daily Variance 0.597 0.821 1.155 1.676 2.700 4.916 8.202
SGARCH Variance 0.594 0.829 1.155 1.653 2.516 4.412 7.504
Box Cox Variance 0.573 0.815 1.200 1.823 2.948 5.868 11.484
XGBoost Constant Variance 0.538 0.785 1.185 1.849 3.428 6.892 12.078
Daily Variance 0.589 0.855 1.242 1.891 3.024 5.379 8.844
SGARCH Variance 0.704 0.976 1.351 1.918 2.907 4.912 8.212
Box Cox Variance 0.439 0.639 0.942 1.437 2.364 4.682 8.880
SARIMAX Constant Variance 0.538 0.778 1.129 1.695 2.994 5.977 10.696
Daily Variance 0.571 0.806 1.149 1.706 2.629 4.702 7.844
SGARCH Variance 0.618 0.857 1.203 1.717 2.529 4.210 6.957
Box Cox Variance 0.414 0.602 0.891 1.371 2.254 4.333 8.407
Austin Texas GRUs Constant Variance 0.528 0.740 1.054 1.538 2.577 5.457 11.131
Daily Variance 0.484 0.686 1.000 1.520 2.538 5.003 9.784
SGARCH Variance 0.492 0.709 1.030 1.537 2.485 4.683 8.380
Box Cox Variance 0.305 0.502 0.822 1.356 2.329 4.699 9.094
XGBoost Constant Variance 0.533 0.769 1.124 1.665 2.682 5.608 11.766
Daily Variance 0.495 0.720 1.065 1.629 2.679 5.241 10.259
SGARCH Variance 0.540 0.786 1.144 1.681 2.642 4.958 8.886
Box Cox Variance 0.349 0.573 0.932 1.523 2.627 5.081 9.521
SARIMAX Constant Variance 0.546 0.771 1.105 1.623 2.616 5.305 10.768
Daily Variance 0.543 0.765 1.097 1.623 2.563 4.780 9.038
SGARCH Variance 0.496 0.728 1.077 1.619 2.548 4.583 7.992
Box Cox Variance 0.330 0.530 0.849 1.382 2.338 4.469 8.375
Note: Box Cox Variance refers to the supply planning model made of a demand prediction model with Box Cox transformed data and the constant variance.
Constant, Daily, and SGARCH Variance refer to supply planning models made of a demand prediction model with original or normalized data and the constant, daily,
and predicted SGARCH variances, respectively.
N. Saum et al.
Transportation Research Interdisciplinary Perspectives 23 (2024) 101019
15
demand seasons like summer.
Box Cox transformation resulted in a small MO from the Austin
dataset for up to 90 % served demand. This method had the same
problem as Minneapolis for a higher percentage of served demand, but
predicted variance performed very well. At 95 % served demand, the
reduction of MO of GRUs was around one between constant variance
and predicted variance, which meant that the operators could save
hourly around 30 e-scooters (or 720 e-scooters in daily rebalancing
operation). This reduction could signicantly increase if the number of
regions was higher and the rebalancing period was longer, ex., reduce by
50 e-scooters for 50 regions per hour or around 100 e-scooters for the
same spatial size with a 2-hour rebalancing cycle. Similar to Minneap-
olis, SARIMAXs MO was smaller than that of GRUs and XGBoost due to
the seasonal pattern of e-scooter demand.
In conclusion from all datasets, accounting for conditional variance
in supply level design could reduce the oversupply by around 26.22 % at
95 % served demand (or the shortage of 5 %). In other words, the pro-
posed framework, which combines demand prediction with variance
analysis, can minimize demand uncertainty, resulting in more efcient
operational planning. Therefore, operators could implement this
framework for the daily operation of shared e-scooters, specically pe-
riodic (or tactical) rebalancing, along with other potential strategies
implemented in shared bikes (Shui and Szeto, 2020). Incentivizing
customers can be integrated with our framework since this strategy can
encourage customers to pick up and drop off e-scooters at a desired
location. Our framework is suitable for periodic rebalancing; hence,
real-time rebalancing operations can be added to respond to sponta-
neous demand spikes.
Conclusion and future work
This research paper proposes a practical framework for designing an
efcient supply planning for the heteroscedastic demand of shared
dockless e-scooters. Several popular deep learning and machine learning
models are applied to forecast the hourly demand, while their residuals
are subjected to variance analysis. Three different datasets of dockless
shared e-scooters (Austin TX, Minneapolis MN, and Thammasat TH) are
employed to evaluate the effectiveness of the proposed approach. The
numerical results show that demand prediction models (especially deep
learning models) can achieve state-of-the-art performance, but the re-
siduals are not white noise. Therefore, the supply planning for such
heteroscedastic demand can be allocated by using variance stabilizing
transformation (Box Cox) or variance analysis (daily seasonal variance
or predicted variance by SGARCH). Seasonal variance (daily pattern)
effectively reduces oversupply but is ineffective for longer temporal
residuals, particularly yearly patterns. However, the conditional vari-
ance model (SGARCH) could overcome this limitation. Another inter-
esting method is using the variance stabilizing transformation, Box Cox
transformation. With the ease of implementation, this transformation
possibly improves the performance of demand prediction models
(particularly the MAE). In addition, it can also remove hetero-
scedasticity, deal with outliers, and provide efcient supply level plan-
ning at a lower percentage of served demand. Nonetheless, the
limitations of this transformation technique are the possibility of
reducing RMSE accuracy and the maximum requirement for the con-
version of some expected demands and supply levels. In other words, a
proper ceiling value is required for supply levels with Box Cox trans-
formation for a higher percentage of served demand.
The conclusion drawn from this result is that demand prediction,
even with deep learning, was insufcient for operational planning for
the shared e-scooters, which has a high maintenance cost, short service
life, irregular demand patterns, and strict regulations. However, the
demand uncertainty of this shared mobility can be minimized by
combining demand prediction and variance analysis. Thus, the proposed
framework can be utilized to increase the efciency of the daily opera-
tion of shared e-scooters or integrated with other strategies, including
customer incentives and real-time rebalancing.
As predictions of demand and variance contribute to the deployment
of e-scooters, it is expected that this combination can also increase the
efciency of operational planning, which can be the direction for future
research. Including the conditional variance in demand prediction
models was also a promising technique for future works as it could
improve the prediction accuracy and promptly provide the expected
demand and variance. Other prospective studies could be: evaluating the
proposed framework with potential demand data and for multi-step
ahead prediction; examining the residuals of recent state-of-the-art
deep learning models, such as Graph Neural Networks (GNNs); supply
level design of net trip ow; and other types of variance stabilizing
methods that can deal with the limitation of Box Cox transformation.
CRediT authorship contribution statement
Narith Saum: Data curation, Methodology, Software, Writing
original draft, Writing review & editing. Mongkut Piantanakulchai:
Data curation, Investigation, Project administration, Writing review &
editing. Satoshi Sugiura: Conceptualization, Formal analysis, Valida-
tion, Writing review & editing.
Declaration of competing interest
The authors declare that they have no known competing nancial
interests or personal relationships that could have appeared to inuence
the work reported in this paper.
Data availability
The authors do not have permission to share data.
Acknowledgments
This work was supported by the AUN/SEED-Net Collaborative Edu-
cation Program (CEP) between Sirindhorn International Institute of
Technology, Thammasat University, and Hokkaido University. More-
over, the authors would like to thank Neuron Mobility for providing the
ridership data of shared e-scooters in Thammasat University Rangsit
Campus, Thailand.
References
Blickstein, S.G., Brown, C., Yang, S., 2019. E-scooter programs: current state of practice
in US cities. Rutgers University. https://doi.org/10.7282/t3-xc8e-tz93.
Breiman, L., 2001. Random Forests. Machine Learning 45 (1), 532.
Chen, T., Guestrin, C., 2016. XGBoost: A Scalable Tree Boosting System. In: In:
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining. Aug. 2016.
Chester, M., 2018. The Electric Scooter Fallacy: Just Because Theyre Electric Doesnt
Mean Theyre Green. Chester Energy and Policy. https://www.chesterenergyandpol
icy.com/blog/electric-scooter-fallacy-green.
Cho, K., Van Merri¨
enboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H.,
Bengio, Y.m 2014. Learning phrase representations using RNN encoder-decoder for
statistical machine translation. Retrieved from http://arxiv.org/abs/1406.1078.
Clewlow, R., Foti, F., Shepard-Ohta, T., 2018. Measuring Equitable Access to New
Mobility: A Case Study of Shared Bikes and Electric Scooters. A Populus Report. Nov.
2018. https://research.populus.ai/reports/Populus_MeasuringAccess_2018-Nov.pdf.
Ham, S.W., Cho, J.-H., Park, S., Kim, D.-K., 2021. Spatiotemporal Demand Prediction
Model for E-Scooter Sharing Services with Latent Feature and Deep Learning.
Transp. Res. Rec. 2675 (11), 3443.
He, S., Shin, K.G., 2020. Dynamic Flow Distribution Prediction for Urban Dockless E-
Scooter Sharing Reconguration. In: In: Proceedings of The Web Conference. Apr.
2020.
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9 (8),
17351780.
Hu, Y., Ni, J., Wen, L., 2020. A hybrid deep learning approach by integrating LSTM-ANN
networks with GARCH model for copper price volatility prediction. Physica A Stat.
Mech. Appl. 557, 124907.
Hu, S., Xiong, C., Chen, P., Schonfeld, P., 2023. Examining nonlinearity in population
inow estimation using big data: An empirical comparison of explainable machine
learning models. Transp. Res. a, Policy Pract. 174, 103743.
N. Saum et al.
Transportation Research Interdisciplinary Perspectives 23 (2024) 101019
16
Hu, S., Xiong, C., 2023. High-dimensional population inow time series forecasting via
an interpretable hierarchical transformer. Transp. Res. c, Emerg. Technol. 146,
103962.
Khan, P.W., Park, S.-J., Lee, S.-J., Byun, Y.-C., 2022. Electric Kickboard Demand
Prediction in Spatiotemporal Dimension Using Clustering-Aided Bagging Regressor.
J. Adv. Transp. 2022, 8062932.
Kim, S., 2011. Forecasting internet trafc by using seasonal GARCH models. J. Commun.
Netw. 13 (6), 621624.
King, P.L., 2011. Crack the code: Understanding safety stock and mastering its equations.
APICS Magazine 21 (2011), 3336.
Kristjanpoller, W., Hern´
andez, E., 2017. Volatility of main metals forecasted by a hybrid
ANN-GARCH model with regressors. Expert Syst. Appl. 84, 290300.
Kumar, S., Hussain, L., Banarjee, S., Reza, M., 2018. Energy Load Forecasting using Deep
Learning Approach-LSTM and GRU in Spark Cluster. In: Presented at the 2018 Fifth
International Conference on Emerging Applications of Information Technology
(EAIT). Jan. 2018.
Le Quy, T., Nejdl, W., Spiliopoulou, M., Ntoutsi, E., 2019. A Neighborhood-Augmented
LSTM Model for Taxi-Passenger Demand Prediction. Presented at the International
Workshop on Multiple-Aspect Analysis of Semantic Trajectories. Sep. 2019.
Li, X., Xu, Y., Chen, Q., Wang, L., Zhang, X., Shi, W., 2021. Short-Term Forecast of
Bicycle Usage in Bike Sharing Systems: A Spatial-Temporal Memory Network. IEEE
Trans. Intell. Transp. Syst. 112.
Liu, X., Gherbi, A., Li, W., Cheriet, M., 2019. Multi features and multi-time steps LSTM
based methodology for bike sharing availability prediction. Procedia Comput. Sci.
155, 394401.
Luo, H., Cai, J., Zhang, K., Xie, R., Zheng, L., 2021. A multi-task deep learning model for
short-term taxi demand forecasting considering spatiotemporal dependences.
J. Trafc Transp. Eng. 8 (1), 8394.
Masoud, M., Elhenawy, M., Almannaa, M.H., Liu, S.Q., Glaser, S., Rakotonirainy, A.,
2019. Heuristic approaches to solve e-scooter assignment problem. IEEE Access 7,
175093175105.
McKenzie, G., 2019. Spatiotemporal comparative analysis of scooter-share and bike-
share usage patterns in Washington, D.C. J. Transp. Geogr. 78, 1928.
McKenzie, G., 2020. Urban mobility in the sharing economy: A spatiotemporal
comparison of shared mobility services. Comput. Environ. Urban Syst. 79, 101418.
Moreau, H., de Jamblinne de Meux, L., Zeller, V., DAns, P., Ruwet, C., Achten, W.M.,
2020. Dockless E-Scooter: A Green Solution for Mobility? Comparative Case Study
between Dockless E-Scooters, Displaced Transport, and Personal E-Scooters.
Sustainability 12 (5), 1803.
OMahony, E.D., 2015. Smarter tools for (Citi) bike sharing. Cornell University. Ph.D.
dissertation.
OMalley, T., Bursztein, E., Long, J., Chollet, F., Jin, H., Invernizzi, L., 2019. others. Keras
documentation, Keras Tuner. Retrieved from https://github.com/keras-team/keras-t
uner.
Rusyana, A., Nurhasanah, Marzuki, Flancia, M., 2016. SARIMA model for forecasting
foreign tourists at the Kualanamu International Airport. In: In: Proceedings of the
12th International Conference on Mathematics, Statistics, and Their Applications
(ICMSA). Oct. 2016.
Saum, N., Piantanakulchai, M., 2019. A Review on an Emerging New Mode of Transport:
The Shared Dockless Electric Scooter. In: In: Proceedings of the Eastern Asia Society
for Transportation Studies. Srilanka. Sept. 2019.
Saum, N., Sugiura, S., Piantanakulchai, M., 2020. Short-Term Demand and Volatility
Prediction of Shared Micro-Mobility: a case study of e-scooter in Thammasat
University. In: Presented at the 2020 Forum on Integrated and Sustainable
Transportation Systems (FISTS). Nov. 2020.
Seo, Y.-H., 2020. A Dynamic Rebalancing Strategy in Public Bicycle Sharing Systems
Based on Real Time Dynamic Programming and Reinforcement Learning. Seoul
National University, South Korea. Ph.D. dissertation.
Severengiz, S., Finke, S., Schelte, N., Forrister, H., 2020. Assessing the Environmental
Impact of Novel Mobility Services using Shared Electric Scooters as an Example.
Procedia Manuf. 43, 8087.
Shui, C.S., Szeto, W.Y., 2020. A review of bicycle-sharing service planning problems.
Transp. Res. C, Emerg. Technol. 117, 102648.
Sigauke, C., Chikobvu, D., 2011. Prediction of daily peak electricity demand in South
Africa using volatility forecasting models. Energy Econ. 33 (5), 882888.
Smith, C.S., Schwieterman, J.P., 2018. E-scooter scenarios: evaluating the potential
mobility benets of shared dockless scooters in Chicago. Depaul University,
Chaddick Institute for Metropolitan Development.
StataCorp., 2013. Stata Time-Series Reference Manual. Stata Press College Station, Texas.
Ti, A., Du, Z., Zhang, W., 2019. Analysis on the Volatility of Sustainable Stock Index and
Traditional Stock Index Based on GARCH Model. In: Presented at the 2019
International Conference on Economic Management and Model Engineering
(ICEMME). Dec. 2019.
Tolomei, L., Fiorini, S., Ciociola, A., Vassio, L., Giordano, D., Mellia, M., 2021. Benets of
Relocation on E-scooter Sharing - a Data-Informed Approach. In: Presented at the
2021 IEEE International Intelligent Transportation Systems Conference (ITSC). Sept.
2021.
Trapero, J.R., Card´
os, M., Kourentzes, N., 2019. Empirical safety stock estimation based
on kernel and GARCH models. Omega 84, 199211.
Wang, T., Hu, S., Jiang, Y., 2021. Predicting shared-car use and examining nonlinear
effects using gradient boosting regression trees. Int. J. Sustain. Transp. 15 (12),
893907.
Wang, B., Kim, I., 2018. Short-term prediction for bike-sharing service using machine
learning. Transp. Res. Procedia 34, 171178.
Wu, Y., 2011. The Simulation Study of Shanghai and Shenzhen 300 Index By Garch
Models. In: In: Proceeding of the 2011 International Conference on Information
Management, Innovation Management and Industrial Engineering, pp. 3033.
Xu, C., Ji, J., Liu, P., 2018. The station-free sharing bike demand forecasting with a deep
learning approach and large-scale datasets. Transp. Res. C, Emerg. Technol. 95,
4760.
Xu, M., Liu, H., Yang, H., 2020. A Deep Learning Based Multi-Block Hybrid Model for
Bike-Sharing Supply-Demand Prediction. IEEE Access 8, 8582685838.
Yang, Y., Gao, P., Sun, Z., Wang, H., Lu, M., Liu, Y., Hu, J., 2023. Multistep ahead
prediction of temperature and humidity in solar greenhouse based on FAM-LSTM
model. Comput. Electron Agric. 213, 108261.
Yeo, I.K., Johnson, R.A., 2000. A new family of power transformations to improve
normality or symmetry. Biometrika 87 (4), 954959.
Yu, Y., Si, X., Hu, C., Zhang, J., 2019. A review of recurrent neural networks: LSTM cells
and network architectures. Neural Comput. 31 (7), 12351270.
Zhang, G., Ali, S., Wang, X., Wang, G., Pan, Z., Zhang, J., 2019. SPI-based drought
simulation and prediction using ARMA-GARCH model. Appl. Math. Comput. 355,
96107.
Zhang, C., Zhu, F., Wang, X., Sun, L., Tang, H., Lv, Y., 2020. Taxi Demand Prediction
Using Parallel Multi-Task Learning Model. IEEE Trans. Intell. Transp. Syst. 23 (2),
794803.
Zhu, R., Zhang, X., Kondor, D., Santi, P., Ratti, C., 2020. Understanding spatio-temporal
heterogeneity of bike-sharing and scooter-sharing mobility. Comput. Environ. Urban
Syst. 81, 101483.
N. Saum et al.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Demand for electric kickboards is increasing specifically in tourist-centric regions worldwide. In order to gain a competitive edge and to provide quality service to customers, it is essential to properly deploy rental electric kickboards (e-kickboards) at the time and place customers want. However, it is necessary to study how to divide the region to predict electric mobility demand by region. Therefore, this study is made to more accurately predict future demand based on past regional customers’ electric mobility demand data. We have proposed a novel electric kickboard demand prediction in spatiotemporal dimension using clustering-aided bagging regressor. We have used electric kickboard usage data from a Jeju, South Korea-based company. As a result of the experiment, it was found that the accuracy before using clustering-based bagging regressor and when the region was divided by the clustering method, the performance was improved, and we have achieved a regression score R 2 of 93.42 using our proposed approach. We have compared our proposed approach with other state-of-the-art models, and we have also compared our model with different other combinations of bagging regressors. This study can be helpful for companies to meet the user’s demand for a better quality of service.
Article
Full-text available
Bike-sharing systems have made notable contributions to cities by providing green and sustainable mobility service to users. Over the years, many studies have been conducted to understand or anticipate the usage of these systems, with the hope to inform their future developments. One important task is to accurately predict usage patterns of the systems. Although many deep learning algorithms have been developed in recent years to support travel demand forecast, they have mainly been used to predict traffic volume or speed on roadways. Few studies have applied them to bike-sharing systems. Moreover, these studies usually focus on one single dataset or study area. The effectiveness and robustness of the prediction algorithms are not systematically evaluated. In this study, we propose a Spatial-Temporal Memory Network (STMN) to predict short-term usage of bicycles in bike-sharing systems. The framework employs Convolutional Long Short-Term Memory models and a feature engineering technique to capture the spatial-temporal dependencies in historical data for the prediction task. Four testing sites are used to evaluate the model. These four sites include two station-based systems (Chicago and New York) and two dockless bike-sharing systems (Singapore and New Taipei City). By assessing STMN with several baseline models, we find that STMN achieves the best overall performance in all the four cities. The model also achieves superior performance in urban areas with varying levels of bicycle usage and during peak periods when demand is high. The findings suggest the reliability of STMN in predicting bicycle usage for different types of bike-sharing systems.
Article
Full-text available
The electric scooter (e-scooter) sharing service has attracted significant attention because of its extensive usage and eco-friendliness. Since e-scooters are mostly accessed by foot, the presence of e-scooters within walking distance has a crucial effect on the service quality. Therefore, to maintain appropriate service quality, relocation strategies are often used to properly distribute e-scooters within service areas. There are extensive literatures on demand forecasting for an efficient relocation. However, the study of the relocation of small-scale spatial units within walking distance level is still inadequate because of the sparsity of demand data. This research aims to establish an effective methodology for predicting the demand for e-scooters in high spatial resolution. A new grid-based spatial setting was created with the usage data. The model in the methodology predicts not only the identified demand but also the unmet demand to increase practicality. A convolutional autoencoder is used to obtain the latent feature that can reduce the problem of representing sparse data. An encoder–recurrent neural network–decoder (ERD) framework with a convolutional autoencoder resulted in a huge improvement in predicting spatiotemporal events. This new ERD framework shows enhanced prediction performance, reducing the mean squared error loss to 0.00036 from 0.00679 compared with the baseline long short-term memory model. This methodological strategy has its significance in that it can solve any prediction issue with spatiotemporal data, even those with sparse data problems.
Article
Full-text available
Flexible drop-off and pick-up (one-way) carsharing programs provide users with high levels of convenience but meanwhile incurs spatiotemporal imbalances in shared-cars distribution. Predicting shared-car use helps recognize system imbalances beforehand while identifying determinants related to shared-car use helps operators efficiently implement relocation strategies. In this study, a gradient boosting regression model (GBRT) is employed to predict shared-car use at a station level, and partial dependence plots (PDPs) are employed to examine nonlinear relationships between shared-car use and various predictors. Results show: (1) GBRTs predict shared-car use with a high level of accuracy (MSE: 1.1069–1.1648). (2) PDPs present highly consistent results with relationships derived from the traditional statistical model; (3) Time-varying variables account for 89.30%–86.84% importance in shared-cars use prediction, suggesting these variables can greatly enhance prediction accuracy; (4) Other variables like built environment, station attributes, and socioeconomic features, also account for some importance and can enhance prediction accuracy. Findings help carsharing operators accurately predict the station-level shared-car use and optimally identify the best locations for stations, and thus maintain the operational efficiency of carsharing programs.
Article
Full-text available
Accurate and real-time taxi demand prediction can help managers pre-allocate taxi resources in cities, which assists drivers quickly finding passengers and reduce passengers' waiting time. Most of the existing studies focus on mining spatial-temporal characteristics of taxi demand distributions, while lacking in modeling the correlations between taxi pick-up demand and the drop-off demand from the perspective of multi-task learning. In this article, we propose a multi-task learning model containing three parallel LSTM layers to co-predict taxi pick-up and drop-off demands, and compare the performance of single demand prediction methodology and that of two demands' co-prediction methodology. Experimental results on real-world datasets demonstrate that the pick-up demand and the drop-off demand do depend on each other, and the effectiveness of the proposed co-prediction methods.
Article
Mobile device location data (MDLD) contain population-representative, fine-grained travel demand information, facilitating opportunities to validate established relations between travel demand and underlying factors from a big data perspective. Using the nationwide census block group (CBG)-level population inflow derived from MDLD as the proxy of travel demand, this study examines its relations with various factors including socioeconomics, demographics, land use, and CBG attributes. A host of tree-based machine learning (ML) models and interpretation techniques (feature importance, partial dependence plot (PDP), accumulated local effect (ALE), SHapley Additive exPlanations (SHAP)) are extensively compared to determine the best model architecture and justify interpretation robustness. Empirical results show that: 1) Boosting trees perform the best among all models, followed by bagging trees, single trees, and linear regressions. (2) Feature importance holds consistently among different tree-based models but is influenced by measures of importance and hyperparameter settings. 3) Pronounced nonlinearities, threshold effects, and interaction effects are observed in relations among population inflow and most of its determinants. 4) Compared with PDP, ALE and SHAP plots are more reliable in the presence of outliers, feature dependency, and local heterogeneity. Taken together, techniques introduced in this study can either be integrated into customary travel demand models to enhance model accuracy or serve as interpretation tools that offer a comprehensive understanding of intricate relations.
Article
Mobile device location data (MDLD) are emerging data sources in the transportation domain that contain large-scale, fine-grained information on population inflow. However, limited studies have built forecasting models based on large-scale MDLD-based population inflow time series. This task is challenging due to complex nonlinear temporal dynamics, high-dimensional time series structure (i.e. multiple time series with multi-shape inputs and outputs), and non-negligible impacts from various external factors. To address these challenges, this study introduces a deep learning framework, the Interpretable Hierarchical Transformer (IHTF), for nationwide countylevel population inflow time series forecasting and interpretation. A variety of cutting-edge deep learning techniques are fused, including the variable selection network to incorporate external effects, the gated residual network to handle nonlinearity, and the transformer architecture to learn temporal dynamics. Different interior parameters, such as variable selection weight and temporal attention weight, are extracted to explain patterns learned by the framework. Numerical experiments show that IHTF outperforms extensive baseline models in forecasting accuracy. In addition, feature importance generated by IHTF is similar to the tree-based model, LightGBM, but exhibits a more even distribution, among which point-of-interests (POIs) count, county location, median household income, and percentage of accommodation and food services are the most important static variables. Moreover, attention weight demonstrates that IHTF can automatically learn the seasonality from time series. Taken together, this framework can serve as a reliable travel demand forecasting component in the transportation planning process that allows modeling the travel demand continuously instead of by snapshot.