ArticlePDF Available

Supply level planning for shared e-scooters considering spatiotemporal heteroscedastic demand

January 2024
Transportation Research Interdisciplinary Perspectives 23(1):101019

January 2024
23(1):101019

DOI:10.1016/j.trip.2024.101019

License
CC BY 4.0

Authors:

Narith Saum

Chulalongkorn University

Mongkut Piantanakulchai

Sirindhorn International Institute of Technology (SIIT)

Satoshi Sugiura

Hokkaido University

Content uploaded by Mongkut Piantanakulchai

Content may be subject to copyright.

Transportation Research Interdisciplinary Perspectives 23 (2024) 101019

Available online 7 February 2024

Supply level planning for shared e-scooters considering spatiotemporal

heteroscedastic demand

Narith Saum

, Mongkut Piantanakulchai

, Satoshi Sugiura

School of Civil Engineering and Technology, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani 12120, Thailand

Division of Engineering and Policy for Sustainable Environment, Hokkaido University, Hokkaido 060-0808, Japan

ARTICLE INFO

Keywords:

Box Cox Transformation

Deep Learning

Machine Learning

SGARCH

Shared E-Scooters

Supply Planning

ABSTRACT

Accurate demand forecasting is a key success for mobility service businesses, especially shared electric (e-)

scooters, for their volatile demand, high operational costs, and strict regulations. The heteroscedasticity of

transportation demand is usually overlooked even it is very important for designing efcient supply manage-

ment. This study proposed a supply planning framework considering heteroscedasticity in the hourly e-scooter

demand. Three shared e-scooter datasets (Austin TX, Minneapolis MN, and Thammasat TH) were examined to

extract temporal patterns. These features were used as inputs for the demand prediction models, including

machine learning and deep learning models. Then, the squared residuals were subjected to variance prediction,

including constant or daily variance and variance predicted by Autoregressive Conditional Heteroscedasticity

(ARCH). Finally, the outputs of these models were combined to determine the supply level. Four supply level

models (with constant, daily, Seasonal Generalized ARCH or SGARCH, and Box Cox variances) were compared

based on the Mean Oversupply (MO) metric. As a result, demand prediction models with Box Cox transformed

data possibly provide higher prediction accuracy than those with original or normalized data, specically Mean

Absolute Error (MAE). Supply level models with Box Cox variance had the lowest MO at lower percentages of

served demand, whereas those with SGARCH variance had lower MO at higher percentages of served demand. At

95 % served demand, considering heteroscedastic demand in supply level planning could reduce oversupply by

26.22 %. From a policy perspective, operators could use our framework to minimize the demand uncertainty for

daily operation, along with other potential policies such as customer incentives and hybrid real-time and periodic

rebalancing.

Introduction

Shared electric scooters (e-scooters) have many advantages

compared to the existing shared bikes for their ease of registration,

parking and pick-up convenience (as a dockless mode), and a relaxed

riding experience. Consequently, e-scooter sharing services have gained

so much popularity in many big cities worldwide, where they help

mitigate several urban transportation problems such as congestion,

limited parking space, air pollution, and unsystematic public transit

connectivity. The history of e-scooter development and adoption to

sharing services, regulations, social perception, and advantages/disad-

vantages of this transportation mode was summarized by Saum and

Piantanakulchai (2019). The spatiotemporal comparison between

shared e-scooters and shared bikes was also examined (McKenzie, 2019;

Zhu et al., 2020). McKenzie (2019) studied these two shared modes in

Washington D.C., nding that station-based shared bikes were primarily

used for commuting while shared e-scooters were more commonly used

for leisure, recreation, or tourism. In Singapore, Zhu et al. (2020)

compared the usage patterns of these two shared modes. They concluded

that the usage pattern of shared e-scooters was spatially compact and

denser than shared bikes, although shared e-scooters required higher

costs for rebalancing and charging. In addition, they examined the

correlation between the hourly trip starts (and trip ends) with rainfall

and air temperature. Shared e-scooters were found to be time-saving

during rush hours compared to ride-hailing services (McKenzie, 2020).

According to the previous studies on shared e-scooters, the opera-

tional planning of this mode is more challenging compared to other

transportation modes for some reasons. First, the demand of shared e-

scooters is highly volatile due to their trip characteristics and purposes.

This is because shared e-scooters are typically used for short-range trips

* Corresponding author.

E-mail addresses: saumnarith@gmail.com (N. Saum), mongkut@siit.tu.ac.th (M. Piantanakulchai), sugiura@eng.hokudai.ac.jp (S. Sugiura).

Contents lists available at ScienceDirect

Transportation Research Interdisciplinary Perspectives

journal homepage: www.sciencedirect.com/journal/transportation-

research-interdisciplinary-perspectives

https://doi.org/10.1016/j.trip.2024.101019

Received 28 June 2022; Received in revised form 17 January 2024; Accepted 25 January 2024

Transportation Research Interdisciplinary Perspectives 23 (2024) 101019

of around 1.8 km or 14 min; otherwise, it would no longer be time and

cost-saving (McKenzie, 2020; Smith and Schwieterman, 2018). With a

higher fee than shared bikes, shared e-scooters are not preferable for

commuting trips but for leisure and tourism activities. These irregular

trip purposes and dockless policy lead to inaccurate demand prediction

and require a highly satisfying service level. Second, there are several

important regulations for operators, including registration fee per e-

scooter, limited number of e-scooters per operator, distribution regula-

tion, and response to any spot with excessive e-scooters (Blickstein et al.,

2019). Third, an e-scooter is a lightweight vehicle powered by batteries,

so it requires intensive maintenance (Zhu et al., 2020) and especially has

short-life service (Moreau et al., 2020). Lastly, charging and rebalancing

operations can produce more emissions than the replaced trips, which

may technically damage its environmentally friendly reputation (Ches-

ter, 2018; Moreau et al., 2020; Severengiz et al., 2020).

To deal with these challenging problems, proper operational plan-

ning for shared e-scooters is necessary to maximize their positive im-

pacts on urban mobility. Operational planning of shared e-scooters

consists of two main parts: trip forecasting and route optimization for

distribution and rebalancing. However, this study focused on the rst

part with the aim of extracting the spatiotemporal patterns from his-

torical trip data. From previous studies in Section 2, many robust pre-

diction models were proposed to forecast the transportation demand,

particularly shared bikes and e-scooters, but most of them only focused

on the accuracy performance. For this reason, the heteroscedasticity of

transportation demand was disregarded; hence, the information from

historical data was not effectively explored. Furthermore, the variance

of heteroscedastic data is not constant, so supply planning must consider

this variation. In other words, the inventory or supply level partly de-

pends on the residuals of the demand prediction model and the heter-

oscedasticity of the data, so variance analysis is required to achieve a

more effective supply level estimation. This can be accomplished by

developing a conditional variance model, such as SGARCH, or utilizing

data transformation techniques like Box Cox transformation.

This study provided three potential contributions to the eld of

shared e-scooters and supply level planning. First, the spatiotemporal

patterns of shared e-scooter demand were revealed based on three

different datasets: Thammasat University (Thailand), Minneapolis

(Minnesota), and Austin (Texas). Second, the heteroscedasticity of

shared e-scooter demand was accounted for in designing the supply

level, whereas the Mean Oversupply (MO) metric was proposed to

compare the efciency at a specic percentage of served demand. Lastly,

the advantages and disadvantages of Box Cox transformation were

revealed, including its impact on demand prediction accuracy and

supply level planning.

This paper is organized into six sections. Section 1 describes the

general background of shared e-scooters, the research gap, and the ob-

jectives of this study. Section 2 outlines the recent studies on spatio-

temporal prediction models in the eld of transportation and variance

analysis. The research framework and mathematical expressions are

claried in Section 3. Section 4 outlines the data preparation and

featuring, while Section 5 reports the demand prediction, variance

prediction, and supply planning. Lastly, Section 6 summarizes the

ndings and future studies.

Related work

As stated previously, the operational planning for shared e-scooters

is challenging for various reasons, such as the volatile demand, high

operating costs, and strict regulations. Safety stock is typically utilized to

tolerate demand volatility, whereas higher demand variation requires

greater safety stock or inventory (King, 2011). Consequently, opera-

tional costs for shared e-scooters are high due to unproductive e-scooters

(i.e., low usage per e-scooter per day), battery degradation, recharging

costs and emissions (Masoud et al., 2019), and maintenance costs (Zhu

et al., 2020). To improve vehicle equitability, shared e-scooter operators

are advised to distribute e-scooters to specic regions, such as low-

income, minority, and other disadvantaged communities (Clewlow

et al., 2018). However, this objective is difcult to achieve when there is

a limited number of shared e-scooters. To improve operational planning

efciency for shared e-scooters, previous studies related to demand

uncertainty, including demand prediction models and volatility anal-

ysis, were reviewed in this study.

The prediction of transportation demand is a time-series problem in

which the demand changes over time (daily, weekly, and seasonal

trends). Nonetheless, it is generally considered to be less volatile than

some time-series problems, such as the stock market. As a result, the

heteroscedasticity of transportation demand is mostly ignored, which

means some vital information for operational planning was lost. Con-

dence intervals and inventory levels, consisting of the expected demand

and predicted variance, are crucial for supply management. However,

numerous studies have investigated the rst term, and many models

were proposed, including statistical regression models, machine

learning algorithms, and deep learning models.

One of the most popular models for time-series data is the Autore-

gressive Integrated Moving Average (ARIMA), while its extension for

seasonal datasets is Seasonal-ARIMA (SARIMA), and other extensions

can be found in StataCorp (2013). Normality and stationarity are

required for ARIMA, but Box Cox transformation can cope with the rst

requirement (Rusyana et al., 2016). One of the most accurate models in

machine learning is Random Forest (RF) regression. This model could

have a comparable result with some deep learning architectures (Wang

and Kim, 2018). With the same concept of RF, XGBoost ts the data

based on gradient-boosted decision trees, which effectively reduces the

training time (Chen and Guestrin, 2016).

Recently, Deep Learning has gained popularity for its promising

performance over statistical regression and machine learning models,

especially with the availability of powerful computational devices. The

most basic deep learning models are Articial Neural Networks (ANNs),

which connect a bunch of nodes or articial neurons of one layer with

another layer using an activation function. To deal with the limitation of

ANNs on sequential data, Recurrent Neural Networks (RNNs) were

proposed by modifying the conventional perceptron to include the

outputs from the previous state, called the recurrent cell (Yu et al.,

2019). This recurrent cell was later extended with several gates inside

(forget gate, input gate, and output gate), called Long-Short Term

Memory Neural Networks (LSTM NNs), to improve the performance and

eliminate the limitations of RNNs, including vanishing and exploding

gradient (Hochreiter and Schmidhuber, 1997). Cho et al. (2014) com-

bined the forget gate and input gate into a single update gate to reduce

the number of trainable parameters of LSTM NNs, called Gated Recur-

rent Units (GRUs). Several other extensions of RNNs can be found in (Yu

et al., 2019).

These popular machine learning models and deep learning models

have been widely applied in transportation demand prediction for their

advantage of learning temporal patterns. For example, Xu et al. (2018)

used LSTM NNs to predict the spatiotemporal demand of dockless shared

bikes in Nanjing (Jiangsu), China. Similarly, Wang and Kim (2018)

employed RF, LSTM NNs, and GRUs to predict station-level bike avail-

ability in Suzhou (China), and these models yielded almost the same

performance. This architecture of LSTM NNs was also used to predict

multi-step bike availability (Liu et al., 2019). Likewise, Gradient

Boosting Regression Trees (GRBT), alongside four other baseline models

(ARIMA, RF, ANNs, and LSTM NNs), were employed to forecast station-

level shared-car rentals in Shanghai (Wang et al., 2021). To improve the

performance of standard LSTM NNs, Le Quy et al. (2019) proposed

Neighborhood-Augmented LSTM NNs that include the historical data of

neighboring regions to forecast taxi-passenger demand in Porto,

Portugal. Zhang et al. (2020) co-predict the taxi pick-up and drop-off

demands using a multi-task learning model consisting of three parallel

LSTM layers. The nonlinear Granger causality test was employed to

enhance the spatiotemporal feature selection of LSTM NNs for short-

N. Saum et al.

Transportation Research Interdisciplinary Perspectives 23 (2024) 101019

term taxi demand forecasting (Luo et al., 2021). Li et al. (2021) proposed

a Spatial-Temporal Memory Network (STMN) to predict the hourly de-

mand of bike-sharing in four different cities: Singapore, Taipei, Chicago,

and New York. Xu et al. (2020) examined a novel Multi-Block Hybrid

(MBH) model in predicting the supply–demand of the bike-sharing

system in Shanghai. A recent study utilized various tree-based models,

such as DT, RF, ExtraTree, XGBoost, CatBoost, and LightGBM, to forecast

the nationwide census-level population inow in the USA (Hu et al.,

2023).

Several machine learning and deep learning models were also

employed and adapted to predict the spatiotemporal demand of shared

e-scooters. For instance, He and Shin (2020) proposed a deep learning

model called graph capsule neural networks (GCScoot) to predict the

spatiotemporal trip ow of shared e-scooters in three cities: Austin TX,

Louisville KY, and Minneapolis MN. Ham et al. (2021) proposed an

Encoder–Recurrent neural network–Decoder (ERD) framework to pre-

dict served and unmet demand of shared e-scooters operated in

Gwangjin district, Seoul, South Korea. Similarly, Khan et al. (2022)

developed a bagging ensemble of XGBoost, RF, and Extra Tree regressors

to forecast the daily demand of e-scooter sharing on Jeju Island, South

Korea. The shared e-scooter demand in Austin TX, and Louisville KY was

predicted using a deep learning model called 3D-CloST, and then the

surplus and shortfall e-scooters were relocated by dedicated workers

using a simple greedy strategy (Tolomei et al., 2021).

On the other hand, volatility or variance analysis has been pre-

dominantly studied in the econometric eld, where it could provide

more valuable information to support decision-making. Autoregressive

Conditional Heteroscedasticity (ARCH) is a statistical regression model

used to predict future variance or volatility (StataCorp, 2013). ARCH has

two different models, ARCH in variance and ARCH in mean (ARCH-M).

ARCH has only the squared residuals from the previous lags as the in-

dependent variables, while the Generalized ARCH (GARCH) also in-

cludes the past variances. Many extensions of GARCH have been

proposed, such as Power ARCH, Threshold ARCH, Exponential ARCH,

etc. Several models of GARCH were applied to predict the return rate of

the daily closing price of the Shanghai and Shenzhen 300 Index (Wu,

2011). Similarly, Ti et al. (2019) employed ARCH-M (ARMA-GARCH

and ARMA-TARCH) to forecast the volatility of traditional and sus-

tainable stock indices from the FTSE4Good index series family.

Furthermore, ARMA-GARCH, SARIMA-GARCH, and SARIMA-SGARCH

were trained to predict the precipitation index (Zhang et al., 2019),

daily peak electricity demand (Sigauke and Chikobvu, 2011), and

internet trafc (Kim, 2011), respectively. ARCH-M could slightly

improve the prediction accuracy of ARIMA, but it may struggle with the

convergence criteria and training time. Recently, GARCH was combined

with some deep learning models to predict the price volatility of main

metals like Gold, Silver, and Copper (Hu et al., 2020; Kristjanpoller and

Hern´

andez, 2017).

Throughout the literature review, two research gaps were observed.

First, many machine learning models and comprehensive deep learning

architectures were proposed to forecast the transportation demand, but

none examined the residuals. These studies focused only on the accuracy

performance of demand prediction models, but they did not consider the

supply level planning, which partly depends on the demand variation.

Second, no attempts have been applied to the ARCH model to forecast

the conditional variance of transportation demand. Therefore, this

research aims to ll these gaps by bridging these two methodologies to

develop a practical supply level planning framework for the new

transportation mode, shared dockless e-scooters.

Methodology

Research framework

To achieve the purposes of this study, the research framework was

separated into ve steps, including data preparation, data

transformation, demand prediction, variance prediction, and supply

level design (see Fig. 1). The rst step involved collecting, encoding, and

featuring shared e-scooter data, weather attributes, annual events,

public holidays, day of the week, and time of the day. Based on the

literature review, the data were mostly normalized between 0 and 1 to

be trainable with some specic activation functions, but training using

the original scale was also found. Since Box Cox transformation could

improve the prediction accuracy and minimize the heteroscedasticity

effect (Saum et al., 2020), it was added as another data transformation

option in the second step.

In the third step, several machine learning and deep learning models

were developed to predict the hourly demand of shared e-scooters, while

their hyperparameters were optimized using Grid Search and Bayesian

Optimization. The demand prediction models included Seasonal

Autoregressive Integrated Moving Average with exogenous variables

(SARIMAX), Random Forest (RF), Extreme Gradient Boosting

(XGBoost), Fully Connected Neural Networks (FCNNs), Recurrent Neu-

ral Networks (RNNs), and Gated Recurrent Units (GRUs). The main

objective of the performance comparison between Box Cox and original/

normalized data was to show that, in contrast to original or normalized

data, the residuals of the prediction models using Box Cox transformed

data had no ARCH effects. Subsequently, the residuals of these models

were used to forecast the future variance.

Since Box Cox transformation can remove the heteroscedasticity

(Rusyana et al., 2016; Saum et al., 2020), its variance is constant. On the

other hand, the most accurate models between original and normalized

data were chosen for variance analysis under three cases: constant

variance, daily variance, and predicted variance by SGARCH. Therefore,

the variance of the Box Cox transformed data has only one model

(Constant Variance), and that of original or normalized data has three

models (Constant Variance, Daily Variance, and SGARCH Variance). For

variance analysis and supply level design, we only investigated three

models (SARIMAX, XGBoost, and GRUs) because XGBoost and RF have

comparable prediction results, and GRUs have similar performance to

FCNNs and RNNs (see Table 4). Finally, the predicted demand (step 3)

and predicted variance (step 4) were used to design the Supply Level in

Fig. 1. Research framework.

N. Saum et al.

Transportation Research Interdisciplinary Perspectives 23 (2024) 101019

step 5. In this step, the Mean Oversupply (MO) metric was proposed to

compare the performance of the four supply level models at a specic

percentage of served demand ranging from 70 % to 98 %. It’s worth

noting that the proposed framework was primarily designed for one-step

(i.e., one hour) ahead prediction. However, it could be further extended

for multi-step ahead prediction in future studies or practical

implications.

Data transformation

This paper investigated three types of data transformation: original

scale, normalized scale, and Box Cox transformation. Normalization is a

popular transformation technique that likely improves prediction per-

formance. It has several formulations for different purposes, including

changing the input data to have the same scale (min–max normalization

or 0–1 scale) or similar distribution (mean and Z-score normalization).

Min-Max normalization in Eq. (1) was employed to transform inputs and

outputs so that deep learning models could have output activation

functions like Tanh and Sigmoid. The formulation of min–max

normalization was given as follows:

xnorm

t=xt−min(xt)(max(xt) − min(xt)) (1)

where xnorm

t is the normalized scale of the variable (xt) at time interval t,

including the e-scooter demand and exogenous variables.

Box Cox is a powered monotonic transformation that stabilizes the

variance, minimizes skewness, and makes the data more Gaussian-like

based on the likelihood maximization technique. This transformation

requires the input data to be strictly positive, while the generalized form

supports both positive and negative data and improves the normality and

symmetry, called Yeo-Johnson Transformation (Yeo and Johnson, 2000).

The expressions of Box Cox transformation and log-likelihood function

are as follows:

xBC

t,r=











λ−1

rxt,r+1λr−1if λr∕= 0,xt,r≥0

lnxt,r+1if λr=0,xt,r≥0

−−xt,r+12−λr−1(2−λr)if λr∕= 2,xt,r<0

−ln−xt,r+1if λr=2,xt,r<0

(2)

where θr=λr,

r

′

, Xr=x1,r,x2,r,x3,r⋯,xT,r

′

and xBC

t,r∼ℕ

r.

Therefore, the best estimator of

r and

r could be computed by maxi-

mizing the log-likelihood function at any xed value of λr as following:



r(λr) = 1

T

t=1

xBC

t,r(4)



2(λr) = 1

T

t=1xBC

t,r−

r(λr)2(5)

Therefore, 

θr=

λr,

r(

λr),

2(

λr)

′

could be obtained by maximizing

the log-likelihood function in Eq. (3). xBC

t,r is the Box Cox scale of e-

scooter demand xt,r at time t and region r. λr is the parameter of Box Cox

transformation for region r. This means that e-scooter demand was

transformed by Box Cox spatially independent, while other exogenous

variables were not transformed. Both input and output were trans-

formed unless the residuals were not homoscedastic. Since the hourly

demand of shared e-scooters is a nonnegative variable, the trans-

formation falls into the rst case of Eq. (2). However, this equation has

the maximum requirement, i.e., the maximum value of xBC

t,r must be less

than −1/λ, unless it cannot be converted back. In other words, the

predicted transformed demand 

xBC

t,r, including the supply level, must

follow this requirement, specically when λ<0.

Demand prediction

GRUs model is a popular deep learning model in the family of

recurrent neural networks. This architecture has only two gates: reset

gate and update gate. Thus, it requires a shorter training time than LSTM

NNs (Wang and Kim, 2018) with comparable performance (Kumar et al.,

2018). For this reason, GRUs are more suitable for hyperparameter

tuning than LSTM NNs, particularly when many parameters need to be

optimized. The learning process of standard GRUs (Yu et al., 2019) could

be expressed as follows:

rt=sigmoid(Wrhht−1+Wrx xt+br)(6)

zt=sigmoid(Wzhht−1+Wzx xt+bz)(7)



ht=tanhW˜

hhr◦

tht−1+W˜

hxxt+b˜

h(8)

ht= (1−zt)◦ht−1+z◦

t

ht(9)

sigmoid(x) = 1/(1+e−x)(10)

tanh(x) = (ex−e−x)/(ex+e−x)(11)

where ◦denotes pointwise multiplication of two matrices called Hada-

mard product. W and b are trainable weight matrices and bias vectors,

respectively. rt is reset gate, and zt is update gate. In this case, the GRUs’

output ht at time t is a linear interpolation between the previous output

ht−1 and the candidate output 

ht.

The introduction of other benchmark models (RNNs, FCNNs,

XGBoost, Random Forest, and SARIMAX) was provided in Section 5.1,

along with the hyperparameter setting. These demand prediction

models were compared based on two popular metrics, including Mean

Absolute Error (MAE) and Root Mean Squared Error (RMSE) as follows:

MAE =1

RT 

r=1

t=1D(t,r)−

D(t,r)(12)

RMSE =

RT 

r=1

t=1D(t,r)−

D(t,r)2





(13)

where D(t,r), and 

D(t,r)are the actual hourly demand and predicted de-

mand of shared e-scooters at the time t of the region r, respectively.

Variance prediction

As we know, the actual data could not be accurately predicted

llT(θr|Xr) = − T

2log(2

) − T

2log

r−1

r

t=1xBC

t,r−

r2+ (λr−1)

t=1

signxt,rlogxt,r+1(3)

N. Saum et al.

Transportation Research Interdisciplinary Perspectives 23 (2024) 101019

(i.e., y=

(X)

) as we assumed no related error in the observed

data. In this case, 

y=E(y|X),E(

|X) = 0,Var(

|X) = E

2|X−E2(

|X) = 1,

Var(y|X) =

2(X)>0,and X and

are independent. Homoscedasticity

refers to the condition that the variance is constant, otherwise hetero-

scedasticity. Model diagnostics are necessary for probabilistic-based

models, starting with assumptions such as data stationary, data distri-

bution, and homoscedasticity. However, this step was mostly neglected

in machine learning and deep learning. In the case of heteroscedastic

data, the variance can be formulated as the function of the random

variables X. In the time-series problem, the autocorrelation of squared

residuals and the Lagrange Multiplier (ARCH-LM) test were mainly

employed to conrm the heteroscedasticity of the residuals. The vari-

ance was usually formulated as the previous variance and squared error

function. This conditional variance stands on the idea that the periods of

high and low variance are grouped together (StataCorp, 2013). At this

point, there are two possible options: to allow or not to allow the con-

ditional variance to inuence the conditional mean. Simultaneous pre-

diction (i.e., including conditional variance into the conditional mean)

may have a nonconvex objective function, and it is computationally

expensive since there are more parameters to be estimated, especially

for hyperparameter tuning. Therefore, this study chose to predict the

expected mean and the conditional variance separately (i.e., ignore the

conditional variance on the expected mean). This technique has a few

advantages, such as uncomplicated model formulation (univariate

variance model) and easier hyperparameter tuning for both demand and

variance prediction. The disadvantage, however, is the possibility of

accuracy improvement from including the conditional variance in the

demand prediction model. For instance, Trapero et al. (2019) employed

ARIMA to predict the demand and GARCH to predict the variance for

safety stock estimation. Similarly, the residuals of demand prediction

from Section 3.3 were used to train the variance models. Three variance

models were formulated, including constant variance in Eq. (14), daily

seasonal variance in Eq. (15), and predicted variance by SGARCH in Eq.

(16) as given below:

con(r) = 1

T

t=1

(t,r)(14)

seas(t,r) = 1

N

(t−24,r)+

(t−2*24,r)+⋯+

(t−N*24,r)(15)

SGARCH(t,r) = a0+a1

(t−1,r)+a2

(t−2,r)+a3

(t−24,r)+b1

(t−1,r)

+b2

(t−2,r)+b3

(t−24,r)

(16)

where the constant variance of the region r,

con(r)in Eq. (14) is simply

the average squared residuals of the predicted demand in that region.

Similarly, the seasonal variance

seas(t,r)in Eq. (15) is the average

squared residuals of the predicted demand at the same hour of the day. N

is the total number of days. The average of seasonal variance theoreti-

cally equals the constant variance, but it is mostly slightly smaller

because the mean of evaluation residuals tends to differ from zero,

con(r) = 1

2424

t=1

seas(t,r) + 1

2424

t=1E[

seas(t,r)] − E

(r)2. The con-

stant and daily seasonal variances were calculated based on the training

dataset. Lastly,

SGARCH(t,r)in Eq. (16) is the predicted variance by

SGARCH, which was trained spatially independently. SGARCH was

trained with maximum log-likelihood estimation (StataCorp, 2013). A

daily seasonal pattern (S =24) was employed, while the insignicant

parameters (95 %) in this equation would be dropped.

Supply planning

As previously stated, shared dockless e-scooters face many chal-

lenges in daily operation such as short-range trips with unforeseeable

trip purposes, high operational costs, unproductive e-scooters, emissions

from rebalancing, and strict regulations. For station-based shared bikes,

the number of trips is mostly limited by the number of docks in the

station, but dockless shared e-scooters likely have a wider range of de-

mand (higher volatility). Thus, the operators must use rebalancing

strategies to balance unserved demand (or shortages) and operational

constraints. Rebalancing (or relocating) refers to the process of regularly

distributing (or collecting) the e-scooters to starving (or from excessive)

regions according to the target inventory level or supply level. Under

various operational constraints, the operator can choose between peri-

odic rebalancing (based on historical data) or real-time rebalancing.

Periodic rebalancing cost depends on the frequency, several times per

day or during peak hours. For real-time rebalancing, the rebalancing

vehicle may visit only a few locations if the number of available e-

scooters falls below (or higher than) the limited threshold value. Real-

time rebalancing can respond to unusual demand on time that can be

tracked through indicators such as local events or fairs, app login ac-

tivities, number of new registrations, and distribution of active users.

However, this strategy requires higher operational costs since the staff

must standby for rebalancing calls. Therefore, it is less popular in

practical operations (Shui and Szeto, 2020).

As shown in Fig. 1, the proposed framework in this study aims to

assist the periodic rebalancing by extracting all helpful information from

the historical data to forecast the future demand and variance for

designing effective supply levels or inventory levels. The research de-

nes the supply level as the total supply, which includes supplies from

the operator’s rebalancing, drop-offs, and available e-scooters around

the area. It is noted that operators aiming to rebalance e-scooters must

take into account variations in drop-off demand, stock level, and lead

time, all of which may be approximated using models in this study. In

our approach, we estimate the total supply from the demand side (pick-

up demand). This study focuses on determining the level of overall de-

mand prior to rebalancing; hence, the total supply is derived from the

demand side by using only pick-up data. In other words, the term

“supply level” (referring to inventory or order-up-to level) in this study

has the same formulation of the condence interval as the sum of pre-

dicted pick-up demand (from Section 3.3) and safety stock (based on the

predicted variance from Section 3.4). The comparison of supply level

models was examined to reveal the effectiveness of accounting for the

heteroscedasticity of shared e-scooter demand for operational planning.

Safety stock is the inventory to prevent stockout caused by uctu-

ating demand, forecast inaccuracy, and supply lead time (King, 2011).

For station-based shared bikes, the supply level is designed according to

the target service level (or probability of shortage event) for both pick-

up and drop-off trips (King, 2011; O’Mahony, 2015). In the case of

dockless shared e-scooters, the user has more freedom to nish the trip

anywhere, so the service level of drop-off trips can be ignored. Since

different supply level models have different service levels and backorder

levels, the curves of deviation from the target cycle service level and

backorder level (by scaled safety stock) are usually employed to

compare the supply level or inventory level models (Trapero et al.,

2019). However, this study compared the supply level models at the

same percentage of served demand (see Fig. 2); we thus could compare

them using only one metric, Mean Oversupply. The expressions of

Supply Level (S(t,r)), Served Demand (SDt,r), Percentage of Served De-

mand (P), Oversupply (Ot,r) and Mean Oversupply (MO) are dened as

below:

S(t,r)=

D(t,r)+d*

(t,r)(17)

SD(t,r)=minD(t,r),S(t,r)(18)

P=

r=1

t=1

SD(t,r)

r=1

t=1

D(t,r)(19)

Ot,r=maxS(t,r)−D(t,r);0(20)

N. Saum et al.

Transportation Research Interdisciplinary Perspectives 23 (2024) 101019

MO =1

RT 

r=1

t=1

Ot,r(21)

where Eq. (17) shows the Supply Level, S(t,r), as the sum of the predicted

hourly demand 

D(t,r)and the predicted safety stock at the time t interval

of region r. Safety stock in this equation is the product of the predicted

standard deviation

(t,r)and safety stock parameter d in the function of

the target service level Zscore, lead time, and time increment (King, 2011;

Seo, 2020; Trapero et al., 2019). The predicted standard deviation

(t,r)is

the square root of the predicted variance, as shown in Section 3.4,

possibly as the constant, daily seasonal, or conditional variance by

SGARCH. Lead time and time increment depend on rebalancing fre-

quency, so these two parameters are constant across different supply

level models. For this reason, these two parameters were chosen as the

unit value. For this case, d and Zscore are equivalent, but the safety stock

parameter (d)was manually adjusted to reach the target percentage of

served demand (see Fig. 2).

Served demand in Eq. (18) is the minimum of actual demand D(t,r)

and the supply level S(t,r). If the supply level is smaller than the actual

demand, there is some unserved demand (i.e., SD(t,r)=S(t,r)). On the

contrary, if the supply level is higher than the actual demand, there are

some oversupplies, as in Eq. (20) (i.e., SD(t,r)=D(t,r)and O(t,r)=

S(t,r)−D(t,r)≥0). The percentage of served demand in Eq. (19) refers to

the total expected served demand ratio to the total actual demand. Since

the total actual demand equals the sum of served and unserved demand,

the percentage of unserved demand equals one minus the percentage of

served demand (1−P). Therefore, a supply level model is considered

efcient if it has the smallest mean oversupply in Eq. (21) while

retaining the same percentage of served (or unserved) demand. At a

specic percentage of served demand, the value of the safety stock

parameter of each supply level model might have a different value, see

Fig. 2. R and T are the total number of regions and time intervals,

respectively.

Data preparation and featuring

Three different datasets were employed to examine the effectiveness

of the proposed framework and compare the temporal pattern. We got

the data from Neuron Mobility (https://www.rideneuron.com), the

operator of shared e-scooters in Thammasat University Rangsit Campus

(Thailand). The other two datasets were retrieved from open data

websites operating in Austin Texas (https://austintexas.gov/share

dmobility) and Minneapolis Minnesota (https://opendata.minneapolis

mn.gov), US.

Due to data limitations, historical trip data were commonly used to

evaluate the proposed methodologies in previous studies (He and Shin,

2020; Le Quy et al., 2019; Li et al., 2021; Xu et al., 2018; Zhang et al.,

2020). Similarly, this study examined the proposed framework based on

the observed demand data (or historical trip data). The data obtained by

Neuron Mobility and all available open data websites are the observed

demand (the actual services provided). To acquire the potential demand

(i.e., trips that might occur if there were available e-scooters), one may

need to access the data of the user’s activities on the mobile application

(Ham et al., 2021), such as the user’s requests or searches for potential e-

scooters nearby. These kinds of data were commonly unavailable unless

provided by the operators. However, there were also cases where the

observed data could be used, for example, when the e-scooters were not

fully utilized at most stations (in the case of Thammasat University, the

data were collected during the rst few months, whereas the demand of

e-scooters was still low). In this case, the observed demand equals the

potential demand. The operators needed to increase the safety stock to

cover a higher uncertainty of demand variation whenever the observed

demand data was used instead of the potential demand data. It is noted

that the potential demand data should be used in the planning when

they are available.

We removed the trips during the rst several months for Austin

because it mainly operated in the Downtown area. The abnormal trips

were removed using several criteria such as trip duration (less than 30 s

or more than 2 h), trip distance (less than 20 m or more than 10 km), and

date (out of nal date boundary). As a result, the total number of sam-

ples from Thammasat, Minneapolis, and Austin were 2,352 (24 x 98

days), 4,704 (24 x 196 days), and 13,680 (24 x 570 days), respectively,

during the date mentioned in Table 1.

In this study, the term “demand” refers to the total pick-up trips

during a specic time interval (1 h) and region. Thammasat Rangsit

Campus has an area of only 3.21 km

, so we attempted to predict overall

demand. In Austin, the data come from each census tract, which has an

average area of 2.05 km

. Shared e-scooters were operated in more than

50 census tracts of the Austin metropolitan area, but we selected only

the top 30 census tracts with an average hourly demand of more than

one trip. There was a signicant difference in the demand between the

Downtown and other census tracts, with an average hourly demand of

around 208 and 10, respectively, see Fig. 3. The trip locations of Min-

neapolis data were recorded using the street name. The street center thus

was used as the trip’s coordinates. To diversify the spatial clustering, the

K-means algorithm was employed to group the trips in Minneapolis. The

Elbow method’s optimal number of spatial clustering was 15. Therefore,

the average area of these clusters was about 10 km

, but the inner

Fig. 2. Flowchart of supply level models comparison.

Table 1

Dataset’s information.

Description Thammasat (TH) Minneapolis (MN) Austin (TX)

Start Date 23-Jan-19 14-May-19 1-Aug-18

End Date 30-Apr-19 25-Nov-19 21-Feb-20

# Days 98 196 570

# Trips 29,132 913,781 8,689,720

# Time Intervals (T) 2,352 4,704 13,680

# Regions (R) 1 15 30

Trip Distance (km) 1.3 1.7 1.5

Trip Duration (min) 11.6 12.7 10.4

N. Saum et al.

Transportation Research Interdisciplinary Perspectives 23 (2024) 101019

clusters with denser trips were several times smaller than the outskirt

area.

The ndings by Wang et al. (2021) demonstrate that the time-

varying variables signicantly impact prediction performance (with

aggregated relative importance metrics of 88 %), surpassing the inu-

ence of other static variables such as the built environment and socio-

economic factors. The impact of these time-varying variables could be

even more pronounced, especially for shorter prediction time intervals

like our study. Conversely, static variables might prove essential for

longer time intervals (e.g., daily predictions), with a comprehensive set

detailed in Hu and Xiong (2023). Therefore, our research focuses on

collecting time-varying variables as predictive inputs, encompassing

weather attributes, public holidays, and local fairs and festivals.

Similar to shared bikes, e-scooter ridership is also affected by

weather conditions. Weather Underground is a global weather network

providing a variety of weather attributes at one-hour intervals

(https://www.wunderground.com). We got seven weather attributes for

training such as temperature, precipitation, wind speed, humidity, wind

gust, pressure, and dew point. Linear interpolation was employed to ll

in the missing values.

Fig. 4 shows some abnormal patterns of shared e-scooter demand.

We found that high demand was correlated with some annual festivals or

fairs. In Austin, those special annual events were the Annual SXSW,

Pecan Street Festival, H-E-B Austin Symphony, and City Limits Music

Festival. Likewise, Minneapolis had high ridership during annual events

such as OpenStreets, Pride Festival Parade, Stone Arch Bridge Festival,

Uptown Art Fair, and State Fair Festival. The special promotion during

the season Market event at Thammasat University also led to very high

demand. Public holidays also affected ridership, especially in the

Thammasat dataset, where ridership dropped sharply when most

students did not come to school. Besides the daily and weekly patterns,

ridership of shared e-scooters also had seasonal patterns. In general, we

could see that the demand was very high during summer but relatively

low during winter. Due to the riders’ safety, the operators in Minneap-

olis had to stop the operation during winter, while the number of rid-

erships gradually dropped towards the coming snow. Moreover, the

operators of shared e-scooters were advised to postpone the operation

during the US president’s state visit to Minneapolis on Oct. 10, 2019.

Therefore, these attributes were recorded as binary variables for de-

mand prediction models, including annual festivals or events, public

holidays, hour of the day, day of the week, day of the month, and

temporary ban (in Minneapolis). As a short-range mode, Thammasat,

Austin, and Minneapolis’s average trip distances were 1.3, 1.5, and 1.7

km, respectively. The riders spent around 11.6, 10.4, and 12.7 min on e-

scooters. Based on the average fee in (Saum and Piantanakulchai, 2019),

the revenue from each trip in these three cities was about 1.75, 2.56, and

2.91 US dollars, respectively. These fares were relatively higher than

those for shared bikes, which could be one reason why shared e-scooters

were not favored for commuting trips.

As shown in Fig. 5, the ridership in Austin and Minneapolis had a

very similar pattern from Monday to Thursday, while the demand

increased gradually and sharply during both the afternoon and evening

on Friday and Saturday, respectively. This was simply because people

used e-scooters for other relaxing activities after a tiring week. On

Sunday, the demand in Austin was just like other weekdays but rela-

tively high in the afternoon. A bit different from Austin, the ridership on

this day in Minneapolis was very similar to the weekdays but very low in

the evening compared to other days of the week. In addition, we could

also observe the difference between weekdays and weekends for a small

peak in the morning. This means that shared e-scooters were also used as

Fig. 3. Average hourly demand (# trips/hour) in each census tract of Austin, TX.

N. Saum et al.

Transportation Research Interdisciplinary Perspectives 23 (2024) 101019

a commuting mode, but the ratio was still low. During the public holi-

days, the demand in Austin and Minneapolis is lower than on an ordi-

nary day but slightly higher than most weekdays in the afternoon. For

both cities, the demand strongly increased on the annual festival days,

especially in Austin, where the demand was about twice the regular

days. For Thammasat University dataset, the demand on weekends was

relatively low compared to weekdays, while the demand on Friday af-

ternoon was lower than on other weekdays. This pattern showed the

correlation between e-scooter demand to the presence of students and

staff on campus. In general, the demand on Tuesday was higher than on

the other days of the week. Moreover, the ridership was also correlated

with student activities, i.e., the demand increased from the early

morning until the afternoon. Like the previous two datasets, the

ridership in Thammasat was considerably high during the annual events.

In summary from all datasets, the demand of shared e-scooters had a

signicant weekly pattern, especially between weekdays and weekends,

relatively low demand on public holidays, and surprisingly high demand

on annual festivals or events.

From the demand patterns explained above, the inputs for demand

prediction models were selected accordingly, as summarized in Table 2.

Since there were both daily and weekly seasonal patterns, the lookback

length for demand prediction would range from 24 to 168 (24 x 7 days).

Table 2 shows that Box Cox transformation (# Trips BC) signicantly

minimizes hourly demand volatility compared to the original scale (#

Trips). The three inputs, the historical average of overall demand (HAO

of weekly, holiday, and event), are very important for SARIMAX as it

Fig. 4. Hourly demand of shared e-scooters in Austin TX (top), Thammasat University TH (bottom left), and Minneapolis MN (bottom right).

Fig. 5. Average hourly demand of shared e-scooters by day of the week, public holiday, and annual festival or event.

N. Saum et al.

Transportation Research Interdisciplinary Perspectives 23 (2024) 101019

impractically accepts the binary value of these exogenous variables. Ban

was imposed only in Minneapolis, so this feature was included in the

prediction models as the binary variable.

To assure the model generalization, the total datasets were split into

two parts: training part (in sample) and testing part (out of sample). The

rst 75 % of the dataset was used as model training, and the rest was for

testing (see Fig. 4). As shown in Fig. 4, the volatility at attening de-

mand is low, but the volatility is proportionally higher at high demand.

Similarly, there are many relaxing big or small activities, festivals, and

fairs during the summer, so the demand for shared e-scooters is also high

and uctuates. Therefore, 75 % of the training part was randomly

selected for model training, and another 25 % was for model evaluation.

Random split was chosen because it requires far less computational time

compared to K-Folding (especially in hyperparameter tuning), and it

could learn some explanatory variables (such as events, holidays, and

ban) that happened on a specic date that might not be included in the

model if using the conventional time-series split. However, the Tham-

masat Dataset seemed relatively small, so all the data were used for

model training and evaluation.

Demand and variance prediction

Demand prediction

The formulation of GRU cell was described in Section 3.3. The

conguration of GRUs is shown in Fig. 6, composing the input layer with

GRU nodes, one dropout layer, a group of hidden layers with GRU nodes,

and the output layer with conventional neurons. Fig. 6 means all inputs

were sequentially arranged before proceeding to the input layer, while

the outputs from this layer were dropped at some specic rate to in-

crease the learning performance with smoother steps. The hidden layers

were set to have the same activation function and the number of nodes.

This study addresses spatiotemporal dependencies in shared e-scooter

demands through two distinct approaches, where the models were

trained spatially independently or combined. Consequently, the output

layer of GRUs in this study has one or multiple neurons for the training

with spatial independence or spatial combination, respectively. Both

spatially independent and spatially combined architectures were

examined, while the best result was selected. These two training ap-

proaches have their advantages and disadvantages. Spatially indepen-

dent training allows the models to reach the optimal learning curve

freely, but it may lose some vital information from neighboring regions.

On the contrary, the model with multiple spatial outputs shares the

correlated information across regions to improve the prediction per-

formance, but the optimal results must be leveraged.

In the rst approach, prediction models were trained with spatial

independence, meaning one model for each zone. In this scenario,

spatiotemporal dependencies were primarily addressed by including

various external features in the input layer and the optimized lookback

length. In other words, inputs in this approach consisted of historical

demands specic to each zone and other external variables, while the

models had only a single output. This conguration is similar to that

proposed by Yang et al. (2023) but with a few differences, like multi-step

prediction and an additional attention layer. In contrast, the second

approach was trained with spatial combination (i.e., one model for all

zones). Inputs included all spatial demands and external variables for a

specic lookback length, while outputs comprised all spatial demands.

Spatiotemporal dependencies in this approach were managed through

the weights and biases of the prediction models. A similar architecture

was proposed to predict short-term taxi demands in New York City,

except for an additional feature selection mechanism based on a

nonlinear Granger causality test (Luo et al., 2021).

Even deep learning models could outperform conventional proba-

bilistic models or machine learning algorithms; they also require time-

Table 2

Description of inputs for demand prediction models.

Inputs Thammasat (TH) Minneapolis (MN) Austin (TX)

# Trips 11.59 ±11.65 12.50 ±22.39 16.91 ±59.98

# Trips BC* 2.72 ±1.55 1.65 ±1.73 1.88 ±2.56

Temperature 30.46 ±3.31 15.96 ±9.33 19.70 ±9.37

Dew point 23.48 ±2.89 9.29 ±8.81 13.24 ±8.84

Humidity 68.28 ±16.05 66.89 ±16.22 69.96 ±20.27

Wind speed 12.15 ±5.00 14.26 ±7.78 13.01 ±9.23

Wind gust 0.06 ±2.19 7.66 ±16.82 4.95 ±13.46

Pressure 1010.24 ±2.78 983.87 ±6.31 996.93 ±13.38

Precipitation 0.12 ±0.46 0.16 ±1.09 0.10 ±1.03

HAO

weekly 11.88 ±8.78 192 ±159.68 527.44 ±369.44

HAO

holiday 11.63 ±7.68 157.98 ±141.91 409.31 ±282.48

HAO

event 25.38 ±18.14 253.53 ±220.21 1082.10 ±793.52

Hour of day 0–1 0–1 0–1

Day of week 0–1 0–1 0–1

Day of month 1–31 1–31 1–31

Holiday 0–1 0–1 0–1

Event 0–1 0–1 0–1

Ban – 0–1 –

BC: Box Cox scale.

HAO: Historical Average of Overall demand.

Fig. 6. The proposed architecture of GRUs model.

N. Saum et al.

Transportation Research Interdisciplinary Perspectives 23 (2024) 101019

consuming hyperparameter optimization. Hyperparameter Optimiza-

tion (HPO) refers to the approach of optimizing the number of GRU

nodes for the input layer, dropout rate, number of hidden layers, etc.

Many techniques were employed in this stage, including grid search,

random search, and automatic optimization algorithms (Bayesian

Optimization, Tree-structured Parzen Estimator, genetic algorithm,

etc.). Bayesian Optimization (BO) is a popular sequential optimization

technique for expensive problems, especially HPO of deep learning

models. BO’s two critical components are the surrogate function

(Gaussian Processes) and the acquisition function (Upper Condence

Bound), which play an essential role in balancing exploration and

exploitation. Keras Tuner (O’Malley et al., 2019), a Python package for

HPO based on Bayesian optimization, was employed to optimize the

congurations of the GRUs, which were run on top of Keras and Ten-

sorFlow on Jupyter Notebook. All parameters of BO were set as defaults

while the objective function was validation loss, the number of initial

points was 10, and the maximum iterations were 80. The number of

epochs was tuned using the early stopping criteria with the patience

value of 10 and the maximum number of epochs of 150.

In this study, BO was employed to optimize nine crucial hyper-

parameters of GRUs, including lookback length, activation of input

layer, number of GRU nodes in input layer, dropout rate, number of

hidden layers, number of GRU nodes in hidden layer, activation of

hidden layer, activation of output layer, and batch size (see Table 3). The

HPO was split into several sequential steps for a few reasons: training

time caused by lookback length and the number of hidden layers, local

optima, non-convergence iteration, and exploding iteration (i.e., loss

function becomes innite). First, deep learning models with only one

hidden layer were optimized independently for different lookback

lengths (24, 48, …, 168) to nd the optimal lookback length. In each

case of lookback length, BO with the above settings searched the mini-

mum validation loss by changing the number of nodes per layer, dropout

rate, the activation function of each layer, and batch size. After nding

the optimal lookback length, deep learning congurations were reopti-

mized to account for a higher number of hidden layers. Three activation

functions were considered, such as ReLU, Tanh, and Sigmoid, while the

dropout rate was between 0.00 and 0.40 with the step of 0.01. The

number of nodes per layer was between 10 and 500 at the grid of 10. The

batch size had a range of 4–1000. Other parameters of GRUs were set as

the default value, including optimizer (Adam), learning rate (0.001),

and loss function as Mean Squared Error (MSE).

The prediction performance of GRUs was compared to the other ve

benchmark models, including SARIMAX, RF, XGBoost, FCNNs, and

RNNs. The historical average (HA) was also included to show the impact

of Box Cox transformation on RMSE and MAE. The other ve demand

prediction models were also optimized using BO (for FCNNs and RNNs)

and grid search (for SARIMAX, RF, and XGBoost), as shown in Table 3.

The SARIMAX, RF, and XGBoost models were trained independently for

each zone, while FCNNs and RNNs had the same congurations as GRUs.

The difference between training and validation loss was set to around

15 % to control the overtting problem, especially RF and XGBoost. A

short introduction of these ve baseline models is given as follows:

SARIMAX: is popular statistical regression assuming the linear cor-

relation between future demand and explanatory variables, including

past observations, residuals, and exogenous variables. As mentioned

above, three exogenous variables (see Fig. 5) were included in SAR-

IMAX: the hourly average demand by day of the week, public holiday,

and event. SARIMAX was trained using a statistical program, STATA,

since it was more convenient for out-of-bag evaluation. As shown in

Table 3, six parameters of SARIMAX were optimized by grid search,

including degree of differencing (d), deseasonalizing degree (D), sea-

sonal (P)/non-seasonal (p) autoregressive lag polynomial, and seasonal

(Q)/non-seasonal (q) moving average lag polynomial. All parameters,

including the exogenous variables of this model, must be statistically

signicant at 95 %, while the model with the smallest RMSE was

selected.

RF: is a powerful machine learning algorithm dealing with high-

dimensional data while requiring just a small amount of data and

training time, introduced by Breiman (2001). RF leverages the results

from many random trees’ predictions, while numerous trees are built

from randomly selected inputs or combinations of inputs (bootstrapped

sampling). RF was tuned for three important hyperparameters,

including lookback length (24–168), the number of trees in the forest

(10–500), and the maximum depth of the tree (0–15). Random Forest

was trained by a Python library, Scikit-learn.

XGBoost: is one of the most popular machine learning algorithms for

regression and classication problems based on gradient-boosted deci-

sion trees. This approach could handle massive data using sparsity-

aware splitting algorithm, cache-aware algorithm, and distributed

memory computing technology (Chen and Guestrin, 2016). In this

paper, three hyperparameters of this model were tuned, such as look-

back length, the number of gradient-boosted trees, and the maximum

depth of the tree using the XGBoost python module.

FCNNs: are the most basic architecture of articial neural networks

(ANNs), where all the nodes in one layer are connected to the nodes in

the next layer. FCNNs are widely applied to classication and regression

Table 3

Description of hyperparameter optimization for demand prediction models.

Model Parameters Value range Tuning

GRUs

RNNs

FCNNs

Lookback length 24, 48, …, 168 Bayesian Optimization

Activation input layer Relu, Tanh, Sigmoid

# Nodes input layer 10, 20, 30, …, 500

Dropout rate 0.0–0.40

# Hidden layer 1–5

Activation of hidden layer Relu, Tanh, Sigmoid

# Nodes in hidden layer 10, 20, 30, …, 500

Activation output layer Relu, Tanh, Sigmoid

Batch size 4–1000

XGBoost Lookback length 24, 48, …, 168 Grid Search

# Gradient-boosted trees 10, 15, 20, …, 500

Max-depth of tree 0, 1, 2, …, 10

Random Forest Lookback length 24, 48, 72, …, 168 Grid Search

# Trees in forest 10, 15, 20, …, 300

Max-depth of tree 0, 1, 2, …, 15

SARIMAX(p, d, q) *

(P, D, Q, 24)

p 0–5 Grid Search

d 0–2

q 0–5

P 0–2

D 0–2

Q 0–2

N. Saum et al.

Transportation Research Interdisciplinary Perspectives 23 (2024) 101019

problems in transportation engineering. In this study, the conguration

was the same as GRUs (Fig. 6) and was optimized by BO. As a benchmark

model, the number of hidden layers and nodes per layer of FCNNs was

set to at most 2 and 100, respectively, the standard architecture (Liu

et al., 2019; Wang and Kim, 2018).

RNNs: is a popular deep learning approach for time-series datasets

for their ability to accurately learn the temporal sequences and their

long-range dependencies. Similar to FCNNs, the conguration of RNNs

was set as shown in Fig. 6 and optimized by Bayesian Optimization while

limiting the number of hidden layers and nodes per layer to at most 2

and 100, respectively.

According to our ndings, the sigmoid activation function required

nearly twice as many epochs as the Tanh or ReLU functions but had a

smoother learning curve. Austin data needed to be trained spatially

independently, while Minneapolis data were trained as multiple spatial

outputs. Overall, training GRUs with the original data yielded better

results than training them with the normalized data, particularly in

terms of model generalization. The reason was that the optimal archi-

tectures of normalized data had Tanh or Sigmoid activation function,

which effectively learned the training data and fell into the overtting

problem, especially when compared to the benchmark models. Similar

ndings as in (Saum et al., 2020), Box Cox transformation had simpler

models than the original scale. This was observed from SARIMAX

models in which the exogenous variables were mostly insignicant. In

addition, the optimal GRUs model of Austin data had two hidden layers

for the original scale but only one for Box Cox transformed data.

Therefore, Box Cox transformation is suitable for deep learning as it

could reduce the training time, especially during hyperparameter

tuning.

Table 4 shows the performance comparison between Box Cox

transformed data and original or normalized data (Thammasat Dataset

does not have testing data). For original or normalized data, deep

learning could improve the prediction performance of both RMSE and

MAE, which strongly depend on the number of tuned hyperparameters.

Box Cox transformation also had similar patterns for Austin and

Thammasat datasets, but Minneapolis. As a generalized logarithmic

transformation, Box Cox exponentially transformed the abnormal de-

mand (outliers) closer to the mean value, leading to a simpler model and

accuracy improvement (especially MAE metric). This characteristic can

be found in Table 2, as the mean and standard deviation ratio of demand

between the original and Box Cox scales was about 7 and 15, respec-

tively. In addition, Table 4 also shows that the RMSE of Historical

Average (HA) in the original scale was lower than that of Box Cox scale,

but MAE metric. The effect of Box Cox transformation on demand

volatility is the reason why deep learning models of the Minneapolis

dataset had even worse performance than that of SARIMAX. The logic

here was Box Cox transformation made the temporal information from

neighbor regions unnecessary. This meant that the Box Cox transformed

data of Minneapolis should be trained spatially independently. There-

fore, Box Cox transformation is desirable for training datasets with a

high abnormality or less exogenous variables. In summary from all

datasets, Box Cox transformation reduced the RMSE and MAE metrics by

0.14 % and 5.36 %, respectively. This accuracy improvement by Box Cox

transformation may not be very signicant, but it is acceptable for ease

of implementation and dealing with outliers.

SARIMAX had better prediction accuracy on the testing dataset than

other models because the testing data was during the low-demand sea-

son. However, this regression model had limited performance during the

high-demand season (like summer), while GRUs achieved precise per-

formance for both training and testing datasets. Fig. 7 shows the com-

parison of e-scooter demand prediction by GRUs with the original and

Box Cox scale for the Downtown census in Austin, Texas. These two

models performed very well in learning the hourly demand of shared e-

scooters. Even though they also have some different prediction results,

especially during peak demand. Overall, both models correctly predict

the nighttime demand (low demand) but perform poorly during the

afternoon and evening as demand and volatility are high.

Table 4

Performance comparison based on RMSE and MAE.

Dataset Models Original or Normalized Data Box Cox Transformed Data

RMSE-Eval. RMSE-Test MAE-Eval. MAE-Test RMSE-Eval. RMSE-Test MAE-Eval. MAE-Test

Thammasat Thailand GRUs 5.27 – 3.41 – 5.18 – 3.37 –

RNNs 5.52 – 3.75 – 4.91 – 3.40 –

FCNNs 5.52 – 3.76 – 5.00 – 3.46 –

XGBoost 5.21 – 3.64 – 5.17 – 3.46 –

Random forest 5.30 – 3.72 – 5.31 – 3.56 –

SARIMAX 5.47 – 3.82 – 5.26 – 3.62 –

Historical average 11.65 – 8.69 – 12.24 – 8.32 –

Minneapolis Minnesota GRUs 6.96 6.34 3.58 2.89 7.18 6.92 3.67 2.99

RNNs 7.07 6.25 3.49 2.85 7.75 6.82 3.80 2.95

FCNNs 7.53 6.48 4.06 3.08 8.38 7.30 3.83 3.33

XGBoost 7.44 6.44 4.04 3.37 7.16 6.04 3.66 2.73

Random forest 7.34 6.39 3.85 3.21 7.35 6.20 3.76 2.81

SARIMAX 7.79 6.24 4.08 3.07 7.72 6.03 3.85 2.64

Historical average 21.21 16.83 12.97 11.08 23.80 17.39 12.28 8.55

Austin Texas GRUs 11.24 11.28 4.15 3.70 11.20 11.01 4.00 3.54

RNNs 11.34 11.58 4.21 3.73 11.52 11.96 4.10 3.60

FCNNs 11.47 11.22 4.16 3.67 12.50 11.86 4.21 3.72

XGBoost 11.29 11.83 4.29 3.89 13.06 11.75 4.29 3.70

Random forest 12.18 12.00 4.31 3.92 12.54 12.21 4.26 3.77

SARIMAX 12.30 11.13 4.58 3.83 12.60 11.23 4.40 3.54

Historical average 50.49 36.44 14.82 12.53 53.95 37.30 14.34 10.90

Fig. 7. Demand prediction by GRUs with original and Box Cox scale for the

Downtown in Austin, TX.

N. Saum et al.

Transportation Research Interdisciplinary Perspectives 23 (2024) 101019

Fig. 8. Daily scatter plot and histogram of GRUs’ residuals for Downtown Census in Austin, TX: (top) original data and (bottom) Box Cox transformed data.

Fig. 9. Variance prediction for residuals of GRUs with original scale data of Downtown Census in Austin, TX.

N. Saum et al.

Transportation Research Interdisciplinary Perspectives 23 (2024) 101019

Variance prediction and supply planning

Extracting all useful information from historical data is crucial for

dockless shared e-scooters’ daily operational planning to properly

manage resources and minimize the related operating costs. Even de-

mand prediction models could forecast future demand at state-of-the-art

performance; some uncertainties still arise from both prediction models

and the related errors in historical data. Safety stock is commonly

employed to cover these uncertainties, depending on demand variation

in Eq. (17). For this reason, variance analysis is necessary for designing

an efcient supply level. Two critical characteristics of residuals are

essential for supply level design, distribution and heteroscedasticity. The

residuals of the forecasting model commonly follow normal distribution

or student’s t-distribution. This characteristic is important for choosing

the condence level parameter (Zscore) or the cover rate (number of data

lay within the condence interval bound). As explained in Section 3.4,

heteroscedasticity refers to the temporal pattern of residuals; the in-

ventory thus should be designed proportionally.

Fig. 8 shows the daily scatter plot and histogram of GRUs’ residuals

with the original and Box Cox scale. For the original scale, the distri-

bution had fatter tails than the normal distribution, so the student’s t-

distribution was more appropriate for these residuals. For GRUs with

Box Cox scale, the residuals had slightly fat tails as it was practically

ignorable. From the daily scatter plot, we could see a clear daily pattern

of residuals of GRUs with the original scale but almost constant for the

Box Cox scale. To conrm the heteroscedasticity of the residuals, ARCH-

LM test was performed. As a result, we could reject the null hypothesis

(no ARCH effects) as the p-value was less than 5 % for both cases original

and Box Cox scale. However, the coefcients of SGARCH model of Box

Cox scale were relatively small, so we could statistically ignore the

ARCH effects (Saum et al., 2020; StataCorp, 2013).

In the case of GRUs with the original scale in Fig. 8, the 97.5 % upper

Condence Interval (CI) with constant standard deviation had the cover

rate (or service level) of 96.79 %. The slight difference was not the main

problem, but the distribution of residuals above the upper CI. In the rst

half (0–11), only 0.56 % of the residuals lay above the upper CI, while

2.65 % of the residuals lay above the upper CI in the second half

(12–23). This meant that the upper CI with constant standard deviation

had an excellent cover rate in the rst half but a poor cover rate in the

other half. On the other hand, the 97.5 % upper CI with daily standard

deviation had an overall cover rate of 96.8 %, while the outliers (re-

siduals lay above upper CI) were 1.65 % and 1.55 % for the rst and the

second half, respectively. For this cover rate, the percentage of served

demand was 99.24 % and 99.36 % for upper CI with constant and daily

standard deviation, respectively. The upper CI with constant standard

deviation had a supply ratio (i.e., the ratio of total supply to total actual

demand) of 145 %, and that of the upper CI with daily standard devia-

tion was only 139.4 %. In other words, despite having the same cover

rate, the upper CI (or supply level) with daily standard deviation had

lower inventory (lower operational cost) but a higher percentage of

served demand (higher trip revenue) than the upper CI with constant

standard deviation.

As shown in Fig. 8, the residuals of the original scale still had the

seasonal pattern, while the ARCH-LM test also conrmed the presence of

ARCH effects. Therefore, Seasonal GARCH in Eq. (16) was trained to

extract temporal variance patterns further. Fig. 9 compares variance

prediction models (Constant, Daily Seasonal, and SGARCH) for the ab-

solute residuals of GRUs with the original scale of Downtown Census in

Austin, Texas. This graph shows that the constant variance or mean

squared error (Constant_STD) approach performs poorly, as it cannot

capture the conditional variance. The daily seasonal variance (Dai-

ly_STD) could somehow include the daily volatility pattern, but it is not

exible enough for long-term demand. On the other hand, the predicted

variance by SGARCH (SGARCH_STD) is very adaptable to conditional

variance. However, it has one main disadvantage: SGARCH transfers it

to the next seasonal step once there is a considerable error.

Fig. 10 compares the four supply level models of GRUs at 98 % served

demand. Supply levels with constant variance had high oversupply at

nighttime demand but failed to meet the afternoon demand. On the

other hand, supply levels with Box Cox variance perform very well,

except for some peak points caused by the logarithmic inversion effect.

Supply levels with daily and SGARCH variance had a similar pattern, but

SGARCH variance better allocated the uncertainty in the long-term de-

mand. In summary, variance analysis was necessary for the original or

normalized data, but the constant variance was sufcient for the Box

Cox transformed data. For original or normalized data, three types of

variances were examined: constant, daily, and predicted variance by

SGARCH. Four supply level models were compared for each demand

Fig. 10. Comparison of supply level models of GRUs at 98% served demand (cover rate of around 90%) of Downtown Census in Austin, TX.

N. Saum et al.

Transportation Research Interdisciplinary Perspectives 23 (2024) 101019

prediction model as three of them were the demand prediction model

with the original or normalized scale with three different variance

models (constant variance, daily variance, and predicted variance by

SGARCH). And another one was the demand prediction model with Box

Cox transformed data with constant variance. The supply level designs

were compared for three demand prediction models: SARIMAX,

XGBoost, and GRUs. These three models were popular prediction models

in the probabilistic based, machine learning, and deep learning models.

As mentioned above, the condence interval was unsuitable for daily

operational planning as it did not account for the intensity of the re-

siduals, specically for the heteroscedastic dataset. Moreover, different

models of CI tended to have different inventory levels (operational cost)

and expected served demand (trip revenue) even they had the same

cover rate. So, this study chose to compare the supply level models at the

same percentage of served demand (same number of served or unserved

demand) in order to compare the number of oversupplies (as MO). To

achieve the same percentage of served demand, the safety stock

parameter (d) was adjusted independently for each supply level model

following the owchart in Fig. 2. In practice, this parameter should be

set following the desired service level (same as Zscore) or adjusted until

the supply level reaches the maximum number of e-scooters.

Mean oversupply (MO) in Table 5 was compared on training data for

the Thammasat dataset, while the other two were compared on the

testing dataset. According to the Thammasat result, Box Cox trans-

formation provided an efcient supply level (the lowest mean over-

supply), except XGBoost. Moreover, SARIMAX model of Box Cox

transformed data could have a comparable MO with the worst case of

GRUs, constant variance, and even better at a higher percentage of

served demand (95 % up). Overall, the MO of GRUs was smaller than

that of SARIMAX, which showed the importance of demand prediction

performance. The difference between the worst and the best cases of

GRUs’ supply level model was signicantly increased by the percentage

of served demand, up to one at 98 % served demand. In other words, the

operators had the average hourly oversupplies of 8 e-scooters to achieve

98 % served demand using the supply level model with constant vari-

ance, but they could reduce the oversupply to around 7 e-scooters per

hour for using the supply level with Box Cox variance. At this reduction

rate, the operator could save up to 30 e-scooters for 10 spatial regions at

a 3-hour rebalancing cycle (i.e., reduce 30 e-scooters from the reba-

lancing operation).

In the present comparison of the Minneapolis dataset, the MO metric

trend was similar to that of accuracy performance, i.e., SARIMAX and

XGBoost had good performance with Box Cox transformation, but GRUs

had small MO with the original scale. At a low percentage of served

demand (or d<0), the supply level slightly differed in MO, but it

changed signicantly at a high percentage of served demand. This

dataset also showed the limitation of Box Cox transformation as high MO

at 98 % served demand. The reason was the effect of the exponential

transformation, specically when the value of lambda (λ) is close to −1.

Therefore, the maximum value of the designed supply level, S(t,r), should

be carefully limited for Box Cox transformed data. SARIMAX had the

lowest MO in this dataset because SARIMAX had better demand pre-

diction on the testing dataset. However, GRUs likely perform better in

both demand prediction accuracy and mean oversupply during high-

Table 5

Mean oversupply comparison for four supply planning models.

Dataset Supply Level Model Mean Oversupply by Percentage of Served Demand

Demand Model Variance Model 70 % 75 % 80 % 85 % 90 % 95 % 98 %

Thammasat Thailand GRUs Constant Variance 0.615 0.835 1.179 1.710 2.707 4.880 8.069

Daily Variance 0.586 0.816 1.160 1.704 2.629 4.490 7.130

SGARCH Variance 0.591 0.815 1.181 1.704 2.605 4.458 7.259

Box Cox Variance 0.465 0.695 1.069 1.631 2.557 4.330 7.091

XGBoost Constant Variance 0.546 0.774 1.106 1.667 2.754 4.928 8.184

Daily Variance 0.546 0.793 1.138 1.670 2.601 4.441 7.075

SGARCH Variance 0.577 0.814 1.147 1.668 2.566 4.313 6.909

Box Cox Variance 0.527 0.773 1.132 1.672 2.572 4.385 6.951

SARIMAX Constant Variance 0.745 1.029 1.414 2.029 3.063 5.304 8.974

Daily Variance 0.772 1.046 1.418 2.021 2.941 4.779 7.563

SGARCH Variance 0.721 0.992 1.401 2.030 3.022 5.022 7.831

Box Cox Variance 0.631 0.897 1.282 1.875 2.813 4.603 7.304

Minneapolis Minnesota GRUs Constant Variance 0.582 0.803 1.142 1.980 3.562 6.894 11.845

Daily Variance 0.597 0.821 1.155 1.676 2.700 4.916 8.202

SGARCH Variance 0.594 0.829 1.155 1.653 2.516 4.412 7.504

Box Cox Variance 0.573 0.815 1.200 1.823 2.948 5.868 11.484

XGBoost Constant Variance 0.538 0.785 1.185 1.849 3.428 6.892 12.078

Daily Variance 0.589 0.855 1.242 1.891 3.024 5.379 8.844

SGARCH Variance 0.704 0.976 1.351 1.918 2.907 4.912 8.212

Box Cox Variance 0.439 0.639 0.942 1.437 2.364 4.682 8.880

SARIMAX Constant Variance 0.538 0.778 1.129 1.695 2.994 5.977 10.696

Daily Variance 0.571 0.806 1.149 1.706 2.629 4.702 7.844

SGARCH Variance 0.618 0.857 1.203 1.717 2.529 4.210 6.957

Box Cox Variance 0.414 0.602 0.891 1.371 2.254 4.333 8.407

Austin Texas GRUs Constant Variance 0.528 0.740 1.054 1.538 2.577 5.457 11.131

Daily Variance 0.484 0.686 1.000 1.520 2.538 5.003 9.784

SGARCH Variance 0.492 0.709 1.030 1.537 2.485 4.683 8.380

Box Cox Variance 0.305 0.502 0.822 1.356 2.329 4.699 9.094

XGBoost Constant Variance 0.533 0.769 1.124 1.665 2.682 5.608 11.766

Daily Variance 0.495 0.720 1.065 1.629 2.679 5.241 10.259

SGARCH Variance 0.540 0.786 1.144 1.681 2.642 4.958 8.886

Box Cox Variance 0.349 0.573 0.932 1.523 2.627 5.081 9.521

SARIMAX Constant Variance 0.546 0.771 1.105 1.623 2.616 5.305 10.768

Daily Variance 0.543 0.765 1.097 1.623 2.563 4.780 9.038

SGARCH Variance 0.496 0.728 1.077 1.619 2.548 4.583 7.992

Box Cox Variance 0.330 0.530 0.849 1.382 2.338 4.469 8.375

Note: Box Cox Variance refers to the supply planning model made of a demand prediction model with Box Cox transformed data and the constant variance.

Constant, Daily, and SGARCH Variance refer to supply planning models made of a demand prediction model with original or normalized data and the constant, daily,

and predicted SGARCH variances, respectively.

N. Saum et al.

Transportation Research Interdisciplinary Perspectives 23 (2024) 101019

demand seasons like summer.

Box Cox transformation resulted in a small MO from the Austin

dataset for up to 90 % served demand. This method had the same

problem as Minneapolis for a higher percentage of served demand, but

predicted variance performed very well. At 95 % served demand, the

reduction of MO of GRUs was around one between constant variance

and predicted variance, which meant that the operators could save

hourly around 30 e-scooters (or 720 e-scooters in daily rebalancing

operation). This reduction could signicantly increase if the number of

regions was higher and the rebalancing period was longer, ex., reduce by

50 e-scooters for 50 regions per hour or around 100 e-scooters for the

same spatial size with a 2-hour rebalancing cycle. Similar to Minneap-

olis, SARIMAX’s MO was smaller than that of GRUs and XGBoost due to

the seasonal pattern of e-scooter demand.

In conclusion from all datasets, accounting for conditional variance

in supply level design could reduce the oversupply by around 26.22 % at

95 % served demand (or the shortage of 5 %). In other words, the pro-

posed framework, which combines demand prediction with variance

analysis, can minimize demand uncertainty, resulting in more efcient

operational planning. Therefore, operators could implement this

framework for the daily operation of shared e-scooters, specically pe-

riodic (or tactical) rebalancing, along with other potential strategies

implemented in shared bikes (Shui and Szeto, 2020). Incentivizing

customers can be integrated with our framework since this strategy can

encourage customers to pick up and drop off e-scooters at a desired

location. Our framework is suitable for periodic rebalancing; hence,

real-time rebalancing operations can be added to respond to sponta-

neous demand spikes.

Conclusion and future work

This research paper proposes a practical framework for designing an

efcient supply planning for the heteroscedastic demand of shared

dockless e-scooters. Several popular deep learning and machine learning

models are applied to forecast the hourly demand, while their residuals

are subjected to variance analysis. Three different datasets of dockless

shared e-scooters (Austin TX, Minneapolis MN, and Thammasat TH) are

employed to evaluate the effectiveness of the proposed approach. The

numerical results show that demand prediction models (especially deep

learning models) can achieve state-of-the-art performance, but the re-

siduals are not white noise. Therefore, the supply planning for such

heteroscedastic demand can be allocated by using variance stabilizing

transformation (Box Cox) or variance analysis (daily seasonal variance

or predicted variance by SGARCH). Seasonal variance (daily pattern)

effectively reduces oversupply but is ineffective for longer temporal

residuals, particularly yearly patterns. However, the conditional vari-

ance model (SGARCH) could overcome this limitation. Another inter-

esting method is using the variance stabilizing transformation, Box Cox

transformation. With the ease of implementation, this transformation

possibly improves the performance of demand prediction models

(particularly the MAE). In addition, it can also remove hetero-

scedasticity, deal with outliers, and provide efcient supply level plan-

ning at a lower percentage of served demand. Nonetheless, the

limitations of this transformation technique are the possibility of

reducing RMSE accuracy and the maximum requirement for the con-

version of some expected demands and supply levels. In other words, a

proper ceiling value is required for supply levels with Box Cox trans-

formation for a higher percentage of served demand.

The conclusion drawn from this result is that demand prediction,

even with deep learning, was insufcient for operational planning for

the shared e-scooters, which has a high maintenance cost, short service

life, irregular demand patterns, and strict regulations. However, the

demand uncertainty of this shared mobility can be minimized by

combining demand prediction and variance analysis. Thus, the proposed

framework can be utilized to increase the efciency of the daily opera-

tion of shared e-scooters or integrated with other strategies, including

customer incentives and real-time rebalancing.

As predictions of demand and variance contribute to the deployment

of e-scooters, it is expected that this combination can also increase the

efciency of operational planning, which can be the direction for future

research. Including the conditional variance in demand prediction

models was also a promising technique for future works as it could

improve the prediction accuracy and promptly provide the expected

demand and variance. Other prospective studies could be: evaluating the

proposed framework with potential demand data and for multi-step

ahead prediction; examining the residuals of recent state-of-the-art

deep learning models, such as Graph Neural Networks (GNNs); supply

level design of net trip ow; and other types of variance stabilizing

methods that can deal with the limitation of Box Cox transformation.

CRediT authorship contribution statement

Narith Saum: Data curation, Methodology, Software, Writing –

original draft, Writing – review & editing. Mongkut Piantanakulchai:

Data curation, Investigation, Project administration, Writing – review &

editing. Satoshi Sugiura: Conceptualization, Formal analysis, Valida-

tion, Writing – review & editing.

Declaration of competing interest

The authors declare that they have no known competing nancial

interests or personal relationships that could have appeared to inuence

the work reported in this paper.

Data availability

The authors do not have permission to share data.

Acknowledgments

This work was supported by the AUN/SEED-Net Collaborative Edu-

cation Program (CEP) between Sirindhorn International Institute of

Technology, Thammasat University, and Hokkaido University. More-

over, the authors would like to thank Neuron Mobility for providing the

ridership data of shared e-scooters in Thammasat University Rangsit

Campus, Thailand.

References

Blickstein, S.G., Brown, C., Yang, S., 2019. E-scooter programs: current state of practice

in US cities. Rutgers University. https://doi.org/10.7282/t3-xc8e-tz93.

Breiman, L., 2001. Random Forests. Machine Learning 45 (1), 5–32.

Chen, T., Guestrin, C., 2016. XGBoost: A Scalable Tree Boosting System. In: In:

Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge

Discovery and Data Mining. Aug. 2016.

Chester, M., 2018. The Electric Scooter Fallacy: Just Because They’re Electric Doesn’t

Mean They’re Green. Chester Energy and Policy. https://www.chesterenergyandpol

icy.com/blog/electric-scooter-fallacy-green.

Cho, K., Van Merri¨

enboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H.,

Bengio, Y.m 2014. Learning phrase representations using RNN encoder-decoder for

statistical machine translation. Retrieved from http://arxiv.org/abs/1406.1078.

Clewlow, R., Foti, F., Shepard-Ohta, T., 2018. Measuring Equitable Access to New

Mobility: A Case Study of Shared Bikes and Electric Scooters. A Populus Report. Nov.

2018. https://research.populus.ai/reports/Populus_MeasuringAccess_2018-Nov.pdf.

Ham, S.W., Cho, J.-H., Park, S., Kim, D.-K., 2021. Spatiotemporal Demand Prediction

Model for E-Scooter Sharing Services with Latent Feature and Deep Learning.

Transp. Res. Rec. 2675 (11), 34–43.

He, S., Shin, K.G., 2020. Dynamic Flow Distribution Prediction for Urban Dockless E-

Scooter Sharing Reconguration. In: In: Proceedings of The Web Conference. Apr.

2020.

Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9 (8),

1735–1780.

Hu, Y., Ni, J., Wen, L., 2020. A hybrid deep learning approach by integrating LSTM-ANN

networks with GARCH model for copper price volatility prediction. Physica A Stat.

Mech. Appl. 557, 124907.

Hu, S., Xiong, C., Chen, P., Schonfeld, P., 2023. Examining nonlinearity in population

inow estimation using big data: An empirical comparison of explainable machine

learning models. Transp. Res. a, Policy Pract. 174, 103743.

N. Saum et al.

Transportation Research Interdisciplinary Perspectives 23 (2024) 101019

Hu, S., Xiong, C., 2023. High-dimensional population inow time series forecasting via

an interpretable hierarchical transformer. Transp. Res. c, Emerg. Technol. 146,

103962.

Khan, P.W., Park, S.-J., Lee, S.-J., Byun, Y.-C., 2022. Electric Kickboard Demand

Prediction in Spatiotemporal Dimension Using Clustering-Aided Bagging Regressor.

J. Adv. Transp. 2022, 8062932.

Kim, S., 2011. Forecasting internet trafc by using seasonal GARCH models. J. Commun.

Netw. 13 (6), 621–624.

King, P.L., 2011. Crack the code: Understanding safety stock and mastering its equations.

APICS Magazine 21 (2011), 33–36.

Kristjanpoller, W., Hern´

andez, E., 2017. Volatility of main metals forecasted by a hybrid

ANN-GARCH model with regressors. Expert Syst. Appl. 84, 290–300.

Kumar, S., Hussain, L., Banarjee, S., Reza, M., 2018. Energy Load Forecasting using Deep

Learning Approach-LSTM and GRU in Spark Cluster. In: Presented at the 2018 Fifth

International Conference on Emerging Applications of Information Technology

(EAIT). Jan. 2018.

Le Quy, T., Nejdl, W., Spiliopoulou, M., Ntoutsi, E., 2019. A Neighborhood-Augmented

LSTM Model for Taxi-Passenger Demand Prediction. Presented at the International

Workshop on Multiple-Aspect Analysis of Semantic Trajectories. Sep. 2019.

Li, X., Xu, Y., Chen, Q., Wang, L., Zhang, X., Shi, W., 2021. Short-Term Forecast of

Bicycle Usage in Bike Sharing Systems: A Spatial-Temporal Memory Network. IEEE

Trans. Intell. Transp. Syst. 1–12.

Liu, X., Gherbi, A., Li, W., Cheriet, M., 2019. Multi features and multi-time steps LSTM

based methodology for bike sharing availability prediction. Procedia Comput. Sci.

155, 394–401.

Luo, H., Cai, J., Zhang, K., Xie, R., Zheng, L., 2021. A multi-task deep learning model for

short-term taxi demand forecasting considering spatiotemporal dependences.

J. Trafc Transp. Eng. 8 (1), 83–94.

Masoud, M., Elhenawy, M., Almannaa, M.H., Liu, S.Q., Glaser, S., Rakotonirainy, A.,

2019. Heuristic approaches to solve e-scooter assignment problem. IEEE Access 7,

175093–175105.

McKenzie, G., 2019. Spatiotemporal comparative analysis of scooter-share and bike-

share usage patterns in Washington, D.C. J. Transp. Geogr. 78, 19–28.

McKenzie, G., 2020. Urban mobility in the sharing economy: A spatiotemporal

comparison of shared mobility services. Comput. Environ. Urban Syst. 79, 101418.

Moreau, H., de Jamblinne de Meux, L., Zeller, V., D’Ans, P., Ruwet, C., Achten, W.M.,

2020. Dockless E-Scooter: A Green Solution for Mobility? Comparative Case Study

between Dockless E-Scooters, Displaced Transport, and Personal E-Scooters.

Sustainability 12 (5), 1803.

O’Mahony, E.D., 2015. Smarter tools for (Citi) bike sharing. Cornell University. Ph.D.

dissertation.

O’Malley, T., Bursztein, E., Long, J., Chollet, F., Jin, H., Invernizzi, L., 2019. others. Keras

documentation, Keras Tuner. Retrieved from https://github.com/keras-team/keras-t

uner.

Rusyana, A., Nurhasanah, Marzuki, Flancia, M., 2016. SARIMA model for forecasting

foreign tourists at the Kualanamu International Airport. In: In: Proceedings of the

12th International Conference on Mathematics, Statistics, and Their Applications

(ICMSA). Oct. 2016.

Saum, N., Piantanakulchai, M., 2019. A Review on an Emerging New Mode of Transport:

The Shared Dockless Electric Scooter. In: In: Proceedings of the Eastern Asia Society

for Transportation Studies. Srilanka. Sept. 2019.

Saum, N., Sugiura, S., Piantanakulchai, M., 2020. Short-Term Demand and Volatility

Prediction of Shared Micro-Mobility: a case study of e-scooter in Thammasat

University. In: Presented at the 2020 Forum on Integrated and Sustainable

Transportation Systems (FISTS). Nov. 2020.

Seo, Y.-H., 2020. A Dynamic Rebalancing Strategy in Public Bicycle Sharing Systems

Based on Real Time Dynamic Programming and Reinforcement Learning. Seoul

National University, South Korea. Ph.D. dissertation.

Severengiz, S., Finke, S., Schelte, N., Forrister, H., 2020. Assessing the Environmental

Impact of Novel Mobility Services using Shared Electric Scooters as an Example.

Procedia Manuf. 43, 80–87.

Shui, C.S., Szeto, W.Y., 2020. A review of bicycle-sharing service planning problems.

Transp. Res. C, Emerg. Technol. 117, 102648.

Sigauke, C., Chikobvu, D., 2011. Prediction of daily peak electricity demand in South

Africa using volatility forecasting models. Energy Econ. 33 (5), 882–888.

Smith, C.S., Schwieterman, J.P., 2018. E-scooter scenarios: evaluating the potential

mobility benets of shared dockless scooters in Chicago. Depaul University,

Chaddick Institute for Metropolitan Development.

StataCorp., 2013. Stata Time-Series Reference Manual. Stata Press College Station, Texas.

Ti, A., Du, Z., Zhang, W., 2019. Analysis on the Volatility of Sustainable Stock Index and

Traditional Stock Index Based on GARCH Model. In: Presented at the 2019

International Conference on Economic Management and Model Engineering

(ICEMME). Dec. 2019.

Tolomei, L., Fiorini, S., Ciociola, A., Vassio, L., Giordano, D., Mellia, M., 2021. Benets of

Relocation on E-scooter Sharing - a Data-Informed Approach. In: Presented at the

2021 IEEE International Intelligent Transportation Systems Conference (ITSC). Sept.

2021.

Trapero, J.R., Card´

os, M., Kourentzes, N., 2019. Empirical safety stock estimation based

on kernel and GARCH models. Omega 84, 199–211.

Wang, T., Hu, S., Jiang, Y., 2021. Predicting shared-car use and examining nonlinear

effects using gradient boosting regression trees. Int. J. Sustain. Transp. 15 (12),

893–907.

Wang, B., Kim, I., 2018. Short-term prediction for bike-sharing service using machine

learning. Transp. Res. Procedia 34, 171–178.

Wu, Y., 2011. The Simulation Study of Shanghai and Shenzhen 300 Index By Garch

Models. In: In: Proceeding of the 2011 International Conference on Information

Management, Innovation Management and Industrial Engineering, pp. 30–33.

Xu, C., Ji, J., Liu, P., 2018. The station-free sharing bike demand forecasting with a deep

learning approach and large-scale datasets. Transp. Res. C, Emerg. Technol. 95,

47–60.

Xu, M., Liu, H., Yang, H., 2020. A Deep Learning Based Multi-Block Hybrid Model for

Bike-Sharing Supply-Demand Prediction. IEEE Access 8, 85826–85838.

Yang, Y., Gao, P., Sun, Z., Wang, H., Lu, M., Liu, Y., Hu, J., 2023. Multistep ahead

prediction of temperature and humidity in solar greenhouse based on FAM-LSTM

model. Comput. Electron Agric. 213, 108261.

Yeo, I.K., Johnson, R.A., 2000. A new family of power transformations to improve

normality or symmetry. Biometrika 87 (4), 954–959.

Yu, Y., Si, X., Hu, C., Zhang, J., 2019. A review of recurrent neural networks: LSTM cells

and network architectures. Neural Comput. 31 (7), 1235–1270.

Zhang, G., Ali, S., Wang, X., Wang, G., Pan, Z., Zhang, J., 2019. SPI-based drought

simulation and prediction using ARMA-GARCH model. Appl. Math. Comput. 355,

96–107.

Zhang, C., Zhu, F., Wang, X., Sun, L., Tang, H., Lv, Y., 2020. Taxi Demand Prediction

Using Parallel Multi-Task Learning Model. IEEE Trans. Intell. Transp. Syst. 23 (2),

794–803.

Zhu, R., Zhang, X., Kondor, D., Santi, P., Ratti, C., 2020. Understanding spatio-temporal

heterogeneity of bike-sharing and scooter-sharing mobility. Comput. Environ. Urban

Syst. 81, 101483.

N. Saum et al.

ResearchGate has not been able to resolve any citations for this publication.

Electric Kickboard Demand Prediction in Spatiotemporal Dimension Using Clustering-Aided Bagging Regressor

Article

Full-text available

Aug 2022
J ADV TRANSPORT

Demand for electric kickboards is increasing specifically in tourist-centric regions worldwide. In order to gain a competitive edge and to provide quality service to customers, it is essential to properly deploy rental electric kickboards (e-kickboards) at the time and place customers want. However, it is necessary to study how to divide the region to predict electric mobility demand by region. Therefore, this study is made to more accurately predict future demand based on past regional customers’ electric mobility demand data. We have proposed a novel electric kickboard demand prediction in spatiotemporal dimension using clustering-aided bagging regressor. We have used electric kickboard usage data from a Jeju, South Korea-based company. As a result of the experiment, it was found that the accuracy before using clustering-based bagging regressor and when the region was divided by the clustering method, the performance was improved, and we have achieved a regression score R 2 of 93.42 using our proposed approach. We have compared our proposed approach with other state-of-the-art models, and we have also compared our model with different other combinations of bagging regressors. This study can be helpful for companies to meet the user’s demand for a better quality of service.

Short-Term Forecast of Bicycle Usage in Bike Sharing Systems: A Spatial-Temporal Memory Network

Article

Full-text available

Jul 2021

Bike-sharing systems have made notable contributions to cities by providing green and sustainable mobility service to users. Over the years, many studies have been conducted to understand or anticipate the usage of these systems, with the hope to inform their future developments. One important task is to accurately predict usage patterns of the systems. Although many deep learning algorithms have been developed in recent years to support travel demand forecast, they have mainly been used to predict traffic volume or speed on roadways. Few studies have applied them to bike-sharing systems. Moreover, these studies usually focus on one single dataset or study area. The effectiveness and robustness of the prediction algorithms are not systematically evaluated. In this study, we propose a Spatial-Temporal Memory Network (STMN) to predict short-term usage of bicycles in bike-sharing systems. The framework employs Convolutional Long Short-Term Memory models and a feature engineering technique to capture the spatial-temporal dependencies in historical data for the prediction task. Four testing sites are used to evaluate the model. These four sites include two station-based systems (Chicago and New York) and two dockless bike-sharing systems (Singapore and New Taipei City). By assessing STMN with several baseline models, we find that STMN achieves the best overall performance in all the four cities. The model also achieves superior performance in urban areas with varying levels of bicycle usage and during peak periods when demand is high. The findings suggest the reliability of STMN in predicting bicycle usage for different types of bike-sharing systems.

Spatiotemporal Demand Prediction Model for E-Scooter Sharing Services with Latent Feature and Deep Learning

Article

Full-text available

Apr 2021

The electric scooter (e-scooter) sharing service has attracted significant attention because of its extensive usage and eco-friendliness. Since e-scooters are mostly accessed by foot, the presence of e-scooters within walking distance has a crucial effect on the service quality. Therefore, to maintain appropriate service quality, relocation strategies are often used to properly distribute e-scooters within service areas. There are extensive literatures on demand forecasting for an efficient relocation. However, the study of the relocation of small-scale spatial units within walking distance level is still inadequate because of the sparsity of demand data. This research aims to establish an effective methodology for predicting the demand for e-scooters in high spatial resolution. A new grid-based spatial setting was created with the usage data. The model in the methodology predicts not only the identified demand but also the unmet demand to increase practicality. A convolutional autoencoder is used to obtain the latent feature that can reduce the problem of representing sparse data. An encoder–recurrent neural network–decoder (ERD) framework with a convolutional autoencoder resulted in a huge improvement in predicting spatiotemporal events. This new ERD framework shows enhanced prediction performance, reducing the mean squared error loss to 0.00036 from 0.00679 compared with the baseline long short-term memory model. This methodological strategy has its significance in that it can solve any prediction issue with spatiotemporal data, even those with sparse data problems.

Predicting shared-car use and examining nonlinear effects using gradient boosting regression trees

Article

Full-text available

Oct 2020

Flexible drop-off and pick-up (one-way) carsharing programs provide users with high levels of convenience but meanwhile incurs spatiotemporal imbalances in shared-cars distribution. Predicting shared-car use helps recognize system imbalances beforehand while identifying determinants related to shared-car use helps operators efficiently implement relocation strategies. In this study, a gradient boosting regression model (GBRT) is employed to predict shared-car use at a station level, and partial dependence plots (PDPs) are employed to examine nonlinear relationships between shared-car use and various predictors. Results show: (1) GBRTs predict shared-car use with a high level of accuracy (MSE: 1.1069–1.1648). (2) PDPs present highly consistent results with relationships derived from the traditional statistical model; (3) Time-varying variables account for 89.30%–86.84% importance in shared-cars use prediction, suggesting these variables can greatly enhance prediction accuracy; (4) Other variables like built environment, station attributes, and socioeconomic features, also account for some importance and can enhance prediction accuracy. Findings help carsharing operators accurately predict the station-level shared-car use and optimally identify the best locations for stations, and thus maintain the operational efficiency of carsharing programs.

Taxi Demand Prediction Using Parallel Multi-Task Learning Model

Article

Full-text available

Aug 2020

Accurate and real-time taxi demand prediction can help managers pre-allocate taxi resources in cities, which assists drivers quickly finding passengers and reduce passengers' waiting time. Most of the existing studies focus on mining spatial-temporal characteristics of taxi demand distributions, while lacking in modeling the correlations between taxi pick-up demand and the drop-off demand from the perspective of multi-task learning. In this article, we propose a multi-task learning model containing three parallel LSTM layers to co-predict taxi pick-up and drop-off demands, and compare the performance of single demand prediction methodology and that of two demands' co-prediction methodology. Experimental results on real-world datasets demonstrate that the pick-up demand and the drop-off demand do depend on each other, and the effectiveness of the proposed co-prediction methods.

Multistep ahead prediction of temperature and humidity in solar greenhouse based on FAM-LSTM model

Article

Oct 2023
COMPUT ELECTRON AGR

Examining nonlinearity in population inflow estimation using big data: An empirical comparison of explainable machine learning models

Article

Jun 2023
TRANSPORT RES A-POL

Mobile device location data (MDLD) contain population-representative, fine-grained travel demand information, facilitating opportunities to validate established relations between travel demand and underlying factors from a big data perspective. Using the nationwide census block group (CBG)-level population inflow derived from MDLD as the proxy of travel demand, this study examines its relations with various factors including socioeconomics, demographics, land use, and CBG attributes. A host of tree-based machine learning (ML) models and interpretation techniques (feature importance, partial dependence plot (PDP), accumulated local effect (ALE), SHapley Additive exPlanations (SHAP)) are extensively compared to determine the best model architecture and justify interpretation robustness. Empirical results show that: 1) Boosting trees perform the best among all models, followed by bagging trees, single trees, and linear regressions. (2) Feature importance holds consistently among different tree-based models but is influenced by measures of importance and hyperparameter settings. 3) Pronounced nonlinearities, threshold effects, and interaction effects are observed in relations among population inflow and most of its determinants. 4) Compared with PDP, ALE and SHAP plots are more reliable in the presence of outliers, feature dependency, and local heterogeneity. Taken together, techniques introduced in this study can either be integrated into customary travel demand models to enhance model accuracy or serve as interpretation tools that offer a comprehensive understanding of intricate relations.

High-Dimensional Population Flow Time Series Forecasting Via an Interpretable Hierarchical Transformer

Article

Jan 2023
TRANSPORT RES C-EMER

Mobile device location data (MDLD) are emerging data sources in the transportation domain that contain large-scale, fine-grained information on population inflow. However, limited studies have built forecasting models based on large-scale MDLD-based population inflow time series. This task is challenging due to complex nonlinear temporal dynamics, high-dimensional time series structure (i.e. multiple time series with multi-shape inputs and outputs), and non-negligible impacts from various external factors. To address these challenges, this study introduces a deep learning framework, the Interpretable Hierarchical Transformer (IHTF), for nationwide countylevel population inflow time series forecasting and interpretation. A variety of cutting-edge deep learning techniques are fused, including the variable selection network to incorporate external effects, the gated residual network to handle nonlinearity, and the transformer architecture to learn temporal dynamics. Different interior parameters, such as variable selection weight and temporal attention weight, are extracted to explain patterns learned by the framework. Numerical experiments show that IHTF outperforms extensive baseline models in forecasting accuracy. In addition, feature importance generated by IHTF is similar to the tree-based model, LightGBM, but exhibits a more even distribution, among which point-of-interests (POIs) count, county location, median household income, and percentage of accommodation and food services are the most important static variables. Moreover, attention weight demonstrates that IHTF can automatically learn the seasonality from time series. Taken together, this framework can serve as a reliable travel demand forecasting component in the transportation planning process that allows modeling the travel demand continuously instead of by snapshot.

Benefits of Relocation on E-scooter Sharing - a Data-Informed Approach

Conference Paper