Conference PaperPDF Available

Rebalancing shared mobility-on-demand systems: A reinforcement learning approach

Authors:
Rebalancing Shared Mobility-on-Demand Systems:
a Reinforcement Learning Approach
Jian Wen, Jinhua Zhaoand Patrick Jaillet
Department of Civil and Environmental Engineering
Department of Urban Studies and Planning
Department of Electrical Engineering and Computer Science
Laboratory for Information and Decision Systems, Operations Research Center
Massachusetts Institute of Technology, Cambridge, MA 02139-4307
Emails: wenj@mit.edu, jinhua@mit.edu, jaillet@mit.edu
Abstract—Shared mobility-on-demand systems have very
promising prospects in making urban transportation efficient and
affordable. However, due to operational challenges among others,
many mobility applications still remain niche products. This
paper addresses rebalancing needs that are critical for effective
fleet management in order to offset the inevitable imbalance of
vehicle supply and travel demand. Specifically, we propose a
reinforcement learning approach which adopts a deep Q network
and adaptively moves idle vehicles to regain balance. This
innovative model-free approach takes a very different perspective
from the state-of-the-art network-based methods and is able to
cope with large-scale shared systems in real time with partial
or full data availability. We apply this approach to an agent-
based simulator and test it on a London case study. Results show
that, the proposed method outperforms the local anticipatory
method by reducing the fleet size by 14% while inducing little
extra vehicle distance traveled. The performance is close to the
optimal solution yet the computational speed is 2.5 times faster.
Collectively, the paper concludes that the proposed rebalancing
approach is effective under various demand scenarios and will
benefit both travelers and operators if implemented in a shared
mobility-on-demand system.
Index Terms—rebalancing, mobility-on-demand, reinforcement
learning, deep Q network, ride-sharing, agent-based simulation.
I. INTRODUCTION
Despite many of the debates with regards to regulation
and societal impact, the shared mobility-on-demand (shared
MoD) system is believed to be one of the most promising
approaches to reform urban transportation. On the one hand,
the on-demand mobility is able to connect more people with
timely and flexible service. On the other hand, sharing, taking
a variety of forms including car-sharing and ride-sharing, helps
reduce travel costs and enables more affordable trips.
Researchers have been seeking solutions to optimize the
service design and improve its operational performance. The
efforts have led to advanced assignment algorithms [1], [2],
dynamic prediction tools [3], [4], system design evaluation
[5]–[7], dynamic pricing [8], [9] and the pursuit of full auton-
omy. Rebalancing (or interchangeably repositioning, relocat-
ing), which consists of distributing idle vehicles to regain the
demand-supply balance, also appears among the list recently
[10]–[15].
Stemming from the spatial and temporal mismatch between
demand (trips, also referred to as travelers, requests) and
supply (vehicles), the rebalancing problem can often be seen in
transportation systems like car rental [16], [17] and public bike
sharing [18], [19]. Rebalancing MoD systems is an emerging
topic and distinguishes itself from the aboves for the following
reasons: (1) allowing one-way trips adds on asymmetry to
the demand-supply interaction and necessitates rebalancing;
(2) on-demand travelers are often impatient with long wait
and systems without timely rebalancing will suffer massive
customer loss; and (3) MoD systems provide door-to-door
service, making the decision space continuous. If rides are
shared, measuring the supply availability becomes even more
critical, since a partially-occupied vehicle is still (condition-
ally) available to new requests as long as the dispatching
constraints are satisfied.
Existing works adopting network-based optimization ap-
proaches are usually computationally demanding. In addition,
as far as the authors are aware, none of them has been aiming
for the door-to-door systems and ride-sharing has also been
omitted for the sake of simplicity. However, door-to-door
and ride-sharing are indeed two key elements that ensure the
connectivity and affordability of the MoD service. For this
matter, this paper will incorporate both of the missing elements
into the shared MoD system and propose a reinforcement
learning approach that applies deep Q network (DQN) for
fast and effective solution. The contribution of the paper is
therefore twofold: (1) it makes a transition from the station-
based discrete space to continuous space to enable door-to-
door service and add ride-sharing features to the model; (2)
the proposed DQN-based approach for rebalancing shared
MoD systems is model-free, real-time, demand predictive,
computationally scalable and has good performance in terms
of both level of service and operational cost.
The rest of the paper is organized as follows. Section II
will review some of the most relevant works in the literature
and identify the research areas to which we can contribute.
Section III will develop the reinforcement learning approach
and we also present two benchmark policies for comparison:
the optimal rebalancing problem (ORP) and the simple antic-
ipatory rebalancing (SAR). In Section IV, we will simulate
and compare the effectiveness of the methods under different
demand scenarios and map sizes. A case study in London will
then demonstrate the algorithmic performance in the realistic
urban setting. Finally, Section V will draw the conclusion and
point out the directions to the future works.
II. LITERATURE REVIEW
Within the realm of urban transportation, the existing re-
search works on rebalancing have been largely focused on car
rental systems [16], [17] and public bike sharing systems [18],
[19]. Rebalancing MoD systems, on the contrary, is a relatively
new topic. Early MoD works often draw on the experience of
the car rental and bike sharing counterparts and adopt network-
based optimization approaches.
[10] is among the first. Based on the fluid model, this
paper proposes an optimal rebalancing model and simulates it
with a 12-station autonomous MoD (AMoD) system. In this
system, every station reaches an equilibrium so that there are
excess vehicles and no waiting customers. However, under the
influence of its car rental predecessors, the proposed method is
only limited to simplified station-based networks. In addition,
it is only focused on the ideal equilibrium and does not touch
upon the stochastic fluctuations of demand-supply interplay.
In continuation to this work, [11] transforms the model into
an analytical guideline for fleet sizing in conceptual AMoD
systems and validates it in a Singapore case study. This
strategical work still remains at the static level and provides
little insights to real-time operation.
[12] extends the idea of the fluid model and presents a
queueing-theoretical approach within the framework of Jack-
son networks. Many efforts in this paper have been made to
prove that, as a closed Jackson network, the system is most
efficient when inward and outward vehicle flows (including
rebalancing flows) are equal at each station. The solution
to an offline optimal rebalancing problem is given, and the
paper argues that, if taking only current information at a
specific time point, the problem could be adopted to online
applications. A case study in New York City with around
8000 non-shared vehicles demonstrates the effectiveness of the
method. Unfortunately, trips in the simulation still have to be
clustered to fit in the station-based model.
[13] tests both offline and online policies with an agent-
based simulation platform using Singapore travel data as input.
The results show that about 28% and 23% less vehicles are
required to guarantee the same service rate when offline and
online rebalancing are in use respectively. Moreover, online
policy outperforms the offline one by reducing the average
wait time from 11 minutes to 9. Using a similar approach,
[14] tackles the rebalancing issues from the perspective of the
fleet operators. It quantifies the operational cost as a function
of fleet size, customer walk aways and the utilization rate and
reveals that rebalancing can reduce the cost significantly.
The problem formulation in this paper extends the online
model in [12] in order to make it compatible with both
door-to-door service and ride-sharing. It also introduces a
probabilistic objective function to describe the stochasticity
in request arrivals. In the next section, we will first give
the formulation to the generic shared MoD system. Multiple
solutions are provided thereafter, including the proposed DQN
approach and ORP/SAR as two benchmarks.
III. METHODOLOGY
Consider a shared MoD system covering a predefined ser-
vice area. For operational purposes, the area is divided into
Szones S={1,2, . . . , S}which are mutually exclusive
and collectively exhaustive. The set of fleet Vconsists of V
shareable vehicles of capacity K. Travelers with origins and
destinations in Ssend requests and queue up. Travelers are
then assigned to vehicles dynamically by the central dispatcher
on a first-come-first-serve basis and no traveler will “walk
away” despite the possible long wait.
A vehicle having been assigned to travelers is “in service”.
Otherwise, it is “idle” and available to be rebalanced. We
assume that the online rebalancing algorithm is run every T
time units and at time t=T, one run is triggered. The period
of study is therefore [T, T + ∆T]. We also assume that, over
the period, the number of incoming requests Aiin zone i
follows the Poisson process with predicted arrival intensity
λi, that is to say, AiPoisson(λiT). Knowing both
the demand distribution and the vehicle status, the objective
of rebalancing is to maximize the service availability while
limiting the rebalancing cost. Service availability is evaluated
by the average wait time of requests emerging within the
period, in which “wait time” is defined as the time difference
between a traveler sends out the request and s/he is picked up
by a vehicle; operational cost is represented by total vehicle
rebalancing distance traveled, that is, the total distance covered
by all vehicles in the fleet due to rebalancing.
A. Optimal Rebalancing Problem
The decision variables in the rebalancing problem are ri,j
for i, j ∈ S.ri,j represents the number of rebalancing vehicles
sent from zone ito zone jat t=T, i.e., the beginning of
the period. Specially, if i=j, it represents the number of idle
vehicles in zone ithat remain unmoved during the period. The
decision variables satisfy Pj∈S ri,j =ri, in which riis the
number of idle vehicles available to be rebalanced in zone i
when t=T.
The cost of rebalancing one vehicle from ito jis noted as
ci,j . In this paper, ci,j is defined as below:
ci,j =cdi,j if jis accessible from iwithin T(1a)
¯cotherwise (1b)
In equation 1a, di,j is the distance from ito jand the cost is
proportional to the distance with a multiplier c; in equation 1b,
the cost is set to be a large constant ¯cwhen jis too far away
from i. This is to discourage long rebalancing and guarantee
that all rebalanced vehicles can arrive at their destinations
before the period ends.
We assume that both in-service and idle vehicles follow
the shortest routes when picking up/dropping off travelers and
rebalancing. We also assume that the planned routes are not
influenced by the incoming requests over the period and travel
time is deterministic. Consequently, the level of supply at
t=T+ ∆T, i.e., the end of the period, could be measured
by the availability of both idle and in-service vehicles at that
time. Note, a vehicle is classified as “idle” or “in-service” only
according to its status before rebalancing starts.
The availability of idle vehicles at t=T+∆Tis measured
by r0
j, the number of idle vehicles that zone jwill receive at the
end. r0
j
=Pi∈S ri,j . An in-service vehicle may also become
available (if all on-board passengers have been dropped off)
or partially available (if it still has passenger(s) on board but
admits ride-sharing) at the end. Similarly, we define s0
jas the
availability of in-service vehicles in zone jat t=T+ ∆T.
s0
jcould be expressed as:
s0
j=X
v∈V
I0
j(v)w(l0
v),j∈ S (2)
I0
j(v)is the indicator function that equals 1if vis in zone
jwhen t=T+ ∆Tand 0otherwise. w(l0
v)is the load-
availability factor, defined as:
w(l0
v) = ˆp(l0
v)
ˆp(0) ,for 0 l0
vK(3)
l0
vis the load of vwhen t=T+ ∆T. In a shared MoD
system with fixed settings, ˆp(l0
v)is the empirical probability
that a vehicle of load l0
vwill be assigned to the new request
based on simulation results. Similarly, ˆp(0) is the empirical
probability that an empty vehicle will be assigned to the new
request. For a vehicle of capacity 4, we have w(0) = 1.0and
w(·) = 0.4,0.2,0.1,0.0when load is 1,2,3,4respectively.
The estimated total supply v0
jwhen t=T+ ∆Tis therefore:
v0
j=br0
j+s0
jc,j∈ S (4)
The objective of maximizing the areawide service availabil-
ity is translated to maximizing the total expected number of
requests bthat can be served by vehicles from the same zone at
the end of the period. b=Pj∈S bj(v0
j)and bj(v0
j)represents
the expected number of served requests in zone jduring T,
knowing that v0
jvehicles are in supply:
bj(v0
j) = X
k=0
min(k, v0
j)P(Aj=k)(5)
Based on the definitions and assumptions, the optimal
rebalancing problem (ORP) is as follows:
max
ri,j Pj∈S bj(v0
j)Pi,j∈S ci,j ri,j (6)
s.t.Pj∈S ri,j =ri,i∈ S
ri,j N,i, j ∈ S
The equation 6 is a Mixed Integer Nonlinear Programming
(MINLP) problem. When problem size is large, solving
MINLP is extremely computationally burdensome and in this
paper, we use a combination of incremental-optimal and
branch-and-bound methods to reach the optimum.
ORP serves as the “best practice” in benchmarking.
B. Simple Anticipatory Rebalancing
Despite the good quality in solutions, the wide use of
optimal rebalancing policies is still constrained by the limit
of computational capacity. Large-scale applications and simu-
lations often opt for locally executable algorithms which de-
centralize the decision-making and solve the problem heuris-
tically. By bounding the problem to a small area, it not
only reduces the complexity in real-time data management
and vehicle operation, but also naturally caps the induced
rebalancing distance without explicitly describing the cost.
One representative example is [15], which gives an intuitive
method for local rebalancing: if a zone’s supply exceeds its
expected demand or vice versa, system pushes or pulls idle
vehicles to or from adjacent zones. The paper then justifies
this empirical method through simulation.
We extend this method and develop here a simple antici-
patory rebalancing (SAR) approach: a vehicle is informed of
the demand distribution in its neighboring area; the probability
that it moves to a zone is proportional to the number of pre-
dicted requests in that zone. Under this policy, a vehicle could
rebalance itself with its local knowledge and avoid causing
increased workload on the central dispatcher. Decisions can
be made in parallel.
SAR serves as the “average line” in benchmarking.
C. Deep Q Network
Solving the rebalancing problem requires modeling the
shared MoD system deliberately. However, delicate models
are barely solvable and transferable from system to system,
which unfortunately discourages its use in practice.
Reinforcement learning provides a very different approach
to tackling the rebalancing problem since it’s model-free and
requires little adjustment of the generic architecture. Recent
advances in deep Q networks [20], [21] have also made it
possible to handle the delay between actions and rewards as
well as the sequences of highly correlated states. Research
works have demonstrated that DQN has the ability to master
difficult control policies including traffic controls and taxi
dispatching [22], [23].
In this paper, the neural network is trained with a variant of
the Q-learning algorithm [20]. The neighboring area of a spe-
cific vehicle makes the reinforcement learning environment.
The distribution of idle vehicles, in-service vehicles together
with the predicted demand around it (i.e. rj, s0
j, λjfor all jin
its neighborhood) build up the state. Based on an action-value
function, the DQN agent returns the policy by simply selecting
the action with the highest value from the set of {noop, ne, e,
se, s, sw, w, nw, n}. “noop” indicates no rebalancing operation;
“ne” indicates rebalancing to the northeast adjacent zone and
so on and so forth. The vehicle then executes the action and
returns the reward.
The rewards is evaluated under the following rules: (1) if
the vehicle is assigned to traveler(s) during the rebalancing
period, we compare the environment to the one without
rebalancing and calculate the save in wait time as reward;
the episode terminates and the system moves on until the
(a) discretization (b) average reward
Fig. 1. (1a) Illustration of the discretization: points represent vehicles; lines
(dashed line) originating from points represent planned routes (rebalancing
routes); dotted grids represent the discretized cells for both ORP (10×10)
and SAR/DQN (5×5 with red vehicle as center). (1b) Average reward per
episode during one typical training.
vehicle becomes idle again; (2) if the vehicle remains idle
during the rebalancing period, we use a penalty as reward
to discourage this rebalancing action; the episode continues.
The rewarded episodes are stored into a replay memory and
we update action-value function using samples drawn from
the pool. This action-value function is then used to guide the
rebalancing actions of all idle vehicles in the online algorithm.
The architecture and parameterization of DQN are described
in the next section, where we’ll compare the effectiveness of
DQN with ORP and SAR and cast them into a case study.
IV. SIMULATION AND CASE STUDY
The simulator in this section is built upon an agent-based
modeling platform and details can be found in [7]. It evaluates
the performance of the shared MoD system with the aid of a
series of indicators, among which the interesting ones are:
(1) wait time, representing the service availability from the
traveler’s perspective; (2) total/vehicle rebalancing distance
traveled, indicating the rebalancing cost from the operator’s
perspective; and (3) computational time, implying it feasibility
for large-scale real-time applications.
A. Benchmarking
For the sake of training efficiency, we begin the test with an
abstract 5km×5km map with no road networks. Requests are
drawn from a list of OD pairs with arrival rate of 100 trips/h.
20 vehicles of capacity 4 forms the fleet, moving straight to
travelers based on Euclidean distance with a constant speed of
21.6 km/h. Despite its simplification in describing the traffic,
this presentation is sufficient to evaluate the effectiveness of
the algorithms.
As shown in figure 1a, ORP discretizes the entire map
into 10×10 fixed cells of 0.5km×0.5km. As for the local
algorithms, each vehicle in both SAR and DQN has knowledge
of a neighboring area of 2.5km×2.5km, which is centered at
the location of the vehicle and discretized into 5×5 moving
cells of the same size. For each of the rebalancing methods,
simulation runs 50 times with a simulation time of 3 hours.
Rebalancing is performed every 150 seconds. The DQN is
trained with a three-layer neural network beforehand for 6000
steps, using -greedy behavior policy and a replay memory of
Fig. 2. Comparison of rebalancing methods: the balanced scenario.
2000 most recent steps in batches of size 32. The learning rate
is set to be 0.001 with no decay for the Adam optimizer and
the penalty for empty rebalancing is -5. Figure 1b shows how
the average reward evolves during one typical training. The
best DQN in terms of shortest average wait time is used in
the following analysis.
We distinguish three demand scenarios: (1) the balanced
scenario, in which the trip ODs are uniformly distributed on
the map at random; (2) the imbalanced scenario, in which
the trip origins are concentrated to two production areas and
destinations to two attraction areas; and (3) the first-mile
scenario, in which trip origins are uniformly distributed while
the destinations are fixed to one point (e.g. an access station).
The level of imbalance increases from scenarios 1 to 3 as the
distributions of origins (where requests are sent, i.e., demand)
and destinations (where vehicles become idle, i.e., supply)
mismatch each other to a greater and greater extent. The
necessity of rebalancing is therefore expected to increase.
1) Performance Comparison: Figure 2 compares the per-
formance of the rebalancing methods presented in section
III under the balanced scenario. When ORP is applied, the
average traveler wait time according to 50 three-hour runs
is 146.2 seconds. This is a great leap from 170.6 seconds
(+16.7%) in the case with no rebalancing policy as shown
in the rightmost box. With the same system settings, SAR
scores 155.6 (+6.4%) and DQN scores 150.1 (+2.7%). It
indicates that DQN has superiority over its local counterpart
SAR with regard to high service accessibility, yet it falls
behind ORP. The suboptimality of DQN could be explained
by its design of individual rewarding. Without coordination
with other vehicles in the training the agent (vehicle) tends
to overreact to the imbalance. It is also noticed that both
ORP and DQN produce smaller wait time variance. SAR, in
contrast, performs in a rather random manner and the results
are more dispersed. When it comes to the vehicle distance
traveled, the simulation points out that all rebalancing methods
would induce average vehicle distance traveled by around
30% to 35%. The rebalancing distance could be controlled
by adjusting ci,j in ORP and the size of the neighboring area
in SAR and DQN.
2) Various Demand Patterns: Table I shows how rebal-
ancing responds to different demand patterns. As the level
TABLE I
COMPARISON OF AVER AGE WAIT TIM ES ACR OSS SC ENA RI OS
ORP SAR DQN No Rebl
Scenario 1 146.2 155.6 150.1 170.6
(balanced) (+6.4%) (+2.7%) (+16.7%)
Scenario 2 138.5 172.5 155.8 211.4
(imbalanced) (+24.5%) (+12.5%) (+52.6%)
Scenario 3 129.2 161.9 151.6 232.8
(first-mile) (+25.3%) (+17.3%) (+80.2%)
*Wait times (in seconds) on top; changes compared to ORP (in
percentage) in the middle.
TABLE II
COMPARISON OF AVER AGE WAIT TIMES AND COM PUTATI ONA L TIMES
ACRO SS MA P SIZE S
ORP SAR DQN No Rebl
Map 1 146.2 155.6 150.1 170.6
(5km×5km) (+6.4%) (+2.7%) (+16.7%)
0.033 0.027 0.034
Map 2 147.9 164.3 154.4 176.3
(10km×10km) (+11.1%) (+4.4%) (+19.2%)
1.893 0.682 0.822
Map 3 147.2 182.4 158.2 240.7
(20km×20km) (+23.9%) (+7.5%) (+63.5%)
259.185 42.976 41.209
*Wait times (in seconds) on top; changes compared to ORP (in
percentage) in the middle; computational times (in seconds) on
bottom in italics.
Fig. 3. Fleet Sizing under Different Rebalancing Policies.
of imbalance increases from scenario 1 to scenario 3, the
wait time with no rebalancing soars up from 170.6 to 232.8.
If ORP is in action, the system performs surprisingly better
when productions/attractions are more unevenly distributed,
owing to the fact that agglomerated trip distribution reduces
the routing difficulties in ride-sharing. The performance of
DQN still resides in between ORP and SAR. However, lim-
ited to the local knowledge, SAR and DQN gradually loss
their competitiveness when the demand-supply imbalance is
significant only at the areawide level.
3) Scalability: We enlarge the map from 5km×5km to
10km×10km and 20km×20km and augment the demand
intensity proportionally. The fleet size also increases from
20 to 125 and 810 to maintain the same level of service
under ORP. As shown in Table II, the increasing wait time
from 170.6 to 240.7 shows a manifest necessity for vehicle
rebalancing when map is large. DQN stays very close to the
optimal solution, demonstrating its robust performance over
different map sizes.
The computational times are also shown in Table II. Each
value represents the average running time for one rebalancing
solution using a machine with 2.7 GHz Intel Core i5 and 8
GB memory. ORP is undoubtedly the most computationally
demanding one. The computational time increases drastically
as the map expands since its complexity is proportional to the
product of the fleet size and the number of cells in the area.
This might raise a challenge when the scale of application
grows. SAR and DQN, in contrast, perform much faster when
the area is large and could be further distributed and computed
in parallel. This structure evidently facilitates the application
to large-scale networks, especially when autonomous vehicles
are used.
B. Case Study: London Shared MoD
Now we test our rebalancing methods on a 150km2road
network in Orpington, London.
According to the travel data over the past 10 years, residents
within this area make over 40,000 trips during the morning
peak hours of a typical workday. The demand intensity for
shared MoD is estimated to be 1145 trips/h, which accounts
for around 12% of the motorized trips [7]. In this series
of simulations, we adopt this demand pool and assume the
request time window to be 8 minutes (traveler walks away if
wait time exceeds this value) and the maximum detour factor
to be 1.5 (ride-sharing is acceptable only if the actual travel
time is less than 1.5 times of the shortest travel time). We
further assume that the system does not reject travelers unless
the above constraints cannot be satisfied and the “walk away
rate” should not exceed 10% to ensure the service availability.
As shown in Figure 3, the average wait time decreases as
the fleet size grows. If the operator promises the average wait
time to be less than 3 minutes, about 230 vehicles should be
put in service when no policy is adopted to deal with the
imbalance of origins and destination (morning trips usually
feature “home to work” or “suburb to downtown” asymmetry).
If it chooses ORP, SAR or DQN to rebalance during operation,
the fleet size could be largely reduced to 96, 122 and 105. The
computational time for one rebalancing solution is on average
118.8, 55.1 and 42.4 seconds.
Since the vehicles are used much more efficiently under
DQN, the average vehicle distance traveled also increases from
around 14km per hour to around 29km, of which 23km are for
servicing travelers and 6km for rebalancing. However, because
of ride-sharing, the total distance traveled by the fleet does not
grow as much (3538km vs. 3220km), although rebalancing
induces around 30% more empty miles.
V. CONCLUSION AND FUTURE WORK
This paper develops a DQN-based reinforcement learning
approach for rebalancing the shared mobility-on-demand sys-
tem and proves its effectiveness through an agent-based sim-
ulation platform. Results show that DQN performs effectively
by reducing the wait time of travelers and limiting the distance
traveled by vehicles. The London case study also demonstrates
that the proposed approach is robust in a realistic setting.
Although its performance is still second to the network-based
optimal, the model-free DQN has revealed its potential in
this field originally dominated by operation research models,
particularly when scales are large and stochasticity hinders the
quality of the optimal formulation.
The very next step of this work is to customize the rein-
forcement learning architecture to better serve the shared MoD
systems. Several directions might be worthy to follow up:
1) moving from discrete action space to continuous one to
avoid discretizing rebalancing actions;
2) extending Q-Learning to multi-agent systems to correct
overreacting;
3) redefining the reward to represent different performance
metrics from the perspective of both operators and
travelers.
Modeling traffic and customer behaviors in a stochastic man-
ner might also be of interest to get closer to the real world.
ACKNOWLEDGMENT
The authors would like to thank Transport for London for
providing trip data and road network used in this study. We
also wanted to express our gratitude to Neema Nassir, Yuxin
Leo Chen, Han Qiu and many other members from MIT JTL
Mobility Lab and MIT Transit Lab for their comments and
suggestions during this study.
REFERENCES
[1] J. Alonso-Mora, S. Samaranayake, A. Wallar, E. Frazzoli, and D. Rus,
“On-demand high-capacity ride-sharing via dynamic trip-vehicle assign-
ment,” Proceedings of the National Academy of Sciences, p. 201611675,
2017.
[2] A. Prorok and V. Kumar, “Privacy-preserving vehicle assignment for
mobility-on-demand systems,” arXiv preprint arXiv:1703.04738, 2017.
[3] J. Chen, K. H. Low, Y. Yao, and P. Jaillet, “Gaussian process decentral-
ized data fusion and active sensing for spatiotemporal traffic modeling
and prediction in mobility-on-demand systems,” IEEE Transactions on
Automation Science and Engineering, vol. 12, no. 3, pp. 901–921, 2015.
[4] J. Miller, A. Hasfura, S.-Y. Liu, and J. P. How, “Dynamic arrival
rate estimation for campus mobility on demand network graphs,” in
Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International
Conference on. IEEE, 2016, pp. 2285–2292.
[5] Y. Shen, H. Zhang, and J. Zhao, “Embedding autonomous vehicle
sharing in public transit system: An example of last-mile problem,” Tech.
Rep., 2017.
[6] D. J. Fagnant and K. M. Kockelman, “Dynamic ride-sharing and fleet
sizing for a system of shared autonomous vehicles in austin, texas,
Transportation, pp. 1–16, 2016.
[7] J. Wen, Y. Chen, N. Nassir, and J. Zhao, “Designing and simulating
the integrated autonomous vehicle + public transport service in london,
2017, working paper.
[8] M. K. Chen and M. Sheldon, “Dynamic pricing in a labor market: Surge
pricing and flexible work on the uber platform.” in EC, 2016, p. 455.
[9] H. Qiu, R. Li, and J. Zhao, “Dynamic pricing in shared mobility on
demand service and itssocial impacts.” Elsevier, 2017.
[10] M. Pavone, S. L. Smith, E. Frazzoli, and D. Rus, “Robotic load balancing
for mobility-on-demand systems,” The International Journal of Robotics
Research, vol. 31, no. 7, pp. 839–854, 2012.
[11] K. Spieser, K. Treleaven, R. Zhang, E. Frazzoli, D. Morton, and
M. Pavone, “Toward a systematic approach to the design and evaluation
of automated mobility-on-demand systems: A case study in singapore,”
in Road Vehicle Automation. Springer, 2014, pp. 229–245.
[12] R. Zhang and M. Pavone, “Control of robotic mobility-on-demand
systems: a queueing-theoretical perspective,The International Journal
of Robotics Research, vol. 35, no. 1-3, pp. 186–203, 2016.
[13] K. A. Marczuk, H. S. Soh, C. M. Azevedo, D.-H. Lee, and E. Frazzoli,
“Simulation framework for rebalancing of autonomous mobility on
demand systems,” in MATEC Web of Conferences, vol. 81. EDP
Sciences, 2016, p. 01005.
[14] K. Spieser, S. Samaranayake, W. Gruel, and E. Frazolli, “Shared-vehicle
mobility-on-demand systems: a fleet operators guide to rebalancing
empty vehicles,” in Transportation Research Board 96th Annual Meet-
ing, 2016.
[15] D. J. Fagnant and K. M. Kockelman, “Dynamic ride-sharing and
optimal fleet sizing for a system of shared autonomous vehicles,” in
Transportation Research Board 94th Annual Meeting, no. 15-1962,
2015.
[16] B. Boyacı, K. G. Zografos, and N. Geroliminis, “An optimization frame-
work for the development of efficient one-way car-sharing systems,”
European Journal of Operational Research, vol. 240, no. 3, pp. 718–
733, 2015.
[17] G. Alfian, J. Rhee, M. F. Ijaz, M. Syafrudin, and N. L. Fitriyani,
“Performance analysis of a forecasting relocation model for one-way
carsharing,” Applied Sciences, vol. 7, no. 6, p. 598, 2017.
[18] M. Dell’Amico, E. Hadjicostantinou, M. Iori, and S. Novellani, “The
bike sharing rebalancing problem: Mathematical formulations and
benchmark instances,” Omega, vol. 45, pp. 7–19, 2014.
[19] S. Ghosh, P. Varakantham, Y. Adulyasak, and P. Jaillet, “Dynamic
repositioning to reduce lost demand in bike sharing systems,” Journal
of Artificial Intelligence Research, vol. 58, pp. 387–430, 2017.
[20] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier-
stra, and M. Riedmiller, “Playing atari with deep reinforcement learn-
ing,” arXiv preprint arXiv:1312.5602, 2013.
[21] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning
with double q-learning.” in AAAI, 2016, pp. 2094–2100.
[22] E. Walraven, M. T. Spaan, and B. Bakker, “Traffic flow optimization: A
reinforcement learning approach,” Engineering Applications of Artificial
Intelligence, vol. 52, pp. 203–212, 2016.
[23] M. Han, P. Senellart, S. Bressan, and H. Wu, “Routing an autonomous
taxi with reinforcement learning,” in Proceedings of the 25th ACM In-
ternational on Conference on Information and Knowledge Management.
ACM, 2016, pp. 2421–2424.
... A deep reinforcement learning approach for rebalancing was investigated in [17] to improve the income efficiency of drivers. For shared MoD systems, a model-free reinforcement learning scheme is proposed to offset the imbalance in [18], however, it has the risk of dispatching more vehicles than needed to an area. Moreover, reinforcement learning methods usually need huge amounts of data to yield good performances, and safety constraints are not considered in most of these works. ...
Preprint
Balancing passenger demand and vehicle availability is crucial for ensuring the sustainability and effectiveness of urban transportation systems. To address this challenge, we propose a novel hierarchical strategy for the efficient distribution of empty vehicles in urban areas. The proposed approach employs a data-enabled predictive control algorithm to develop a high-level controller, which guides the inter-regional allocation of idle vehicles. This algorithm utilizes historical data on passenger demand and vehicle supply in each region to construct a non-parametric representation of the system, enabling it to determine the optimal number of vehicles to be repositioned or retained in their current regions without modeling the system. At the low level, a coverage control-based controller is designed to provide inter-regional position guidance, determining the desired road intersection each vehicle should target. With the objective of optimizing area coverage, it aligns the vehicle distribution with the demand across different districts within a single region. The effectiveness of the proposed method is validated through simulation experiments on the real road network of Shenzhen, China. The integration of the two layers provides better performance compared to applying either layer in isolation, demonstrating its potential to reduce passenger waiting time and answer more requests, thus promoting the development of more efficient and sustainable transportation systems.
... However, such approaches suffer from extensive joint action search spaces. In contrast, decentralized approaches consider each vehicle as an independent agent (Al-Abbasi et al. 2019;Wen et al. 2017), but face the non-stationary exploration problems in multiagent RL tasks (Gao et al. 2022). ...
Article
Ride-hailing platforms have been facing the challenge of balancing demand and supply. Existing vehicle reposition techniques often treat drivers as homogeneous agents and relocate them deterministically, assuming compliance with the reposition. In this paper, we consider a more realistic and driver-centric scenario where drivers have unique cruising preferences and can decide whether to take the recommendation or not on their own. We propose i-Rebalance, a personalized vehicle reposition technique with deep reinforcement learning (DRL). i-Rebalance estimates drivers' decisions on accepting reposition recommendations through an on-field user study involving 99 real drivers. To optimize supply-demand balance and enhance preference satisfaction simultaneously, i-Rebalance has a sequential reposition strategy with dual DRL agents: Grid Agent to determine the reposition order of idle vehicles, and Vehicle Agent to provide personalized recommendations to each vehicle in the pre-defined order. This sequential learning strategy facilitates more effective policy training within a smaller action space compared to traditional joint-action methods. Evaluation of real-world trajectory data shows that i-Rebalance improves driver acceptance rate by 38.07% and total driver income by 9.97%.
... The algorithm minimizes the total distance driven by all buses while distributing users over all buses. The algorithm includes rebalancing [8,[26][27][28], i.e. sending back idling buses towards a central location to avoid that empty buses get stuck in regions of low demand (details in supplementary note 1.C.4). ...
Article
Full-text available
Ride sharing services combine trips of multiple users in the same vehicle and may provide more sustainable transport than private cars. As mobility demand varies during the day, the travel times experienced by passengers may substantially vary as well, making the service quality unreliable. We show through model simulations that such travel time fluctuations may be drastically reduced by stop pooling. Having users walk to meet at joint locations for pick-up or drop-off allows buses to travel more direct routes by avoiding frequent door-to-door detours, especially during high demand. We in particular propose adaptive stop pooling by adjusting the maximum walking distance to the temporally and spatially varying demand. The results highlight that adaptive stop pooling may substantially reduce travel time fluctuations while even improving the average travel time of ride sharing services, especially for high demand. Such quality improvements may in turn increase the acceptance and adoption of ride sharing services.
... The literature surrounding traffic congestion management in smart cities, with a focus on the integration of ML and DL, reveals a rich tapestry of research efforts aimed at addressing the multifaceted challenges posed by urbanization. Urban areas worldwide are grappling with the implications of rapid population growth and increased vehicular density, leading to a surge in studies exploring innovative approaches to alleviate traffic congestion [12][13][14][15][16][17][18]. Researchers have extensively examined the traditional paradigms of traffic management within smart cities, acknowledging the limitations and complexities inherent in conventional systems. ...
Article
Ride-hailing is a sustainable transportation paradigm where riders access door-to-door traveling services through a mobile phone application, which has attracted a colossal amount of usage. There are two major planning tasks in a ride-hailing system: 1) matching, i.e., assigning available vehicles to pick up the riders; and 2) repositioning, i.e., proactively relocating vehicles to certain locations to balance the supply and demand of ride-hailing services. Recently, many studies of ride-hailing planning that leverage machine learning techniques have emerged. In this article, we present a comprehensive overview on latest developments of machine learning-based ride-hailing planning. To offer a clear and structured review, we introduce a taxonomy into which we carefully fit the different categories of related works according to the types of their planning tasks and solution schemes, which include collective matching, distributed matching, collective repositioning, distributed repositioning, and joint matching and repositioning. We further shed light on many real-world data sets and simulators that are indispensable for empirical studies on machine learning-based ride-hailing planning strategies. At last, we propose several promising research directions for this rapidly growing research and practical field.
Article
Full-text available
Autonomous vehicles (AVs) represent potentially disruptive and innovative changes to public transportation (PT) systems. However, the exact interplay between AV and PT is understudied in existing research. This paper proposes a systematic approach to the design, simulation, and evaluation of integrated autonomous vehicle and public transportation (AV + PT) systems. Two features distinguish this research from the state of the art in the literature: the first is the transit-oriented AV operation with the purpose of supporting existing PT modes; the second is the explicit modeling of the interaction between demand and supply. We highlight the transit-orientation by identifying the synergistic opportunities between AV and PT, which makes AVs more acceptable to all the stakeholders and respects the social-purpose considerations such as maintaining service availability and ensuring equity. Specifically, AV is designed to serve first-mile connections to rail stations and provide efficient shared mobility in low-density suburban areas. The interaction between demand and supply is modeled using a set of system dynamics equations and solved as a fixed-point problem through an iterative simulation procedure. We develop an agent-based simulation platform of service and a discrete choice model of demand as two subproblems. Using a feedback loop between supply and demand, we capture the interaction between the decisions of the service operator and those of the travelers and model the choices of both parties. Considering uncertainties in demand prediction and stochasticity in simulation, we also evaluate the robustness of our fixed-point solution and demonstrate the convergence of the proposed method empirically. We test our approach in a major European city, simulating scenarios with various fleet sizes, vehicle capacities, fare schemes, and hailing strategies such as in-advance requests. Scenarios are evaluated from the perspectives of passengers, AV operators, PT operators, and urban mobility system. Results show the trade off between the level of service and the operational cost, providing insight for fleet sizing to reach the optimal balance. Our simulated experiments show that encouraging ride-sharing, allowing in-advance requests, and combining fare with transit help enable service integration and encourage sustainable travel. Both the transit-oriented AV operation and the demand-supply interaction are essential components for defining and assessing the roles of the AV technology in our future transportation systems, especially those with ample and robust transit networks.
Article
Full-text available
A carsharing service can be seen as a transport alternative between private and public transport that enables a group of people to share vehicles based at certain stations. The advanced carsharing service, one-way carsharing, enables customers to return the car to another station. However, one-way implementation generates an imbalanced distribution of cars in each station. Thus, this paper proposes forecasting relocation to solve car distribution imbalances for one-way carsharing services. A discrete event simulation model was developed to help evaluate the proposed model performance. A real case dataset was used to find the best simulation result. The results provide a clear insight into the impact of forecasting relocation on high system utilization and the reservation acceptance ratio compared to traditional relocation methods.
Article
Full-text available
Bike Sharing Systems (BSSs) are widely adopted in major cities of the world due to concerns associated with extensive private vehicle usage, namely, increased carbon emissions, traffic congestion and usage of non- renewable resources. In a BSS, base stations are strategically placed throughout a city and each station is stocked with a pre-determined number of bikes at the beginning of the day. Customers hire the bikes from one station and return them at another station. Due to unpredictable movements of customers hiring bikes, there is either congestion (more than required) or starvation (fewer than required) of bikes at base stations. Existing data has shown that congestion/starvation is a common phenomenon that leads to a large number of unsatisfied customers resulting in a significant loss in customer demand. In order to tackle this problem, we propose an optimisation formulation to reposition bikes using vehicles while also considering the routes for vehicles and future expected demand. Furthermore, we contribute two approaches that rely on decomposability in the problem (bike repositioning and vehicle routing) and aggregation of base stations to reduce the computation time significantly. Finally, we demonstrate the utility of our approach by comparing against two benchmark approaches on two real-world data sets of bike sharing systems. These approaches are evaluated using a simulation where the movements of customers are generated from real-world data sets.
Article
Full-text available
Significance Ride-sharing services can provide not only a very personalized mobility experience but also ensure efficiency and sustainability via large-scale ride pooling. Large-scale ride-sharing requires mathematical models and algorithms that can match large groups of riders to a fleet of shared vehicles in real time, a task not fully addressed by current solutions. We present a highly scalable anytime optimal algorithm and experimentally validate its performance using New York City taxi data and a shared vehicle fleet with passenger capacities of up to ten. Our results show that 2,000 vehicles (15% of the taxi fleet) of capacity 10 or 3,000 of capacity 4 can serve 98% of the demand within a mean waiting time of 2.8 min and mean trip delay of 3.5 min.
Article
Full-text available
We are observing a disruption in the urban transportation worldwide. The number of cities offering shared-use on-demand mobility services is increasing rapidly. They promise sustainable and affordable personal mobility without a burden of owning a vehicle. Despite growing popularity, on-demand services, such as carsharing, remain niche products due to small scale and rebalancing issues. We are proposing an extension to the traditional carsharing, which is Autonomous Mobility on Demand (AMOD). AMOD provides a one-way carsharing with self- driving electric vehicles. Autonomous vehicles can make the carsharing more attractive to customers as they (i) reduce the operating cost, which is incurred when a manually driven system is unbalanced, and (ii) release people from the burden of driving.
Article
Full-text available
Shared autonomous (fully-automated) vehicles (SAVs) represent an emerging transportation mode for driverless and on-demand transport. Early actors include Google and Europe’s CityMobil2, who seek pilot deployments in low-speed settings. This work investigates SAVs’ potential for U.S. urban areas via multiple applications across the Austin, Texas, network. This work describes advances to existing agent- and network-based SAV simulations by enabling dynamic ride-sharing (DRS, which pools multiple travelers with similar origins, destinations and departure times in the same vehicle), optimizing fleet sizing, and anticipating profitability for operators in settings with no speed limitations on the vehicles and at adoption levels below 10 % of all personal trip-making in the region. Results suggest that DRS reduces average service times (wait times plus in-vehicle travel times) and travel costs for SAV users, even after accounting for extra passenger pick-ups, drop-offs and non-direct routings. While the base-case scenario (serving 56,324 person-trips per day, on average) suggest that a fleet of SAVs allowing for DRS may result in vehicle-miles traveled (VMT) that exceed person-trip miles demanded (due to anticipatory relocations of empty vehicles, between trip calls), it is possible to reduce overall VMT as trip-making intensity (SAV membership) rises and/or DRS users become more flexible in their trip timing and routing. Indeed, DRS appears critical to avoiding new congestion problems, since VMT may increase by over 8 % without any ride-sharing. Finally, these simulation results suggest that a private fleet operator paying $70,000 per new SAV could earn a 19 % annual (long-term) return on investment while offering SAV services at $1.00 per mile for a non-shared trip (which is less than a third of Austin’s average taxi cab fare).
Article
We consider a profit maximization problem in an urban mobility on-demand service, of which the operator owns a fleet, provides both exclusive and shared trip services, and dynamically determines prices of offers. With knowledge of the traveler preference and the distribution of future trip requests, the operator wants to find the pricing strategy that optimizes the total operating profit of multiple trips during a specific period, namely, a day in this paper. This problem is first formulated and analyzed within the dynamic programming framework, where a general approach combining parametric rollout policy and stochastic optimization is proposed. A discrete-choice-based price optimization model is then used for the request level optimal decision problem and leads to a practical and computationally tractable algorithm for the problem. Our algorithm is evaluated with a simulated experiment in the urban traffic network in Langfang, China, and it is shown to generate considerably higher profit than naive strategies. Further analysis shows that this method also leads to higher congestion level and lower service capacity in the urban traffic system, which highlights a need for policy interventions that balance the private profit making and the system level optimality.
Conference Paper
Mobility On Demand (MOD) systems are revolutionizing transportation in urban settings by improving vehicle utilization and reducing parking congestion. A key factor in the success of an MOD system is the ability to measure and respond to real-time customer arrival data. Real time traffic arrival rate data is traditionally difficult to obtain due to the need to install fixed sensors throughout the MOD network. This paper presents a framework for measuring pedestrian traffic arrival rates using sensors onboard the vehicles that make up the MOD fleet. A novel distributed fusion algorithm is presented which combines onboard LIDAR and camera sensor measurements to detect trajectories of pedestrians with a 90% detection hit rate with 1.5 false positives per minute. A novel moving observer method is introduced to estimate pedestrian arrival rates from pedestrian trajectories collected from mobile sensors. The moving observer method is evaluated in both simulation and hardware and is shown to achieve arrival rate estimates comparable to those that would be obtained with multiple stationary sensors.
Conference Paper
Singapore's vision of a Smart Nation encompasses the development of effective and efficient means of transportation. The government's target is to leverage new technologies to create services for a demand-driven intelligent transportation model including personal vehicles, public transport, and taxis. Singapore's government is strongly encouraging and supporting research and development of technologies for autonomous vehicles in general and autonomous taxis in particular. The design and implementation of intelligent routing algorithms is one of the keys to the deployment of autonomous taxis. In this paper we demonstrate that a reinforcement learning algorithm of the Q-learning family, based on a customized exploration and exploitation strategy, is able to learn optimal actions for the routing autonomous taxis in a real scenario at the scale of the city of Singapore with pick-up and drop-off events for a fleet of one thousand taxis.