Conference PaperPDF Available

Rebalancing shared mobility-on-demand systems: A reinforcement learning approach

October 2017

October 2017

DOI:10.1109/ITSC.2017.8317908

Conference: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC)

Authors:

Jinhua Zhao

Massachusetts Institute of Technology

Patrick Jaillet

Massachusetts Institute of Technology

Content uploaded by Jinhua Zhao

Content may be subject to copyright.

Rebalancing Shared Mobility-on-Demand Systems:

a Reinforcement Learning Approach

Jian Wen∗, Jinhua Zhao†and Patrick Jaillet‡

∗Department of Civil and Environmental Engineering

†Department of Urban Studies and Planning

‡Department of Electrical Engineering and Computer Science

Laboratory for Information and Decision Systems, Operations Research Center

Massachusetts Institute of Technology, Cambridge, MA 02139-4307

Emails: wenj@mit.edu, jinhua@mit.edu, jaillet@mit.edu

Abstract—Shared mobility-on-demand systems have very

promising prospects in making urban transportation efﬁcient and

affordable. However, due to operational challenges among others,

many mobility applications still remain niche products. This

paper addresses rebalancing needs that are critical for effective

ﬂeet management in order to offset the inevitable imbalance of

vehicle supply and travel demand. Speciﬁcally, we propose a

reinforcement learning approach which adopts a deep Q network

and adaptively moves idle vehicles to regain balance. This

innovative model-free approach takes a very different perspective

from the state-of-the-art network-based methods and is able to

cope with large-scale shared systems in real time with partial

or full data availability. We apply this approach to an agent-

based simulator and test it on a London case study. Results show

that, the proposed method outperforms the local anticipatory

method by reducing the ﬂeet size by 14% while inducing little

extra vehicle distance traveled. The performance is close to the

optimal solution yet the computational speed is 2.5 times faster.

Collectively, the paper concludes that the proposed rebalancing

approach is effective under various demand scenarios and will

beneﬁt both travelers and operators if implemented in a shared

mobility-on-demand system.

Index Terms—rebalancing, mobility-on-demand, reinforcement

learning, deep Q network, ride-sharing, agent-based simulation.

I. INTRODUCTION

Despite many of the debates with regards to regulation

and societal impact, the shared mobility-on-demand (shared

MoD) system is believed to be one of the most promising

approaches to reform urban transportation. On the one hand,

the on-demand mobility is able to connect more people with

timely and ﬂexible service. On the other hand, sharing, taking

a variety of forms including car-sharing and ride-sharing, helps

reduce travel costs and enables more affordable trips.

Researchers have been seeking solutions to optimize the

service design and improve its operational performance. The

efforts have led to advanced assignment algorithms [1], [2],

dynamic prediction tools [3], [4], system design evaluation

[5]–[7], dynamic pricing [8], [9] and the pursuit of full auton-

omy. Rebalancing (or interchangeably repositioning, relocat-

ing), which consists of distributing idle vehicles to regain the

demand-supply balance, also appears among the list recently

[10]–[15].

Stemming from the spatial and temporal mismatch between

demand (trips, also referred to as travelers, requests) and

supply (vehicles), the rebalancing problem can often be seen in

transportation systems like car rental [16], [17] and public bike

sharing [18], [19]. Rebalancing MoD systems is an emerging

topic and distinguishes itself from the aboves for the following

reasons: (1) allowing one-way trips adds on asymmetry to

the demand-supply interaction and necessitates rebalancing;

(2) on-demand travelers are often impatient with long wait

and systems without timely rebalancing will suffer massive

customer loss; and (3) MoD systems provide door-to-door

service, making the decision space continuous. If rides are

shared, measuring the supply availability becomes even more

critical, since a partially-occupied vehicle is still (condition-

ally) available to new requests as long as the dispatching

constraints are satisﬁed.

Existing works adopting network-based optimization ap-

proaches are usually computationally demanding. In addition,

as far as the authors are aware, none of them has been aiming

for the door-to-door systems and ride-sharing has also been

omitted for the sake of simplicity. However, door-to-door

and ride-sharing are indeed two key elements that ensure the

connectivity and affordability of the MoD service. For this

matter, this paper will incorporate both of the missing elements

into the shared MoD system and propose a reinforcement

learning approach that applies deep Q network (DQN) for

fast and effective solution. The contribution of the paper is

therefore twofold: (1) it makes a transition from the station-

based discrete space to continuous space to enable door-to-

door service and add ride-sharing features to the model; (2)

the proposed DQN-based approach for rebalancing shared

MoD systems is model-free, real-time, demand predictive,

computationally scalable and has good performance in terms

of both level of service and operational cost.

The rest of the paper is organized as follows. Section II

will review some of the most relevant works in the literature

and identify the research areas to which we can contribute.

Section III will develop the reinforcement learning approach

and we also present two benchmark policies for comparison:

the optimal rebalancing problem (ORP) and the simple antic-

ipatory rebalancing (SAR). In Section IV, we will simulate

and compare the effectiveness of the methods under different

demand scenarios and map sizes. A case study in London will

then demonstrate the algorithmic performance in the realistic

urban setting. Finally, Section V will draw the conclusion and

point out the directions to the future works.

II. LITERATURE REVIEW

Within the realm of urban transportation, the existing re-

search works on rebalancing have been largely focused on car

rental systems [16], [17] and public bike sharing systems [18],

[19]. Rebalancing MoD systems, on the contrary, is a relatively

new topic. Early MoD works often draw on the experience of

the car rental and bike sharing counterparts and adopt network-

based optimization approaches.

[10] is among the ﬁrst. Based on the ﬂuid model, this

paper proposes an optimal rebalancing model and simulates it

with a 12-station autonomous MoD (AMoD) system. In this

system, every station reaches an equilibrium so that there are

excess vehicles and no waiting customers. However, under the

inﬂuence of its car rental predecessors, the proposed method is

only limited to simpliﬁed station-based networks. In addition,

it is only focused on the ideal equilibrium and does not touch

upon the stochastic ﬂuctuations of demand-supply interplay.

In continuation to this work, [11] transforms the model into

an analytical guideline for ﬂeet sizing in conceptual AMoD

systems and validates it in a Singapore case study. This

strategical work still remains at the static level and provides

little insights to real-time operation.

[12] extends the idea of the ﬂuid model and presents a

queueing-theoretical approach within the framework of Jack-

son networks. Many efforts in this paper have been made to

prove that, as a closed Jackson network, the system is most

efﬁcient when inward and outward vehicle ﬂows (including

rebalancing ﬂows) are equal at each station. The solution

to an ofﬂine optimal rebalancing problem is given, and the

paper argues that, if taking only current information at a

speciﬁc time point, the problem could be adopted to online

applications. A case study in New York City with around

8000 non-shared vehicles demonstrates the effectiveness of the

method. Unfortunately, trips in the simulation still have to be

clustered to ﬁt in the station-based model.

[13] tests both ofﬂine and online policies with an agent-

based simulation platform using Singapore travel data as input.

The results show that about 28% and 23% less vehicles are

required to guarantee the same service rate when ofﬂine and

online rebalancing are in use respectively. Moreover, online

policy outperforms the ofﬂine one by reducing the average

wait time from 11 minutes to 9. Using a similar approach,

[14] tackles the rebalancing issues from the perspective of the

ﬂeet operators. It quantiﬁes the operational cost as a function

of ﬂeet size, customer walk aways and the utilization rate and

reveals that rebalancing can reduce the cost signiﬁcantly.

The problem formulation in this paper extends the online

model in [12] in order to make it compatible with both

door-to-door service and ride-sharing. It also introduces a

probabilistic objective function to describe the stochasticity

in request arrivals. In the next section, we will ﬁrst give

the formulation to the generic shared MoD system. Multiple

solutions are provided thereafter, including the proposed DQN

approach and ORP/SAR as two benchmarks.

III. METHODOLOGY

Consider a shared MoD system covering a predeﬁned ser-

vice area. For operational purposes, the area is divided into

Szones S={1,2, . . . , S}which are mutually exclusive

and collectively exhaustive. The set of ﬂeet Vconsists of V

shareable vehicles of capacity K. Travelers with origins and

destinations in Ssend requests and queue up. Travelers are

then assigned to vehicles dynamically by the central dispatcher

on a ﬁrst-come-ﬁrst-serve basis and no traveler will “walk

away” despite the possible long wait.

A vehicle having been assigned to travelers is “in service”.

Otherwise, it is “idle” and available to be rebalanced. We

assume that the online rebalancing algorithm is run every ∆T

time units and at time t=T, one run is triggered. The period

of study is therefore [T, T + ∆T]. We also assume that, over

the period, the number of incoming requests Aiin zone i

follows the Poisson process with predicted arrival intensity

λi, that is to say, Ai∼Poisson(λi∆T). Knowing both

the demand distribution and the vehicle status, the objective

of rebalancing is to maximize the service availability while

limiting the rebalancing cost. Service availability is evaluated

by the average wait time of requests emerging within the

period, in which “wait time” is deﬁned as the time difference

between a traveler sends out the request and s/he is picked up

by a vehicle; operational cost is represented by total vehicle

rebalancing distance traveled, that is, the total distance covered

by all vehicles in the ﬂeet due to rebalancing.

A. Optimal Rebalancing Problem

The decision variables in the rebalancing problem are ri,j

for i, j ∈ S.ri,j represents the number of rebalancing vehicles

sent from zone ito zone jat t=T, i.e., the beginning of

the period. Specially, if i=j, it represents the number of idle

vehicles in zone ithat remain unmoved during the period. The

decision variables satisfy Pj∈S ri,j =ri, in which riis the

number of idle vehicles available to be rebalanced in zone i

when t=T.

The cost of rebalancing one vehicle from ito jis noted as

ci,j . In this paper, ci,j is deﬁned as below:

ci,j =cdi,j if jis accessible from iwithin ∆T(1a)

¯cotherwise (1b)

In equation 1a, di,j is the distance from ito jand the cost is

proportional to the distance with a multiplier c; in equation 1b,

the cost is set to be a large constant ¯cwhen jis too far away

from i. This is to discourage long rebalancing and guarantee

that all rebalanced vehicles can arrive at their destinations

before the period ends.

We assume that both in-service and idle vehicles follow

the shortest routes when picking up/dropping off travelers and

rebalancing. We also assume that the planned routes are not

inﬂuenced by the incoming requests over the period and travel

time is deterministic. Consequently, the level of supply at

t=T+ ∆T, i.e., the end of the period, could be measured

by the availability of both idle and in-service vehicles at that

time. Note, a vehicle is classiﬁed as “idle” or “in-service” only

according to its status before rebalancing starts.

The availability of idle vehicles at t=T+∆Tis measured

by r0

j, the number of idle vehicles that zone jwill receive at the

end. r0

∆

=Pi∈S ri,j . An in-service vehicle may also become

available (if all on-board passengers have been dropped off)

or partially available (if it still has passenger(s) on board but

admits ride-sharing) at the end. Similarly, we deﬁne s0

jas the

availability of in-service vehicles in zone jat t=T+ ∆T.

jcould be expressed as:

j=X

v∈V

j(v)w(l0

v),∀j∈ S (2)

j(v)is the indicator function that equals 1if vis in zone

jwhen t=T+ ∆Tand 0otherwise. w(l0

v)is the load-

availability factor, deﬁned as:

w(l0

v) = ˆp(l0

ˆp(0) ,for 0 ≤l0

v≤K(3)

vis the load of vwhen t=T+ ∆T. In a shared MoD

system with ﬁxed settings, ˆp(l0

v)is the empirical probability

that a vehicle of load l0

vwill be assigned to the new request

based on simulation results. Similarly, ˆp(0) is the empirical

probability that an empty vehicle will be assigned to the new

request. For a vehicle of capacity 4, we have w(0) = 1.0and

w(·) = 0.4,0.2,0.1,0.0when load is 1,2,3,4respectively.

The estimated total supply v0

jwhen t=T+ ∆Tis therefore:

j=br0

j+s0

jc,∀j∈ S (4)

The objective of maximizing the areawide service availabil-

ity is translated to maximizing the total expected number of

requests bthat can be served by vehicles from the same zone at

the end of the period. b=Pj∈S bj(v0

j)and bj(v0

j)represents

the expected number of served requests in zone jduring ∆T,

knowing that v0

jvehicles are in supply:

bj(v0

j) = X

k=0

min(k, v0

j)P(Aj=k)(5)

Based on the deﬁnitions and assumptions, the optimal

rebalancing problem (ORP) is as follows:

max

ri,j Pj∈S bj(v0

j)−Pi,j∈S ci,j ri,j (6)

s.t.Pj∈S ri,j =ri,∀i∈ S

ri,j ∈N,∀i, j ∈ S

The equation 6 is a Mixed Integer Nonlinear Programming

(MINLP) problem. When problem size is large, solving

MINLP is extremely computationally burdensome and in this

paper, we use a combination of incremental-optimal and

branch-and-bound methods to reach the optimum.

ORP serves as the “best practice” in benchmarking.

B. Simple Anticipatory Rebalancing

Despite the good quality in solutions, the wide use of

optimal rebalancing policies is still constrained by the limit

of computational capacity. Large-scale applications and simu-

lations often opt for locally executable algorithms which de-

centralize the decision-making and solve the problem heuris-

tically. By bounding the problem to a small area, it not

only reduces the complexity in real-time data management

and vehicle operation, but also naturally caps the induced

rebalancing distance without explicitly describing the cost.

One representative example is [15], which gives an intuitive

method for local rebalancing: if a zone’s supply exceeds its

expected demand or vice versa, system pushes or pulls idle

vehicles to or from adjacent zones. The paper then justiﬁes

this empirical method through simulation.

We extend this method and develop here a simple antici-

patory rebalancing (SAR) approach: a vehicle is informed of

the demand distribution in its neighboring area; the probability

that it moves to a zone is proportional to the number of pre-

dicted requests in that zone. Under this policy, a vehicle could

rebalance itself with its local knowledge and avoid causing

increased workload on the central dispatcher. Decisions can

be made in parallel.

SAR serves as the “average line” in benchmarking.

C. Deep Q Network

Solving the rebalancing problem requires modeling the

shared MoD system deliberately. However, delicate models

are barely solvable and transferable from system to system,

which unfortunately discourages its use in practice.

Reinforcement learning provides a very different approach

to tackling the rebalancing problem since it’s model-free and

requires little adjustment of the generic architecture. Recent

advances in deep Q networks [20], [21] have also made it

possible to handle the delay between actions and rewards as

well as the sequences of highly correlated states. Research

works have demonstrated that DQN has the ability to master

difﬁcult control policies including trafﬁc controls and taxi

dispatching [22], [23].

In this paper, the neural network is trained with a variant of

the Q-learning algorithm [20]. The neighboring area of a spe-

ciﬁc vehicle makes the reinforcement learning environment.

The distribution of idle vehicles, in-service vehicles together

with the predicted demand around it (i.e. rj, s0

j, λjfor all jin

its neighborhood) build up the state. Based on an action-value

function, the DQN agent returns the policy by simply selecting

the action with the highest value from the set of {noop, ne, e,

se, s, sw, w, nw, n}. “noop” indicates no rebalancing operation;

“ne” indicates rebalancing to the northeast adjacent zone and

so on and so forth. The vehicle then executes the action and

returns the reward.

The rewards is evaluated under the following rules: (1) if

the vehicle is assigned to traveler(s) during the rebalancing

period, we compare the environment to the one without

rebalancing and calculate the save in wait time as reward;

the episode terminates and the system moves on until the

(a) discretization (b) average reward

Fig. 1. (1a) Illustration of the discretization: points represent vehicles; lines

(dashed line) originating from points represent planned routes (rebalancing

routes); dotted grids represent the discretized cells for both ORP (10×10)

and SAR/DQN (5×5 with red vehicle as center). (1b) Average reward per

episode during one typical training.

vehicle becomes idle again; (2) if the vehicle remains idle

during the rebalancing period, we use a penalty as reward

to discourage this rebalancing action; the episode continues.

The rewarded episodes are stored into a replay memory and

we update action-value function using samples drawn from

the pool. This action-value function is then used to guide the

rebalancing actions of all idle vehicles in the online algorithm.

The architecture and parameterization of DQN are described

in the next section, where we’ll compare the effectiveness of

DQN with ORP and SAR and cast them into a case study.

IV. SIMULATION AND CASE STUDY

The simulator in this section is built upon an agent-based

modeling platform and details can be found in [7]. It evaluates

the performance of the shared MoD system with the aid of a

series of indicators, among which the interesting ones are:

(1) wait time, representing the service availability from the

traveler’s perspective; (2) total/vehicle rebalancing distance

traveled, indicating the rebalancing cost from the operator’s

perspective; and (3) computational time, implying it feasibility

for large-scale real-time applications.

A. Benchmarking

For the sake of training efﬁciency, we begin the test with an

abstract 5km×5km map with no road networks. Requests are

drawn from a list of OD pairs with arrival rate of 100 trips/h.

20 vehicles of capacity 4 forms the ﬂeet, moving straight to

travelers based on Euclidean distance with a constant speed of

21.6 km/h. Despite its simpliﬁcation in describing the trafﬁc,

this presentation is sufﬁcient to evaluate the effectiveness of

the algorithms.

As shown in ﬁgure 1a, ORP discretizes the entire map

into 10×10 ﬁxed cells of 0.5km×0.5km. As for the local

algorithms, each vehicle in both SAR and DQN has knowledge

of a neighboring area of 2.5km×2.5km, which is centered at

the location of the vehicle and discretized into 5×5 moving

cells of the same size. For each of the rebalancing methods,

simulation runs 50 times with a simulation time of 3 hours.

Rebalancing is performed every 150 seconds. The DQN is

trained with a three-layer neural network beforehand for 6000

steps, using -greedy behavior policy and a replay memory of

Fig. 2. Comparison of rebalancing methods: the balanced scenario.

2000 most recent steps in batches of size 32. The learning rate

is set to be 0.001 with no decay for the Adam optimizer and

the penalty for empty rebalancing is -5. Figure 1b shows how

the average reward evolves during one typical training. The

best DQN in terms of shortest average wait time is used in

the following analysis.

We distinguish three demand scenarios: (1) the balanced

scenario, in which the trip ODs are uniformly distributed on

the map at random; (2) the imbalanced scenario, in which

the trip origins are concentrated to two production areas and

destinations to two attraction areas; and (3) the ﬁrst-mile

scenario, in which trip origins are uniformly distributed while

the destinations are ﬁxed to one point (e.g. an access station).

The level of imbalance increases from scenarios 1 to 3 as the

distributions of origins (where requests are sent, i.e., demand)

and destinations (where vehicles become idle, i.e., supply)

mismatch each other to a greater and greater extent. The

necessity of rebalancing is therefore expected to increase.

1) Performance Comparison: Figure 2 compares the per-

formance of the rebalancing methods presented in section

III under the balanced scenario. When ORP is applied, the

average traveler wait time according to 50 three-hour runs

is 146.2 seconds. This is a great leap from 170.6 seconds

(+16.7%) in the case with no rebalancing policy as shown

in the rightmost box. With the same system settings, SAR

scores 155.6 (+6.4%) and DQN scores 150.1 (+2.7%). It

indicates that DQN has superiority over its local counterpart

SAR with regard to high service accessibility, yet it falls

behind ORP. The suboptimality of DQN could be explained

by its design of individual rewarding. Without coordination

with other vehicles in the training the agent (vehicle) tends

to overreact to the imbalance. It is also noticed that both

ORP and DQN produce smaller wait time variance. SAR, in

contrast, performs in a rather random manner and the results

are more dispersed. When it comes to the vehicle distance

traveled, the simulation points out that all rebalancing methods

would induce average vehicle distance traveled by around

30% to 35%. The rebalancing distance could be controlled

by adjusting ci,j in ORP and the size of the neighboring area

in SAR and DQN.

2) Various Demand Patterns: Table I shows how rebal-

ancing responds to different demand patterns. As the level

TABLE I

COMPARISON OF AVER AGE WAIT TIM ES ACR OSS SC ENA RI OS

ORP SAR DQN No Rebl

Scenario 1 146.2 155.6 150.1 170.6

(balanced) (+6.4%) (+2.7%) (+16.7%)

Scenario 2 138.5 172.5 155.8 211.4

(imbalanced) (+24.5%) (+12.5%) (+52.6%)

Scenario 3 129.2 161.9 151.6 232.8

(ﬁrst-mile) (+25.3%) (+17.3%) (+80.2%)

*Wait times (in seconds) on top; changes compared to ORP (in

percentage) in the middle.

TABLE II

COMPARISON OF AVER AGE WAIT TIMES AND COM PUTATI ONA L TIMES

ACRO SS MA P SIZE S

ORP SAR DQN No Rebl

Map 1 146.2 155.6 150.1 170.6

(5km×5km) (+6.4%) (+2.7%) (+16.7%)

0.033 0.027 0.034

Map 2 147.9 164.3 154.4 176.3

(10km×10km) (+11.1%) (+4.4%) (+19.2%)

1.893 0.682 0.822

Map 3 147.2 182.4 158.2 240.7

(20km×20km) (+23.9%) (+7.5%) (+63.5%)

259.185 42.976 41.209

*Wait times (in seconds) on top; changes compared to ORP (in

percentage) in the middle; computational times (in seconds) on

bottom in italics.

Fig. 3. Fleet Sizing under Different Rebalancing Policies.

of imbalance increases from scenario 1 to scenario 3, the

wait time with no rebalancing soars up from 170.6 to 232.8.

If ORP is in action, the system performs surprisingly better

when productions/attractions are more unevenly distributed,

owing to the fact that agglomerated trip distribution reduces

the routing difﬁculties in ride-sharing. The performance of

DQN still resides in between ORP and SAR. However, lim-

ited to the local knowledge, SAR and DQN gradually loss

their competitiveness when the demand-supply imbalance is

signiﬁcant only at the areawide level.

3) Scalability: We enlarge the map from 5km×5km to

10km×10km and 20km×20km and augment the demand

intensity proportionally. The ﬂeet size also increases from

20 to 125 and 810 to maintain the same level of service

under ORP. As shown in Table II, the increasing wait time

from 170.6 to 240.7 shows a manifest necessity for vehicle

rebalancing when map is large. DQN stays very close to the

optimal solution, demonstrating its robust performance over

different map sizes.

The computational times are also shown in Table II. Each

value represents the average running time for one rebalancing

solution using a machine with 2.7 GHz Intel Core i5 and 8

GB memory. ORP is undoubtedly the most computationally

demanding one. The computational time increases drastically

as the map expands since its complexity is proportional to the

product of the ﬂeet size and the number of cells in the area.

This might raise a challenge when the scale of application

grows. SAR and DQN, in contrast, perform much faster when

the area is large and could be further distributed and computed

in parallel. This structure evidently facilitates the application

to large-scale networks, especially when autonomous vehicles

are used.

B. Case Study: London Shared MoD

Now we test our rebalancing methods on a 150km2road

network in Orpington, London.

According to the travel data over the past 10 years, residents

within this area make over 40,000 trips during the morning

peak hours of a typical workday. The demand intensity for

shared MoD is estimated to be 1145 trips/h, which accounts

for around 12% of the motorized trips [7]. In this series

of simulations, we adopt this demand pool and assume the

request time window to be 8 minutes (traveler walks away if

wait time exceeds this value) and the maximum detour factor

to be 1.5 (ride-sharing is acceptable only if the actual travel

time is less than 1.5 times of the shortest travel time). We

further assume that the system does not reject travelers unless

the above constraints cannot be satisﬁed and the “walk away

rate” should not exceed 10% to ensure the service availability.

As shown in Figure 3, the average wait time decreases as

the ﬂeet size grows. If the operator promises the average wait

time to be less than 3 minutes, about 230 vehicles should be

put in service when no policy is adopted to deal with the

imbalance of origins and destination (morning trips usually

feature “home to work” or “suburb to downtown” asymmetry).

If it chooses ORP, SAR or DQN to rebalance during operation,

the ﬂeet size could be largely reduced to 96, 122 and 105. The

computational time for one rebalancing solution is on average

118.8, 55.1 and 42.4 seconds.

Since the vehicles are used much more efﬁciently under

DQN, the average vehicle distance traveled also increases from

around 14km per hour to around 29km, of which 23km are for

servicing travelers and 6km for rebalancing. However, because

of ride-sharing, the total distance traveled by the ﬂeet does not

grow as much (∼3538km vs. ∼3220km), although rebalancing

induces around 30% more empty miles.

V. CONCLUSION AND FUTURE WORK

This paper develops a DQN-based reinforcement learning

approach for rebalancing the shared mobility-on-demand sys-

tem and proves its effectiveness through an agent-based sim-

ulation platform. Results show that DQN performs effectively

by reducing the wait time of travelers and limiting the distance

traveled by vehicles. The London case study also demonstrates

that the proposed approach is robust in a realistic setting.

Although its performance is still second to the network-based

optimal, the model-free DQN has revealed its potential in

this ﬁeld originally dominated by operation research models,

particularly when scales are large and stochasticity hinders the

quality of the optimal formulation.

The very next step of this work is to customize the rein-

forcement learning architecture to better serve the shared MoD

systems. Several directions might be worthy to follow up:

1) moving from discrete action space to continuous one to

avoid discretizing rebalancing actions;

2) extending Q-Learning to multi-agent systems to correct

overreacting;

3) redeﬁning the reward to represent different performance

metrics from the perspective of both operators and

travelers.

Modeling trafﬁc and customer behaviors in a stochastic man-

ner might also be of interest to get closer to the real world.

ACKNOWLEDGMENT

The authors would like to thank Transport for London for

providing trip data and road network used in this study. We

also wanted to express our gratitude to Neema Nassir, Yuxin

Leo Chen, Han Qiu and many other members from MIT JTL

Mobility Lab and MIT Transit Lab for their comments and

suggestions during this study.

REFERENCES

[1] J. Alonso-Mora, S. Samaranayake, A. Wallar, E. Frazzoli, and D. Rus,

“On-demand high-capacity ride-sharing via dynamic trip-vehicle assign-

ment,” Proceedings of the National Academy of Sciences, p. 201611675,

2017.

[2] A. Prorok and V. Kumar, “Privacy-preserving vehicle assignment for

mobility-on-demand systems,” arXiv preprint arXiv:1703.04738, 2017.

[3] J. Chen, K. H. Low, Y. Yao, and P. Jaillet, “Gaussian process decentral-

ized data fusion and active sensing for spatiotemporal trafﬁc modeling

and prediction in mobility-on-demand systems,” IEEE Transactions on

Automation Science and Engineering, vol. 12, no. 3, pp. 901–921, 2015.

[4] J. Miller, A. Hasfura, S.-Y. Liu, and J. P. How, “Dynamic arrival

rate estimation for campus mobility on demand network graphs,” in

Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International

Conference on. IEEE, 2016, pp. 2285–2292.

[5] Y. Shen, H. Zhang, and J. Zhao, “Embedding autonomous vehicle

sharing in public transit system: An example of last-mile problem,” Tech.

Rep., 2017.

[6] D. J. Fagnant and K. M. Kockelman, “Dynamic ride-sharing and ﬂeet

sizing for a system of shared autonomous vehicles in austin, texas,”

Transportation, pp. 1–16, 2016.

[7] J. Wen, Y. Chen, N. Nassir, and J. Zhao, “Designing and simulating

the integrated autonomous vehicle + public transport service in london,”

2017, working paper.

[8] M. K. Chen and M. Sheldon, “Dynamic pricing in a labor market: Surge

pricing and ﬂexible work on the uber platform.” in EC, 2016, p. 455.

[9] H. Qiu, R. Li, and J. Zhao, “Dynamic pricing in shared mobility on

demand service and itssocial impacts.” Elsevier, 2017.

[10] M. Pavone, S. L. Smith, E. Frazzoli, and D. Rus, “Robotic load balancing

for mobility-on-demand systems,” The International Journal of Robotics

Research, vol. 31, no. 7, pp. 839–854, 2012.

[11] K. Spieser, K. Treleaven, R. Zhang, E. Frazzoli, D. Morton, and

M. Pavone, “Toward a systematic approach to the design and evaluation

of automated mobility-on-demand systems: A case study in singapore,”

in Road Vehicle Automation. Springer, 2014, pp. 229–245.

[12] R. Zhang and M. Pavone, “Control of robotic mobility-on-demand

systems: a queueing-theoretical perspective,” The International Journal

of Robotics Research, vol. 35, no. 1-3, pp. 186–203, 2016.

[13] K. A. Marczuk, H. S. Soh, C. M. Azevedo, D.-H. Lee, and E. Frazzoli,

“Simulation framework for rebalancing of autonomous mobility on

demand systems,” in MATEC Web of Conferences, vol. 81. EDP

Sciences, 2016, p. 01005.

[14] K. Spieser, S. Samaranayake, W. Gruel, and E. Frazolli, “Shared-vehicle

mobility-on-demand systems: a ﬂeet operators guide to rebalancing

empty vehicles,” in Transportation Research Board 96th Annual Meet-

ing, 2016.

[15] D. J. Fagnant and K. M. Kockelman, “Dynamic ride-sharing and

optimal ﬂeet sizing for a system of shared autonomous vehicles,” in

Transportation Research Board 94th Annual Meeting, no. 15-1962,

2015.

[16] B. Boyacı, K. G. Zografos, and N. Geroliminis, “An optimization frame-

work for the development of efﬁcient one-way car-sharing systems,”

European Journal of Operational Research, vol. 240, no. 3, pp. 718–

733, 2015.

[17] G. Alﬁan, J. Rhee, M. F. Ijaz, M. Syafrudin, and N. L. Fitriyani,

“Performance analysis of a forecasting relocation model for one-way

carsharing,” Applied Sciences, vol. 7, no. 6, p. 598, 2017.

[18] M. Dell’Amico, E. Hadjicostantinou, M. Iori, and S. Novellani, “The

bike sharing rebalancing problem: Mathematical formulations and

benchmark instances,” Omega, vol. 45, pp. 7–19, 2014.

[19] S. Ghosh, P. Varakantham, Y. Adulyasak, and P. Jaillet, “Dynamic

repositioning to reduce lost demand in bike sharing systems,” Journal

of Artiﬁcial Intelligence Research, vol. 58, pp. 387–430, 2017.

[20] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier-

stra, and M. Riedmiller, “Playing atari with deep reinforcement learn-

ing,” arXiv preprint arXiv:1312.5602, 2013.

[21] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning

with double q-learning.” in AAAI, 2016, pp. 2094–2100.

[22] E. Walraven, M. T. Spaan, and B. Bakker, “Trafﬁc ﬂow optimization: A

reinforcement learning approach,” Engineering Applications of Artiﬁcial

Intelligence, vol. 52, pp. 203–212, 2016.

[23] M. Han, P. Senellart, S. Bressan, and H. Wu, “Routing an autonomous

taxi with reinforcement learning,” in Proceedings of the 25th ACM In-

ternational on Conference on Information and Knowledge Management.

ACM, 2016, pp. 2421–2424.

Hierarchical Control for Vehicle Repositioning in Autonomous Mobility on Demand Systems

Preprint

Jun 2024

Balancing passenger demand and vehicle availability is crucial for ensuring the sustainability and effectiveness of urban transportation systems. To address this challenge, we propose a novel hierarchical strategy for the efficient distribution of empty vehicles in urban areas. The proposed approach employs a data-enabled predictive control algorithm to develop a high-level controller, which guides the inter-regional allocation of idle vehicles. This algorithm utilizes historical data on passenger demand and vehicle supply in each region to construct a non-parametric representation of the system, enabling it to determine the optimal number of vehicles to be repositioned or retained in their current regions without modeling the system. At the low level, a coverage control-based controller is designed to provide inter-regional position guidance, determining the desired road intersection each vehicle should target. With the objective of optimizing area coverage, it aligns the vehicle distribution with the demand across different districts within a single region. The effectiveness of the proposed method is validated through simulation experiments on the real road network of Shenzhen, China. The integration of the two layers provides better performance compared to applying either layer in isolation, demonstrating its potential to reduce passenger waiting time and answer more requests, thus promoting the development of more efficient and sustainable transportation systems.

i-Rebalance: Personalized Vehicle Repositioning for Supply Demand Balance

Article

Mar 2024

Ride-hailing platforms have been facing the challenge of balancing demand and supply. Existing vehicle reposition techniques often treat drivers as homogeneous agents and relocate them deterministically, assuming compliance with the reposition. In this paper, we consider a more realistic and driver-centric scenario where drivers have unique cruising preferences and can decide whether to take the recommendation or not on their own. We propose i-Rebalance, a personalized vehicle reposition technique with deep reinforcement learning (DRL). i-Rebalance estimates drivers' decisions on accepting reposition recommendations through an on-field user study involving 99 real drivers. To optimize supply-demand balance and enhance preference satisfaction simultaneously, i-Rebalance has a sequential reposition strategy with dual DRL agents: Grid Agent to determine the reposition order of idle vehicles, and Vehicle Agent to provide personalized recommendations to each vehicle in the pre-defined order. This sequential learning strategy facilitates more effective policy training within a smaller action space compared to traditional joint-action methods. Evaluation of real-world trajectory data shows that i-Rebalance improves driver acceptance rate by 38.07% and total driver income by 9.97%.

Taming Travel Time Fluctuations through Adaptive Stop Pooling

Article

Full-text available

Apr 2024

Ride sharing services combine trips of multiple users in the same vehicle and may provide more sustainable transport than private cars. As mobility demand varies during the day, the travel times experienced by passengers may substantially vary as well, making the service quality unreliable. We show through model simulations that such travel time fluctuations may be drastically reduced by stop pooling. Having users walk to meet at joint locations for pick-up or drop-off allows buses to travel more direct routes by avoiding frequent door-to-door detours, especially during high demand. We in particular propose adaptive stop pooling by adjusting the maximum walking distance to the temporally and spatially varying demand. The results highlight that adaptive stop pooling may substantially reduce travel time fluctuations while even improving the average travel time of ride sharing services, especially for high demand. Such quality improvements may in turn increase the acceptance and adoption of ride sharing services.

Integrating Machine Learning and Deep Learning in Smart Cities for Enhanced Traffic Congestion Management: An Empirical Review

Article

Dec 2023

A hierarchical control framework for vehicle repositioning in ride-hailing systems

Article

Jun 2024

Reinforcement learning-based order-dispatching optimization in the ride-sourcing service

Article

May 2024
COMPUT IND ENG

A Survey of Machine Learning-Based Ride-Hailing Planning

Article

Jun 2024

Ride-hailing is a sustainable transportation paradigm where riders access door-to-door traveling services through a mobile phone application, which has attracted a colossal amount of usage. There are two major planning tasks in a ride-hailing system: 1) matching, i.e., assigning available vehicles to pick up the riders; and 2) repositioning, i.e., proactively relocating vehicles to certain locations to balance the supply and demand of ride-hailing services. Recently, many studies of ride-hailing planning that leverage machine learning techniques have emerged. In this article, we present a comprehensive overview on latest developments of machine learning-based ride-hailing planning. To offer a clear and structured review, we introduce a taxonomy into which we carefully fit the different categories of related works according to the types of their planning tasks and solution schemes, which include collective matching, distributed matching, collective repositioning, distributed repositioning, and joint matching and repositioning. We further shed light on many real-world data sets and simulators that are indispensable for empirical studies on machine learning-based ride-hailing planning strategies. At last, we propose several promising research directions for this rapidly growing research and practical field.

Order dispatching and vacant vehicles rebalancing for the first-mile ride-sharing problem

Article

Mar 2024

Agent Guidance in Autonomous Mobility on Demand Systems: An Approach Utilizing Priority Double Deep-Q-Networks

Conference Paper

Jan 2024

Distributed Fair Assignment and Rebalancing for Mobility-on-Demand Systems via an Auction-based Method

Conference Paper

Dec 2023

Transit-Oriented Autonomous Vehicle Operation with Integrated Demand-Supply Interaction,

Article

Full-text available

Nov 2018
TRANSPORT RES C-EMER

Autonomous vehicles (AVs) represent potentially disruptive and innovative changes to public transportation (PT) systems. However, the exact interplay between AV and PT is understudied in existing research. This paper proposes a systematic approach to the design, simulation, and evaluation of integrated autonomous vehicle and public transportation (AV + PT) systems. Two features distinguish this research from the state of the art in the literature: the first is the transit-oriented AV operation with the purpose of supporting existing PT modes; the second is the explicit modeling of the interaction between demand and supply. We highlight the transit-orientation by identifying the synergistic opportunities between AV and PT, which makes AVs more acceptable to all the stakeholders and respects the social-purpose considerations such as maintaining service availability and ensuring equity. Specifically, AV is designed to serve first-mile connections to rail stations and provide efficient shared mobility in low-density suburban areas. The interaction between demand and supply is modeled using a set of system dynamics equations and solved as a fixed-point problem through an iterative simulation procedure. We develop an agent-based simulation platform of service and a discrete choice model of demand as two subproblems. Using a feedback loop between supply and demand, we capture the interaction between the decisions of the service operator and those of the travelers and model the choices of both parties. Considering uncertainties in demand prediction and stochasticity in simulation, we also evaluate the robustness of our fixed-point solution and demonstrate the convergence of the proposed method empirically. We test our approach in a major European city, simulating scenarios with various fleet sizes, vehicle capacities, fare schemes, and hailing strategies such as in-advance requests. Scenarios are evaluated from the perspectives of passengers, AV operators, PT operators, and urban mobility system. Results show the trade off between the level of service and the operational cost, providing insight for fleet sizing to reach the optimal balance. Our simulated experiments show that encouraging ride-sharing, allowing in-advance requests, and combining fare with transit help enable service integration and encourage sustainable travel. Both the transit-oriented AV operation and the demand-supply interaction are essential components for defining and assessing the roles of the AV technology in our future transportation systems, especially those with ample and robust transit networks.

Performance Analysis of a Forecasting Relocation Model for One-Way Carsharing

Article

Full-text available

Jun 2017

A carsharing service can be seen as a transport alternative between private and public transport that enables a group of people to share vehicles based at certain stations. The advanced carsharing service, one-way carsharing, enables customers to return the car to another station. However, one-way implementation generates an imbalanced distribution of cars in each station. Thus, this paper proposes forecasting relocation to solve car distribution imbalances for one-way carsharing services. A discrete event simulation model was developed to help evaluate the proposed model performance. A real case dataset was used to find the best simulation result. The results provide a clear insight into the impact of forecasting relocation on high system utilization and the reservation acceptance ratio compared to traditional relocation methods.

Dynamic Repositioning to Reduce Lost Demand in Bike Sharing Systems

Article

Full-text available

Feb 2017
JAIR

Bike Sharing Systems (BSSs) are widely adopted in major cities of the world due to concerns associated with extensive private vehicle usage, namely, increased carbon emissions, traffic congestion and usage of non- renewable resources. In a BSS, base stations are strategically placed throughout a city and each station is stocked with a pre-determined number of bikes at the beginning of the day. Customers hire the bikes from one station and return them at another station. Due to unpredictable movements of customers hiring bikes, there is either congestion (more than required) or starvation (fewer than required) of bikes at base stations. Existing data has shown that congestion/starvation is a common phenomenon that leads to a large number of unsatisfied customers resulting in a significant loss in customer demand. In order to tackle this problem, we propose an optimisation formulation to reposition bikes using vehicles while also considering the routes for vehicles and future expected demand. Furthermore, we contribute two approaches that rely on decomposability in the problem (bike repositioning and vehicle routing) and aggregation of base stations to reduce the computation time significantly. Finally, we demonstrate the utility of our approach by comparing against two benchmark approaches on two real-world data sets of bike sharing systems. These approaches are evaluated using a simulation where the movements of customers are generated from real-world data sets.

On-demand high-capacity ride-sharing via dynamic trip-vehicle assignment

Article

Full-text available

Jan 2017

Significance Ride-sharing services can provide not only a very personalized mobility experience but also ensure efficiency and sustainability via large-scale ride pooling. Large-scale ride-sharing requires mathematical models and algorithms that can match large groups of riders to a fleet of shared vehicles in real time, a task not fully addressed by current solutions. We present a highly scalable anytime optimal algorithm and experimentally validate its performance using New York City taxi data and a shared vehicle fleet with passenger capacities of up to ten. Our results show that 2,000 vehicles (15% of the taxi fleet) of capacity 10 or 3,000 of capacity 4 can serve 98% of the demand within a mean waiting time of 2.8 min and mean trip delay of 3.5 min.

Simulation Framework for Rebalancing of Autonomous Mobility on Demand Systems

Article

Full-text available

Jan 2016

We are observing a disruption in the urban transportation worldwide. The number of cities offering shared-use on-demand mobility services is increasing rapidly. They promise sustainable and affordable personal mobility without a burden of owning a vehicle. Despite growing popularity, on-demand services, such as carsharing, remain niche products due to small scale and rebalancing issues. We are proposing an extension to the traditional carsharing, which is Autonomous Mobility on Demand (AMOD). AMOD provides a one-way carsharing with self- driving electric vehicles. Autonomous vehicles can make the carsharing more attractive to customers as they (i) reduce the operating cost, which is incurred when a manually driven system is unbalanced, and (ii) release people from the burden of driving.

Dynamic ride-sharing and fleet sizing for a system of shared autonomous vehicles in Austin, Texas

Article

Full-text available

Jan 2018
TRANSPORTATION

Shared autonomous (fully-automated) vehicles (SAVs) represent an emerging transportation mode for driverless and on-demand transport. Early actors include Google and Europe’s CityMobil2, who seek pilot deployments in low-speed settings. This work investigates SAVs’ potential for U.S. urban areas via multiple applications across the Austin, Texas, network. This work describes advances to existing agent- and network-based SAV simulations by enabling dynamic ride-sharing (DRS, which pools multiple travelers with similar origins, destinations and departure times in the same vehicle), optimizing fleet sizing, and anticipating profitability for operators in settings with no speed limitations on the vehicles and at adoption levels below 10 % of all personal trip-making in the region. Results suggest that DRS reduces average service times (wait times plus in-vehicle travel times) and travel costs for SAV users, even after accounting for extra passenger pick-ups, drop-offs and non-direct routings. While the base-case scenario (serving 56,324 person-trips per day, on average) suggest that a fleet of SAVs allowing for DRS may result in vehicle-miles traveled (VMT) that exceed person-trip miles demanded (due to anticipatory relocations of empty vehicles, between trip calls), it is possible to reduce overall VMT as trip-making intensity (SAV membership) rises and/or DRS users become more flexible in their trip timing and routing. Indeed, DRS appears critical to avoiding new congestion problems, since VMT may increase by over 8 % without any ride-sharing. Finally, these simulation results suggest that a private fleet operator paying $70,000 per new SAV could earn a 19 % annual (long-term) return on investment while offering SAV services at $1.00 per mile for a non-shared trip (which is less than a third of Austin’s average taxi cab fare).

Dynamic Pricing in Shared Mobility on Demand Service

Article

Feb 2018

We consider a profit maximization problem in an urban mobility on-demand service, of which the operator owns a fleet, provides both exclusive and shared trip services, and dynamically determines prices of offers. With knowledge of the traveler preference and the distribution of future trip requests, the operator wants to find the pricing strategy that optimizes the total operating profit of multiple trips during a specific period, namely, a day in this paper. This problem is first formulated and analyzed within the dynamic programming framework, where a general approach combining parametric rollout policy and stochastic optimization is proposed. A discrete-choice-based price optimization model is then used for the request level optimal decision problem and leads to a practical and computationally tractable algorithm for the problem. Our algorithm is evaluated with a simulated experiment in the urban traffic network in Langfang, China, and it is shown to generate considerably higher profit than naive strategies. Further analysis shows that this method also leads to higher congestion level and lower service capacity in the urban traffic system, which highlights a need for policy interventions that balance the private profit making and the system level optimality.

Privacy-preserving vehicle assignment for mobility-on-demand systems

Conference Paper

Sep 2017

Dynamic Arrival Rate Estimation for Campus Mobility on Demand Network Graphs

Conference Paper

Oct 2016

Mobility On Demand (MOD) systems are revolutionizing transportation in urban settings by improving vehicle utilization and reducing parking congestion. A key factor in the success of an MOD system is the ability to measure and respond to real-time customer arrival data. Real time traffic arrival rate data is traditionally difficult to obtain due to the need to install fixed sensors throughout the MOD network. This paper presents a framework for measuring pedestrian traffic arrival rates using sensors onboard the vehicles that make up the MOD fleet. A novel distributed fusion algorithm is presented which combines onboard LIDAR and camera sensor measurements to detect trajectories of pedestrians with a 90% detection hit rate with 1.5 false positives per minute. A novel moving observer method is introduced to estimate pedestrian arrival rates from pedestrian trajectories collected from mobile sensors. The moving observer method is evaluated in both simulation and hardware and is shown to achieve arrival rate estimates comparable to those that would be obtained with multiple stationary sensors.

Routing an Autonomous Taxi with Reinforcement Learning

Conference Paper

Oct 2016

Singapore's vision of a Smart Nation encompasses the development of effective and efficient means of transportation. The government's target is to leverage new technologies to create services for a demand-driven intelligent transportation model including personal vehicles, public transport, and taxis. Singapore's government is strongly encouraging and supporting research and development of technologies for autonomous vehicles in general and autonomous taxis in particular. The design and implementation of intelligent routing algorithms is one of the keys to the deployment of autonomous taxis. In this paper we demonstrate that a reinforcement learning algorithm of the Q-learning family, based on a customized exploration and exploitation strategy, is able to learn optimal actions for the routing autonomous taxis in a real scenario at the scale of the city of Singapore with pick-up and drop-off events for a fleet of one thousand taxis.

Rebalancing shared mobility-on-demand systems: A reinforcement learning approach

Recommended publications

Different Ways of Linking Behavioral and Neural Data via Computational Cognitive Models

Approximating meta-heuristics with homotopic recurrent neural networks

Task-Oriented Reinforcement Learning

Relational Deep Reinforcement Learning