ArticlePDF Available

To share or not to share? Revealing determinants of individuals’ willingness to share rides through a big data approach

Authors:

Abstract

Ridesplitting has been widely recognised as a promising mobility mode for sustainable transportation, but its success largely depends on a sufficient number of passengers who are willing to share their rides. To uncover the determinants of willingness to share (WTS), prior studies typically relied on either individual-level survey-based or aggregate-level data-driven methods. To combine the former’s strength in capturing individual choice preferences and the latter’s advantage in utilising available multi-source big data, this study proposes a big data approach to modelling individual choices between the solo and shared options for each trip. To reconstruct the choice process, we leverage large-scale real-world trip records and propose a learning framework to not only retrieve the trip time and fare of the chosen option (solo or shared), but also impute the likely time and fare of the alternative option. These reconstructed trip attributes are then integrated with the sociodemographic, built environment and traffic features from other data sources. Finally, all these variables are fed into a random coefficient logit model to reveal passengers’ ridesplitting preferences. Through a case study of Manhattan, New York City, we reveal the spatiotemporal pattern of WTS and its determinants. Results show that WTS varies greatly across space and time. The time-fare trade-off is identified as the most essential factor, with the value of time revealed to be about $28-36/h. WTS decreases with longer trip distance/commuting time/distance to the urban centre, lower road speed, and higher speed fluctuation/bus station/crime density, but increases with a higher proportion of middle-class/female/young residents, residential land use and metro station. The proposed methodology can be used to explain and monitor WTS in a cost-effective way, complementing traditional survey-based methods to better design and promote ridesplitting services.
To share or not to share? Revealing determinants of individuals’
willingness to share rides through a big data approach
Guan Huanga, Ting Lianb, A.G.O Yeha,c, Zhan Zhaoa,d,e,
aDepartment of Urban Planning and Design, The University of Hong Kong, Hong Kong SAR
bDepartment of Geography, The University of Hong Kong, Hong Kong SAR
cCentre of Urban Studies and Urban Planning, The University of Hong Kong, Hong Kong SAR
dUrban Systems Institute, The University of Hong Kong, Hong Kong SAR
eMusketeers Foundation Institute of Data Science, The University of Hong Kong, Hong Kong SAR
Abstract
Ridesplitting has been widely recognised as a promising mobility mode for sustainable trans-
portation, but its success largely depends on a sufficient number of passengers who are willing
to share their rides. To uncover the determinants of willingness to share (WTS), prior studies
typically relied on either individual-level survey-based or aggregate-level data-driven methods.
To combine the former’s strength in capturing individual choice preferences and the latter’s
advantage in utilising available multi-source big data, this study proposes a big data approach
to modelling individual choices between the solo and shared options for each trip. To recon-
struct the choice process, we leverage large-scale real-world trip records and propose a learning
framework to not only retrieve the trip time and fare of the chosen option (solo or shared),
but also impute the likely time and fare of the alternative option. These reconstructed trip
attributes are then integrated with the sociodemographic, built environment and traffic fea-
tures from other data sources. Finally, all these variables are fed into a random coefficient logit
model to reveal passengers’ ridesplitting preferences. Through a case study of Manhattan, New
York City, we reveal the spatiotemporal pattern of WTS and its determinants. Results show
that WTS varies greatly across space and time. The time-fare trade-off is identified as the
most essential factor, with the value of time revealed to be about $28-36/h. WTS decreases
with longer trip distance/commuting time/distance to the urban centre, lower road speed, and
higher speed fluctuation/bus station/crime density, but increases with a higher proportion of
middle-class/female/young residents, residential land use and metro station. The proposed
methodology can be used to explain and monitor WTS in a cost-effective way, complementing
traditional survey-based methods to better design and promote ridesplitting services.
Keywords: Shared mobility, Ridesplitting, Willingness to share, Discrete choice modelling,
Spatiotemporal analysis
1. Introduction1
Ridesplitting is one of the popular mobility services provided by transportation network2
companies (TNCs), such as Uber pool, Lyft line/shared, and DiDi Express carpool. Through an3
app-based platform, the ridesplitting service matches passengers who travel in similar directions4
in real-time and serves their combined routes with a single for-hire vehicle. Due to the shared5
Corresponding author (zhanzhao@hku.hk)
Preprint submitted to Elsevier December 1, 2023
usage of vehicles, passengers will enjoy a lower fare but experience certain travel delay, and the6
driver may receive more income (Shaheen et al., 2015, 2016; Li et al., 2019). Meanwhile, given7
the increased vehicle utilisation, the vehicle fleet size, vehicle kilometres travelled, and related8
traffic congestion and emissions can be potentially reduced (He and Chen, 2021; Shaheen, 2020).9
For instance, Santi et al. (2014) indicated a 40% reduction in the required fleet size through10
shared rides, and Li et al. (2021a) found that ridesplitting can reduce the emission by 30%11
compared to the solo ride service. Therefore, ridesplitting is deemed as one of the promising12
pathways to sustainable transportation (Sperling, 2018).13
The operations and the expected outcomes of the ridesplitting service highly rely on the14
successful matching between passengers. Since higher ridesplitting demand will increase the15
matching probability and reduce the undesired detour (Ke et al., 2021), a comprehensive un-16
derstanding of the determinants of individuals’ willingness to share (WTS) is of great aca-17
demic and practical value. To identify the factors affecting WTS, two types of methods are18
commonly adopted, i.e. individual-level survey-based analysis and aggregate-level data-driven19
analysis. Individual-level methods either directly acquire personal attitudes and preferences to-20
wards ridesplitting through questionnaires and estimate the latent relationships through models21
such as the structural equation model (Li et al., 2021b; Wang et al., 2019), or collect choice22
data through stated-preference (SP) surveys with predefined scenarios and explain the stated23
choices with discrete choice models (Huang et al., 2019; Alonso-Gonz´alez et al., 2021; Lavieri24
and Bhat, 2019). This allows us to identify and quantify essential determinants of WTS, par-25
ticularly trip attributes such as the fare, detour, and presence of strangers. Aggregate-level26
methods are mostly based on available big data provided by TNCs, such as trip order data or27
trajectory data. In this case, WTS is usually measured as the ratio between the number of28
shared trips and total trips aggregated at the community or the census tract level (Xu et al.,29
2021; Huang et al., 2021). By mining spatial and temporal variations, the aggregated analysis30
can reveal the contributions of local sociodemographic profiles (income, ethnicity, age, gen-31
der, etc.) and built environment characteristics (access to public transit, urban/rural, etc.).32
Despite the prior efforts, our understanding of WTS is yet to be comprehensive due to some33
limitations. On the one hand, individual-level methods allow us to unravel the most essential34
trade-off between trip time and fare as well as other detailed personal preferences. However,35
they require exquisite sampling strategies, well-designed questionnaires, and detailed survey36
processes, which are costly and difficult to scale. Furthermore, the hypothetical nature of the37
questionnaire and SP experiments may lead to biased results (Whitehead et al., 2008). On the38
other hand, aggregate-level methods can utilise large-scale real-world data to uncover revealed39
preferences, examine their spatiotemporal variability, and incorporate diverse factors across40
different data sources. However, the data aggregation can mask the variation in specific trip41
attributes (trip length, travel time, fare, etc.) that are likely to affect trip-specific WTS.42
To fill this gap, this study aims to combine the strengths of two methods and proposes a43
data-driven approach to capturing both the important time-fare trade-off at the individual level44
and the related sociodemographic and built environment factors at the aggregate level to com-45
prehensively uncover the determinants of WTS from a spatiotemporal perspective. Specifically,46
we assume that, for each trip, the passenger makes a choice between requesting a solo vs shared47
ride based on the trip time and fare provided by TNCs, their personal attributes, and general48
spatiotemporal contexts. As illustrated in Fig. 1, when making a trip request, passengers are49
typically presented with the estimated time and fare for both the solo (UberX/Lyft) and shared50
(Pool/Shared) options. Thus, the choice can be well-modelled by the presented time and fare51
of both options, the sociodemographic information of the passenger, and the spatiotemporal52
context of the ride, supposing all are obtained. However, in the available trip data, only the53
2
time and fare of the chosen option are recorded, while the time and fare of the alternative option54
are missing. To address this issue, we leverage the available big data to impute the alternative55
trip time and fare based on other known trip attributes (origin, destination, request timestamp,56
etc.). Furthermore, the individual-level sociodemographic information of the passenger is un-57
known, but it can be reasonably approximated using aggregate-level sociodemographic profiles58
(e.g., age, gender, income, ethnicity) of the trip origin and destination zones. Finally, these59
observed/imputed trip attributes are integrated with sociodemographic and built environment60
features, and a discrete choice model is used to reveal the determinants of WTS. The method-61
ology is demonstrated through a case study using real-world TNC trip data from Manhattan,62
New York City (NYC). This study integrates and advances the prior findings that are either63
relatively focused on individual-level or aggregate-level determinants. Based on the results,64
several implications for better promoting and developing ridesplitting services are proposed.65
The remainder of this paper is organised as follows. In Section 2, we review related literature66
regarding the determinants of WTS and discrete choice models. Section 3 introduces the data67
and methods. The empirical results and findings are presented in Section 4, and the conclusion68
is drawn in Section 5.69
Figure 1: Example trip request interfaces of Uber and Lyft
2. Literature Review70
In this section, we review existing studies on the determinant of WTS regarding their method71
and conclusions.72
2.1. Potential determinants of WTS73
Previous studies on the determinants of WTS can be categorised into two types: individual-74
level and aggregate-level studies, based on the unit of analysis. Both provide rich insights into75
the potential determinants from different perspectives.76
Individual-level studies aim at uncovering the WTS determinants related to individual cog-77
nitive attitudes or choice behaviour. Attitudinal studies show that the perceived values (util-78
itarian, hedonic, and social) and risks (safety, conflict, performance, and privacy) associated79
3
with the ridesplitting service are the main influencing factors affecting the adoption behaviour80
(Li et al., 2021b; Wang et al., 2019). Among these factors, utility-based factors, such as fare81
savings and efficiency improvement, are identified as essential determinants of the choice be-82
tween shared or solo trips, though other non-utilitarian attitudes may also be influential. Choice83
behaviour studies typically adopt utility-based discrete choice models to investigate the specific84
effects of different WTS determinants. Many related studies agree that the most important85
factor is the trade-off between trip time and fare, with ridesplitting usually associated with86
lower fare but longer travel time (Cahyo et al., 2019; Huang et al., 2019; Lavieri and Bhat,87
2019; Alonso-Gonz´alez et al., 2021). Additionally, the safety concern with the presence of a88
stranger is also considered as a ridesplitting-specific constant in the utility function. In addition89
to trip attributes, individual characteristics such as socio-economic characteristics (e.g. age,90
gender, ethnicity, income, and car ownership) and other contextual factors (e.g. public transit91
convenience, safety, and trip purpose) are also found to influence the choice behaviour (Soltani92
et al., 2021; Lavieri and Bhat, 2019).93
Aggregate-level studies focus on the WTS determinants that are aggregatable to certain94
spatiotemporal units. For example, both Xu et al. (2021) and Abkarian et al. (2022) analysed95
millions of trip records in Chicago at the community level, and found a plethora of sociode-96
mographic variables, such as ethnicity, income, education, and neighbourhood density, to be97
influential on WTS. Using DiDi Chuxing data in Chengdu, China, Tu et al. (2021) investigated98
the influence of built environment factors on WTS. Distance to the city centre, land use diver-99
sity and road density were found to be the key influencing factors. Based on the same data,100
Huang et al. (2021) found significant spatiotemporal heterogeneity of WTS and further iden-101
tified the influence of the distance to railway hubs, travel demand, and accessibility to public102
transit on WTS. The data, method and conclusions of these related studies are summarised in103
Table 1.104
Both individual-level and aggregate-level studies have their strengths and weaknesses. On105
the one hand, individual-level studies identify the most important influence of the trade-off106
between time and fare on WTS. However, these survey-based studies are typically limited by107
the sample size and representativeness of the respondents. Also, they can only uncover people’s108
stated preferences in hypothetical scenarios, which could differ from their revealed preferences109
that are more reflective of their actual behaviour. On the other hand, aggregate-level studies are110
mainly based on big trip data, which can overcome the sample size and hypothetical scenario111
limitations of individual-level studies. However, the aggregation of trips masks the actual112
heterogeneity across trips (e.g. fare and time), leading to underestimated influence of the113
time-fare trade-off.114
This study assumes the choice of ridesplitting to be determined by both the individual-115
level and aggregate-level factors. Based on large-scale real-world TNC trip data, we can not116
only retrieve the trip attributes of the chosen option (solo vs shared) but also impute the trip117
time and fare of the alternative. These attributes are then combined with sociodemographic118
and built environment features from multi-source data to restore the choice scenarios. Finally,119
by adopting a discrete choice modelling framework, we can investigate the effects of various120
determinants on WTS to provide a comprehensive understanding.121
2.2. Discrete choice analysis of ridesplitting service122
Discrete choice modelling is a potent tool commonly used in travel behaviour-related liter-123
ature. Under the random utility maximisation theory, discrete choice models assume that the124
passenger chooses the alternative that maximises the utility (Train, 2003). The utility function125
is determined by a series of factors. Regarding the ridesplitting service, the choice between the126
4
Table 1: Summary of related literature
Study Study level Data source Method Determinants to WTS/Findings
Li et al.
(2021b)
Individual Questionnaire (848
respondents)
Structural equation model
Incentives and management issue, perceived benefit, perceived usefulness,
education increase WTS;
Public transport, advanced age, being female decrease WTS.
Wang et al.
(2019)
Individual Questionnaire (378
respondents)
Structural equation model Perceived utilitarian, hedonic, social values increase WTS;
Perceived privacy, performance, security, conflict risks decrease WTS.
Cahyo et al.
(2019)
Individual Stated-preference
survey (120 respon-
dents)
Binary logit
Male passengers are more sensitive to time;
More fare saving and less travel delay increase WTS;
Under the current situation, females are more likely to choose ridesplitting.
Huang et al.
(2019)
Individual Stated-preference
survey (102 respon-
dents)
Binary logit
Travel time, cost, safety, reliability, comfort, social contact, and identity
promotion are the main considerations;
VOT of ridesplitting is about CNY ¥62.6/h.
Soltani et al.
(2021)
Individual Stated-preference
survey (422 respon-
dents)
Multinomial logit
Population density, housing values, education, income, casual job, younger
age and access to smartphones increase WTS;
Safety concerns, advanced age, digital illiteracy and suburban living decrease WTS;
Lavieri and
Bhat (2019)
Individual Stated-preference
survey (1607 respon-
dents)
Generalised heteroge-
neous data Model
Passengers are less sensitive to the presence of strangers during commute trips,
compared to leisure trips;
Travel delay may be a greater factor for WTS than the presence of a stranger;
VOTs of ridesplitting are $28.77/h and $23.27/h for work and leisure purposes.
Alonso-
Gonz´alez
et al. (2021)
Individual Stated-preference
survey (1077 respon-
dents)
Mixed logit;
Random coefficient logit;
Latent class logit;
There are non-trader for ridesplitting;
People may vary in their WTS
WTS depends primarily on the time–fare trade-offs;
VOTs of ridesplitting range from 7.78 to 26.25/h.
Xu et al.
(2021)
Aggregate Chicago trip data (2
million)
Random forest
Relationships between aggregate-level variables and WTS are nonlinear;
The most important determinants of WTS include ethnicity, household
income, education level, trip distance, and neighbourhood density.
Abkarian
et al. (2022)
Aggregate Chicago trip data
(110 million)
Random forest;
Extra trees regression;
XGBoost;
Travel time and cost reliability are more important than travel time
and cost savings for WTS;
White people and those with bachelor degree have lower WTS.
Tu et al.
(2021)
Aggregate DiDichuxing trip
data
Gradient boosting deci-
sion tree
Distance to city centre, land use diversity and road density are the key
influencing factors to WTS.
Huang et al.
(2021)
Aggregate DiDichuxing trip
data (6 million)
Spatial lag regression
model
WTS is spatiotemporally heterogeneous;
WTS exhibits spatial dependency;
The distance to urban centre and railway stations, travel demand, public
transit accessibility, land use mixture, and trip purpose are the most relevant
factors for WTS.
5
shared and solo options can be viewed as a binary decision process. The utility function for this127
decision was found to be influenced by a series of trip-level factors, such as trip time and fare128
of both options and safety concerns regarding sharing with strangers, as well as individual-level129
factors, such as age, gender, ethnicity, income, and public transit convenience of the passenger.130
In terms of modelling methods, ridesplitting-related studies typically used fixed coefficient logit131
models to consider the trip-level related factors, individual-level related factors (Soltani et al.,132
2021), or both (Huang et al., 2019; Alonso-Gonz´alez et al., 2021). Nevertheless, logit mod-133
els with fixed coefficients have several limitations, as they cannot account for the unobserved134
preference heterogeneity among individuals, which may lead to biased estimation (Train and135
Weeks, 2005). To address this issue, Alonso-Gonz´alez et al. (2021) used a random coefficient136
logit model and latent class choice model to control the heterogeneous preferences. Their re-137
sults indicated that the random coefficient logit model outperformed the fixed coefficient logit138
model in modelling WTS. In this study, we will test both the fixed and random coefficient logit139
models, but will focus on the latter.140
With the estimated model, the value of time (VOT) of passengers can be derived using the141
coefficients of time and fare factors (Train and Weeks, 2005; Frei et al., 2017). Prior studies142
found that the VOT of ridesplitting was about CNY ¥62.6/h (around $9.11) in Huaian, China143
(Huang et al., 2019), 7.78 to 26.25/h (around $8.4 to $28.35) in Netherlands (Alonso-144
Gonz´alez et al., 2021), and $28.77/h and $23.27/h for commuting and leisure trips in Dallas145
(Lavieri and Bhat, 2019). Apart from the variation of VOT in trip purpose, Small (2012)146
summarised that VOT was temporally dynamic, which was higher at the peak time while lower147
at the off-peak time. While these previous findings were mainly derived from survey data, this148
study will adopt a big data approach that leverages available large-scale TNC trip records for149
VOT estimation, and the results will be compared.150
3. Data and Methods151
In this section, we first introduce the data sources and how they are processed. Then, we152
present the methods for trip attribute imputation and discrete choice modelling.153
3.1. Data154
The primary data source of this study is the TNC trip data of New York City (NYC)155
provided by NYC Taxi and Limousine Commission (TLC)1, which contains the trip records156
from several TNCs. The data structure is presented in Table 2. To protect user privacy, the157
specific ODs of all trips are aggregated to the taxi zone level. The shared request flag and158
shared match flag indicate whether this trip requested to be shared and whether it is actually159
shared, respectively. The actual trip fare, distance and time are recorded for the chosen option.160
This means that, for a shared ride, the reported trip fare (or time) is the discount fare (or delay161
time) and is typically lower (or longer) than what it would have taken for the solo option. The162
congestion surcharge is a kind of fee charged by the NYC government for trips entering the163
south of 96th Street in Manhattan for congestion alleviation (Parrott and Reich, 2018). It is164
$2.75 for the solo ride and $0.75 for the shared ride to encourage more adoption of ridesplitting.165
Given the large data volume and the potential pricing or operational differences between TNCs,166
we focus our study only on Lyft’s trip data in Manhattan from Feb 1st 2019 to Oct 31st 2019.167
As shown in Fig. 2, there are 63 taxi zones within Manhattan. The other data sources used168
in this study include NYC’s sociodemographic and built environment data, which are used to169
1https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
6
extract more comprehensive explanatory features. The details of these auxiliary data will be170
introduced later in Table 3.171
Table 2: Data structure of trip data
Field Type Example Description
Request datetime Date 2019-02-01 00:01:26 Time when user submit request
Pickup datetime Date 2019-02-01 00:05:18 Time when user being picked
DropOff datetime Date 2019-02-01 00:14:57 Time when user being dropped
PULocationID Int 247 ID of pick-up zone
DOLocationID Int 251 ID of drop-off zone
Shared request flag Boolean Y Whether this is a ridesplitting request
Shared match flag Boolean N Whether this request successfully matched
Trip miles Float 2.45 Trip distance (mile)
Trip time Int 579 Trip duration (second)
Base passenger fare Float 9.35 Passenger payment for mobility service($)
Congestion surcharge Float 2.75 Congestion surcharge ($)
Figure 2: Study area and taxi units
3.2. Methods172
The overall study framework is summarised in Fig. 3, which consists of 4 stages. In the first173
stage, the trip data is cleaned, and key features are extracted. Then, based on the observed174
trip time and fare extracted from the chosen option for each trip, we impute and validate the175
potential time and fare of the observed alternative in Stages II and III, respectively. Lastly,176
in Stage IV, we combine the trip attributes with other contextual features to explain the177
ridesplitting choice behaviour through a discrete choice model. In Fig. 3, labels in black colour178
denote the existing or extracted variables from TNC data, and those in red indicate imputed179
variables. Blue arrows represent the operation of the different methods. In the remainder of180
this section, each part will be introduced in detail.181
3.2.1. Data processing and feature extraction182
Data cleaning is conducted based on several definitions and criteria prior to the feature183
extraction:184
Definition1 (Wait time): The wait time is the difference between the request time and185
pick-up time.186
Definition2 (Reference fare ratio): The reference fare ratio is the ratio between the fare187
actually paid by the passenger and the reference fare, where the latter is the fare calculated188
by the official charging rate of distance and time provided by Lyft2. Although the actual189
2https://www.lyft.com/pricing/BKN
7
Figure 3: Methodological framework
fare paid may not necessarily be the same as the reference fare due to dynamic pricing190
strategies (Castillo et al., 2017), it is unlikely to be too far away. Therefore, the reference191
fare ratio should be in a rational range. With the reference fare ratio, trips with an192
exceptional fare, or those requested through particular services (e.g. luxury vehicles), can193
be identified and filtered as much as possible, so that these unusual cases would not lead194
to biased results. In 2019, Lyft’s charging rate for the regular solo option was $1.46/mile195
and $0.66/minute with no base fare. Thus, the reference fare ratio can be calculated as196
8
Reference fare ratio =Base passenger f are
1.46 ×trip miles + 0.66 ×trip time/60 .(1)
Based on the available data and the above definitions, trip records with the following char-197
acteristics are regarded as exceptional cases and removed: (1) trip time less than 1 minute or198
greater than 2 hours; (2) trip speed less than 2mph or greater than 40mph (Ke et al., 2021);199
(3) drop-off time earlier than pickup time or pick-up time is earlier than request time; (4) trips200
with outlier wait time; and (5) trips with outlier reference fare ratio. For conditions (4) and201
(5), the outlier is identified based on the 1.5 interquartile range rule. In total, exceptional202
cases account for 6.6% of the data, and the remaining 19.8 million trips will be used for further203
analysis.204
After the data cleaning, based on the available data, we categorise several different types of205
trips:206
Definition3 (Solo trip): A solo trip is a trip that is requested to be a solo ride, i.e. shared207
request flag equals “N”.208
Definition4 (Unmatched trip): An unmatched trip is a trip that is requested to be a209
shared ride but failed to be matched with another passenger, i.e. shared request flag210
equals “Y” but shared match flag equals “N”.211
Definition5 (Matched trip): A matched trip is a trip that is requested to be a shared ride212
and also successfully matched with another passenger, i.e. shared request flag equals “Y”213
and shared match flag equals “Y”.214
Definitions 3-5 enumerate all possible trip choices and outcomes, where unmatched trips215
and matched trips can also be collectively called shared trips. Specifically, the unmatched216
and matched trips are differentiated here, because they can be leveraged to impute the trip217
time and cost for the alternative option. Compared to a solo trip, the alternative shared trip218
is expected to have a lower (discounted) trip fare regardless of the matching result because of219
the guaranteed fare policy3, but its potential travel delay depends on whether it is actually220
matched or not. Therefore, while a shared trip is likely to have a longer (delayed) travel time,221
we can reasonably assume an unmatched trip to have a similar travel time as the solo option.222
In summary, the unmatched trip is likely to have a similar fare/travel time as matched/solo223
trip and a different fare/travel time with solo/matched trip. Such similarities and differences224
between solo, matched and unmatched trips make it possible to disentangle the variability of225
trip time and cost, and ultimately impute their value for the alternative option. An overview226
of the statistical information of trips after data cleaning is provided in Section 4.1.227
To conduct the trip time and fare imputation, several variables are first identified and228
extracted as follows:229
Definition6 (O demand): For each trip, its O demand measures the number of departure230
trips at the zone of its origin during the hour of its request time.231
Definition7 (O supply): For each trip, its O supply measures the number of arrival trips232
at the zone of its origin during the hour of its request time.233
Definition8 (D demand): For each trip, its D demand measures the number of departure234
trips at the zone of its destination during the hour of its drop-off time.235
Definition9 (D supply): For each trip, its D supply measures the number of arrival trips236
at the zone of its destination during the hour of its drop-off time.237
Definition10 (W demand): For each trip, its W demand measures the number of depar-238
ture trips of the whole Manhattan during the hour of its request time.239
3https://help.lyft.com/hc/en/all/articles/115013080888-Shared-ride-pricing
9
Definition11 (W supply): For each trip, its W supply measures the number of arrival240
trips of the whole Manhattan during the hour of its drop-off time.241
Metrics 6-11 are the measurements of demand and approximate supply. For each trip, the242
demand-supply balance at its origin and destination can be generally described by comparing243
the departure and arrival trips at the origin, destination, and whole of Manhattan. These244
metrics will be utilised for fare imputation because the fare is influenced by dynamic pricing245
schemes that react to the instantaneous imbalances between real-time demand and supply246
(Chen et al., 2015; Wang and Yang, 2019). For example, a trip starting at a place with low247
supply and high demand, or going to a place with high supply and low demand, might be248
charged more due to the potential cost of dispatching and repositioning, and vice-versa.249
Definition12 (R demand): For each trip, its R demand measures the number of shared250
trips that have the same OD zone with it during the same 15 minutes of its request time.251
Definition13 (M demand): For each trip, its M demand measures the number of matched252
trips that have the same OD zone with it during the same 15 minutes of its request time.253
Definition14 (Match probability): For each trip, its match probability is calculated by254
the ratio between M demand and R demand, which reflects the probability for this trip255
to be successfully matched at its request time.256
Metrics 12-14 measure the ridesplitting-related variables. A sufficient shared demand is the257
prerequisite of a high match probability, while match probability is one of the most essential258
variables for TNC to provide an estimated time (Wang et al., 2021). These variables will be259
utilised for time imputation. Unlike the measurement of demand-supply balance, metrics 12-14260
are defined based on the same OD pair and a shorter time interval because those identified261
trips are the candidates that can be most likely matched with the target trip.262
3.2.2. Trip time imputation263
When passengers make a choice about ridesplitting, they need to evaluate the time difference264
between the solo vs shared options, including both the in-vehicle travel time (IVTT) and wait265
time (WT). While IVTT can be affected by the potential detour for ridesplitting, the difference266
in WT is likely a result of different operational strategies by TNCs. Therefore, the time267
imputation for the alternative option needs to consider IVTT and WT separately.268
IVTT is usually the main part of the total travel time. For a matched ridesplitting trip,269
its IVTT is expected to be longer than that for unmatched or solo trips, and we assume the270
difference is a function of the trip distance detour for picking up or dropping off additional271
passengers. To estimate the possible detour of each trip, an empirical law of the ridesplitting272
detour in Manhattan proposed by Ke et al. (2021) is used in our study. Following this law, the273
ratio between the detour distance and direct distance is calculated as274
Detour ratio =1
αR×N+βR
.(2)
where Nis the number of shared demand, i.e. Rdemand in our study, and αRand βRare the275
empirical parameters subject to a set of given searching radius R. To obtain the detour ratio, a276
two-step logic is adopted. Firstly, to determine the most probable searching radius Rfor each277
trip, we use another equation from this study:278
Match probability = 1 ξR×exp(γR×N).(3)
where ξRand γRare also the empirical parameters related to R. In Eq. (3), since Nand match279
probability are already calculated by Definitions 12 and 14 in our study, thus we can infer R280
10
based on maximum likelihood estimation. Then, plugging the estimated Rback to Eq. (2)281
allows us to determine αRand βRand further calculate the likely detour ratio for each trip.282
As depicted in Fig. 3, the recorded trip distance and IVTT for a solo (or unmatched)283
trip are without detours, and we can impute the unobserved trip distance and IVTT for the284
corresponding matched ridesplitting trip based on the estimated detour ratio, assuming the285
detour would result in the same percentage of increase in distance and time. Conversely,286
for a matched trip, the distance and IVTT of the alternative solo option can be imputed by287
discounting the effect of the estimated detour.288
Because of differences in the demand and vehicle assignment strategies, WT for the solo and289
shared trips can be different (Chen et al., 2017). Specifically, the shared WT is usually longer290
than the solo trip WT (Li et al., 2019). For WT imputation, we compare the average WT of291
solo trips and shared trips to calculate an expansion factor between the two trips. Then, for a292
solo/shared trip, the WT of their alternative option is simply imputed as their observed wait293
time multiplied/divided by the expansion factor respectively.294
Trip time imputation results will be validated before executing the next stage, which will295
be introduced in Section 3.2.4.296
3.2.3. Trip fare imputation297
For traditional taxi services, the trip fare is typically specified using a linear function of the298
served distance and time, plus a base charge, which can be expressed as299
F are =rd×trip mile +rt×trip time +base. (4)
where rdis a fixed or segmented distance fare rate, rtthe time fare rate, and base the base300
fare. However, for TNC services, due to the implementation of dynamic pricing strategies,301
the trip fare is determined by a plethora of factors and thus has more variation (Chen et al.,302
2015; Wang and Yang, 2019). In this study, we assume that the TNC trip fare still follows the303
manner of Eq. (4), but the rd,rt, and base of each trip are dynamic and affected by various304
factors including trip distance, time, speed, wait time, and the demand-supply balance status305
between trip OD.306
To capture the potentially nonlinear and interactive effects across these variables, we develop307
a fully connected deep neural network (DNN) model to recover the pricing scheme of TNC trips308
for fare prediction, as shown in Fig. 4. Two model variants are separately trained on solo and309
matched trips to map the solo (without detour)/matched (with detour) distance and time to310
the solo/shared fare. For the solo trip pricing model, 80% of the solo trips are used for training311
and the remaining 20% for testing. Input variables include O demand,O supply,D demand,312
D supply,W demand,W supply,wait time,speed, percentile of reference f are ratio,313
trip distance, and I V T T . As depicted in Fig. 3, the trained model can then be used to predict314
the alternative solo trip fare for each observed shared trip, which would first require imputing315
the solo trip time of the observed shared trip using methods introduced in Section 3.2.2. Simi-316
larly, the shared trip pricing model can be trained and tested using observed shared trips, and317
used to impute unobserved shared trip fare for each observed solo trip. Unmatched trips, given318
their particular characteristics, are reserved for model testing. Specifically, for each unmatched319
trip, we can estimate the fare based on imputed detour distance and time using the shared320
pricing model, and the result can be compared to its ground truth shared fare for validation.321
The specific validation process will be introduced in Section 3.2.4. Regarding the DNN model322
architecture, we use 3 hidden layers, the Log-Cosh loss function, and the RMSprop optimiser323
with a learning rate of 0.01 and a batch size of 512 for training. The number of training epochs324
is set as 100, and we use early stopping to prevent overfitting.325
11
It is worth noting that the proposed DNN model cannot be used to predict the fare before326
the trip starts, and should only be used to impute the trip fare afterwards based on partially327
observed information. Essentially, the trained model serves as the likely pricing scheme learned328
from the data. Therefore, some posterior variables of each trip, e.g. speed,wait time, and329
percentile of ref erence fare ratio, can be used to improve the fitting performance. The330
validation of these methods will be discussed in Section 3.2.4.331
Figure 4: Illustration of deep neural networks for trip fare prediction
3.2.4. Validation of imputation results332
Validating the imputed trip time and fare can be challenging without detailed information333
about the trip request interface seen by passengers (Fig. 1 for an example), which is typically334
only available for internal use by TNCs due to privacy concerns. In this section, we discuss335
ways to validate the imputation results indirectly.336
First, given the different characteristics of solo and shared trips, we can examine whether337
the probability distributions of imputed trip distance, time, and fare match the corresponding338
empirical distributions observed in the data. For example, if the imputed solo trip fare of shared339
trips (the probable fare if the passenger chose the solo option) and the observed solo trip fare340
have a similar probability distribution, it indicates the imputation results are valid. Specifi-341
cally, two metrics, Pearson’s rand Jensen–Shannon divergence (JSD), are used to measure the342
similarity between the imputed and observed distributions. Pearson’s r(or Pearson correlation343
coefficient) is the most widely used metric to measure the (linear) correlation between two344
continuous variables. JSD is a distribution similarity measurement that is widely applied in345
physics, bioinformatics, and machine learning (Chen and Liu, 2021). JSD ranges from 0 to 1,346
with a smaller JSD indicating a higher similarity. Pearson’s rand JSD can be calculated as347
follows:348
r=PN
i=1(Pe
iPe)(Ph
iPh)
qPN
i=1(Pe
iPe)2qPN
i=1((Ph
iPh))2
.(5)
12
JSD = 0.5×(
N
X
i=1
Pe
i×log(Pe
i
0.5×(Pe
i+Ph
i)) +
N
X
i=1
Ph
i×log(Ph
i
0.5×(Pe
i+Ph
i))).(6)
where Pe
iand Ph
iare the probability distributions for the imputed and observed values within349
the i-th smallest interval. In this study, the interval for WT, IVTT, distance, and fare are 0.1350
minute, 0.5 minute, 0.1 mile, and $0.5, respectively.351
Second, since the trip fare imputation models are trained to predict the observed fare, they352
should at least achieve excellent performance on the test data from the same option. Three353
widely used metrics, coefficient of determination (R2), root mean square error (RMSE) and354
mean absolute percentage error (M AP E) are employed for evaluation:355
R2= 1 PN
i=1(f areg
ifaree
i)2
PN
i=1(f areg
ifareg)2.(7)
RMSE =sPN
i=1(f aree
ifareg
i)2
N.(8)
M AP E =1
N
N
X
i=1
|faree
ifareg
i
fareg
i
|.(9)
where faree
iis the imputed fare for trip iin the test set, while fareg
iis its ground truth fare.356
Third, as the aforementioned differences between matched and unmatched trips, both enjoy357
a fare discount for ridesplitting, but only the former have detours. Therefore, for unmatched358
trips, their trip distance and time are similar to those of solo trips, but their fare should be359
similar to matched trips, which can be regarded as a solo trip with a discounted fare. To360
leverage such specialty for validation, we need first to infer the detoured distance and time if361
the unmatched trip can be successfully matched, and then predict the discounted fare using362
the trained shared trip pricing model. Supposing the fare imputation is valid, the predicted363
shared fare should be close to its observed ground truth “shared” fare. Fig. 5 illustrates the364
corresponding process used for validation. Specifically, R2,RMSE, and M AP E are applied365
to compare the estimated discount fare and ground truth discount fare for unmatched trips.366
Figure 5: Illustration of validation by unmatched trips
3.2.5. Discrete choice modelling367
With the imputed results, each trip in the data has two sets of time and fare, one for the368
solo option and the other for the shared option. This is similar to the trip request interface369
(Fig. 1) shown to passengers when they have to make a decision between the solo vs shared370
options. Meanwhile, other contextual factors, including those related to sociodemographics,371
13
built environment and traffic conditions, can also influence passengers’ WTS. These factors372
are associated with each trip based on their OD zone IDs and request timestamps. A list373
of explanatory variables to be tested in this study are summarised in Table 3. It is worth374
noticing that the distance variable considered in the model is the intrinsic distance of a trip375
without ridesplitting. Specifically, for a solo or an unmatched trip, the distance can be directly376
observed from the data (no detour). For a matched trip, we need to impute the distance for377
its alternative solo option. For those contextual variables only at the zone level, we consider378
their effects both at the trip origin and destination. Fig. 6 also presents how these variables379
are combined using a solo trip as an example.380
Figure 6: Example choice scenario with different features
Naturally, the choice between solo vs shared rides can be modelled through discrete choice381
analysis on factors such as the trip fare, trip time and other contextual variables. Under a382
random utility maximization framework, the systematic parts of the utility functions for the383
two options are given as384
Usolo =βsolo
f×F aresolo +βw×W Tsolo +βsolo
iv ×I V T Tsolo +βo×Other
Ushared =βshared
f×F areshared +βw×W Tshared +βshared
iv ×I V T Tshared +βo×Other +ASCshared
(10)
In this study, for the trip fare, we account for the possibility that passengers choosing the385
solo vs shared options may have different socioeconomic backgrounds and have different cost386
sensitivity. Therefore, the disutility of solo and shared fares are modelled separately (i.e. βsolo
f
387
vs βshared
f). Additionally, as passengers may perceive WT and IVTT differently (Gerˇziniˇc et al.,388
2023; Mohring et al., 1987), we also distinguish WT and IVTT in the utility functions (i.e. βw
389
vs βiv). Since sharing a ride with a stranger may be perceived as uncomfortable, we further390
separately model the IVTT preferences for solo and shared trips (i.e. βsolo
iv vs βshared
iv ).391
In Eq. (10), F aresolo ,W Tsolo, and I V T Tsolo vs F areshared ,W Tshared, and I V T Tshared repre-392
sent the time-fare trade-off for the two options. Note that, due to the aforementioned congestion393
surcharge policy in Manhattan below 96th Street, F aresolo would be the trip fare plus a con-394
gestion surcharge of $2.75, while F areshared would only include a $0.75 surcharge. If there is395
no congestion surcharge for the chosen option, then the congestion surcharge of its alterna-396
tive option is also 0. ASCshared means the ridesplitting specific constant. Other denotes the397
sociodemographic, built environment and traffic-related contextual variables.398
In the discrete choice framework, fixed and random coefficient logit models are both com-399
monly used methods for solving the model coefficients. In our study, the fixed coefficient logit400
14
Table 3: Potential explanatory variables for ridesplitting choice
Category Variable Aggregation
level
Description Data source
Time-fare
trade-offs
Solo fare Trip Trip fare for the solo option TNC trip data
+ imputation
Shared fare Trip Trip fare for the shared option TNC trip data
+ imputation
Total solo time Trip Travel time for the solo option, in-
cluding IVTT and WT
TNC trip data
+ imputation
Total shared time Trip Travel time for the shared option,
including IVTT and WT
TNC trip data
+ imputation
Traffic Distance Trip Solo distance for the trip TNC trip data
Demand Zone + hour Number of pick-up trips TNC trip data
Speed Zone-to-zone
+ hour
Average travel speed between a pair
of zones during an hour
TNC trip data
Unreliability Zone-to-zone
+ hour
Standard deviation of travel speed
between a pair of zones of an hour
TNC trip data
Match probability Zone-to-zone
+ hour
Probability of a shared trip being
matched
TNC trip data
Commuting time Zone Average commuting time of a zone NYC census
NoVehicle rate Zone Percentage of households with no
car ownership
NYC census
Sociodemo-
graphic
Median income Zone Median household income NYC census
Female rate Zone Percentage of female population NYC census
Young rate Zone Percentage of population below 35 NYC census
White rate Zone Percentage of white population NYC census
Ethnicity mixture Zone Shannon entropy of ethnicity com-
position
NYC census
Crime density Zone Density of felony, misdemeanour,
and violation crime
NYC Police De-
partment
Built envi-
ronment
Residential area rate Zone Percentage of residential land use NYC Open data
Land use mixture Zone Shannon entropy of land use NYC Open data
Bus Zone Number of bus stops in the zone NYC Open data
Metro Zone Number of metro stations in the
zone
NYC Open data
Distance to center Zone Distance to the urban center NYC Open data
model assumes all passengers have a homogeneous preference for trip fare and time, while the401
random coefficient logit model allows for heterogeneous preferences among individuals. Specif-402
ically, in the random coefficient logit model, fare and time-related coefficients in the utility403
functions (βsolo
f,βshared
f,βw,βsolo
iv , and βshared
iv ) are assumed to follow a normal distribution404
(Alonso-Gonz´alez et al., 2021). The probability of each individual of choosing shared option is405
calculated using406
Pshared =E(Pβ
shared) = Zβsolo
f
... Zβshared
iv
Pβ
sharedf(β|θ)solo
f...dβshared
iv (11)
where Pβ
shared is the probability of choosing shared option under the random coefficients β,407
θrepresent the set of parameters of the distribution of β, and f(β|θ) means the probability408
density function of this distribution. With the estimated coefficient of fare and time, the VOT409
15
can be calculated (Frei et al., 2017):410
V OT =E[βtime
βfare
]E[βtime]
E[βfare]+V ar[βf are]×E[βtime]
E3[βfare](12)
Two more procedures are conducted prior to modelling. Firstly, when it comes to a decision411
between solo and shared options, passengers may not always make a deliberate choice due to412
unfamiliarity with share service, reluctance to share, or fixed habits (Gargiulo et al., 2015;413
Moody et al., 2019), and these passengers are defined as non-trader (Alonso-Gonz´alez et al.,414
2021). With a cost-effectiveness ratio between fare-saving and time-delay of the two options415
for a trip ( F aresoloF ar eshared
T imesharedT imesolo ). Intuitively, a higher ratio indicates choosing the shared option416
will be more cost-effective. In Alonso-Gonz´alez et al. (2021), trips with such a ratio more than417
e15/h of the expected average VOT (e15/h) of passengers but still choose the solo option or418
trips with a ratio less than e10/h of the VOT but still select the shared option are identified as419
non-traders. In our study, as indicated by the prior studies on the VOT in Manhattan (He et al.,420
2020; Goldszmidt et al., 2020), we expect the reasonable range of Manhattan VOT to be about421
$30/h ±$15/h. Thus, solo trips with the ratio between fare-saving and time-delay greater than422
$45/h or shared trips with this ratio less than $15/h are identified as the non-trade trips and423
will not be considered in the choice model. Secondly, since the volume of the solo and shared424
trips is unbalanced and the shared trip is the minority in the data, we set a weight factor to the425
shared trips as the ratio between the volume of the solo and shared trips to avoid the model426
focusing on solo trips. Common binary classification metrics, including McFadden’s pseudo R2,427
precise, recall, and F1 score, are used to evaluate the model performance. McFadden’s pseudo428
R2is calculated as Eq. (13), where LLmodel is the log-likelihood of the proposed model, and429
LL0is the log-likelihood of the model with only a constant.430
P seudo R2= 1 LLmodel
LL0
(13)
4. Results431
In this section, we first present an overview of the data and spatiotemporal patterns of WTS432
at the aggregate level. Then, we show the performance and validation of the proposed method433
and findings on the determinants of WTS.434
4.1. Data exploration and WTS spatiotemporal patterns435
After data cleaning, 28.89% of Lyft trips within Manhattan are found to choose the shared436
option, with 58.35% of them successfully matched. Table 4 shows the summary statistics of437
these trips. The WT for solo trips is mostly less than 5 minutes, while for shared trips, it is438
longer and ranges within 10 minutes. IVTT is much longer than WT, and the ratio between439
the two is about 5.2 times for solo trips and 3.8 for matched trips. The average IVTT for440
matched trips is 25.94 minutes, compared to 21.28 minutes for solo and unmatched trips. As441
expected, the trip distance of matched trips is longer than that of solo trips. The trip distance442
is distributed mostly within 7.5 miles (12 km), which is similar to prior findings (Li et al.,443
2019). For the trip fare, it is intuitive that shared trips have a lower fare than solo trips. The444
average fare for a shared trip is $13.17, almost half of the average solo trip.445
To have a general overview of WTS in Manhattan, its spatiotemporal patterns are calcu-446
lated by dividing the number of shared trips, including the matched and unmatched trips, by447
the number of total trips, and the results are shown in Fig. 7. To contextualise the spatial dis-448
tribution across the 63 zones, several larger neighbourhood areas are marked with descriptive449
16
Table 4: Statistical information of trips
Trip type Q1 Median Mean Q3
WT Solo trips 2.45 3.62 4.08 5.25
Shared trips 4.3 6.25 6.81 8.75
Fare Solo trips 11.57 18.2 22.69 28.48
Shared trips 6.09 10.45 13.17 17.28
IVTT Solo & unmatched trips 11.32 17.98 21.28 27.33
Matched trip 15.47 23.00 25.94 33.32
Distance Solo & unmatched trips 1.69 3.39 5.29 6.89
Matched trips 2.43 4.17 5.65 7.23
names and golden borders for easier reference. For both the origin and destination, the areas450
with the highest WTS are Downtown East and Midtown East. WTS in Uptown West and451
Uptown East are also considerable, followed by Midtown West and Harlem. WTS in Midtown452
Core is relatively low compared to nearby areas, while North Harlem has the lowest WTS. A453
significant clustering effect can also be observed, which is consistent with prior finding (Huang454
et al., 2021). To describe the temporal distribution, WTS is calculated hourly. Based on the455
variation shown in Fig. 7(c), the temporal pattern can be generally separated into 4 different456
periods, i.e. midnight (22 pm-5 am), morning (6 am-10 am), afternoon (11 am-15 pm), and457
evening (16 pm- 21 pm). The WTS at midnight is the highest. It keeps increasing from 22 pm458
and reaches a peak at 3 am. In the morning, especially during the morning rush hours, WTS459
gets its lowest value in a day, with about 24% of the passengers opting for ridesplitting. From460
the afternoon to evening periods, WTS keeps a slow steady increase. The potential explanation461
for such a temporal pattern could be due to two reasons. On the one hand, the trip purpose462
can be more related to the working affairs during the daytime (especially the rush period),463
while for leisure at night. Thus, a lower VOT of those leisure trips lead to higher adoption of464
ridesplitting service (Lavieri and Bhat, 2019; Small, 2012). On the other hand, even though465
the safety concern raised at night is often regarded as a barrier to the ridesplitting service, the466
presence of a stranger has been proven to be less important compared to the time-fare trade-off467
(Lavieri and Bhat, 2019). In addition, for those who do need mobility service in Manhattan at468
night, ridesplitting may still be perceived as a safer option compared to public transit at night469
(Ding et al., 2022; Garc´ıa et al., 2022; LaGrange et al., 1992).470
Figure 7: Spatiotemporal pattern of WTS at aggregate level
17
4.2. Validation of imputed trip time and fare471
4.2.1. Validation based on distributional similarity472
We first validate the imputed trip time and fare values based on their distributional sim-473
ilarity with the observed values. The probability distributions of the observed and imputed474
values are presented in Fig. 8. For example, in the left subfigure of Fig. 8(a), the area in red475
represents the WT distribution of observed solo trips, while the area in blue denotes the WT476
distribution of observed shared trips, and the black curve indicates the probability distribution477
of the imputed shared WT of the observed solo trips (if they were shared). The red arrow478
in the subfigure pointed from the red area to the black curve represents the process of using479
observed solo trip WT to impute shared trip WT. The black curve fits closely to the blue480
area, suggesting that the imputed shared WT values are quite realistic. The specific similarity481
metrics are shown in Table 5.482
For WT imputation, the difference is assumed to exist between solo and shared trips due to483
their different matching and dispatching strategies. Similarly, for fare imputation, a discount484
also exists between solo and shared trips. Therefore, for the validation of these two variables,485
the distributional similarity is measured between solo and shared trips. However, for IVTT486
and distance imputation, since the detour only affects matched trips, solo and unmatched487
trips are combined together for comparison with matched trips. Both Fig. 8 and Table 5488
indicate an excellent similarity between the distribution of imputed trips and that of observed489
trips. WT and IVTT show the best similarity, followed by the distance. The distributional490
similarity of imputed fare is slightly poorer, which may be due to the cumulative error in the491
imputation process of WT, IVTT, and distance, as well as the complexity of the dynamic492
pricing mechanism. However, its rand JSD still reach 0.903&0.970 and 0.024&0.010.493
Overall, for all trip attributes, the imputed trip time and fare distributions are shown to494
be quite similar to the observed distributions, with Pearson’s rvalues all close to 1 and JSD495
values close to 0. Although the similarity-based validation can only be done at the aggregate496
level, rather than the individual level (which is impossible given the available data), it verifies497
the proposed imputation methods for producing reasonable and realistic results.498
Table 5: Validation results based on distributional similarity
Similarity Pearson’s rJSD
WT
Solo trips (imputed)
shared trips (observed) 0.991 0.002
Shared trips (imputed)
solo trips (observed) 0.995 0.002
IVTT
Solo & unmatched trips (imputed)
matched trips (observed) 0.997 0.004
Matched trips (imputed)
solo & unmatched trips (observed) 0.995 0.005
Distance
Solo & unmatched trips (imputed)
matched trips (observed) 0.978 0.011
Matched trips (imputed)
solo & unmatched trips (observed) 0.976 0.011
Fare
Solo trips (imputed)
shared trips (observed) 0.903 0.024
Shared trips (imputed)
solo trips (observed) 0.970 0.010
18
Figure 8: Probability distribution of observed and imputed values
4.2.2. Validation based on model prediction499
DNNs are adapted to learn the pricing rules for trip fare imputation. Thus, outstanding500
model goodness of fit on test data (from the chosen option) is a must. The model evaluation501
results on the test set are presented in Table 6. The absolute errors (RMSE) are less than 0.4502
dollars, the relative errors (MAPE) are just 1%, and the testing R2reaches 0.999. The excellent503
performance of the proposed DNN model on the chosen option provides some confidence for504
the fare imputation on the alternative option.505
Table 6: Performance evaluation of DNN for trip fare prediction
RMSE MAPE R2
Solo pricing scheme 0.369 0.010 0.999
Shared pricing scheme 0.149 0.008 0.999
In addition, leveraging the aforementioned differences between matched and unmatched506
trips, we compare the observed and imputed fares of unmatched trips for validation. Com-507
19
pared to previous validation methods, this approach is more direct and convincing, since the508
observed discount fare of the unmatched trip provides us with the ground truth for validat-509
ing the imputation. The RMSE, MAPE, and R2of the imputed fare for unmatched trips are510
$0.792, 4.44%, and 0.992 respectively. The prediction performance is slightly poorer than the511
validation result on the testing data. The potential reason for this difference is likely to be the512
imperfectness of time imputation leading to small deviations in fare estimation. However, the513
accuracy is still considerably high. All of the above evaluation results suggest the validity of our514
proposed imputation methods, allowing us to reconstruct the trip attributes of the unobserved515
alternative (with reasonable confidence) and pave the way for applying discrete choice analysis.516
4.3. Random coefficient logit model results517
Instructed by prior findings that the WTS has significant temporal variations across a518
day, we build four choice models for the midnight, morning, afternoon and evening periods,519
respectively using the logit model with random coefficients. Table 7 presents the results of520
choice models on choosing the ridesplitting service (Y= 1) over the solo option (Y= 0)4.521
As expected, the negative value of all fare and time-related coefficients suggest an increase522
in fare or time would greatly reduce passengers’ willingness to choose the option. Among523
five random coefficients, only the WT is examined to have a significant heterogeneity among524
individuals. In terms of fare-related coefficients, βshared
fis found to be more negative than βsolo
f,525
indicating higher price sensitivity for shared trips. This is likely because the passengers who526
prefer the shared option are generally more price-sensitive and tend to prioritise fare savings.527
With regard to time-related coefficients, βshared
iv is found to be more negative than βsolo
iv . The528
ratios between the two in-vehicle time coefficients are about 1.070 and 1.103 at midnight and529
evening and 1.065 and 1.069 in the daytime. This is consistent with the common experience530
that a unit of time spent in the shared vehicle is generally less desirable due to the crowded531
in-vehicle space or safety/privacy concerns, especially at nighttime. Meanwhile, the βwis532
found to be similar to the coefficients of in-vehicle time. This is different from prior findings533
that the importance of waiting time is about 2-3 times the in-vehicle time (Gerˇziniˇc et al.,534
2023; Mohring et al., 1987). We attribute this “untypical” finding to the trade-off process of535
ridesplitting service. As shown in Fig. 1, the provided information for the solo and shared536
options often only reveal the whole trip time, combining the waiting time and in-vehicle time.537
Thus, although we separately model WT and IVTT in our study, passengers might not be able538
to distinguish the two and regard them equally in reality.539
Based on the estimated coefficients of fare and time, we can calculate VOT. The estimated540
VOT for midnight, morning, afternoon, and evening are listed in Table 8. Among different541
periods, the VOT of both WT and IVTT are the highest in the morning and at midnight, and542
the lowest in the evening. This coincides with common sense that passengers are most time-543
sensitive during the morning rush hours (for going to work) (Small, 2012) and at midnight544
(when limited transport services are available) and is consistent with the prior finding that545
the VOT related to commuting trips (mostly in the morning) is higher than the leisure trips546
(mostly in the evening) (Lavieri and Bhat, 2019). Also, the VOT of the shared option is found547
to be slightly lower than the solo option, which matches the expectation that ridesplitting users548
generally have lower VOT and thus tend to prioritise fare savings. Finally, the VOT found in549
this study is consistent with the VOT regarding the shared autonomous vehicle in several US550
4The fixed and random coefficient logit model are both adopted in this study to solve the discrete choice
process. For conciseness, only the result of the random coefficient logit model is reported in this section. The
complete comparison between the two models is provided in Appendix A
20
Table 7: Summary of logit model results across four time periods
Variables Midnight Morning Afternoon Evening
Coef z-test Coef z-test Coef z-test Coef z-test
Time-fare trade-offs
Fare-solo-mean ($) -3.368** -131.3 -3.625** -76.8 -4.209** -81.2 -3.397** -81.2
Fare-shared-mean ($) -3.654** -127.2 -3.940** -76.1 -4.664** -80.3 -3.808** -82.2
WT-mean (minute) -2.022** -113.5 -2.031** -69.0 -2.209** -72.5 -1.779** -75.5
IVTT-solo-mean (minute) -1.981** -71.1 -2.020** -55.4 -2.256** -56.7 -1.622** -66.9
IVTT-shared-mean (minute) -2.120** -88.8 -2.151** -64.9 -2.410** -67.6 -1.790** -74.2
Fare-solo-std ($) / 0.1 / 0.0 / 0.8 / 0.1
Fare-shared-std ($) / 0.1 / 0.0 / 0.1 / 0.0
WT-std (minute) 0.453** 59.9 0.175** 9.8 0.353** 29.4 0.158** 8.4
IVTT-solo-std (minute) / 0.0 / 0.2 / 0.8 / 0.2
IVTT-shared-std (minute) / 0.0 / 0.2 / 0.4 / 0.1
Traffic
Trip distance (mile) -0.868** -43.5 -0.750** -24.4 -0.913** -25.4 -0.771** -37.7
Demand (10 trips) 0.029** 22.3 0.069** 10.8 0.127** 22.8 0.017** 10.3
Speed (mph) 0.312** 47.7 0.150** 16.1 0.381** 26.3 0.263** 35.9
Unreliability (mph) -0.468** -29.1 -0.222** -10.4 -0.834** -27.2 -0.505** -30.2
Commuting time-O (minute) -0.057** -9.9 -0.033** -4.5 -0.047** -5.8 -0.040** -9.4
Commuting time-D (minute) -0.040** -7.4 -0.044** -5.8 -0.047** -5.9 -0.025** -6.0
NoVehicle rate-O (%) / 0.8 -0.027** -7.0 -0.018** -4.2 / -0.2
NoVehicle rate-D (%) / -0.3 -0.010* -2.1 -0.015** -3.4 / -1.4
Sociodemographic
Median income2-O ($10,000) -0.029** -8.4 -0.020** -4.5 -0.010* -2.2 -0.012** -4.9
Median income2-D ($10,000) -0.034** -9.9 -0.010* -2.6 / 0.1 -0.020** -8.0
Median income-O ($10,000) 0.452** 8.4 0.375** 5.4 0.293** 4.0 0.189** 4.8
Median income-D ($10,000) 0.557** 10.4 0.243** 3.5 / 0.6 0.326** 8.1
Female rate-O (%) 0.068** 13.8 0.067** 10.9 0.081** 11.4 0.041** 11.1
Female rate-D (%) 0.083** 18.7 0.045** 7.0 0.061** 8.7 0.044** 12.5
Young rate-O (%) 0.018** 7.1 0.024** 7.3 0.018** 4.9 0.011** 6.1
Young rate-D (%) 0.019** 7.6 0.020** 5.8 0.020** 5.5 0.019** 10.3
White rate-O (%) -0.024** -11.4 -0.035** -13.1 -0.035** -11.5 -0.017** -10.4
White rate-D (%) -0.030** -14.9 -0.026** -9.0 -0.035** -11.5 -0.025** -15.6
Ethnicity mixture-O / 0.4 / 1.8 0.003** 2.8 / 0.4
Ethnicity mixture-D 0.007** 9.8 / -0.2 0.003** 3.0 0.002** 4.9
Crime density-O (100/mile2) -0.013** -7.6 -0.010** -4.2 -0.008** -3.2 -0.005** -4.0
Crime density-D (100/mile2) -0.022** -13.3 -0.013** -6.0 -0.016** -6.4 -0.015** -11.7
Built environment
Residential area ratio-O (%) 0.423** 7.5 0.475** 6.5 0.445** 5.4 0.461** 11.3
Residential area ratio-D (%) 0.733** 12.9 / -0.3 0.368** 4.6 0.426** 10.6
Land use mixture-O 1.613** 13.2 0.390* 2.3 0.517** 2.9 0.351** 4.1
Land use mixture-D 0.280* 2.3 / 0.2 / 0.7 0.255** 2.9
Bus-O (count) -0.005** -3.0 -0.008** -3.8 -0.012** -5.2 -0.002* -2.0
Bus-D (count) -0.008** -4.9 / -0.9 / -1.5 -0.006** -4.7
Metro-O (count) 0.032** 4.2 / -1.5 / -1.8 0.020** 4.0
Metro-D (count) / 1.4 0.020* 2.2 0.030** 2.9 0.015** 2.8
Distance to center-O (mile) -0.213** -10.9 -0.166** -5.8 -0.121** -4.0 -0.136** -9.4
Distance to center-D (mile) -0.151** -7.8 -0.161** -6.4 -0.251** -8.5 -0.147** -10.0
Constant -9.090** -14.6 -1.620* -2.2 -2.519** -3.0 -4.298** -10.0
*, ** represents statistically significant at 95%, 99%; / represents insignificant;
metropolitan (Zhong et al., 2020), and slightly higher than some prior survey-based studies.551
For example, the VOT is found to be around $28.77/h for commuting trips and $23.27/h for552
leisure trips in Dallas, TX (Lavieri and Bhat, 2019), and between e7.78 and e26.25/h ($8.56553
21
to $28.88/h in the exchange rate of 2020) in Dutch cities (Alonso-Gonz´alez et al., 2021). The554
difference can stem from two aspects. Firstly, the higher income in Manhattan generally leads555
to a higher VOT of passengers. Second, passengers may behave differently in reality than in the556
hypothetical SP experiment, where they can show less fare sensitivity and lower delay tolerance557
when they face the actual choice scenarios.558
Table 8: Summary of the estimated value of time
Midnight Morning Afternoon Evening
WT-solo 36.0 33.6 31.5 31.4
WT-shared 33.2 30.9 28.4 28.0
IVTT-solo 35.3 33.4 32.2 28.7
IVTT-shared 34.8 32.8 31.0 28.2
In terms of traffic-related variables, trip distance shows a consistently negative effect on559
the WTS, i.e. passengers are more willing to choose the ridesplitting service for shorter trips,560
which is opposite to the previous finding at the aggregate-level (Xu et al., 2021). This may561
be because passengers for long trips are more sensitive to potential detours, or they expect562
shorter trips to have a lower chance of being matched in reality so that they can just enjoy a563
discounted fare without the detour. In our data, we find that the match probability is 67.7% for564
longer-than-average trips and just 53.6% for shorter trips, which is consistent with the finding565
in Zhu et al. (2020) based on a theoretical model.566
In general, travel demand is found to slightly encourage the choice of ridesplitting, which567
can result from the attempt to the difficulty of hailing a vehicle (Huang et al., 2021). It is568
found that speed will increase WTS while unreliability will decrease WTS. Both findings are569
quite intuitive because a higher speed would lessen the travel delay caused by detours, but570
speed variability is likely to raise concerns for travel time unreliability, which is similar to the571
unpunctuality issue cited as one of the top concerns on ridesplitting in prior studies (Wang572
et al., 2019). The influence of commuting time on WTS is significant and consistent across the573
four periods, and areas with long commuting time are associated with lower WTS. Generally,574
every 1 more minute may decrease the odds of ridesplitting by about 4%. Household vehicle575
availability at the zone level seems to be less significant.576
Regarding the sociodemographic variables, median income and its squared term are both577
considered. It is found that the median income shows a positive impact, whereas its squared578
term holds a negative sign, which indicates that the influence of income on WTS could be579
non-linear. WTS will first increase with the increase in income but decrease later. The turning580
point is at around $80,000-90,000 of annual income, which is approximately the middle-class581
income of NYC residents. Trips from and to zones with a higher proportion of female and582
young residents have a higher WTS. Although these variables cannot directly capture the583
sociodemographic characteristics of the individual decision-maker, they are still reflective of584
the general trends. The findings that female and young passengers are more willing to share585
rides are also consistent with the prior studies (Cahyo et al., 2019; Soltani et al., 2021; Lavieri586
and Bhat, 2019). In terms of ethnic composition, the proportion of the white population has587
a negative influence on WTS, indicating white communities generally have lower WTS. Crime588
density, as a proxy of the safety risk, is also found to reduce the WTS. In particular, its influence589
at the destination at midnight and in the evening is relatively the strongest. Every 100 more590
crime/mile2at the destination zone will decrease the odds of ridesplitting by about 2.17% and591
1.49% for the midnight and evening periods, respectively.592
Regarding the built environment, the proportion of residential areas in origin and destination593
22
zones is found to have a strong positive effect on WTS, indicating that the ridesplitting trip may594
mostly start and/or end at the residential areas. Meanwhile, a high land use mixture is related595
to a higher adoption rate of ridesplitting, consistent with prior findings (Alemi et al., 2018;596
Yang et al., 2021). The number of bus stations in a zone is found to have a slightly negative597
influence on WTS, while the number of metro station slightly improve WTS. In this study, the598
urban centre is assigned to be Times Square and the Grand Central area of Manhattan, and it599
is found that trips at zones further away from the centre are less likely to be shared (Brown,600
2020; Soltani et al., 2021).601
4.4. Model comparison602
The model goodness-of-fit is shown in Table 9. To compare the relative contribution of603
individual-level trip attributes and aggregate-level contextual factors (e.g. overall demand,604
sociodemographic characteristics, and built environment), we further test models with only one605
group of variables and examine how their performance is different from the complete model.606
For all time periods, the complete model is shown to achieve the best performance based on607
the pseudo R2, precision, recall, and F1 score. The goodness-of-fit of the model with only trip608
attributes is still considerable. This suggests that the trade-off between trip time and fare is609
the most influential determinant of WTS, which is also consistent with prior findings (Alonso-610
Gonz´alez et al., 2021). In comparison, the impacts of the aggregate-level variables are less611
important but still moderate. Incorporating these variables in the model can not only lead to612
significantly better model performance but also provide a more comprehensive understanding613
of the diverse determinants of WTS.614
Table 9: Evaluation of model goodness-of-fit
Metric Models Midnight Morning Afternoon Evening
McFadden’s pseudo R2
Complete 0.684 0.820 0.827 0.771
Trip attributes only 0.659 0.813 0.816 0.760
Contextual factors only 0.191 0.196 0.187 0.161
Precision
Complete 0.977 0.988 0.989 0.985
Trip attributes only 0.978 0.991 0.991 0.988
Contextual factors only 0.857 0.814 0.837 0.824
Recall
Complete 0.892 0.941 0.944 0.927
Trip attributes only 0.889 0.940 0.941 0.923
Contextual factors only 0.713 0.760 0.759 0.731
F1 score
Complete 0.932 0.964 0.966 0.955
Trip attributes only 0.931 0.965 0.966 0.955
Contextual factors only 0.779 0.786 0.796 0.775
Apart from the significant interpretability of our model result, we further demonstrate the615
model’s superior prediction performance compared to the aggregate-level baselines. Unlike our616
proposed model that analyses each individual choice scenario, aggregate-level models require617
aggregated trip attributes at the origin-destination-time period level and directly predict WTS618
as the ratio between the number of shared trips and all trips. Both linear regression and the619
XGBoost are used for learning the aggregate-level relationships. The former is a representative620
linear model, and the latter is a powerful machine learning method for capturing non-linear621
relationships. In contrast, our model predicts the choice of each individual, and the choice622
probabilities for the shared option can be aggregated at the origin-destination-time period level623
and compared to the actual WTS. R2,RMSE and M AP E are adopted for model evaluation624
23
and comparison, with the results summarised in Table 10. Among all models, linear regression625
is found to perform the worst, while XGBoost performs significantly better, indicating the626
relationships between the determinants and WTS are likely to be non-linear. The observed627
RMSE of linear regression and XGBoost model is similar to that of a prior aggregate-level628
study conducted in Chicago (Xu et al., 2021). Our proposed model gets the best performance in629
predicting the WTS, with the R2all exceeding 0.9 and the RMSE and M AP E mostly less than630
0.05. Both previous research (Alonso-Gonz´alez et al., 2021; Lavieri and Bhat, 2019) and our631
findings (Table 7) underscore the significance of individual time-fare trade-offs in mode choice.632
Unlike aggregate-level models that apply average values, thus obscuring individual variations633
in trip time and fare, our approach incorporates these variations and models the trade-off634
process more effectively. Therefore, our findings reveal the contribution of individual-level trip635
attributes to enhanced model prediction performance.636
In summary, the comparison demonstrates that our proposed model can not only provide637
meaningful insights about the WTS determinants but also accurately predict WTS.638
Table 10: Comparison of model ability to predict WTS
Metric Models Midnight Morning Afternoon Evening
R2
Proposed model 0.903 0.981 0.980 0.968
XGBoost 0.865 0.793 0.875 0.900
Linear regression 0.749 0.697 0.735 0.744
RM SE
Proposed model 0.063 0.030 0.029 0.036
XGBoost 0.081 0.103 0.081 0.068
Linear regression 0.113 0.127 0.116 0.110
M AP E
Proposed model 0.072 0.041 0.035 0.042
XGBoost 0.116 0.166 0.128 0.095
Linear regression 0.198 0.304 0.229 0.210
5. Conclusion639
Having a comprehensive understanding of the determinants of individuals’ willingness to640
share rides is key to the promotion of ridesplitting services. Prior studies have explored this641
question either through individual-level survey-based SP experiments or based on aggregate-642
level data-driven regression analysis. However, both types of methods hold certain limitations,643
such as the costly and hypothetical nature of the former, and the tendency to mask the influence644
of individual-level trip attributes for the latter. To fill these gaps, this study introduces an645
individual-level big data approach to comprehensively modelling the determinants of WTS.646
Specifically, we propose to utilise the observed trip fare and time of the chosen option, as647
well as the relationships between the solo and shared trips, to impute (and validate) the likely648
trip fare and time of the unobserved alternative option. These individual-level trip attributes649
are then integrated with aggregate-level contextual factors (e.g. socioeconomic characteristics,650
built environment and traffic features) by combining multiple data sources. Finally, a series of651
discrete choice models are estimated to explain the effects of different variables on the choice of652
ridesplitting. The detailed process of data preparation, imputation, validation, and modelling653
is tested using real-world TNC data from Manhattan, NYC, but the methods are general and654
can be adapted for different cities and data sources.655
For trip time and fare imputation, we estimate the detour and discount ratios separately656
using a prior empirical law (Ke et al., 2021) and a DNN based on known TNC pricing schemes.657
24
The imputation results are further validated through several indirect methods, including dis-658
tribution similarity analysis, DNN prediction performance evaluation, and comparison between659
matched and unmatched trips. With the imputed trip fare and time, we can then model the660
individual choice between solo and shared rides using a discrete choice model. Using Manhat-661
tan as a case study, the model results reveal considerable spatial and temporal heterogeneity in662
WTS. For spatial heterogeneity, it is found that WTS in Downtown and Midtown East areas663
is the highest, while the Midtown Core and remote North Harlem areas are associated with664
the lowest WTS. For temporal heterogeneity, WTS is the highest at midnight and lowest in665
the morning, especially during the morning rush hour, followed by a steady increase later in666
the afternoon and evening. The time-fare trade-off is found to have the strongest impact on667
WTS. The VOT associated with the ridesplitting choice is estimated as about $35/h, $33/h,668
$31/h, and $28/h during the midnight, morning, afternoon, and evening periods respectively.669
In addition, aggregate-level contextual factors are also found to play a role. Relatively shorter670
trips have higher WTS. Average speed on the road lessens the influence of the detour and671
increases WTS, but speed variability can increase the travel time unreliability and reduce the672
probability of choosing ridesplitting. A longer commuting time, higher crime density, larger673
proportion of the white population, longer distance to the urban centre, and easier access to the674
bus station are found to reduce WTS, whereas a higher proportion of middle-class, female and675
young residents and more mixed land use can encourage the adoption of ridesplitting. Apart676
from the significant interpretability of the model, a comparison between our model and the677
aggregate-level model also shows a better predictiveness of WTS of our proposed model.678
On the basis of the findings, several implications can be proposed for the development679
of ridesplitting services. First, since the time-fare trade-off plays the most fundamental role680
in choice behaviour, the most practical policy for TNCs to promote the usage of ridesplitting681
should be lowering the fare (or increasing the solo fare) and limiting the travel delay (or detour).682
For example, a higher discount, periodical vouchers, and low-carbon rewards for ridesplitting683
trips can be adopted (Li et al., 2023). Additionally, TNCs might try to properly make ridesplit-684
ting the default option to increase its adoption (He et al., 2023). For the local governments, the685
attraction of ridesplitting services can be enhanced by enlarging fare savings through differen-686
tial congestion surcharges, tolling fees, or other government taxes, without interfering with the687
pricing strategies of TNCs. Meanwhile, some assembly stations or HOV lanes could be designed688
and planned to reduce the potential detour for shared trips, as some studies suggested (S´ejourn´e689
et al., 2018; Sperling, 2018). Apart from these strategies targeting trip fare and time, auxiliary690
policies can be designed to provide a more attractive environment for ridesplitting, which may691
include planning more mixed land use, reducing commuting time and urban crimes, and in-692
creasing traffic speed and travel time reliability. However, it is worth considering the potential693
competitive relationship between ridesplitting and public transit to avoid the undesired modal694
shift away from the latter. Meanwhile, since female and young passengers are shown to have695
higher WTS, precautions should also be considered by both the TNCs and local governments696
to ensure their safety and alleviate their risk concerns.697
There are also some limitations to be addressed in future research. Firstly, with the lim-698
itation of the data, we cannot directly validate the imputed fare and time at the individual699
trip level. Instead, we only show how indirect methods of validation can provide reasonable700
confidence in the imputation results. For a more direct way of validation, a more detailed data701
source capturing the actual trip fare and time presented to passengers for both the solo and702
shared options would be needed. Secondly, some sociodemographic factors, such as gender,703
income, and ethnicity are only evaluated at the aggregate level. As mentioned earlier, data704
aggregation may mask the actual variation of these factors and lead to underestimated effects.705
25
Thirdly, the linearity of the influence is not discussed in this study due to the model we used. In706
future research, more sophisticated machine learning models and explainable AI techniques can707
be used to uncover and interpret the potentially nonlinear and interactive effects of different708
influencing factors on WTS.709
Acknowledgements710
This research is supported by the National Natural Science Foundation of China (NSFC)711
Young Scientists Fund (42201502), Joint Programming Initiative Urban Europe and NSFC712
(71961137003), Guangdong Science and Technology Department (2020B12120300), and Chan713
To-Haan Endowed Professorship Fund and Seed Fund for Basic Research for New Staff (104006019)714
at The University of Hong Kong.715
References716
Abkarian, H., Chen, Y., and Mahmassani, H. S. (2022). Understanding ridesplitting behavior with interpretable717
machine learning models using chicago transportation network company data. Transportation research record,718
2676 , 83–99.719
Alemi, F., Circella, G., Handy, S., and Mokhtarian, P. (2018). What influences travelers to use uber? exploring720
the factors affecting the adoption of on-demand ride services in california. Travel Behaviour and Society,13 ,721
88–104.722
Alonso-Gonz´alez, M. J., Cats, O., van Oort, N., Hoogendoorn-Lanser, S., and Hoogendoorn, S. (2021). What723
are the determinants of the willingness to share rides in pooled on-demand services? Transportation,48 ,724
1733–1765.725
Brown, A. E. (2020). Who and where rideshares? rideshare travel and use in los angeles. Transportation726
Research Part A: Policy and Practice,136 , 120–134.727
Cahyo, A., Burhan, H. et al. (2019). Mode choice model analysis between ridesouring and ridesplitting service728
in dki jakarta. In MATEC Web of Conferences (p. 03013). EDP Sciences volume 270.729
Castillo, J. C., Knoepfle, D., and Weyl, G. (2017). Surge pricing solves the wild goose chase. In Proceedings of730
the 2017 ACM Conference on Economics and Computation (pp. 241–242).731
Chen, L., Mislove, A., and Wilson, C. (2015). Peeking beneath the hood of uber. In Proceedings of the 2015732
internet measurement conference (pp. 495–508).733
Chen, X. M., Zahiri, M., and Zhang, S. (2017). Understanding ridesplitting behavior of on-demand ride services:734
An ensemble learning approach. Transportation Research Part C: Emerging Technologies,76 , 51–70.735
Chen, Z., and Liu, X. C. (2021). Statistical distance-based travel-time reliability measurement for freeway736
bottleneck identification and ranking. Transportation Research Record ,2675 , 424–438.737
Ding, H., Loukaitou-Sideris, A., and Wasserman, J. L. (2022). Homelessness on public transit: A review of738
problems and responses. Transport Reviews,42 , 134–156.739
Frei, C., Hyland, M., and Mahmassani, H. S. (2017). Flexing service schedules: Assessing the potential for740
demand-adaptive hybrid transit via a stated preference approach. Transportation Research Part C: Emerging741
Technologies,76 , 71–89.742
Garc´ıa, I., Albelson, M., Puczkowskyj, N., Khan, S. M., and Fagundo-Ojeda, K. (2022). Harassment of low-743
income women on transit: a photovoice project in oregon and utah. Transportation research part D: transport744
and environment,112 , 103466.745
Gargiulo, E., Giannantonio, R., Guercio, E., Borean, C., and Zenezini, G. (2015). Dynamic ride sharing service:746
are users ready to adopt it? Procedia Manufacturing,3, 777–784.747
Gerˇziniˇc, N., Cats, O., van Oort, N., Hoogendoorn-Lanser, S., Bierlaire, M., and Hoogendoorn, S. (2023). An748
instance-based learning approach for evaluating the perception of ride-hailing waiting time variability. arXiv749
preprint arXiv:2301.04982 , .750
Goldszmidt, A., List, J. A., Metcalfe, R. D., Muir, I., Smith, V. K., and Wang, J. (2020). The Value of Time in751
the United States: Estimates from Nationwide Natural Field Experiments. Technical Report National Bureau752
of Economic Research.753
He, B. Y., Zhou, J., Ma, Z., Chow, J. Y., and Ozbay, K. (2020). Evaluation of city-scale built environment754
policies in new york city with an emerging-mobility-accessible synthetic population. Transportation Research755
Part A: Policy and Practice,141 , 444–467.756
26
He, G., Pan, Y., Park, A., Sawada, Y., and Tan, E. S. (2023). Reducing single-use cutlery with green nudges:757
Evidence from china’s food-delivery industry. Science,381 , eadd9884.758
He, Z., and Chen, P. (2021). Shared mobility: Characteristics, impacts, and improvements.759
Huang, G., Qiao, S., and Yeh, A. G.-O. (2021). Spatiotemporally heterogeneous willingness to ridesplitting and760
its relationship with the built environment: A case study in chengdu, china. Transportation Research Part761
C: Emerging Technologies ,133 , 103425.762
Huang, K., Liu, Z., Kim, I., Zhang, Y., and Zhu, T. (2019). Analysis of the influencing factors of carpooling763
schemes. IEEE Intelligent Transportation Systems Magazine,11 , 200–208.764
Ke, J., Zheng, Z., Yang, H., and Ye, J. (2021). Data-driven analysis on matching probability, routing distance765
and detour distance in ride-pooling services. Transportation Research Part C: Emerging Technologies,124 ,766
102922.767
LaGrange, R. L., Ferraro, K. F., and Supancic, M. (1992). Perceived risk and fear of crime: Role of social and768
physical incivilities. Journal of research in crime and delinquency ,29 , 311–334.769
Lavieri, P. S., and Bhat, C. R. (2019). Modeling individuals’ willingness to share trips with strangers in an770
autonomous vehicle future. Transportation research part A: policy and practice ,124 , 242–261.771
Li, W., Pu, Z., Li, Y., and Ban, X. J. (2019). Characterization of ridesplitting based on observed data: A case772
study of chengdu, china. Transportation Research Part C: Emerging Technologies,100 , 330–353.773
Li, W., Pu, Z., Li, Y., and Tu, M. (2021a). How does ridesplitting reduce emissions from ridesourcing? a774
spatiotemporal analysis in chengdu, china. Transportation Research Part D: Transport and Environment ,775
95 , 102885.776
Li, X., Feng, F., Wang, W., Cheng, C., Wang, T., and Tang, P. (2021b). Structure analysis of factors influencing777
the preference of ridesplitting. Journal of Advanced Transportation,2021 .778
Li, X., Zhang, Y., Yang, Z., Zhu, Y., Li, C., and Li, W. (2023). Modeling choice behaviors for ridesplitting779
under a carbon credit scheme. Sustainability,15 , 12241.780
Mohring, H., Schroeter, J., and Wiboonchutikula, P. (1987). The values of waiting time, travel time, and a seat781
on a bus. The RAND Journal of Economics, (pp. 40–56).782
Moody, J., Middleton, S., and Zhao, J. (2019). Rider-to-rider discriminatory attitudes and ridesharing behavior.783
Transportation Research Part F: Traffic Psychology and Behaviour ,62 , 258–273.784
Parrott, J. A., and Reich, M. (2018). An earnings standard for new york city’s app-based drivers. New York:785
The New School: Center for New York City Affairs, .786
Santi, P., Resta, G., Szell, M., Sobolevsky, S., Strogatz, S. H., and Ratti, C. (2014). Quantifying the benefits787
of vehicle pooling with shareability networks. Proceedings of the National Academy of Sciences ,111 , 13290–788
13294.789
ejourn´e, T., Samaranayake, S., and Banerjee, S. (2018). The price of fragmentation in mobility-on-demand790
services. Proceedings of the ACM on Measurement and Analysis of Computing Systems,2, 1–26.791
Shaheen, S. (2020). Going my way? the evolution of shared ride and pooling services. Transfers Magazine, .792
Shaheen, S., Chan, N., Bansal, A., and Cohen, A. (2015). Shared mobility: Definitions, industry developments,793
and early understanding. Transportation Sustainability Research Center, Innovative Mobility Research , .794
Shaheen, S., Cohen, A., Zohdy, I. et al. (2016). Shared mobility: current practices and guiding principles.795
Technical Report United States. Federal Highway Administration.796
Small, K. A. (2012). Valuation of travel time. Economics of transportation ,1, 2–14.797
Soltani, A., Allan, A., Khalaj, F., Pojani, D., and Mehdizadeh, M. (2021). Ridesharing in adelaide: Segmenta-798
tion of users. Journal of Transport Geography,92 , 103030.799
Sperling, D. (2018). Three revolutions: Steering automated, shared, and electric vehicles to a better future.800
Island Press.801
Train, K., and Weeks, M. (2005). Discrete choice models in preference space and willingness-to-pay space .802
Springer.803
Train, K. E. (2003). Discrete choice methods with simulation. Cambridge university press.804
Tu, M., Li, W., Orfila, O., Li, Y., and Gruyer, D. (2021). Exploring nonlinear effects of the built environment805
on ridesplitting: Evidence from chengdu. Transportation Research Part D: Transport and Environment,93 ,806
102776.807
Wang, H., and Yang, H. (2019). Ridesourcing systems: A framework and review. Transportation Research Part808
B: Methodological,129 , 122–155.809
Wang, J., Wang, X., Yang, S., Yang, H., Zhang, X., and Gao, Z. (2021). Predicting the matching probability and810
the expected ride/shared distance for each dynamic ridepooling order: A mathematical modeling approach.811
Transportation Research Part B: Methodological ,154 , 125–146.812
Wang, Y., Gu, J., Wang, S., and Wang, J. (2019). Understanding consumers’ willingness to use ride-sharing813
services: The roles of perceived value and perceived risk. Transportation Research Part C: Emerging Tech-814
27
nologies,105 , 504–519.815
Whitehead, J. C., Pattanayak, S. K., Van Houtven, G. L., and Gelso, B. R. (2008). Combining revealed and816
stated preference data to estimate the nonmarket value of ecological services: an assessment of the state of817
the science. Journal of Economic Surveys,22 , 872–908.818
Xu, Y., Yan, X., Liu, X., and Zhao, X. (2021). Identifying key factors associated with ridesplitting adoption819
rate and modeling their nonlinear relationships. Transportation Research Part A: Policy and Practice,144 ,820
170–188.821
Yang, H., Liang, Y., and Yang, L. (2021). Equitable? exploring ridesourcing waiting time and its determinants.822
Transportation Research Part D: Transport and Environment,93 , 102774.823
Zhong, H., Li, W., Burris, M. W., Talebpour, A., and Sinha, K. C. (2020). Will autonomous vehicles change824
auto commuters’ value of travel time? Transportation Research Part D: Transport and Environment ,83 ,825
102303.826
Zhu, Z., Qin, X., Ke, J., Zheng, Z., and Yang, H. (2020). Analysis of multi-modal commute behavior with827
feeding and competing ridesplitting services. Transportation Research Part A: Policy and Practice,132 ,828
713–727.829
28
Appendix A. Comparison between the fixed and random coefficient model830
The result of the fixed and random coefficient models is shown in Table A.11. In terms of831
the model goodness-of-fit, the fixed and random coefficient models exhibit similar performance832
across all evaluation metrics. The similar performance of the models suggests that the inclusion833
of random effects may not be necessary for improving the model’s predictiveness. A possible834
explanation is that, benefiting from the advantage of millions of trip data, the prediction of835
the fixed coefficient model has already been excellent enough, so that further improvement can836
be extremely difficult. For the estimated coefficients, only the coefficients that have been set837
as random are reported. Among five random coefficients, only the WT is examined to have a838
significant heterogeneity among individuals. The result regarding the coefficients of the random839
coefficient model suggests that the heterogeneity in preference for fare and time in our trip data840
might not be significant. Finally, in terms of the VOT, the estimated VOT from the two models841
are almost identical, indicating the estimated VOT from the trip data by different models are842
relatively stable. In summary, the fixed and random coefficient models perform almost the843
same on model fit, coefficient, and VOT estimation.844
Table A.11: Comparison between results of fixed and random coefficient models
Midnight Morning Afternoon Evening
Fixed Random Fixed Random Fixed Random Fixed Random
Goodness-of-fit
McFadden’s pseudo R20.676 0.684 0.820 0.820 0.825 0.827 0.769 0.771
Precision 0.976 0.977 0.989 0.988 0.989 0.989 0.985 0.985
Recall 0.895 0.892 0.942 0.941 0.944 0.944 0.927 0.927
F1 score 0.933 0.932 0.965 0.964 0.966 0.966 0.955 0.955
Coefficient
Fare-solo (mean) -2.670* -3.368* -3.434* -3.625* -3.448* -4.209* -3.259* -3.397*
Fare-shared (mean) -2.984* -3.654* -3.730* -3.940* -3.831* -4.664* -3.640* -3.808*
WT (mean) -1.513* -2.022* -1.920* -2.031* -1.779* -2.209* -1.713* -1.779*
IVTT-solo (mean) -1.475* -1.981* -1.892* -2.020* -1.914* -2.256* -1.516* -1.622*
IVTT-shared (mean) -1.589* -2.120* -2.023* -2.151* -2.020* -2.410* -1.690* -1.790*
Fare-solo (std) N/A 0.000 N/A 0.000 N/A 0.002 N/A 0.000
Fare-shared (std) N/A 0.000 N/A 0.000 N/A 0.000 N/A 0.000
WT (std) N/A 0.453* N/A 0.175* N/A 0.353* N/A 0.158*
IVTT-solo (std) N/A 0.000 N/A 0.000 N/A 0.001 N/A 0.000
IVTT-shared (std) N/A 0.000 N/A 0.000 N/A 0.000 N/A 0.000
VOT
WT-solo 34.0 36.0 33.6 33.6 31.0 31.5 31.5 31.4
WT-shared 30.4 33.2 30.9 30.9 27.9 28.4 28.2 28.0
IVTT-solo 33.2 35.3 33.1 33.4 33.3 32.2 27.9 28.7
IVTT-shared 31.9 34.8 32.5 32.8 31.6 31.0 27.9 28.2
* represents statistically significant at 99%; N/A means not applicable;
29
Article
Full-text available
Ride-hailing enjoys global popularity as a door-to-door mobility service, but its pick-up and drop-off inefficiencies and resulting environmental costs are often overlooked. This research examines the potential efficiency and environmental benefits of a flexible pick-up and drop-off (PUDO) strategy, which seeks to improve the routing efficiency by shifting the PUDO locations within short walking distance. A two-stage heuristic method is developed to identify suitable PUDO locations while considering both the detour and congestion factors. Using DiDi Chuxing trip data from central Chengdu, China, we find that 8%–23% of trips can be improved, reducing driving distance and time by up to 15%, and saving 11.6–51.1 GJ of energy consumption and 0.8–3.7 tons of CO emissions in a single working day. Further built environment analysis reveals that areas with one-way and narrow streets, more diverse land use, higher income population, and better public transportation accessibility would benefit more from the proposed strategy. These insights highlight the potential of integrating walking into urban mobility solutions for improved efficiency and environmental sustainability.
Article
Full-text available
Ridesplitting, a form of shared ridesourcing service, has the potential to significantly reduce emissions. However, its current adoption rate among users remains relatively low. Policies such as carbon credit schemes, which offer rewards for emission reduction, hold great promise in promoting ridesplitting. This study aimed to quantitatively analyze the choice behaviors for ridesplitting under a carbon credit scheme. First, both the socio-demographic and psychological factors that may influence the ridesplitting behavioral intention were identified based on the theory of planned behavior, technology acceptance model, and perceived risk theory. Then, a hybrid choice model of ridesplitting was established to model choice behaviors for ridesplitting under a carbon credit scheme by integrating both structural equation modeling and discrete choice modeling. Meanwhile, a stated preference survey was conducted to collect the socio-demographic and psychological information and ridesplitting behavioral intentions of transportation network company (TNC) users in 12 hypothetical scenarios with different travel distances and carbon credit prices. Finally, the model was evaluated based on the survey data. The results show that attitudes, subjective norms, perceived behavioral control, low-carbon values, and carbon credit prices have significant positive effects on the choice behavior for ridesplitting. Specifically, increasing the carbon credit price could raise the probability of travelers choosing ridesplitting. In addition, travelers with higher low-carbon values are usually more willing to choose ridesplitting and are less sensitive to carbon credit prices. The findings of this study indicate that a carbon credit scheme is an effective means to incentivize TNC users to choose ridesplitting.
Article
Full-text available
More than half a million people in the U.S. experience homelessness every day. Lacking other options, many turn to transit vehicles, stops, and stations for shelter. Many also ride public transit to reach various destinations. With affordable housing scarce and the numbers of unhoused individuals often surpassing the capacities of existing safety nets and support systems, transit operators face homelessness as a pressing issue on their systems and must implement policy measures from other realms beyond transportation to address it. Because of the health and safety implications for transit of the COVID-19 pandemic and the anticipated further rise in homelessness from the resulting economic downturn, studying and responding to the needs of these vulnerable travelers is critical. We conduct a comprehensive literature review to identify articles discussing homelessness in transit systems. While only a handful of articles exist from the 1990s, there is an emerging literature in the last 20 years that examines different aspects of homelessness in transit systems. We identify and review 63 articles on homelessness in transit systems and other public settings to better understand the extent of homelessness in the U.S., and how transit agencies perceive its impacts. We also summarise literature findings on the travel patterns of unsheltered individuals, which show that public transit represents an important and common mode of travel for them. Lastly, we focus on responses to homelessness from the part of transit operators. We find two types of responses: 1) punitive, in which criminalisation, policing and enforcement of laws and codes of conduct prevail, and 2) outreach-related, which aim towards providing help and support to unsheltered individuals. We conclude by summarising our findings as well as the existing gaps in the literature.
Article
Full-text available
Ridesharing and the tech companies that enable it have become household names but as research has focused on users rather than non-users, much less is known about the latter. However, understanding the characteristics, behaviours, and motivations of nonusers is quite important too, if the planning goal is to shift urban populations from private car ownership and use to ridesharing. This study examines both users and nonusers in the context of Adelaide, an Australian metropolitan of 1.3 million inhabitants. The segmentation of potential customers of ridesharing into three groups of users, interested non-users and non-interested non-users helped out to investigate the individual and built-environmental determinants of the behaviour of each group in detail. Applying advanced statistical analyses, we find that neighbourhood density and quality, higher levels of education and income, causal work status, younger age, and access to smartphones are the key factors associated with higher ridesharing use and/or higher interest in ridesharing. Factors such as concern over safety and security, advanced age, digital illiteracy, and suburban living lead some people to shun ridesharing. Socio-demographic factors such as car ownership, ethnic background; gender, and household size, are not associated with ridesharing behaviours. We conclude that the choice of ridesharing in Adelaide is driven by the notion of socioeconomic class.
Article
Rising consumer demand for online food delivery has increased the consumption of disposable cutlery, leading to plastic pollution worldwide. In this work, we investigate the impact of green nudges on single-use cutlery consumption in China. In collaboration with Alibaba's food-delivery platform, Eleme (which is similar to Uber Eats and DoorDash), we analyzed detailed customer-level data and found that the green nudges-changing the default to "no cutlery" and rewarding consumers with "green points"-increased the share of no-cutlery orders by 648%. The environmental benefits are sizable: If green nudges were applied to all of China, more than 21.75 billion sets of single-use cutlery could be saved annually, equivalent to preventing the generation of 3.26 million metric tons of plastic waste and saving 5.44 million trees.
Article
Scholarship on gendered mobilities has shown that women experience transit differently than men do, particularly regarding personal safety. Not enough studies have considered the everyday interaction of women of color with transit systems. This research employs a photovoice methodology which includes in-depth interviews and phone texting with 22 low-income women of color who ride transit at least a few times a month in Oregon and Utah. Like other gender mobility research, participants discussed sexual harassment on buses, streetcar, and light rail while walking or waiting for public transportation in either crowded areas downtown or desolated spaces outside of it. At the same time, this article makes a unique contribution to this body of literature because it shows that women feel targeted also based on their racial or ethnic identity and not only their gender. The article discusses women’s actions every day to increase their sense of safety.
Article
The popularity of smartphones and the advent of GPS positioning and wireless communication technologies in the recent decade have facilitated large-scale implementations of dynamic ridepooling services, such as Uber Pool, Lyft Line, and Didi Pinche. As in such services trips usually start before the appearance of pooling partners, knowing the probability of getting matching with another order (i.e., matching probability), the expected detour distance, and the expected shared distance before the start of each trip is essential for passengers to evaluate their willingness to pool and for ridepooling platforms to offer attractive discounts. In this paper, assuming that every ridepooling passenger shares vehicle space with at most one another during the entire trip, and ridepooling orders in each (origin-destination) OD pair appear following a Poisson process with a given rate, we propose a mathematical modeling approach to predict the matching probability, the expected ride distance, and the expected shared distance of each order under a first-come-first-serve strategy in dynamic ridepooling service. The method defines unmatched passengers at different locations along the exclusive-riding path of each OD pair into different seeker- and taker-states, formulates the complex interdependency of the matching probabilities, matching rates and arrival rates of (unmatched) passengers in different states into a system of nonlinear equations, and generates the matching probabilities and expected ride/shared distances of all OD pairs simultaneously. Under the same first-come-first-serve strategy, we simulated the occurrence, movements and state transitions of ridepooling orders based on a 30*30 grid network and the real network of Haikou City in China. In comparison with simulation results, we show that the method proposed in this paper can generate fairly satisfactory predictions under diverse matching conditions and demand intensities.
Article
The capability of ridesplitting service to address current urban transportation problems has attracted considerable research interest in recent years. Given that ridesplitting needs an adequate user base to realize “sharing”, its success highly depends on a comprehensive understanding of people’s willingness of using it. However, previous studies mainly focused on the ridesplitting willingness based on questionnaire surveys at the individual level, which may help transportation network companies find potential users, but lacked studies on people’s ridesplitting willingness from the perspective of the built environment and time-space integration, because the macro urban elements are difficult to be perceived by individuals and thus hard to reflect in questionnaires. Thus, this study estimates the ridesplitting willingness rate of different areas in a city at different times of the day by building a model on the shared order rate and shareability from the real-world DiDi Chuxing dataset, using Chengdu, China, as a case study. The spatial lag model (SLM) is further utilized to exam the relationship between the willingness rate, built environment and transportation-related variables. Results revealed the ridesplitting willingness rate has a significant spatiotemporal pattern between the urban centre and urban periphery and can be divided into morning peak, noon plateau, afternoon valley, night peak, and midnight valley. SLM models indicate that the ridesplitting willingness rate has a significant spatial dependency on its vicinity areas at the origin of the daytime periods or at the destination at night. The distance to urban centre, the distance to railway station, travel demand, accessibility to public transit, land use mixture, and the trip purpose are found to be the most relevant variables with the willingness rate, while the points of interest (POIs) are almost irrelevant. The findings of this study can be helpful to the future promotion of this sustainable service and formulation of policy to improve service quality.
Article
As congestion levels increase in cities, it is important to analyze people’s choices of different services provided by transportation network companies (TNCs). Using machine learning techniques in conjunction with large TNC data, this paper focuses on uncovering complex relationships underlying ridesplitting market share. A real-world dataset provided by TNCs in Chicago is used in analyzing ridesourcing trips from November 2018 to December 2019 to understand trends in the city. Aggregated origin–destination trip-level characteristics, such as mean cost, mean time, and travel time reliability, are extracted and combined with origin–destination community-level characteristics. Three tree-based algorithms are then utilized to model the market share of ridesplitting trips. The most significant factors are extracted as well as their marginal effect on ridesplitting behavior, using partial dependency plots for interpretation of the machine learning model results. The results suggest that, overall, community-level factors are as or more important than trip-level characteristics. Additionally, the percentage of White people highly affects ridesplitting market share as well as the percentage of bachelor’s degree holders and households with two people residing in them. Travel time reliability and cost variability are also deemed more important than travel time and cost savings. Finally, the potential impact of taxes, crimes, cultural differences, and comfort is discussed in driving the market share, and suggestions are presented for future research and data collection attempts.
Article
Freeway bottleneck identification is an essential component in the process of deploying mitigation strategies to reduce congestion at freeway bottlenecks. Most previous studies on bottleneck identification focus on recurrent bottlenecks, and limited work has been conducted to identify the locations of non-recurrent bottlenecks. Therefore, in this study, we propose a new travel time reliability (TTR) measurement and develop a freeway bottleneck identification method based on this measurement, which can identify with high probability not only recurrent bottlenecks but also the locations of non-recurrent bottlenecks. The TTR measurement is developed based on statistical distance between travel time distributions. Three statistical distance measurements, Jensen–Shannon divergence, Wasserstein distance, and Hellinger distance, are applied in the TTR measurement. The bottleneck identification method is evaluated in a case study on I-15 freeway corridor in Salt Lake City, Utah. The three statistical distance measurements show good consistency in ranking locations by the impacts of recurrent and non-recurrent congestion, especially for extreme cases with very high or low variation between travel time distributions. The recurrent bottlenecks identified in this study show their clustering characteristics, which is similar to the generating and dismissing process of recurrent congestion. The locations with high probability of non-recurrent bottlenecks scatter both spatially and temporally, which agrees with the random characteristic of non-recurrent congestion.
Article
Ridespitting, which enables riders with similar routes to share a ridesourcing trip, is a promising transportation technology to reduce traffic congestions and air pollutions. This study aims to explore how ridesplitting reduces emissions from ridesourcing based on GPS trajectory data from Didi Chuxing in Chengdu, China. First, this study quantifies the emission factors of both regular ridesourcing and ridesplitting trips to evaluate the emission reductions per ride-km from ridesplitting. The results show that the average emission reduction rates of CO2, CO, NOx, and HC are 28.7%, 32.5%, 27.7%, and 31.2%, respectively. Then, a spatiotemporal analysis of the emission reductions indicates that ridesplitting generally reduces more emissions around the expressways and during peak hours. Finally, a spatial error model is adopted to analyze the effects of travel-related and built environment variables on emission reductions from ridesplitting. The trajectory overlapping rate of shared rides turns out to be the most important determinant for expanding the environmental benefits of ridesplitting.