Content uploaded by Jason Cao
Author content
All content in this area was uploaded by Jason Cao on May 08, 2016
Content may be subject to copyright.
Exploring the influences of density on travel behavior using
propensity score matching
Xinyu (Jason) Cao* and Yingling Fan
Humphrey School of Public Affairs, University of Minnesota
301 19th Ave S., Minneapolis, MN, 55455
*Corresponding author
Email: cao@umn.edu
Phone: 612-625-5671
Fax: 612-625-3513
Abstract
The causality issue has become one of the key questions in the debate over the relationship
between the built environment and travel behavior. Since residential self-selection effect exists,
it is important to know if the observed influence of the built environment on travel behavior
diminishes substantially once we control for self-selection. Using 5,537 adult respondents to the
2006 Great Triangle Travel Survey in North Carolina, this study applied the propensity score
matching approach to identify the causal effect of density on travel behavior and the relative
contribution of self-selection to travel behavior. The results showed that, after removing self-
selection bias, residents living in high-density neighborhoods on average travel 3.31 fewer miles
per person per day than those who lived in low-density neighborhoods. Self-selection effects
account for 28%, 64%, and 49% of the observed influences of density on personal miles
travelled, driving duration, and transit duration, respectively. We also found different modeling
approaches may produce different point estimates, and interval estimates of treatment effects
tend to have a large variation. This points to a caveat of using point estimates to evaluate the
impacts of the built environment on travel behavior.
Key words: land use, smart growth, transportation, treatment effect, statistical control
1
1. INTRODUCTION
Recently, policy makers at the federal, state, and local levels have been considering land use
policies as a way to reduce vehicle miles traveled (VMT) and thus greenhouse gases (GHGs).
The U.S. is the largest emitter of GHGs per capita in the world and its transportation sector is a
major contributor. In 2007, transportation-related CO2 emissions accounted for 28% of GHGs
from all economic sectors (EPA, 2009). This statistic is not surprising given notorious auto
dependence in the U.S. In the fields of transportation and planning, it is widely recognized that
the low-density and segregated-use type of development has greatly contributed to auto
dependence and its negative consequences. Therefore, changing the built environment has been
considered as an important policy instrument to reduce driving. In 2008, California Senate
passed Bill 375 to reduce GHGs through regional sustainable community strategies; the 2009
U.S. HUD-DOT-EPA Interagency Partnership for Sustainable Communities recommended
directing federal funding toward existing communities – through strategies like transit oriented,
mixed-use development, and land recycling, and providing more transportation choices.
Recent empirical studies appear to support the implementation of land use policies to reduce
VMT and GHGs. The report Growing Cooler (Ewing et al., 2008) summarized the evidence on
urban development and climate change and concluded that “it is realistic to assume a 30 percent
cut in VMT [for people in areas of] compact development” (p. 9). Driving and the Built
Environment, a major report of Transportation Research Board (TRB, 2009), projected the
impacts of compact development on motorized travel and CO2 emissions, and made a conclusion
that “steering 75 percent of new and replacement housing units into more compact development
and assuming that residents of compact communities will drive 25 percent less would reduce
VMT and associated fuel use and CO2 emissions of new and existing households by about 7 to 8
percent relative to base case conditions by 2030, with the gap widening to between 8 and 11
percent less by 2050” (p. 4). These studies highlight the promise of high-density and compact
development in addressing the concerns of auto dependence and GHGs.
However, in the studies mentioned above, the assumed relationships between the built
environment and travel behavior may be, at least partly, a result of residential self-selection. The
overwhelming majority of previous studies, on which the assumptions were based, are
2
observational studies (Cao et al., 2009). In the studies, survey respondents self-selected, rather
than being randomly assigned, to live in their residential neighborhoods. Therefore, the observed
association between the built environment and travel behavior may be spurious due to factors
antecedent to both residential and travel choices. For example, an individual who dislikes
driving may selectively live in an urban area with ample transit services and then uses transit
more. In this case, it may be the preference rather than the urban area itself that causes the use of
transit. Residential self-selection generally results from both demographics and attitudinal
factors (Mokhtarian and Cao, 2008); some of the factors are observed whereas others are not.
The goal of the self-selection research is to establish whether there is a causal relationship from
the built environment to travel behavior, and ultimately to determine the magnitude of this
relationship. Such evidence provides a basis for the adoption of policies that aim to change
travel behavior by changing the built environment. The existence of self-selection does not mean
that the built environment is irrelevant. In fact, we encourage self-selection into smart growth
communities alike to reduce driving if there is an undersupply of these alternative developments.
However, to the extent self-selection exists but is not accounted for, we are likely to overestimate
the influence of the built environment when we use land use policies to reduce travel and
emissions (Mokhtarian and Cao, 2008; Pinjari et al., 2007). Therefore, the relative importance of
built environment effect and self-selection effect matters when we estimate the impacts of land
use policies on travel.
Using the 2006 Greater Triangle Travel Survey in central North Carolina, this study applies
propensity score matching to estimate the causal influence of density on travel behavior. In
particular, it answers the following questions: (1) How large is the causal influence of density on
travel behavior? (2) To what extent is the observed influence of density on travel behavior
attributable to density itself? (3) Do the results derived from the matching differ from those
obtained from the statistical control method? Here, we chose density as the built environment
variable of interest for a few reasons. Density is one of few dimensions of the built environment
that have important effects on travel behavior (Cervero and Kockelman, 1997). Higher densities
are likely to create a critical mass of people on the street – making alternative modes competitive
to driving. Further, many projection studies used density as a key instrument. In the
3
Transportation Research Board report (TRB, 2009), compact development meant doubling the
density of new and redevelopment housing. Ewing et al. (Ewing et al., 2008) defined compact
development as a development whose average “blended” densities of single-family and multi-
family houses are higher than existing densities. Although the influence of density on travel
behavior has been well acknowledged (Steiner, 1994), this study disentangles the self-selection
effect from the observed effect of density on travel behavior. The organization of this paper is as
follows. Section 2 reviews research progress on the issue of residential self-selection. Section 3
describes the propensity score approach. The next section presents the data and variables.
Section 5 discusses modeling results. The last section discusses the limitations of the approach
and summarizes major findings.
2. RESIDENTIAL SELF-SELECTION
Since the 1990s, several studies have reviewed empirical relationships between the built
environment and travel behavior from different prospective (Crane, 2000; Ewing and Cervero,
2001; Frank and Engelke, 2001; Handy, 1996). These studies concluded that many attributes of
traditional neighborhoods (such as high density, mixed land use, and high connectivity) have a
positive association with walking and/or a negative relationship with driving. However,
association does not necessarily mean causality. It is possible that residential self-selection
confounds the relationships.
Recently, many studies have investigated the relationships between the built environment and
travel behavior, controlling for residential self-selection. Among the 38 studies reviewed in Cao
et al. (Cao et al., 2009), many concluded the evidence of residential self-selection, and virtually
every study found a statistically significant influence of the built environment on travel behavior,
controlling for self-selection. It is arguable that the magnitude of an effect is at least as
important as statistical significance of the effect, especially as statistical significance is affected
by sample size (Ziliak and McCloskey, 2004). Therefore, to ascertain whether shaping travel
behavior by changing the built environment is practically significant, it is necessary to evaluate
the size of the causal effect. Further, since residential self-selection exists, the observed
influence of the built environment on travel behavior (without a correction for self-selection)
constitutes the influence of the built environment itself and the influence of self-selection. This
4
finding intrigues planners and make them interested in knowing if the observed influence of the
built environment on travel behavior diminishes substantially once we control for self-selection.
However, few studies have shed light on the share of which the causal influence of the built
environment accounts for its observed influence on travel behavior. Using travel diary data from
the Regional Travel – Household Interview Survey, Salon (Salon, 2006) estimated a three-tiered
nested logit model of residential choice (high density vs. low density), auto ownership, and
walking level. She concluded that the effect of the built environment itself accounted for 1/2 to
2/3 of the effect of a change in population density on walking level in most areas of New York
City. Using the 1998-1999 Austin Travel Survey, Zhou and Kockelman (Zhou and Kockelman,
2008) employed a sample selection model to investigate the causal influence of residential
location on VMT. They found that the causal effect of the built environment (rural and suburban
neighborhoods vs. CBD and urban neighborhoods) accounted for 58% of the total influence of
residential location on VMT. Cao (Cao, 2009) also employed a sample selection model to a
2003 data of respondents living in four traditional and four suburban neighborhoods in Northern
California. He concluded that about 24% of the total influence of neighborhood type on VMD
resulted from residential self-selection. In addition, several studies indicated whose effect of the
built environment itself and the self-selection on travel behavior is stronger, but did not report
exact shares (Chatman, 2009; Frank et al., 2007). The results of these studies were mixed: some
suggested the built environment dominates its observed influence on travel; some indicated self-
selection plays a more important role; and others found their relative contributions depend on
where respondents currently live (see (Cao et al., 2009) for details). Therefore, the relative
contribution of self-selection to the relationships between the built environment and travel
behavior merits further investigation.
With respect to methodologies used in previous studies, nine approaches have been applied in the
literature to address the issue of self-selection in the relationships between the built environment
and travel behavior (Cao et al., 2009; Mokhtarian and Cao, 2008). Among them, the propensity
score method, widely used in the evaluation of social programs, has been recently introduced in
the field of land use and transportation (Boer et al., 2007).
5
3. PROPENSITY SCORE MATCHING
Empirical studies on the relationships between the built environment and travel behavior rely on
observational data. The assignment of treatment is often nonrandom in observational studies.
Accordingly, observations in a treatment group may differ systematically from those in a control
group. In this context, people living in high-density areas tend to be less affluent, have fewer
cars, live in a smaller household, and prefer higher access to workplace and services than their
counterparts in low-density areas, a result of residential self-selection. Therefore, the observed
difference in travel behavior outcomes between the groups is confounded by the self-selection.
That is, it may be a biased estimate for the treatment effect of density on travel behavior. To
reduce the bias, we can match residents into strata based on their characteristics, and then
compare travel behavior between residents living in high-density and low-density areas that were
grouped into the same stratum (Rosenbaum and Rubin, 1984).
The matching roughly resembles an experiment with random assignment of treatment.
For the illustration purpose, let’s assume that residential choice is confounded by a single
variable – income. For a resident who lives in high-density areas (treatment) and has an income
of $50,000, the match will be a resident who lives in low-density areas (control) and has an
income of $50,000 (or within a pre-specified range, say $49,000-$51,000). That is, we aim to
find an almost “identical” observation in the control group for an observation in the treatment
group. This matching is approximately equivalent to the process in which one of the two “same”
observations is assigned into the treatment group and the other is assigned into the control group.
If we repeat this process for all observations in the original treatment group, observations in the
matched treatment group should not, on average, differ from those in the matched control group.
We can then compare travel behavior between matched treatments and controls.
When a treatment group differs in many characteristics from a control group, the matching
should be based on a scalar that can integrate all of the characteristics (Rosenbaum and Rubin,
1984). The propensity score is a scalar function that can be used to balance multiple
characteristics. According to their definition, the propensity score in this context is the
conditional probability that an individual lives in one type of neighborhoods (high-density vs.
low-density neighborhoods) given her observed characteristics. The propensity score can be
6
estimated using discrete choice models. Using large and small sample theory, Rosenbaum and
Rubin (Rosenbaum and Rubin, 1983) have proved that “adjustment for the scalar propensity
score is sufficient to remove bias due to all observed covariates [i.e., characteristics/variables]”
(p.41).
In this study we used the software of Limdep 9.0. We chose the caliper matching option with
caliper width being 0.01 since the caliper width of 0.01 or 0.02 is commonly adopted in
empirical studies (Oakes and Johnson, 2006). The matching process is as follows: if the
propensity score of a treatment observation is p, the matched control observations are those
whose propensity scores are within the range of
01.0
±
p
. The treatment observation and
matched control observations comprise a stratum. A treatment observation can have one or more
matched control observations. On the other hand, if there are no eligible control observations for
a treatment observation, it is removed from further analysis. We repeat this process for all
treatment observations and create a number of strata. Note that control observations may be used
more than once and may not be used at all.
The goal of propensity score matching is to estimate the causal influence of density on travel
behavior (or average treatment effect (ATE) of density). In Limdep 9.0, the treatment effect for
each stratum is calculated as the difference between
T
O
and
C
O
, where
T
O
is the travel
behavior outcome of a treatment observation (an individual in high-density areas) and
C
O
is the
mean outcome of controls (matches in low-density areas). The ATE is the average of treatment
effects over those strata (Greene, 2007).
The causal influence of the built environment on travel can also be derived from other methods.
Statistical control is a commonly-used approach in addressing residential self-selection (Cao et
al., 2009). It explicitly accounts for the influences of confounding factors in analyzing travel
behavior, by measuring them and including them in the travel behavior equation. Although
propensity score matching and statistical control can use the same set of variables, they are
different. Conceptually, propensity score matching controls for the observed characteristics that
affect whether an individual is assigned to a treatment group or a control group. The attention is
directed to the imbalance in the values of covariates between treatment and control groups.
7
Statistical control identifies the determinants of travel behavior through incorporating them
directly into the behavior equation, so that we can account for all differences between treatment
and control groups that affect the behavior. The attention is directed to the behavioral outcome
(Winship and Morgan, 1999).
Empirically, first, a model estimating propensity scores is a prediction model so it is not
necessary to evaluate multicolinearity and statistical significance of explanatory variables (Oakes
and Johnson, 2006). However, multicolinearity and statistical significance are important for an
explanatory model in the statistical control approach. Thus, if multicolinearity among
independent variables is a potential problem in the data, it seems that propensity score matching
is superior to statistical control. Second, the propensity score approach generally requires a
discrete classification (binary, nominal, or ordinal) of treatments and controls. Therefore, we
need to transform continuous measurements of the built environment into discrete scales. This
transformation may lead to a loss in efficiency. In contrast, statistical control can accommodate
both continuous and discrete variables. Third, propensity score matching will discard some
observations if no match is found for some treatment observations or if control observations are
not close (in terms of propensity score) to any treatment observations. This may not be desirable
because resources are wasted.
Although both methods can produce unbiased estimates if implemented appropriately, it is
interesting to illustrate how empirical results derived from these approaches differ from each
other. This may carry important messages since we often use point estimates of treatment effects
to evaluate the relative contributions of the built environment itself and self-selection to travel
behavior (Cao, 2009; Salon, 2006; Zhou and Kockelman, 2008). It is worth noting that we
intended to compare the results derived from propensity score matching, statistical control, and
sample selection model. However, we have tried several model specifications for the sample
selection model and failed to obtain reasonable outcomes. Since some scholars questioned the
reliability of sample selection model specification (Brownstone and Golob, 2009; Zhou and
Kockelman, 2008), the model should be used with caution.
8
4. DATA AND VARIABLES
This study uses an existing travel diary dataset from the Triangle area of North Carolina – the
2006 Greater Triangle Travel Survey. This survey was sponsored by the Capital Area
Metropolitan Planning Organization, the Durham-Chapel Hill-Carrboro Metropolitan Planning
Organization, the Triangle Transit Authority, and the North Carolina Department of
Transportation. The survey entailed the collection of activity and travel information for all
household members during a specific 24-hour period. The survey relied on the willingness of
regional households to 1) provide demographic information about the household, its members
and its vehicles, and 2) have all household members record all travel-related details, including
address information for all locations visited, trip purpose, mode choice, and travel times. Due to
variances in response rates, incentives were offered to selected households (such as those without
vehicles, those living in the outlying counties who were of African American descent, and
university students). The survey was accompanied by an extensive public information campaign
that was designed to emphasize the importance of and benefits from participation. The response
rate of this survey is 25%. This study uses 3,480 households in Durham, Orange and Wake
Counties, North Carolina (Figure 1).
[Insert Figure 1 here]
The dependent variables are personal miles travelled (PMT) and travel duration by mode. Based
on the addresses of trip origins and destinations, trip lengths were calculated using the network
distance function of ArcGIS 9.0. Here, it is assumed that individuals will choose the shortest
path between origins and destinations. PMT was computed by summing lengths of all trips that
an individual made by all modes. Mode-specific travel duration variables, measured in minutes,
were directly derived from travel survey, including driving duration and transit duration.
Based on the population density of census block groups, residential locations of respondents
were classified into two categories: high-density areas and low-density areas. Specifically,
census block groups with 5.225 or more persons per acre (70 percentile of the density of 448
census block groups) were classified as high-density areas. The explanatory variables for
residential choice include two groups: residential preferences and socio-demographics. In the
survey, respondents were asked to indicate the factors that they considered when they moved to
9
their current home. The five pre-specified factors include job location/length of commute;
access to transit; quality of school district/access to desirable school; crime level/neighborhood
safety; and neighborhood appearance/other amenities. Although these factors cover only a few
dimensions of residential choices and were measured in a Yes/No way, their presence in a
regional travel survey allows us to explore the fundamental differences in residential preferences
among individuals and their influence on residential and travel choices. It is worth noting that
residential preferences were reported not by each member in the household but as joint
household decisions. In this study, it is assumed that household members younger than 18 years
old did not participate in residential location choice and hence were removed from the analysis.
As a result, the analysis of this study is based on 5,537 adults in the 3,480 households. The
survey also includes a list of socio-demographic variables such as household income, household
size, auto ownership, employment status, gender and so on.
As illustrated in Table 1, there are significant differences between residents living in high-density
and low-density areas. Residents living in high-density areas had a tendency to travel fewer
miles, drive shorter, and take transit longer than those in low-density areas. Different travel
patterns are consistent with the differences in socio-demographics and residential preferences.
Those in high-density areas tended to have a lower income, have fewer vehicles, live in a smaller
household, and be younger and White than their counterparts in low-density areas. Further,
when making residential choices, the former are more likely to value length of commute and
access to transit than the latter, whereas the latter are more likely to consider neighborhood
amenities. These statistics suggest that residential self-selection may be at work. We employ
propensity score matching to isolate the self-selection effect.
[Insert Table 1 here]
5. RESULTS
5.1 The impacts on density on travel behavior
A binary probit model was used to estimate the propensity score (the predicted probability that a
resident lives in high-density areas). Table 2 presents the final model. Pseudo R-square of the
model is 0.298, meaning that 29.8% of the uncertainty in the data was explained by the
10
information in the model (Hauser, 1978). As expected, household income, household size, auto
ownership, age, and being White are negatively associated with the choice of high-density areas.
Individuals who considered access to transit in residential choice were more likely to live in
high-density areas. Those valued neighborhood amenity and safety had a tendency to choose
low-density areas although neighborhood amenity is not significant at the 0.05 level. Length of
commute is insignificant in the model, but its interaction with White is significantly and
positively associated with the choice of high-density areas. The interaction of neighborhood
safety and White is insignificant in the model.
[Insert Table 2 here]
The inclusion of both interaction terms in the model is to ensure the mean characteristics for
treatments and controls are statistically equal for individuals in specific ranges of the propensity
score. After matching on propensity scores, it is important to evaluate whether confounding
variables (residential preferences and socio-demographics) are balanced between treatments and
controls. If they are unbalanced, differences between treatments and controls and hence
selection bias still exist. We need to propose new model specifications. Previous research
suggested that incorporating the unbalanced variable, its high-order form (such as polynomial
terms), and its interaction with other variables may achieve the balance of all variables (Oakes &
Johnson, 2006; Rosenbaum & Rubin, 1984). After several attempts, we found that the inclusion
of the interaction terms satisfied the balance hypothesis (as shown in Table 2).
The task of matching accomplishes once the balance hypothesis is satisfied. Limdep 9.0
produced the ATE according to the procedure described in Section 3. As shown in Table 3, the
ATEs of density on PMT and transit duration are significantly different from 0; the ATE on
driving duration differs from 0 at the 0.1 level. The ATE of density on PMT is 3.31 miles,
meaning that after removing self-selection bias, residents living in high-density areas on average
travel 3.31 fewer miles per person per day than those who lived in low-density areas. This 3.31-
mile effect accounts for 18.6% of the average PMT (17.82 miles) for the whole sample. The
effect of residential self-selection on PMT is 1.32 miles (= 4.63 – 3.31). Therefore, density and
self-selection contribute to 72% and 28% of the observed impact of density on PMT,
respectively.
11
[Insert Table 3 here]
Regarding travel time, the ATE of density on driving duration accounts for 5.4% of the average
driving duration for the whole sample. The point estimate for the ratio between the ATE and
observed effect is 34% - meaning that density contributes to 34% of the observed effect of
density on driving duration. This indicates that long driving time observed in low-density areas
is more of a result of residential self-selection than that of density. Because self-selection
accounts for only 28% of the observed influences of density on PMT, it seems that when making
residential location choices, people are more likely to assess their choices based on driving time
rather than distance travelled, presumably trying to manage their travel within their travel time
budgets. The ATE of density on transit duration is equivalent to the mean transit duration for the
whole sample. This highlights the important influence of density on transit usage. The ratio
between the ATE and observed effect is about 50%, suggesting that density and self-selection are
equally important in explaining the variation in transit duration.
The discussions above were based on the point estimates of ATE. The point estimate from a
sample is just one estimate for the population parameter. It is more or less different from the
population parameter due to chance error. If the chance error is large, the point estimate is not
precise. Accordingly, it may have a critical impact on the share of the observed influence of
density on travel behavior that is attributable to density. The 95% confidence interval captures
the size of chance error with a confidence level of about 95%. In other words, if we draw
numerous samples from the population and calculate the 95% confidence interval for each
sample, the population parameter is caught inside the intervals about 95% of times (Freedman et
al., 2007). The 95% confidence intervals for the ATE are pretty wide in this sample; so are the
ratios between the ATE and observed effect are pretty wide. For instance, with a 95%
confidence, the causal influence of density on PMT accounts for 46% - 97% of the observed
influence of density. Although this interval covers 50%, it seems that density plays a more
important role in influencing PMT than does self-selection. By contrast, the ratio for driving
duration ranges from 7% to 75% and the ratio for transit duration ranges from 5% to 96%. That
is, the causal influence of density on driving duration can account for as large as 75% and as
small as 7% of the observed influence of density. Given this wide range, we are still uncertain
12
whose effect of density and self-selection is more important. Therefore, when we evaluate the
relative contributions of the built environment itself and self-selection to travel behavior, we
should discuss both point estimates and their confidence intervals.
Because the choice of caliper width may influence point estimate of ATE, we conducted a
sensitivity analysis for different caliper widths. As shown in Table 4, the point estimates of the
ATE are fairly consistent for PMT and driving duration when the width increases from 0.003 to
0.03. However, the estimates for transit duration have a relatively large variation. Presumably,
few people in the sample took transit and hence excessive zeros in the data lead to unstable
results.
[Insert Table 4 here]
5.2 Comparison with the statistical control approach
Linear regression models (not shown) were developed for the travel behavior variables. For
illustration purpose, the independent variables in the regression models include the density
dummy variable and all those in the probit model of the propensity score (Table 2). As shown in
Table 3, the coefficient estimates for the density dummy in the equations are within 95%
confidence intervals for the ATEs, respectively; the point estimates for ATEs also fall in 95%
confidence intervals for the coefficients, respectively. However, the relative differences between
point estimates can be as large as 28%. Therefore, when we use point estimates to evaluate the
relative contributions of the built environment itself and self-selection to travel behavior, the
choice of approaches may make a difference.
6. CONCLUSIONS
This study applies propensity score matching to determine the causal effect of density on travel
behavior and the relative contribution of density and self-selection to travel behavior. Although
the propensity score method can be used to estimate treatment effects, it is not a panacea for
addressing selection bias. The method assumes that all variables affecting treatment assignments
are measured through observed characteristics, hidden bias can be a potential concern
(Rosenbaum and Rubin, 1983). If unmeasured characteristics (for example, attitudes were not
measured in most regional travel surveys) are a source of self-selection, this approach cannot
13
compensate for that. In this study, we included several attitudinal factors and hence at least
partly addressed the hidden bias problem. However, the hidden bias can be a concern due to the
size of our pseudo R-square.
Nevertheless, this study provides insightful evidence to understand the causal influence of
density on travel behavior. First, the results suggest that if residential self-selection is not
controlled for, we are likely to overestimate causal influences of the built environment on travel
behavior. Based on point estimates, we found that the effects of density on PMT, driving
duration, and transit duration accounted for 72%, 34%, and 51% of the observed influences of
density, respectively. However, the 95% confidence intervals are pretty wide and hence make
uncertain the answers to the following question: which effect is larger, density or self-selection?
The model results also showed that the causal effect of density on PMT is considerable – a
random individual is likely to reduce her PMT by 18.6% if moving from a low-density area to a
high-density one (the 95% confidence interval ranges from 11.8% to 25.3%). This sizable
influence provides a supportive evidence for the ability of changes in the built environment to
stimulate meaningful changes in travel.
Previous studies often use point estimates to evaluate the relative contributions of the built
environment itself and self-selection to travel behavior. An evaluation using the 95% confidence
interval shows large variation in the results, which undermines the practical relevance of point
estimates reported in previous studies. Further research is needed to understand the range of the
variation. Further, different modeling approaches may produce different point estimates for the
impact of the built environment on travel behavior although the estimates can be consistent. A
confidence interval is likely to provide reliable estimates; however, it seems to be not precise.
ACKNOWLEDGEMENTS
Thank Zhiyi Xu and Fangfang Sun for their help on modeling. Michael Oakes help clarify some
concepts of propensity score matching.
REFERENCES
Boer R, Zheng Y, Overton A, Ridgeway G K, Cohen D A, 2007, "Neighborhood Design and Walking
Trips in Ten U.S. Metropolitan Areas" American Journal of Preventive Medicine 32 298-304
14
Brownstone D, Golob T F, 2009, "The impact of residential density on vehicle usage and energy
consumption" Journal of Urban Economics 65 91-98
Cao X, 2009, "Disentangling the influence of neighborhood type and self-selection on driving behavior:
an application of sample selection model" Transportation 36 207-222
Cao X, Mokhtarian P L, Handy S L, 2009, "Examining the Impacts of Residential Self-Selection on
Travel Behaviour: A Focus on Empirical Findings" Transport Reviews 29 359-395
Cervero R, Kockelman K, 1997, "Travel demand and the 3Ds: Density, diversity, and design"
Transportation Research Part D: Transport and Environment 2 199-219
Chatman D G, 2009, "Residential choice, the built environment, and nonwork travel: evidence using new
data and methods" Environment and Planning A 41 1072-1089
Crane R, 2000, "The Influence of Urban Form on Travel: An Interpretive Review" Journal of Planning
Literature 15 3-23
EPA, 2009, "Inventory of U.S. Greenhouse Gas Emissions and Sinks: 1990-2007", (Environmental
Protection Agency, Washington, DC)
Ewing R, Bartholomew K, Winkelman S, Walters J, Chen D, 2008, "Growing Cooler: The Evidence on
Urban Development and Climate Change", (Urban Land Institute, Washington, DC)
Ewing R, Cervero R, 2001, "Travel and the Built Environment: A Synthesis" Transportation Research
Record: Journal of the Transportation Research Board 1780 87-114
Frank L D, Engelke P O, 2001, "The Built Environment and Human Activity Patterns: Exploring the
Impacts of Urban Form on Public Health" Journal of Planning Literature 16 202-218
Frank L D, Saelens B E, Powell K E, Chapman J E, 2007, "Stepping towards causation: Do built
environments or neighborhood and travel preferences explain physical activity, driving, and obesity?"
Social Science & Medicine 65 1898-1914
Freedman D, Pisani R, Purves R, 2007 Statistics (Norton, New York)
Greene W H, 2007 LIMDEP 9.0 Econometric Modeling Guide (Econometric Software, Plainview: NY)
Handy S, 1996, "Methodologies for exploring the link between urban form and travel behavior"
Transportation Research Part D: Transport and Environment 1 151-165
Hauser J R, 1978, "Testing the accuracy, usefulness, and significance of probabilitic choice models: An
information-theoretic approach" Operations Research 26 406-421
Mokhtarian P L, Cao X, 2008, "Examining the impacts of residential self-selection on travel behavior: A
focus on methodologies" Transportation Research Part B: Methodological 42 204-228
Oakes M J, Johnson P J, 2006, "Propensity score matching for social epidemiology", in Methods in
epidemiology Eds M J Oakes, J S Kaufman (John Wiley & Sons, Inc., New York) pp 370-392
Pinjari A, Pendyala R, Bhat C, Waddell P, 2007, "Modeling residential sorting effects to understand the
impact of the built environment on commute mode choice" Transportation 34 557-573
Rosenbaum P R, Rubin D B, 1983, "The Central Role of the Propensity Score in Observational Studies
for Causal Effects" Biometrika 70 41-55
Rosenbaum P R, Rubin D B, 1984, "Reducing Bias in Observational Studies Using Subclassification on
the Propensity Score" Journal of the American Statistical Association 79 516-524
Salon D, 2006 Cars and the city: An investigation of transportation and residential location choices in
New York city, Agricultural and Resource Economics, University of California, Davis
Steiner R L, 1994, "RESIDENTIAL DENSITY AND TRAVEL PATTERNS: REVIEW OF THE
LITERATURE" Transportation Research Record: Journal of the Transportation Research Board 1466
37-43
TRB, 2009, "Driving and the Built Environment: The Effects of Compact Development on Motorized
Travel, Energy Use, and CO2 Emissions", (Transportation Research Board, Washington, DC)
Winship C, Morgan S L, 1999, "The estimation of causal effects from observational data" Annual Review
of Sociology 25 659-706
Zhou B, Kockelman K, 2008, "Self-Selection in Home Choice: Use of Treatment Effects in Evaluating
Relationship Between Built Environment and Travel Behavior" Transportation Research Record: Journal
of the Transportation Research Board 2077 54-61
15
Ziliak S T, McCloskey D N, 2004, "Size matters: the standard error of regressions in the American
Economic Review" Journal of Socio-Economics 33 527-546
16
Table 1. Differences between Residents in High-Density and Low-Density Areas
High Density Low Density Difference P-value
(N=1,184) (N=4,353)
Travel behavior
PMT 14.188 18.814 -4.626 0.000
Driving duration 54.754 64.585 -9.831 0.000
Transit duration 4.281 0.869 3.412 0.000
Demographics
Household income 4.660 5.457 -0.797 0.000
Household size 2.481 2.664 -0.182 0.000
Auto ownership 1.793 2.184 -0.391 0.000
Female 0.552 0.543 0.009 0.569
Age 47.993 49.304 -1.311 0.013
White 0.717 0.856 -0.139 0.000
Employed 0.742 0.750 -0.007 0.613
# of jobs 0.102 0.116 -0.013 0.184
Residential preferences
Length of commute 0.569 0.532 0.038 0.021
Access to transit 0.185 0.101 0.084 0.000
Access to desirable school 0.327 0.328 -0.001 0.926
Neighborhood safety 0.334 0.357 -0.023 0.132
Neighborhood amenity 0.538 0.594 -0.056 0.001
Table 2. Binary Probit Model for Propensity Score
Coefficients P-values
Constant 0.576 0.000
Socio-demographics
Household income -0.095 0.000
Household size -0.032 0.084
Number of cars -0.135 0.000
Age 0.004 0.003
White -0.503 0.000
Residential preferences
Length of commute -0.030 0.736
Access to transit 0.281 0.000
Neighborhood safety -0.189 0.460
Neighborhood amenity -0.066 0.129
White x Neighborhood safety 0.136 0.191
White x Length of commute 0.265 0.007
N 5537
Log-likelihood at 0 -3838.0
Log-likelihood at convergence 2693.8
McFadden R-square 0.298
17
Table 3. Observed and Treatment Effects of Density on Travel Behavior
Person Mile
Travelled
Driving
Duration
Transit
Duration
Mean 17.82 62.48 1.60
Observed Effect -4.63 -9.83 3.41
Propensity
Score
Matching
ATE -3.31 -3.37 1.73
P-value 0.000 0.100 0.030
95% CI [-4.51, -2.11] [-7.38, 0.65] [0.17, 3.29]
Statistical
Control
Coefficient -3.33 -4.69 1.52
P-value 0.000 0.012 0.000
95% CI [-4.50, -2.16] [-8.36, -1.02] [0.71, 2.33]
Relative Difference 0.6% 28.1% 12.1%
ATE/Observed Effect 71.5% 34.3% 50.7%
ATE/Mean 18.6% 5.4% 108.1%
Note: ATE = Average Treatment Effects; CI = Confidence Interval; Relative Difference = ABS(ATE-
Coefficient)/MAX(ATE, Coefficient).
Table 4. Sensitivity Analysis of Caliper Width
Caliper
Width
Person Mile
Travelled
Driving
Duration
Transit
Duration
0.003 -3.42 -3.43 1.24
0.005 -3.33 -3.31 1.46
0.01 -3.31 -3.37 1.73
0.02 -3.13 -3.40 1.85
0.03 -3.20 -3.67 2.03
18
Figure 1. Study area (Orange, Durham, and Wake Counties, NC)
19