ArticlePDF Available

Heterogeneity Distributions of Willingness-to-Pay in Choice Models

Authors:

Abstract and Figures

We investigate direct and indirect specification of the distribution of consumer willingness-to-pay (WTP) for changes in product attributes in a choice setting. Typically, choice models identify WTP for an attribute as a ratio of the estimated attribute and price coefficients. Previous research in marketing and economics has discussed the problems with allowing for random coefficients on both attribute and price, especially when the distribution of the price coefficient has mass near zero. These problems can be avoided by combining a parameterization of the likelihood function that directly identifies WTP with a normal prior for WTP. We show that the typical likelihood parameterization in combination with what are regarded as standard heterogeneity distributions for attribute and price coefficients results in poorly behaved posterior WTP distributions, especially in small sample settings. The implied prior for WTP readily allows for substantial mass in the tails of the distribution and extreme individual-level estimates of WTP. We also demonstrate the sensitivity of profit maximizing prices to parameterization and priors for WTP.
Content may be subject to copyright.
Heterogeneity distributions of willingness-to-pay
in choice models
Garrett Sonnier &Andrew Ainslie &Thomas Otter
Received: 3 January 2006 / Accepted: 8 March 2007 /
Published online: 9 August 2007
#Springer Science + Business Media, LLC 2007
Abstract We investigate direct and indirect specification of the distribution of
consumer willingness-to-pay (WTP) for changes in product attributes in a choice
setting. Typically, choice models identify WTP for an attribute as a ratio of the
estimated attribute and price coefficients. Previous research in marketing and
economics has discussed the problems with allowing for random coefficients on both
attribute and price, especially when the distribution of the price coefficient has mass
near zero. These problems can be avoided by combining a parameterization of the
likelihood function that directly identifies WTP with a normal prior for WTP. We
show that the typical likelihood parameterization in combination with what are
regarded as standard heterogeneity distributions for attribute and price coefficients
results in poorly behaved posterior WTP distributions, especially in small sample
settings. The implied prior for WTP readily allows for substantial mass in the tails of
the distribution and extreme individual-level estimates of WTP. We also demonstrate
the sensitivity of profit maximizing prices to parameterization and priors for WTP.
Keywords Bayesian analysis .Choice modeling .Willingness-to-pay
JEL classification C11 .M31
Quant Market Econ (2007) 5:313331
DOI 10.1007/s11129-007-9024-6
G. Sonnier
University of Texas at Austin, 1 University Station, Austin, TX 78712, USA
e-mail: garrett.sonnier@mccombs.utexas.edu
A. Ainslie (*)
UCLA, 110 Westwood Plaza, Los Angeles, CA 90095, USA
e-mail: andrew.ainslie@anderson.ucla.edu
T. Otter
Ohio State University, 2100 Neil Avenue, Columbus, OH 43210, USA
e-mail: otter_2@cob.osu.edu
1 Introduction
Markov chain Monte Carlo (MCMC) techniques to simulate from a distribution have
facilitated the exploration of any aspect of the distribution under study, including the
distribution of any transformation of random variables. Once a sample of random
variables from the distribution is available, change-of-variable calculus is no longer
needed to derive the distribution of the transformation. The researcher is free to
empirically explore the posterior distribution of the transformation by simply
computing the transformation on each iteration of the sampler (e.g., Edwards and
Allenby 2003). This technique is referred to as post-processingMCMC draws.
Posterior summaries of any transformation of parameters are easily obtained from
the MCMC output. This can be particularly advantageous when the estimation
problem is difficult or intractable in the space of interest, but tractable in another
space. Post processing readily allows the researcher to move between the two spaces.
While the advantages of post-processing techniques are well-documented, less
attention has been paid to the fact that the priors used for the parameters in one space
necessarily form an implied prior on the transformed parameters.
It is well known that in a Bayesian model, a change in the likelihood param-
eterization must be reflected in the prior to leave the posterior predictive density
unchanged. Hierarchical models introduce a prior distribution on the parameters
across the observational units. Changes in the parameterization of the full conditional
likelihood will alter the predictive density of the hierarchical model unless the prior
distribution is adapted accordingly. In applied work, the choice of parameterization is
often viewed in isolation of the prior distribution, which is typically chosen for
analytic convenience (e.g., conjugacy). However, a convenient and diffuse prior in
one space does not necessarily result in an equivalent implied prior in the transformed
space (Rossi et al. 2005).
An interesting example of transforming model parameters with relevance to
marketing and economics occurs when estimating the willingness-to-pay (WTP) for
changes in product attributes using choice data. In this paper, we contrast two
approaches to estimating the distribution of WTP with choice models. In the first
approach WTP is defined as the ratio of attribute and price parameters and the
implied prior distribution for WTP is a function of the priors for these parameters.
The posterior of the WTP distribution is explored empirically via post-processing.
This is consistent with the work of Meijer and Rouwendal (2006), which investigates
the properties of WTP defined as a ratio for different distributions of the numerator
and the denominator. The second approach re-parameterizes the full conditional
likelihood to directly identify WTP. This allows the researcher to directly implement
a prior for WTP. We demonstrate the sensitivity of inferences about WTP to different
parameterizations in combination with what are regarded as standard assumptions
about the hierarchical prior (i.e., the heterogeneity distribution). The sensitivity is
particularly pronounced in small sample settings. We show how a normal prior
directly specified for WTP results in better inferences. Moreover, not only is the
posterior of WTP sensitive to the different parameterization and prior assumptions,
but so are all marketing actions derived from the distribution of WTP, such as the
setting of profit maximizing prices. The results illustrate the practical importance of
paying attention to implied priors.
314 G. Sonnier et al.
Implied priors can be problematic even if there is no interest in the transformed
parameters themselves. In choice-based conjoint (CBC), for example, the analyst
may not be interested in the model coefficients or WTP, per se, but rather in using
the model to analyze demand and pricing policies. The reservation and equalization
prices, which completely characterize incidence and switching behavior in response
to price changes, are a function of attributes and attribute WTPs (Jedidi and Zhang
2002).
1
If the distributions of reservation and equalization prices are impacted by the
implied priors on WTP, demand estimates and price policies will be dramatically
affected, as we will demonstrate. While certain point estimates of the distribution
may be less influenced by the implied prior (e.g., the median versus the mean), any
investigation of the nature of consumer demand will require the researcher to
consider more than just a particular statistic.
The organization of the remainder of the paper is as follows: Section 2presents
two parameterizations of choice models that result in equivalent full conditional
likelihoods. It then discusses how the two parameterizations result in different prior
predictive and posterior densities depending on the choice of the prior, particularly
once one introduces heterogeneity. Section 3illustrates the size of the effect using
simulated data. Section 4presents the results from two CBC studies. Section 5
summarizes and offers a brief discussion on the role of prior information in conjoint
analysis.
2 Utility and surplus maximization
2.1 Equivalence of likelihood functions
Consider first consumersdiscrete choice problem as that of maximizing an indirect
utility function. We have consumers choosing among alternatives on each of
occasions. Let V*
ijt denote consumer is indirect utility for alternative jon choice
occasion t. It is assumed that indirect utility can be expressed as a linear function of
the alternatives non-price attributes, x
ijt
, income y
i
and price p
ijt
.
V*
ijt ¼x0
ijtϕ*þγ*y
ipijt

þ"*ijt with V*
i0t¼"*
i0t:ð1Þ
We assume the error terms are independent and identically distributed according
to a type I extreme value distribution. For exposition, we initially leave the scale
parameter as unknown, "*
ijt EV 0;μðÞ. It is well-known that multiplying the
indirect utility function for each choice by a constant does not change the utility
maximizing alternative. Thus, V*
ijt must be normalized, which is typically
accomplished by standardizing the error distribution to be EV (0, 1) such that
Vijt ¼x0
ijtϕþγyipijt

þ"ijt ¼x0
ijtϕ*
μþγ*
μyipijt

þ"*
ijt
μ:ð2Þ
1
The reservation price is the price that induces indifference between purchase and non-purchase in the
category. The equalization price (Swait et al.1993) is the price that induces indifference between two
choice alternatives within the category.
Heterogeneity distributions of willingness-to-pay 315
The familiar MNL choice probabilities take the form
Pru
ijt ¼
exp x0
ijtϕ*γ*pijt
μ

1þP
J
m¼1
exp x0
imtϕ*γ*pimt
μ
hi
2
6
6
6
4
3
7
7
7
5¼
exp x0
ijtϕγpijt
hi
1þP
J
m¼1
exp x0
imtϕγpimt
½
2
6
6
6
4
3
7
7
7
5ð3Þ
where the superscript udenotes the probability obtained using the utility model.
WTP for an improvement in x
ijkt
, the kth attribute of alternative j, is the price change
that would leave the individual indifferent between the alternative with the new level
and the alternative with the original level. For continuous x, we have @Vijt ¼
ϕk@xijkt g@pijt ¼0 and the change in price that keeps utility constant given a
change in attribute kis ϕk
γ¼ϕ*
k
γ*(Train 2003). Note that the scale parameter μdrops
out of the WTP.
We can re-parameterize the indirect utility function in (1) by dividing through by γ*.
V*
ijt
γ*¼x0
ijt
ϕ*
γ*þyipijt

þ"*
ijt
γ*
Cijt ¼x0
ijt βþyipijt

þηijt
ð4Þ
In this reparameterization, C
ijt
is consumer is surplus from good jon purchase
occasion t(Jedidi et al. 2003). Surplus is determined in part by the attributes of the
products in the set, x
ijt
, and the WTP for the attributes, β. Consumers arrive at their
choices by maximizing the surplus (i.e., the difference between the monetary value
of the attribute bundle and the price to acquire the bundle) among the Jalternatives
in a set on occasion t. The MNL choice probability associated with the surplus
model is
Prs
ijt ¼
exp x0
ijtβpijt
μ

1þP
J
m¼1
exp x0
imtβpimt
μ
hi
2
6
6
6
4
3
7
7
7
5ð5Þ
where the superscript sdenotes the surplus model. The probability expressions in
Eqs. (3) and (5) are equivalent over the range of parameters for which the
transformations b¼ϕ
gand m¼1
gare well defined. In the case of maximum
likelihood (ML) estimation, the Invariance Property of the ML estimator ensures that
precisely the same point estimates of WTP will be achieved regardless of whether
the likelihood is based on (3)or(5) (Cameron and James 1987).
2.2 Bayesian analysis, priors, and posterior distributions for WTP
The ML estimator of the WTP ratio, defined as the ratio of the ML estimates of
8
k
and γ, does not possess finite moments and has infinite risk relative to quadratic and
many other loss functions (Zellner 1978). In a Bayesian framework the problems
associated with the ML estimator are alleviated by the introduction of informative
prior distributions. The model thus consists of the full conditional likelihood for the
data and the prior distribution for the model parameters. A hierarchical prior
distribution defined on the positive real line for γsolves the problem of positive
316 G. Sonnier et al.
WTP for a decrease in utility and ensures that the prior and posterior moments of the
ratio are finite. The prior and posterior for the ratio is implied by the same for the
numerator and denominator.
In the context of random coefficient models, Meijer and Rouwendal (2006)
discuss the properties of the WTP ratio for a number of different distributions for
8
k
and γ. Only in special cases (e.g., a log-normal distribution for both coefficients)
does the ratio of coefficients follow the same distribution as the coefficients. This
implies that, generally, the distributional form of the prior used with the likelihood in
(5) will differ from that implied by the prior for
8
k
and γ. Thus, unlike ML estimates
of WTP from the homogenous model, the posterior distribution of WTP formed by
mixing the likelihoods in (3) and (5) with priors for the respective coefficients will
generally result in distinct posterior WTP distributions and distinct characterizations
of demand as a function of price. What discrepancy can we expect from these two
approaches? To the extent that the data overwhelm the prior, the posterior WTP
distributions from the two approaches will converge despite the differences in the
prior. This will generally happen for models that impose homogeneity on the
coefficients. The more interesting case occurs with hierarchical models.
In hierarchical models, we typically encounter many units (e.g., consumers) and
relatively few observations per unit. Thus, the full conditional likelihood of any one
consumer is informed by a limited amount of data and the prior distribution will
generally have much more influence on the posterior compared with a homogenous
model. A hierarchical model may build on either parameterization, using either of
the likelihood functions in Eqs. (3)or(5) as the full conditional likelihood.
2
The
likelihood in Eq. (3) in combination with a (hierarchical) prior for γ
i
that has positive
density arbitrarily close to zero readily accommodates respondents that do not appear
to be sensitive to price. Such respondents can in turn have a tremendous influence on
the posterior WTP distribution implied by the model and thus on any characteriza-
tion of demand as a function of price. Hierarchical models with likelihood functions
built on Eq. (5), measure WTP directly by β
i
. An advantage of this formulation is
that a hierarchical prior for WTP can be specified directly.
3
For example, a normal
prior for β
i
will place less mass on absolutely large WTP values.
The problems we outline with the WTP ratio are neither unique to choice models
nor WTP. They apply to any quantity that can be defined as a ratio of model
parameters. However, estimation of WTP (and the related concept of reservation
price) is a particularly relevant problem in marketing and economics. Recently, the
marketing literature has sharpened its focus on the study of WTP and reservation
prices because of the direct implications for pricing strategy (Jedidi and Zhang 2002;
Jedidi et al. 2003; Shaffer and Zhang 1995,2000). The economics literature has
recognized the potential problems with random coefficient ratio estimates of WTP
(Meijer and Rouwendal 2006; Revelt and Train 1998). Marketing practitioners have
2
We adopt a fully Bayesian approach to inference in this paper. However, the points made apply in the
context of hierarchical models independent of the estimation technique.
3
Another motivation for directly parameterizing the model in WTP terms is that the researcher often has
variables such as demographics that may be useful in the characterization of consumer heterogeneity. In
such instances it seems more reasonable to build hierarchical regression structures for WTP instead of
parameters with less clear interpretation. We do not explore this issue here.
Heterogeneity distributions of willingness-to-pay 317
also recognized the problems, advocating use of the median as a summary of the
posterior WTP distribution (Orme 2001). While the median will likely be a more
robust statistic, Bayesian decision theoretic analyses of demand as a function of price
aimed at identifying optimal actions rely on the entire posterior distribution of WTP.
To the extent that the posterior distribution of WTP is sensitive to the prior
assumptions, so will the optimal action.
2.3 Optimal pricing
Ignoring the implied prior on WTP can adversely impact demand and price analyses.
Firms often use the model coefficients estimated from CBC data to build market
share simulators, which are useful for assessing response to price changes and
optimal pricing. Given a set of non-price attributes, market share (and demand) can
be completely characterized by consumer surplus. Consumers choose the inside
alternative that yields the maximum surplus and forgo a category purchase if the
surplus from the best alternative is less than the surplus generated by the outside
alternative. The price that determines the incidence and choice decisions is the
reservation price, e
pijt, which induces indifference between buying alternative jand
forgoing a category purchase. For the surplus model, e
pijt ¼x0
ijtbiþηijt ηi0t

.
Importantly, any proper indirect utility function implies a function for e
pijt. In our
case, the reservation price from the utility model is e
pijt ¼x0
ijtϕiþ"ijt "i0t
ðÞ
γi
. From this
equation, we can see that the change in the reservation price given a change in an
attribute is given by the WTP for that attribute.
If the no-buy option is not included in the CBC experiments, the reservation price
is not identified. In this case, what we can identify is the equalization price p
ijt which
is the price for good jthat equalizes the surplus generated by goods jand j. For the
surplus model, p
ijt ¼xijt xij0t

0biþpij0thijt hij0t

. Again, any proper indirect
utility function implies a function for p
ijt. In our case, p
ijt ¼xijtxij 0t
ðÞ
0ϕiþγipij0tþ"ijt "ij 0t
ðÞ
γi.
Consider now using the utility or surplus models to find the profit maximizing
price for firm j, taking the competing firmsprices as given. To the extent that the
utility model a priori puts greater mass on extreme WTP values, the posterior
distribution of reservation and equalization prices will also be thick tailed. This is
especially so in sparse data environments, and implies that the firm could continue to
raise prices and still find consumers willing to purchase. Thus, inference about the
profit maximizing prices based on the posterior distributions of the parameters will
depend on the model.
3 A simulation study
We more closely investigate the properties of the two approaches to WTP estimation
in the following simulation study. We generate four data sets, two each from the
utility and surplus models, which we will refer to as D1, D2, D3, and D4. For the
utility model data sets, D1 and D2, we assume the following population distribution,
ΦiNΦ;ΣΦ

, where Φi¼ϕ0
ilog gi
ðÞ

0. For both D1 and D2, the covariance
matrix Σqis assumed to be diagonal and we choose parameters such that the
distribution of γ
i
is centered near 1. For D1, we allow for some mass of the
318 G. Sonnier et al.
distribution of γ
i
to be near zero by choosing a large value for the variance of log
(γ
i
). For D2, we choose the variance of log (γ
i
) such that γ
i
is tightly distributed
around one, with little to no mass near zero. In the case of the former, some
individual-level WTPs will be extremely large for values of γ
i
0 while in the latter,
the distribution of WTP should be closer to normal. For the datasets generated by the
surplus model, D3 and D4, we assume qiNq;Σq

where qi¼b0
ilog mi
ðÞ

0.
In this case, the distribution of WTP is specified directly. For D3, we choose
parameters such that μ
i
is, on average, larger and the deterministic component of
surplus has relatively lower explanatory power. For D4, we choose parameters such
that μ
i
is, on average, smaller, translating into more extreme choice probabilities.
For all models, we assume 300 individuals choosing amongst three alternatives
and an outside good on each of 15 choice occasions. The covariates include three
alternative specific constants, a discrete attribute with four levels, and a price. Each
alternative is created by randomly choosing a level of the discrete attribute and a
price from the range [1.52.5] (in increments of 0.1). Tables 1and 2contain the
parameters of the distributions used to generate the four data sets. We retain the last
choice of each simulated respondent to create a holdout sample. Using MCMC
methods, we estimate the utility and surplus models on each of the four data sets, for
a total of eight sets of results. The details of the sampler have been reported
elsewhere (e.g., Allenby and Lenk 1994; Arora et al. 1998; Train 2003). We use a
normal-inverted Wishart hyper-prior structure for the population distribution
parameters qand Σq. The prior on qis set to N0KðÞ
;106IKKðÞ

. The prior on
Σqis set to IW K þ1;IKKðÞ

. These are proper but diffuse priors. We use identical
priors for Φand ΣΦin the utility model.
Table 1 Data generating parameters, utility model data sets
Data set D1 D2
Mean Variance Mean Variance
ϕ
i1
0.5 1 0.5 1
ϕ
i2
11 11
ϕ
i3
1.5 1 1.5 1
ϕ
i4
0.5 1 0.5 1
ϕ
i5
0.75 1 0.75 1
ϕ
i6
11 11
Log (γ
i
)1 2 0 0.2
Table 2 Data generating parameters, surplus model data sets
Data set D3 D4
Mean Variance Mean Variance
β
i1
0.5 1 0.5 1
β
i2
11 11
β
i3
1.5 1 1.5 1
β
i4
0.5 1 0.5 1
β
i5
0.75 1 0.75 1
β
i6
11 11
log (μ
i
) 1 0.1 1 0.5
Heterogeneity distributions of willingness-to-pay 319
For the utility models, we compute the individual-level WTPs as ϕi
gion each
iteration of the sampler. For the surplus models, draws of the individual-level WTPs
are directly available. We compute the mean absolute error (MAE) and the root
mean-squared error (RMSE) between the true and estimated WTPs on each iteration
of the sampler and report the means over iterations. Using the harmonic mean
estimator (Newton and Raftery 1994), we compute the log marginal density (LMD)
statistic for each model. We also report the deviance information criteria (DIC)
(Spiegelhalter et al. 2002) and the log predictive density (LPD) of the holdout data.
Tables 3,4and 5presents the results of our simulation study. D1 and D2 are
generated according to the heterogeneous utility model. D1 contains individuals with
price coefficients near zero and thus extremely large WTPs. Relative to the other
conditions, the error statistics are quite high in this setting. As evidenced by smaller
RMSE and MAE , note that the surplus model has more accurate recovery of the true
WTPs, even though the utility model is consistent with the data generating process.
In terms of fit statistics, the LMD, DIC and LPD all favor the utility model. In D2,
the distribution of the price coefficient has most of its mass away from zero. Again,
the surplus model has lower RMSE and MAE. The LMD and LPD favor the utility
model, while the DIC favors the surplus model. Thus, even when the true WTPs are
a ratio of random coefficients, the surplus model more accurately recovers the true
WTPs compared with the utility model under a range of population distribution
parameters.
Table 3 Root mean squared error
Data set D1 D2 D3 D4
Data generation Utility Utility Surplus Surplus
Model Utility Surplus Utility Surplus Utility Surplus Utility Surplus
WTP
1
48.54 45.68 2.43 1.55 26.06 1.60 1.52 1.19
WTP
2
67.61 64.82 1.53 1.30 16.98 1.36 1.04 0.85
WTP
3
91.83 91.25 1.42 1.19 16.31 1.36 1.06 0.72
WTP
4
26.78 24.82 1.74 1.28 16.90 1.60 1.33 0.95
WTP
5
41.58 38.96 1.68 1.33 16.97 1.46 1.37 0.95
WTP
6
33.42 29.87 1.65 1.27 19.16 1.73 1.13 0.83
Average RMSE 51.63 49.23 1.74 1.32 18.73 1.52 1.24 0.92
Table 4 Mean absolute error
Data Set D1 D2 D3 D4
Data generation Utility Utility Surplus Surplus
Model Utility Surplus Utility Surplus Utility Surplus Utility Surplus
WTP
1
11.88 10.25 1.52 1.21 14.64 1.29 1.10 0.94
WTP
2
14.85 12.32 1.05 1.01 9.15 1.08 0.75 0.66
WTP
3
18.01 15.92 0.97 0.89 8.44 1.08 0.73 0.56
WTP
4
8.58 7.10 1.17 1.00 9.05 1.27 0.94 0.74
WTP
5
10.55 8.64 1.16 1.03 9.14 1.16 0.97 0.75
WTP
6
10.91 8.95 1.13 0.95 10.49 1.38 0.82 0.66
Average MAE 12.46 10.53 1.17 1.01 10.15 1.21 0.89 0.72
320 G. Sonnier et al.
D3 and D4 are generated with the heterogeneous surplus model. In D3, the true scale
parameter μ
i
is, on average, larger. In this setting, the utility model estimates of WTP
are particularly error-prone. Once more, the surplus model is better at recovering the
true WTP parameters. Interestingly, the LMD statistic favors the utility model, despite
the lack of recovery of the true WTPs.
4
The DIC and LPD favor the surplus model. In
D4, the scale parameter is, on average, smaller. Relative to D3, the utility model does
a better job of recovering the WTPs here, but again, the surplus model has more
accurate WTP recovery. All three of the fit statistics favor the surplus model.
In summary, the surplus models always recover the true WTPs with more
accuracy, regardless of the data generating mechanism. Even when the true WTPs
are distributed as a ratio of random coefficients, the ratio estimator does not recover
the true WTPs as accurately as simply directly specifying a prior on WTP. We
attribute this to the fact that the surplus model employs a more sensible prior
distribution for WTP.
4 Two CBC studies
Using CBC data sets provided to us by firms in the camera and automotive
categories, we replicate the findings from our simulation study in the sense that the
posterior of WTP from the utility model is rather different from the posterior
obtained from the surplus model. Moreover, inferences obtained with the utility
model lack face validity. Table 6presents the attributes and levels involved in the
design of each study.
4.1 Data and models
The first data set is CBC data on midsize sedans. The data were provided by a major
automotive manufacturer. Respondents qualified for participation in the study on the
basis of the vehicle they currently own, their intention to purchase a midsize sedan,
and other socio-economic information. A total of 333 respondents participated in the
study. Each respondent completed 15 choice tasks, with each task consisting of three
sedans. The no-buy option was not included in this study. The second data set is
Table 5 Model fit statistics
Data set D1 D2 D3 D4
Data generation Utility Utility Surplus Surplus
Model Utility Surplus Utility Surplus Utility Surplus Utility Surplus
LMD 3532.20 3553.00 3942.80 3946.30 5309.60 5428.90 2493.40 2340.50
DIC 3859.20 3871.10 4324.80 4318.30 5570.70 5497.90 2913.60 2751.50
LPD 287.47 296.64 322.79 331.15 410.41 403.82 236.34 231.89
4
This is similar to previous simulation studies in the literature that find the harmonic mean estimator of
the LMD sometimes favors models with relatively poor parameter recovery (Andrews et al. 2002; Liechty
et al. 2005).
Heterogeneity distributions of willingness-to-pay 321
CBC data on cameras. The study was conducted by the Eastman Kodak Company to
assess the market for a new camera format, the Advanced Photo System (APS). A
detailed description of the data is given by Gilbride and Allenby (2004). A total of
302 respondents participated in the study. Each respondent completed 14 choice
tasks, with each task consisting of three 35 mm cameras, three APS cameras, and a
no-buy option. Some attributes were available only on the APS camera, and price
was nested within camera type.
Table 6 Attributes and levels
Camera data Sedan data
Attribute Levels Attribute Levels
Body Style Low Make/Model Ford Taurus
Medium Toyota Camry
High Nissan Maxima
Honda Accord
VW Passat
Mid-roll change
a
None Engine 4 cylinder; 1.8 L;
150 HP
Manual 4 cylinder; 2.4 L;
160 HP
Automatic 6 cylinder; 3.0 L;
155 HP
6 cylinder; 3.0 L;
222 HP
Annotation
a
None Audio and navigation Standard Audio
Pre-set List Premium Audio
Customized List Premium Audio
with Navigation
Custom Input
Method 1
Custom Input
Method 2
Custom Input
Method 3
Camera operation
feedback
a
No Antilock Brakes (ABS) No
Yes Yes
Zoom None Side Door/Window Curtain
Airbags(CAB)
No
2X Yes
4X
Viewfinder Regular Vehicle Skid Control (VSC) No
Large Yes
Camera settings
feedback
None
LCD
Viewfinder
LCD and Viewfinder
Price (nested within
camera type)
from $41 to $499 Price $17,400
$18,900
$20,400
$21,900
$23,400
$24,900
$26,400
a
feature only available on APS
322 G. Sonnier et al.
For both data sets, we model consumer is surplus for alternative jat choice
occasion tas a linear function of non-price attributes, attribute WTPs, and price
Cijt ¼x0
ijtβipijt þ"ijt "ijt EV 0;μi
ðÞ
θiNθ;Σθ

θi¼β0
ilog μi
ðÞ
hi
0:ð6Þ
For the camera data, we set the deterministic component of the surplus for the no-
buy option to zero. For identification, the lowest level of each attribute is dropped
(with the exception of the body type attribute since the baseline is the no-buy
option). We use the negative of price (in $100s) in the likelihood. The coding
scheme is the same as that employed by Gilbride and Allenby (2004), and results in
a total of K=18 parameters. For the sedan data, the make/model VW Passatis
dropped, as are the lowest level of each of the remaining non-price attributes. This
results in a total of K=13 parameters. We use the negative of price (in $1,000s) in
the likelihood.
For both data sets, we compare estimates of the distribution of WTP from the
surplus model with that of the linear utility model, where θ
i
is replaced with
Φi¼ϕ0
ilog gi
ðÞ

0. Here, the choice probabilities are based on (3). The same
normal-inverted Wishart hyper-prior structure used for qand Σqis used for hyper-
priors on the population parameters Φand ΣΦ. The linear utility model requires we
calculate the WTP from the model parameters using the ratio transformation. On each
iteration of the sampler, we compute the ratio ϕi
giusing the draws of the individual level
parameters. We then compute the mean, median, and standard deviation over
individuals, and report the mean of these quantities over iterations of the sampler.
For both data sets, the samplers are run for 20,000 iterations. We keep the last 5,000
iterations for posterior inference. Parameter estimates are calculated with T1choice
tasks. We keep the last task for each individual to assess holdout performance via
LPD. To assess in-sample performance, we compute the LMD and DIC statistic for
each model.
4.2 Results
In Tables 7and 8we report the mean and standard deviation of the distribution of
WTP for the utility and surplus models. Posterior standard deviations of the reported
statistics are in parentheses. Table 7contains the results from the sedan data while
Table 8contains the results from the camera data. The mean and standard deviation
of the population distribution of WTP are dramatically affected by the priors. For the
sedan data, the means of the utility model estimates are two to three times the
magnitude of the surplus model. The WTP distributions are also far more dispersed,
with standard deviations that are five to six times larger. For the camera data, the
means are also much larger for the utility model. However, most of the standard
deviation estimates for the utility model have large posterior standard deviations. For
both data sets, the median of the population distribution is much less sensitive to the
prior than the mean or standard deviation.
The in-sample fit statistics are somewhat mixed. In both data sets, the LMD
strongly favors the utility model. This result echoes that of the third synthetic data
Heterogeneity distributions of willingness-to-pay 323
set, D3, which was generated by the surplus model. In this case, the LMD strongly
favored the utility model despite its inconsistency with the data generating process
and its extremely poor parameter recovery. The DIC favors the utility model in the
sedan data and the surplus model in the camera data. In contrast to the in-sample fit
measures, the LPD favors the surplus model in both the sedan data and the camera
data, indicating the surplus model has superior out-of-sample performance. We will
now examine more closely the distribution of WTP and optimal prices implied by
the two models. From this vantage point, the differences across the two models are
less ambiguous.
It is evident that the utility and surplus models result in dramatically different
estimates of the distribution of WTP. The utility model estimates seem to be
implausible and not reflective of consumersmonetary valuation of product
attributes. Figure 1presents boxplots of the individual-level make/model WTP
estimates for both the utility and surplus models. These are measured relative to the
VW Passat and can be interpreted as equalization prices; the relative price difference
that equalizes the utility of comparably equipped competitive sedans and the Passat.
The median of the utility models individual level WTP estimates for the three
Table 7 WTP estimates for sedan data (standard errors in parentheses)
WTP ($1,000s) Utility model Surplus model
ϕi
giβ
i
Attribute Level Mean Median Std dev Mean Median Std dev
Make/Model Ford Taurus 5.02 2.20 46.54 2.23 2.27 10.06
(2.76) (0.52) (12.80) (0.32) (0.43) (0.29)
Toyota Camry 16.23 6.76 47.87 7.00 6.74 9.73
(3.43) (0.58) (15.86) (0.23) (0.36) (0.37)
Nissan Maxima 12.42 4.82 46.88 5.13 5.12 9.35
(3.46) (0.52) (15.92) (0.27) (0.42) (0.40)
Honda Accord 9.25 3.70 42.10 3.96 3.81 8.48
(3.06) (0.48) (13.68) (0.26) (0.31) (0.25)
Engine 4-cyl; 2.4L; 160HP 4.88 1.98 14.27 1.82 1.82 2.00
(1.12) (0.39) (5.50) (0.37) (0.37) (0.25)
6-cyl; 3.0L; 155HP 7.89 3.78 16.61 3.67 3.60 3.48
(1.33) (0.49) (5.65) (0.34) (0.33) (0.36)
6-cyl; 3.0L; 222HP 9.89 5.21 20.25 4.87 4.78 3.95
(1.39) (0.51) (6.85) (0.23) (0.24) (0.38)
Audio Premium Audio 2.52 1.10 9.29 1.34 1.31 1.56
(1.02) (0.26) (3.61) (0.21) (0.20) (0.21)
Premium Audio w/Navi 3.54 1.57 10.54 1.61 1.62 2.02
(1.12) (0.26) (4.20) (0.24) (0.25) (0.21)
Safety features Antilock Brakes 4.78 2.11 12.23 2.08 2.07 2.21
(1.05) (0.26) (4.20) (0.20) (0.20) (0.18)
Side Curtain Airbags 2.89 1.22 9.36 1.18 1.17 1.52
(0.88) (0.26) (3.46) (0.19) (0.21) (0.27)
Vehicle Skid Control 2.83 1.28 9.64 1.29 1.28 1.42
(0.82) (0.23) (3.59) (0.26) (0.26) (0.24)
Fit statistics
LMD 2220.4 2665.8
DIC 2,966.4 3,044.3
LPD 340.7 333.5
324 G. Sonnier et al.
Japanese make/models are near or in excess of the range of prices shown to
respondents.
5
This is not being caused by a just a handful of respondents with
estimates of γ
i
near zero. The 75th percentiles for the individual-level equalization
price between Toyota Camry vs. VW Passat and Nissan Maxima vs. VW Passat are
approximately $30,428 and $24,684, respectively. The retail price of the Passat is
about $23,000. This implies that a quarter of the respondents would require Passat to
Table 8 WTP estimates for camera data (standard errors in parentheses)
Attribute WTP ($100s) Utility model Surplus model
ϕi
giβ
i
Level Mean Median Std dev Mean Median Std dev
Body style Low 25.78 4.67 84.40 5.47 5.51 4.90
(6.74) (0.82) (54.41) (0.38) (0.43) (0.36)
Medium 10.15 0.49 56.60 1.07 1.14 3.59
(4.52) (0.28) (44.88) (0.22) (0.25) (0.38)
High 7.82 0.33 48.09 0.85 0.80 3.78
(3.83) (0.27) (36.43) (0.27) (0.31) (0.64)
Mid-roll change Manual 1.96 0.44 14.02 0.36 0.36 2.84
(1.55) (0.24) (6.12) (0.50) (0.48) (0.26)
Automatic 3.93 0.52 16.99 0.80 0.71 2.40
(1.21) (0.12) (10.99) (0.12) (0.14) (0.14)
Annotation Pre-set list 1.51 0.38 7.32 0.34 0.33 1.01
(0.60) (0.13) (3.39) (0.08) (0.09) (0.12)
Customized list 3.25 0.98 9.47 1.21 1.19 1.09
(0.76) (0.16) (4.80) (0.11) (0.12) (0.14)
Custom Input Method 1 4.10 0.28 22.63 1.44 1.48 3.37
(1.62) (0.17) (16.02) (0.24) (0.28) (0.24)
Custom Input Method 2 4.64 1.06 14.71 1.23 1.22 1.73
(1.12) (0.16) (8.49) (0.11) (0.13) (0.18)
Custom Input Method 3 2.09 0.78 19.75 0.99 1.03 2.58
(1.35) (0.15) (11.85) (0.31) (0.32) (0.28)
Operation feedback Feedback 2.65 0.58 10.50 0.85 0.82 1.72
(0.86) (0.12) (4.85) (0.25) (0.25) (0.10)
Zoom ×2 Zoom 5.47 1.95 14.96 2.70 2.70 2.09
(1.52) (0.25) (8.04) (0.18) (0.19) (0.15)
×4 Zoom 7.66 2.50 20.70 3.11 3.13 3.13
(2.12) (0.41) (10.42) (0.24) (0.26) (0.33)
Viewfinder Large viewfinder 0.40 0.18 8.91 0.47 0.46 1.76
(0.75) (0.10) (4.23) (0.17) (0.16) (0.22)
Settings feedback LCD 4.30 1.12 13.27 0.23 0.23 1.30
(1.19) (0.20) (8.30) (0.16) (0.16) (0.16)
Viewfinder 4.49 1.20 13.42 0.16 0.15 1.59
(1.07) (0.19) (7.99) (0.16) (0.16) (0.19)
LCD and Viewfinder 5.33 1.51 14.92 0.21 0.21 1.53
(1.24) (0.24) (9.23) (0.17) (0.18) (0.16)
Fit statistics
LMD 3879.7 4100.4
DIC 4,633.6 4,620.4
LPD 438.5 434.7
5
Note that the median of the posterior mean of the individual-level estimates will not be the same as the
posterior mean of the median of the population distribution.
Heterogeneity distributions of willingness-to-pay 325
Make/Model WTP Estimates: Utility Model
-$175
-$125
-$75
-$25
$25
$75
$125
$175
Taurus Camry Maxima Accord
($1,000's)
Make/Model WTP Estimates: Surplus Model
-$30
-$20
-$10
$0
$10
$20
$30
Taurus Camry Maxima Accord
($1,000's)
Fig. 1 Boxplots of individual-level make/model WTP estimates
Table 9 Attributes and levels for optimal pricing, sedan data
Attribute Levels Attributes and levels for
alternatives
12345
Make/Model Ford Taurus X
Toyota Camry X
Nissan Maxima X
Honda Accord X
VW Passat X
Engine 4 cylinder; 1.8 L; 150 HP X
4 cylinder; 2.4 L; 160 HP X X
6 cylinder; 3.0 L; 155 HP X
6 cylinder; 3.0 L; 222 HP X
Audio and navigation Standard Audio X X X X X
Premium Audio
Premium Audio with Navigation
Antilock brakes No X
Yes X X X X
Side door/Window curtain airbags No
Yes X X X X X
Vehicle skid control No X
Yes X X X X
Price ($1,000) $20.1 $20.7 $24 $20.5 $22.5
326 G. Sonnier et al.
have a zero price as well as a cash subsidy to induce indifference with a similarly
equipped Camry or Maxima, which does not seem credible.
In the camera study, the individual-level estimates of WTP implied by the utility
models also seem lacking in face validity. For example, the surplus model estimate
of the median of the individual-level WTP estimates for a zoom lens is $295,
with demand essentially zero at prices exceeding $550. According to the utility
model, the median of the individual level WTPs is $322. At a price of $550, 32% of
respondents are still in the market. A quarter of respondents have WTP estimates in
excess of $750. Demand does not reach zero until prices exceed $3,000. These
estimates of WTP seem unreasonably high. Furthermore, any analysis of demand
should take into account the uncertainty in the individual-level estimates. We now
turn our attention to such an analysis.
Table 10 Attributes and levels for optimal pricing, camera data
Attributes and levels for alternatives
Attribute Levels 1 2 3
Body style Low X
Medium X
High X
Mid-roll change None
Manual X
Automatic X X
Annotation None
Pre-Set List X
Customized List
Custom Input Method 1 X
Custom Input Method 2 X
Custom Input Method 3
Camera operation feedback No X
Yes X X
Zoom None
X
X X
Viewfinder Regular X X
Large X
Camera settings feedback None
LCD
Viewfinder
LCD & Viewfinder X X X
Price (nested within camera type) from $41 to $499 $100 $225 $400
Table 11 Market shares, sedan scenario
Taurus Camry Maxima Accord Passat
Price $20.1 $20.7 $24 $20.5 $22.5
Utility (%) 16 36 19 21 8
Surplus (%) 16 36 18 22 9
Heterogeneity distributions of willingness-to-pay 327
4.3 An optimal pricing exercise
For the utility model, profits from alternative jin scenario zcan be written as
puz
jΦi;xz
j;pj¼Puz
jΦi;xz
j;pj

pjcj
 ð7Þ
We seek the price puz*
jthat maximizes the firms expected profit,
EΦπuz
jΦi;xz
j;pj
hi
. The expected profit in scenario zis easily calculated with the
output of the Gibbs sampler. For a given price, we simply average the profits
calculated over the draws of Φ
i
. Using routine optimization procedures, it is
straightforward to find the optimal price. For the surplus model, profits from
alternative jin scenario zcan be written as
psz
jqi;xz
j;pj¼Psz
jqi;xz
j;pj

pjcj

:ð8Þ
As with the utility model, we seek the price psz*
jthat maximizes the firms expected
profit, Eqpsz
jqi;xz
j;pj
hi
.
Our goal is to compare puz*
jand psz*
j. Tables 9and 10 present the attributes and
levels used to construct the competitive scenarios for our pricing exercise. In the
sedan data, we consider a competitive set consisting of five sedans. In the camera
data, we consider a competitive set consisting of three cameras. Tables 11 and 12
present the prices and market shares for each alternative for the sedan and camera
scenarios. On each iteration of the sampler, we compute P
j
and report the mean over
iterations. For the sedan data, the two models predict practically the same shares. For
the camera data, there is some disagreement, with the utility model predicting higher
shares for Cameras 1 and 2 and lower shares for Camera 3 and the No-Buy alternative.
For the sedan data, we will find the optimal price for Ford Taurus, assuming the
competitive vehicle prices remain at their current levels. For the camera data, we will
Table 12 Market shares, camera scenario
Camera 1 Camera 2 Camera 3 No Buy
Price $100 $225 $400 $0
Utility (%) 17 35 32 16
Surplus (%) 14 28 35 23
Table 13 Taurus optimal price, sedan scenario
Utility model
Taurus* Camry Maxima Accord Passat
Price($1,000) $33.2 $20.7 $24 $20.5 $22.5
Share (%) 3 41 22 25 9
Surplus model
Taurus* Camry Maxima Accord Passat
Price $25.8 $20.7 $24 $20.5 $22.5
Share (%) 6 40 20 24 10
* denotes optimized product
328 G. Sonnier et al.
find the optimal price for Camera 3, assuming the competitive camera prices remain at
their current levels. To conduct the exercise, we need to make some assumptions on
costs. For simplicity, we assume the sedans are all built at a variable cost of $18,000.
For the cameras, we assume variable costs of $50, $60, and $70 for Cameras 1, 2, and
3, respectively. Similar results were obtained using other sedans and cameras in the
competitive scenarios, as well as other cost assumptions.
Tables 13 and 14 present the findings from the optimal pricing exercise. We
present the optimal price for Ford Taurus and Camera 3 along with the new market
shares. For the sedan data, using the utility model coefficients in the optimization
results in an optimal price for Taurus of $33,200. At this price, the largest relative
price difference is $12,500, observed between Taurus and Camry. The largest
relative price difference shown in the experiments is $9,000. The prior implied by
the utility model supports excessive equalization prices, leading to optimized prices
beyond the empirical range of prices in the data. In contrast, optimization based on
the surplus model leads to an optimal price for Taurus of $25,800. The largest
relative price difference is well within the range of experimental prices. We obtain
similar results from the camera data. Using the utility model, the optimal price for
Camera 3 is over $1,500. The maximum price shown to respondents in the study
was $499. For the camera data, using the surplus model results in an optimal price of
about $520. While this is slightly in excess of the maximum price, it is much more
reasonable.
5 Summary and conclusions
Researchers in marketing and economics have recognized the problems associated
with using random coefficient choice models derived from linear indirect utility
functions to estimate WTP for product attributes. In this setting, WTP is estimated
via the ratio of attribute and price coefficients. We illustrate that the prior implied for
WTP by seemingly reasonable priors for the attribute and price coefficients results in
posterior WTP distributions with extremely fat tails. This also affects the models
characterization of demand which has implications for pricing analyses. A number of
ad-hoc solutions have been proposed, including constraining the price coefficient to
be homogenous, or using the median as a measure of central tendency of the WTP
Table 14 Camera 3 optimal price, camera scenario
Utility model
Camera 1 Camera 2 Camera 3* No Buy
Price $100 $225 $1,525 $0
Share (%) 19 35 10 21
Surplus model
Camera 1 Camera 2 Camera 3* No Buy
Price $100 $200 $522 $0
Share 15% 33% 27% 25%
* denotes optimized product
Heterogeneity distributions of willingness-to-pay 329
distribution. In this paper, we present a straightforward solution to the problems
caused by the implied prior for WTP. Parameterizing the choice model in the space
of consumer surplus allows for direct specification of a prior distribution for WTP.
Such a direct specification is especially advantageous in the context of hierarchical
models where the aforementioned solutions conflict with the purpose and value of
quantifying consumer heterogeneity.
Using both simulated data and CBC data sets from the automotive and camera
categories, we document the influence of the implied prior for WTP. Commonly
employed diffuse priors for the attribute and price coefficients put too much prior
mass on extreme WTP values to render reasonable posterior WTP distributions in
small sample settings. Some posterior summaries are less sensitive to the assumed
prior (e.g. median versus mean). However, marketing actions, such as setting profit
maximizing prices, depend on the entire posterior distribution of WTP and thus will
be sensitive to the implied prior. In the surplus parameterization a hierarchical prior
for WTP can be directly specified. We found a hierarchical normal prior to be useful
in controlling the tails of the WTP distribution. The relatively thinner tails of the
normal result in more reasonable estimates of the WTP distribution and, in turn,
profit-maximizing prices.
The surplus model results in more reasonable estimates of the distribution of
WTP and profit-maximizing prices as well as superior out-of-sample performance.
However, the in-sample fit statistics across the two parameterizations are ambiguous,
even with simulated data. We leave this issue, specifically the performance of the
Newton-Raftery estimator of the LMD and the DIC statistic as criteria for model
choice, to future research. We acknowledge the existence of data generating
mechanisms that leave respondent WTP for a particular attribute level inestimable.
Among these are non-compensatory processing, price based quality inferences or the
simple ignorance of the price attribute in the conjoint exercise. The utility model
with standard priors will readily accommodate respondents who are, for whatever
reason, insensitive to price in the conjoint task. The modeling question then becomes
one of how and whether to implement prior knowledge about the range of likely
WTP values. We have demonstrated that the surplus model is very effective in terms
of how to implement such prior knowledge because it allows the researcher to put a
prior directly on WTP.
Whether to implement prior knowledge about WTP in conjoint studies, especially
when the data are better fit with arbitrarily large WTP values, touches upon the core
of the inferential problems associated with conjoint experiments in marketing.
Conjoint data are collected with the implicit goal of characterizing market demand.
To the extent that the conjoint likelihood differs from the likelihood that generates
choices in the market place, this generalization calls for the diligent use of prior
knowledge held by the researcher about market behavior. That is, the prior should
preserve certain well-known aspects of the target environment in the posterior and
still be informed by the conjoint likelihood in other respects. We acknowledge that
our argument here is limited to forming prior-predictive distributions given the
conjoint data and other prior knowledge. In the long run, only a better understanding
of the actual data generating mechanism underlying the conjoint data will enable
researchers to develop the necessary procedural modifications to move it closer to
the likelihood that generates choices in the market.
330 G. Sonnier et al.
Acknowledgement The authors would like to thank Peter Rossi, JP Dubé, Jordan Louviere, Kenneth
Train, and Greg Allenby for helpful insights. We also thank seminar participants at The Ohio State
University, Duke University, University of Michigan and the University of Chicago for providing useful
comments.
References
Allenby, G., & Lenk, P. (1994). Modeling household purchase behavior with logistic normal regression.
Journal of the American Statistical Association, 89, 12181231.
Andrews, R., Ainslie, A., & Currim, I. (2002). An empirical comparison of logit choice models with
discrete vs. continuous representations of heterogeneity. Journal of Marketing Research, 39, 479487.
Arora, N., Allenby, G. M., & Ginter, J. L. (1998). A disaggregate model of primary and secondary
demand. Marketing Science, 17(1), 2944.
Cameron, T., & James, M. (1987). Estimating willingness-to-pay from survey data: An alternative pre-test-
market evaluation procedure. Journal of Marketing Research, 24, 389395.
Edwards, Y., & Allenby, G. (2003). Multivariate analysis of multiple response data. Journal of Marketing
Research, 40, 321334.
Gilbride, T., & Allenby, G. (2004). A choice model with conjunctive, disjunctive, and compensatory
screening rules. Marketing Science, 23, 391406.
Jedidi, K., Jagpal, S., & Manchanda, P. (2003). Measuring heterogeneous reservation prices for product
bundles. Marketing Science, 22, 107130.
Jedidi, K., & Zhang, J. (2002). Augmenting conjoint analysis to estimate consumer reservation prices.
Management Science, 48, 13501368.
Liechty, J., Fong, D., & DeSarbo, W. (2005) Dynamic models incorporating individual heterogeneity:
Utility evolution in conjoint analysis. Marketing Science, 24, 285293.
Meijer, E., & Rouwendal, J. (2006) Measuring welfare effects in models with random coefficients. Journal
of Applied Econometrics, 21, 227244.
Newton, M. A., & Raftery, A. E. (1994). Approximate bayesian inference by the weighted likelihood
bootstrap. Journal of the Royal Statistical Society. Series B, 56,4348.
Orme, B. (2001). Assessing the monetary value of attribute levels with conjoint: Warnings and suggestions.
Sawtooth Solutions Customer Newsletter (Spring), Sequim, WA: Sawtooth Software, Inc.
Revelt, D., & Train, K. (1998) Mixed logit with repeated choices: Householdschoices of appliance
efficiency level. Review of Economics and Statistics, 4, 647657.
Rossi, P., Allenby, G., & McCulloch, R. (2005) Bayesian Statistics and Marketing. England: Wiley.
Shaffer, G., & Zhang, J. (1995). Competitive coupon targeting. Marketing Science, 14, 395416.
Shaffer, G., & Zhang, J. (2000). Pay to switch or pay not to switch: Third degree price discrimination in
markets with switching costs. Journal of Economics & Management Strategy, 9, 397424.
Spiegelhalter, D., Best, N., Carlin, B., & van der Linde, A. (2002) Bayesian measures of model
complexity and fit. Journal of the Royal Statistical Society B, 64, 583639.
Swait, J., Erdem, T., Louviere, J., & Dubelaar, C. (1993). The equalization price: A measure of consumer-
perceived brand equity. International Journal of Research in Marketing, 10,2345, (March).
Train, K. (2003) Discrete choice methods with simulation. Cambridge: Cambridge University Press.
Zellner, A. (1978). Estimation of functions of population means and regression coefficients including structural
coefficients: A minimum expected loss (MELO) approach. Journal of Econometrics, 8,127158.
Heterogeneity distributions of willingness-to-pay 331
... However, revealed preference data may not always be available. The protocol introduced here seeks to provide an alternative for stated preference data to the test conducted by Sonnier, Ainslie, and Otter (2007) and Train and Weeks (2005) on a similar topic, which consists of measuring the performance out-ofsample prediction on held-out samples. ...
... In their important article on the WTP space approach, Train and Weeks (2005) stated that the WTP distributions they derived from models in preference space may often translate into "untenable" implications because of their "unreasonably" large variance compared with WTP distributions derived from models with utility in WTP space. However, these and other authors (see Sonnier, Ainslie, and Otter 2007;Hensher and Greene 2011) found preference space specifications to fit their data better than the competing WTP space alternatives. 1 Train and Weeks (2005) conclude that alternative distributional specifications are required to either provide more reasonable WTP distributions in preference space or a better fit to the data in WTP space. 2 Sonnier, Ainslie, and Otter (2007) report that the model they estimated in WTP space (referred to as the "surplus model") results in more "reasonable" estimates of the WTP distributions. ...
... As can be observed, the sensitivity and specificity are almost equivalent for both models. A similar test was conducted by Sonnier, Ainslie, and Otter (2007) and Train and Weeks (2005). Sonnier, Ainslie, and Otter (2007) found that their model in preference space had a worse out-of-sample fit than their model in WTP space, while Train and Weeks found the opposite. ...
... This paper contributes to a series of methods developed to derive reasonable distributions of marginal willingness to pay (mWTP) from such models, commonly known as mixed logit models. Whether such distributions are reasonable generally depends on how heavy tailed they are (Mariel, Demel, & Longo, 2021;Scarpa, Thiene, & Marangon, 2008;Sonnier et al., 2007;Train & Weeks, 2005). Heavy-tailed distributions often arise when the numerator of the ratio of distributions used to simulate mWTP can take values very proximate to zero, leading to exploding implicit prices' (Giergiczny et al., 2012). ...
... This protocol is similar to the protocol proposed by Train and Weeks (2005), where the sampled respondents are divided into two equal-sized subsamples and the log likelihood of the estimated models are evaluated on the other subsample. However, our protocol is very different from the approach of Sonnier et al. (2007) who included all but one choice situation for each respondent in their estimation sample and computed the log likelihood of the competing models on the single choice situation remaining for each respondent. In all cases, the objective is the same: to cross-validate the models and to compare whether different specifications are better or worse at fitting "hold-out" choice situations. ...
Article
Full-text available
This paper introduces a new shifted negative log‐normal distribution for the price parameter in mixed multinomial logit models. The new distribution, labeled as the μ‐shifted negative log‐normal distribution, has desirable properties for welfare analysis and in particular a point mass that is further away from zero than the negative log‐normal distribution. This contributes to mitigating the “exploding” implicit prices issue commonly found when the price parameter is specified as negative log‐normal and the model is in preference space. The new distribution is tested on five stated preference datasets. Comparisons are made with standard alternative approaches such as the willingness‐to‐pay (WTP) space approach. It is found that the μ‐shifted distribution yields substantially lower mean marginal WTP estimates compared to the negative log‐normal specification and similar to the values derived from models estimated in WTP‐space with flexible distributions, while at the same time fitting the data as well as the negative log‐normal specification.
... They may also choose the "None" option, if no product is to their liking. Based on these choices, researchers determine utilities of different attributes of these products to ultimately estimate the willingness to pay for these respective attributes (Andrews et al. 2002;Evgeniou et al. 2007;Sonnier et al. 2007;Otter 2019). This offers the advantage of being able to focus on the willingness to pay for each attribute considered, while setting an indirect scenario that is closer to a real purchase process. ...
... The monotony might therefore lead to a simplification of the choice process, leading to rash decisions, where some attributes of the products might not be considered appropriately (Gilbride and Allenby 2004;Yee et al. 2007;Ryan et al. 2009;Scholz et al. 2010). In multiple studies respondents also suffer from extreme response behaviour, where they either do not select the "None" option enough (Sonnier et al. 2007;Natter et al. 2008;Parker and Schrift 2011) or choose it very extensively (Gilbride et al. 2008;Steiner and Meißner 2018). One problem with choosing the "None" option is, that it does not offer any information of whether the price of the presented products was just too high or whether the products were completely unacceptable for the respondents (Kamakura et al. 2001;Gunasti and Ross 2009;Gensler et al. 2012). ...
Thesis
Fridays for future, students for future, scientists for future… Environmental activism increased drastically in the last years resulting in a growing number of activists. While some of these activists live with a sustainable ecological footprint, others do not and pollute the environment in an unsustainable manner e.g. by flying frequently. One strand of economic literature interprets this (at first glance contradictory) behavior as an attitude-behavior-gap: Having a high preference should result in a high willingness to pay and therefore in an adaption of one’s own behavior, which is not the case for these activists. Not changing one’s behavior can easily be explained by the free rider problem caused by the marginality of one’s impact though. However, this in turn raises the question, why some people live sustainable, abstain from environment polluting goods hence have a willingness to pay for the environment. We argue that both kinds of behavior can be explained by separating the willingness to pay for public goods. Since collective action is hard to sustain reciprocally and without the intervention of a (public) entity, especially for large public goods, two willingness to pay for a public good have to be considered instead – one for the private and one for the public provision of the public good. Assuming that both types of environmental activists understand, that their own contribution is marginally small, this dissertation argues – first in a theoretical model and then in an empirical application – that the willingness to pay for public goods in the private case is actually only dependent on the preference for other (mainly social) incentives – e.g. to silence one’s conscience or for reputational reasons. The unsustainable type of environmental activist just has a lower willingness to pay for social incentives compared to the sustainable typ. Only if the state interferes, the preference for the public good will be considered in the decision-making process of individuals. Consequently, it proposes a different form of measuring the willingness to pay for public goods – the so-called Quasi-Monarch. As a Quasi-Monarch, one individual can hypothetically dictate the contribution of all individuals including herself. In this scenario, no one would have an incentive to not state their “real” willingness to pay for the respective good.
... where p j is price, l is a scale parameter, x j is all nonprice attributes, and v is a vector of WTP coefficients for nonprice attributes. For MXL models, directly estimating WTP provides greater control over how WTP is assumed to be distributed across the population, and has been found to yield more reasonable distributions of WTP compared with WTP computed from preference space model coefficients (53)(54)(55). Equation 3 shows the full model used in the study, with explanations of the variable names in Table 3: ...
Article
Automated vehicles (AVs) have the potential to dramatically disrupt current transportation patterns and practices. One particular area of concern is AVs' impacts on public transit systems. If vehicle automation enables significant price decreases or performance improvements for ride-hailing services, some fear that it could undercut public transit, which could have significant implications for the environment and transportation equity. The extent to which individuals adopt automated transportation modes will drive many system-level outcomes, and research on public preferences for AVs is immature and inconclusive. In this study, we used responses from an online choice-based conjoint survey fielded in the Washington, D.C. metropolitan region (N = 1,694) in October 2021 to estimate discrete choice models of public preferences for different automated (ride-hailing, shared ride-hailing, bus) and nonautomated (ride-hailing, shared ride-hailing, bus, rail) modes. We used the estimated models to simulate future marketplace competition across a range of trip scenarios. Respondents on average were only willing to pay a premium for automated modes when a vehicle attendant was also present, limiting the potential cost-savings that AV operators might achieve by removing the driver. Scenario analysis additionally revealed that for trips where good transit options were available, transit remained competitive with automated ride-hailing modes. These results suggest that fears of a mass transition away from transit to AVs may be limited by people's willingness to use AVs, at least in the short term. Future AV operators should also recognize the presence of an AV attendant as a critical feature for early AV adoption.
... Examples include a study on the apartment rental market (Elrod et al. 1992), water resources (D. Hensher et al. 2005), camera and automotive (Sonnier et al. 2007), tourism (Masiero et al. 2015), airline industry (Newman et al. 2014;Ratliff et al. 2008;Talluri and Van Ryzin 2004;Wardell et al. 2008), nursing homes (Milte et al. 2018), healthcare (Regier et al. 2009), and transportation (Li et al. 2010). ...
Preprint
In B2B markets, value-based pricing and selling has become an important alternative to discounting. This study outlines a modeling method that uses customer data (product offers made to each current or potential customer, features, discounts, and customer purchase decisions) to estimate a mixed logit choice model. The model is estimated via hierarchical Bayes and machine learning, delivering customer-level parameter estimates. Customer-level estimates are input into a nonlinear programming next-offer maximization problem to select optimal features and discount level for customer segments, where segments are based on loyalty and discount elasticity. The mixed logit model is integrated with economic theory (the random utility model), and it predicts both customer perceived value for and response to alternative future sales offers. The methodology can be implemented to support value-based pricing and selling efforts. Contributions to the literature include: (a) the use of customer-level parameter estimates from a mixed logit model, delivered via a hierarchical Bayes estimation procedure, to support value-based pricing decisions; (b) validation that mixed logit customer-level modeling can deliver strong predictive accuracy, not as high as random forest but comparing favorably; and (c) a nonlinear programming problem that uses customer-level mixed logit estimates to select optimal features and discounts.
... In lieu of making distributional assumptions, another flexible option to capture preference heterogeneity is to estimate choice models at the individual level. Individual-level choice models have been estimated in the fields of transportation and marketing (Beggs et al. 1981;Hess et al. 2007;Sonnier et al. 2007), but applications in environmental economics are limited due to the lack of abundant choice information per individual. For example, stated preference surveys typically ask one to six choice questions of each respondent while recreation demand applications typically only have sufficient observations to estimate a small number of individual level models. ...
Article
Full-text available
Birding is one of the most popular recreational activities, but bird populations have been declining worldwide. Understanding how much people benefit from local bird populations levels, species richness and their preferences can help inform bird conservation management. This paper uses eBird data and random utility models to assess the birders’ preferences and welfare for trips to local areas. The sample eBird citizen science data includes 35,656 trips by 290 individual birders to 1227 unique birding hotspots in Alberta, Canada. The economic value of seeing one additional bird species during a trip is estimated to be $0.68 on average. We estimated a nonlinear relationship between the utility and number of bird species suggesting satiation in recreation preferences, and the highest MWTP is estimated to be in the summer and fall seasons. Bird species at risk, based on Alberta’s strategy for the management of species at risk, are valued almost ten times higher as seeing other types of bird species. We also estimate individualized choice models and find that preference for species richness is heterogeneous across birders. Results of a combinatorial test find that the individualized choice models produce average welfare estimates that are 67% higher than the single model but the difference is not statistically significant. The members of eBird represent a convenience sample that may not constitute the general population. Thus along with proper weighting, these benefit estimates produced in this research can help inform future bird conservation management decisions including alternative funding mechanisms.
... The error scale is confounded with the price coefficient, p , in a discrete choice model, so only one of the two are estimable (McFadden (2014)). We therefore set the price coefficient to be the inverse of scale parameter to statistically identify the model, expressing the part-worths in monetary units (Sonnier et al., 2007). ...
Article
Full-text available
The application of conjoint analysis to new product development is challenged in studies of complex products that simultaneously examine the major drivers of a purchase decision and the composition of product components. Demands on data increase as more product features are included in an analysis, and at some point it becomes necessary to study the components separately. This paper presents evidence of a non-linear pricing effect that complicates the analysis of large conjoint studies when multiple conjoint exercises are integrated, or bridged into a single analysis. Our model is illustrated with data from the automotive industry showing that option packages are under-valued without accounting for the non-linear effects of price.
... The parameter γ [0 to 1] is a scalar weighting parameter that defines if scaling is only for the fixed coefficient β or also for individual specific variation η n (Keane & Wasi, 2013). We estimate the model in WTP space directly (Scarpa et al., 2008;Sonnier et al., 2007;Train & Weeks, 2005) (see Supporting Information: Online Appendix C for more details). for individual n at choice scenario t, the probability that individual n chooses alternative j is obtained by: ...
Article
Full-text available
Prairie strips on agricultural lands are supported by the Conservation Reserve Program and provide environmental benefits such as reduced soil loss and improved wildlife habitat. The current study measures the value that the public places on those benefits and if that value changes under different policy designs. The policy design varies by who runs the program (state agency vs. nongovernment organization) and who has enrollment priority (historically managed land vs. degraded land). Results from a choice experiment indicate significant overall public support for the expansion and that willingness to pay is highest with priority for land historically managed in conservation‐oriented way.
... Because the estimate is the ratio from two randomly generated values, having an extreme large or small value in the denominator (time) produces this issue. To avoid this issue, prior studies suggest estimating willingness-to-pay directly by reparametrizing the model (Train and Weeks, 2005;Sonnier et al., 2007). Because we can specify the distribution of the estimates in the function, we tend to have a more convenient and reasonable distribution (Train and Weeks, 2005). ...
Article
Bike-share services will produce more limited benefits if users cannot find bikes when and where they need them. Bike-share operators must thus have process for “rebalancing” the bikes within the system to ensure that they are available where demanded. A potentially cost-effective strategy for rebalancing bikes is to offer incentives of some sort to users to walk farther to get a bike (origin-based incentive) or bring a bike to the undersupplied area (destination-based incentive). This paper aims to examine bike-share users’ willingness-to-walk to pick up a bike or drop off a bike at some distance from their origins or destinations if rewarded and to identify characteristics influencing willingness-to-walk. We use data from a survey of dock-less e-bike-share users conducted in the Sacramento region. The analysis shows that half of the respondents use bike-share if the available bike is located 8.9 min away. Our estimates of willingness-to-walk farther than the mean distance for incentives at origins and destinations were 3.8 min and 4.2 min per dollar, respectively. Our results give operators and policy makers insights into the potential effectiveness of incentives as a strategy for spatially rebalancing bike-share fleets.
... This provides an alternative to the standard practice of dividing non-price attribute parameters by the price parameter to obtain the willingness-to-pay estimates. The advantage is that as willingness-to-pay estimates are directly defined for the parameter ratio, estimates are more tractable, plausible, and relevant for policymakers (dit Sourd, 2020; Mabit et al., 2006;Sonnier et al., 2007;Train and Weeks, 2005). ...
Preprint
Full-text available
The decisions of whether and how to evacuate during a climate disaster are influenced by a wide range of factors, including sociodemographics, emergency messaging, and social influence. Further complexity is introduced when multiple hazards occur simultaneously, such as a flood evacuation taking place amid a viral pandemic that requires physical distancing. Such multi-hazard events can necessitate a nuanced navigation of competing decision-making strategies wherein a desire to follow peers is weighed against contagion risks. To better understand these nuances, we distributed an online survey during a pandemic surge in July 2020 to 600 individuals in three midwestern and three southern states in the United States with high risk of flooding. In this paper, we estimate a random parameter logit model in both preference space and willingness-to-pay space. Our results show that the directionality and magnitude of the influence of peers' choices of whether and how to evacuate vary widely across respondents. Overall, the decision of whether to evacuate is positively impacted by peer behavior, while the decision of how to evacuate is negatively impacted by peers. Furthermore, an increase in flood threat level lessens the magnitude of these impacts. These findings have important implications for the design of tailored emergency messaging strategies. Specifically, emphasizing or deemphasizing the severity of each threat in a multi-hazard scenario may assist in: (1) encouraging a reprioritization of competing risk perceptions and (2) magnifying or neutralizing the impacts of social influence, thereby (3) nudging evacuation decision-making toward a desired outcome.
Article
Full-text available
The successful development of marketing strategies requires the accurate measurement of household preferences and their reaction to variables such as price and advertising. Manufacturers, for example, often offer products at a reduced price for a limited period. One reason for this practice is that it induces households to try the promoted product with the hope of retaining them as permanent customers. The successful implementation of this strategy requires knowledge of the extent of price sensitivity in the population, effective methods of advertising, and the existence of a carry-over effect in the household's evaluation of the product. Logistic regression models are often used to relate household demographics, prices, and advertising variables to household purchase decisions. In this article we extend the standard model to include cross-sectional and serial correlation in household preferences and provide algorithms for estimating the model with random effects. The model is applied to scanner panel data for ketchup purchases, and substantive insights into household preference, brand switching, and autocorrelated purchase behavior are obtained.
Article
Full-text available
Multiple response questions, also known as a pick any/J format, are frequently encountered in the analysis of survey data. The relationship among the responses is difficult to explore when the number of response options, J, is large. The authors propose a multivariate binomial probit model for analyzing multiple response data and use standard multivariate analysis techniques to conduct exploratory analysis on the latent multivariate normal distribution. A challenge of estimating the probit model is addressing identifying restrictions that lead to the covariance matrix specified with unit-diagonal elements (i.e., a correlation matrix). The authors propose a general approach to handling identifying restrictions and develop specific algorithms for the multivariate binomial probit model. The estimation algorithm is efficient and can easily accommodate many response options that are frequently encountered in the analysis of marketing data. The authors illustrate multivariate analysis of multiple response data in three applications.
Article
Full-text available
We introduce the weighted likelihood bootstrap (WLB) as a way to simulate approximately from a posterior distribution. This method is often easy to implement, requiring only an algorithm for calculating the maximum likelihood estimator, such as iteratively reweighted least squares. In the generic weighting scheme, the WLB is first order correct under quite general conditions. Inaccuracies can be removed by using the WLB as a source of samples in the sampling-importance resampling (SIR) algorithm, which also allows incorporation of particular prior information. The SIR- adjusted WLB can be a competitive alternative to other integration methods in certain models. Asymptotic expansions elucidate the second- order properties of the WLB, which is a generalization of Rubin’s Bayesian bootstrap [D. B. Rubin, Ann. Stat. 9, 130-134 (1981)]. The calculation of approximate Bayes factors for model comparison is also considered. We note that, given a sample simulated from the posterior distribution, the required marginal likelihood may be simulation consistently estimated by the harmonic mean of the associated likelihood values; a modification of this estimator that avoids instability is also noted. These methods provide simple ways of calculating approximate Bayes factors and posterior model probabilities for a very wide class of models.
Article
Full-text available
It has been shown in the behavioral decision making, marketing research, and psychometric literature that the structure underlying preferences can change during the administration of repeated measurements (e.g., conjoint analysis) and data collection because of effects from learning, fatigue, boredom, and so on. In this research note, we propose a new class of hierarchical dynamic Bayesian models for capturing such dynamic effects in conjoint applications, which extend the standard hierarchical Bayesian random effects and existing dynamic Bayesian models by allowing for individual-level heterogeneity around an aggregate dynamic trend. Using simulated conjoint data, we explore the performance of these new dynamic models, incorporating individual-level heterogeneity across a number of possible types of dynamic effects, and demonstrate the derived benefits versus static models. In addition, we introduce the idea of an unbiased dynamic estimate, and demonstrate that using a counterbalanced design is important from an estimation perspective when parameter dynamics are present.
Article
Conjoint analysis is often used to assess how buyers trade off product features with price. Researchers can test the price sensitivity of potential product configurations using simulation models based on conjoint results. Most often, a simulation is done within a specific context of competitors. But when a product is truly new to the market and has no direct competitors, price sensitivity for that new product can be estimated compared to other options such as buying nothing. The common forms of conjoint analysis measure contrasts between levels within attributes. The worths of levels are estimated on an arbitrary interval scale, so the absolute magnitudes of utilities have no meaning. Also, each attribute's utilities are determined only to within an arbitrary additive constant, so a utility level from one attribute cannot be directly compared to another from a different attribute. To a trained conjoint analyst, an array of utilities conveys clear meaning. But that meaning is often difficult for others to grasp. It is not surprising, then, that researchers look for ways to make conjoint utilities easier to interpret.
Article
Closed-ended contingent valuation surveys are used to assess demands in hypothetical markets and recently have been applied widely to the valuation of (non-market) environmental resources. This interviewing strategy holds considerable promise for more general market research applications. The authors describe a new maximum likelihood estimation technique for use with these special data. Unlike previously used methods, the estimated models are as easy to interpret as ordinary least squares regression results and the results can be approximated accurately by packaged probit estimation routines.
Article
Currently, there is an important debate about the relative merits of mod-els with discrete and continuous representations of consumer hetero-geneity. In a recent JMR study, Andrews, Ansari, and Currim (2002; here-after AAC) compared metric conjoint analysis models with discrete and continuous representations of heterogeneity and found no differences between the two models with respect to parameter recovery and predic-tion of ratings for holdout profiles. Models with continuous representa-tions of heterogeneity fit the data better than models with discrete repre-sentations of heterogeneity. The goal of the current study is to compare the relative performance of logit choice models with discrete versus con-tinuous representations of heterogeneity in terms of the accuracy of household-level parameters, fit, and forecasting accuracy. To accomplish this goal, the authors conduct an extensive simulation experiment with logit models in a scanner data context, using an experimental design based on AAC and other recent simulation studies. One of the main find-ings is that models with continuous and discrete representations of het-erogeneity recover household-level parameter estimates and predict holdout choices about equally well except when the number of purchases per household is small, in which case the models with continuous repre-sentations perform very poorly. As in the AAC study, models with continuous representations of heterogeneity fit the data better.
Article
Consumer reservation price is a key concept in marketing and economics. Theoretically, this concept has been instrumental in studying consumer purchase decisions,competitive pricing strategies,and welfare economics. Managerially,knowledge of consumer reservation prices is critical for implementing many pricing tactics such as bundling,tar get promotions,nonlinear pricing,and one-to-one pricing,and for assessing the impact of marketing strategy on demand. Despite the practical and theoretical importance of this concept, its measurement at the individual level in a practical setting proves elusive. We propose a conjoint-based approach to estimate consumer-level reservation prices. This approach integrates the preference estimation of traditional conjoint with the economic theory of consumer choice. This integration augments the capability of traditional conjoint such that consumers' reservation prices for a product can be derived directly from the individuallevel estimates of conjoint coefficients. With this augmentation,we can model a consumer's decision of not only which product to buy,but also whether to buy at all in a category. Thus, we can simulate simultaneously three effects that a change in price or the introduction of a new product may generate in a market: the customer switching effect,the cannibalization effect,and the market expansion effect. We show in a pilot application how this approach can aid product and pricing decisions. We also demonstrate the predictive validity of our approach using data from a commercial study of automobile batteries.