ArticlePDF Available

Heterogeneity Distributions of Willingness-to-Pay in Choice Models

February 2007
Quantitative Marketing and Economics 5(3):313-331

February 2007
5(3):313-331

DOI:10.2139/ssrn.928412

Source
RePEc

Authors:

Thomas Otter

Goethe-Universität Frankfurt am Main

We investigate direct and indirect specification of the distribution of consumer willingness-to-pay (WTP) for changes in product attributes in a choice setting. Typically, choice models identify WTP for an attribute as a ratio of the estimated attribute and price coefficients. Previous research in marketing and economics has discussed the problems with allowing for random coefficients on both attribute and price, especially when the distribution of the price coefficient has mass near zero. These problems can be avoided by combining a parameterization of the likelihood function that directly identifies WTP with a normal prior for WTP. We show that the typical likelihood parameterization in combination with what are regarded as standard heterogeneity distributions for attribute and price coefficients results in poorly behaved posterior WTP distributions, especially in small sample settings. The implied prior for WTP readily allows for substantial mass in the tails of the distribution and extreme individual-level estimates of WTP. We also demonstrate the sensitivity of profit maximizing prices to parameterization and priors for WTP.

Root mean squared error

Attributes and levels

…

WTP estimates for sedan data (standard errors in parentheses)

…

Figures - uploaded by Thomas Otter

Content may be subject to copyright.

Content uploaded by Thomas Otter

Content may be subject to copyright.

Heterogeneity distributions of willingness-to-pay

in choice models

Garrett Sonnier &Andrew Ainslie &Thomas Otter

Received: 3 January 2006 / Accepted: 8 March 2007 /

Published online: 9 August 2007

#Springer Science + Business Media, LLC 2007

Abstract We investigate direct and indirect specification of the distribution of

consumer willingness-to-pay (WTP) for changes in product attributes in a choice

setting. Typically, choice models identify WTP for an attribute as a ratio of the

estimated attribute and price coefficients. Previous research in marketing and

economics has discussed the problems with allowing for random coefficients on both

attribute and price, especially when the distribution of the price coefficient has mass

near zero. These problems can be avoided by combining a parameterization of the

likelihood function that directly identifies WTP with a normal prior for WTP. We

show that the typical likelihood parameterization in combination with what are

regarded as standard heterogeneity distributions for attribute and price coefficients

results in poorly behaved posterior WTP distributions, especially in small sample

settings. The implied prior for WTP readily allows for substantial mass in the tails of

the distribution and extreme individual-level estimates of WTP. We also demonstrate

the sensitivity of profit maximizing prices to parameterization and priors for WTP.

Keywords Bayesian analysis .Choice modeling .Willingness-to-pay

JEL classification C11 .M31

Quant Market Econ (2007) 5:313–331

DOI 10.1007/s11129-007-9024-6

G. Sonnier

University of Texas at Austin, 1 University Station, Austin, TX 78712, USA

e-mail: garrett.sonnier@mccombs.utexas.edu

A. Ainslie (*)

UCLA, 110 Westwood Plaza, Los Angeles, CA 90095, USA

e-mail: andrew.ainslie@anderson.ucla.edu

T. Otter

Ohio State University, 2100 Neil Avenue, Columbus, OH 43210, USA

e-mail: otter_2@cob.osu.edu

1 Introduction

Markov chain Monte Carlo (MCMC) techniques to simulate from a distribution have

facilitated the exploration of any aspect of the distribution under study, including the

distribution of any transformation of random variables. Once a sample of random

variables from the distribution is available, change-of-variable calculus is no longer

needed to derive the distribution of the transformation. The researcher is free to

empirically explore the posterior distribution of the transformation by simply

computing the transformation on each iteration of the sampler (e.g., Edwards and

Allenby 2003). This technique is referred to as “post-processing”MCMC draws.

Posterior summaries of any transformation of parameters are easily obtained from

the MCMC output. This can be particularly advantageous when the estimation

problem is difficult or intractable in the space of interest, but tractable in another

space. Post processing readily allows the researcher to move between the two spaces.

While the advantages of post-processing techniques are well-documented, less

attention has been paid to the fact that the priors used for the parameters in one space

necessarily form an implied prior on the transformed parameters.

It is well known that in a Bayesian model, a change in the likelihood param-

eterization must be reflected in the prior to leave the posterior predictive density

unchanged. Hierarchical models introduce a prior distribution on the parameters

across the observational units. Changes in the parameterization of the full conditional

likelihood will alter the predictive density of the hierarchical model unless the prior

distribution is adapted accordingly. In applied work, the choice of parameterization is

often viewed in isolation of the prior distribution, which is typically chosen for

analytic convenience (e.g., conjugacy). However, a convenient and diffuse prior in

one space does not necessarily result in an equivalent implied prior in the transformed

space (Rossi et al. 2005).

An interesting example of transforming model parameters with relevance to

marketing and economics occurs when estimating the willingness-to-pay (WTP) for

changes in product attributes using choice data. In this paper, we contrast two

approaches to estimating the distribution of WTP with choice models. In the first

approach WTP is defined as the ratio of attribute and price parameters and the

implied prior distribution for WTP is a function of the priors for these parameters.

The posterior of the WTP distribution is explored empirically via post-processing.

This is consistent with the work of Meijer and Rouwendal (2006), which investigates

the properties of WTP defined as a ratio for different distributions of the numerator

and the denominator. The second approach re-parameterizes the full conditional

likelihood to directly identify WTP. This allows the researcher to directly implement

a prior for WTP. We demonstrate the sensitivity of inferences about WTP to different

parameterizations in combination with what are regarded as standard assumptions

about the hierarchical prior (i.e., the heterogeneity distribution). The sensitivity is

particularly pronounced in small sample settings. We show how a normal prior

directly specified for WTP results in better inferences. Moreover, not only is the

posterior of WTP sensitive to the different parameterization and prior assumptions,

but so are all marketing actions derived from the distribution of WTP, such as the

setting of profit maximizing prices. The results illustrate the practical importance of

paying attention to implied priors.

314 G. Sonnier et al.

Implied priors can be problematic even if there is no interest in the transformed

parameters themselves. In choice-based conjoint (CBC), for example, the analyst

may not be interested in the model coefficients or WTP, per se, but rather in using

the model to analyze demand and pricing policies. The reservation and equalization

prices, which completely characterize incidence and switching behavior in response

to price changes, are a function of attributes and attribute WTPs (Jedidi and Zhang

2002).

If the distributions of reservation and equalization prices are impacted by the

implied priors on WTP, demand estimates and price policies will be dramatically

affected, as we will demonstrate. While certain point estimates of the distribution

may be less influenced by the implied prior (e.g., the median versus the mean), any

investigation of the nature of consumer demand will require the researcher to

consider more than just a particular statistic.

The organization of the remainder of the paper is as follows: Section 2presents

two parameterizations of choice models that result in equivalent full conditional

likelihoods. It then discusses how the two parameterizations result in different prior

predictive and posterior densities depending on the choice of the prior, particularly

once one introduces heterogeneity. Section 3illustrates the size of the effect using

simulated data. Section 4presents the results from two CBC studies. Section 5

summarizes and offers a brief discussion on the role of prior information in conjoint

analysis.

2 Utility and surplus maximization

2.1 Equivalence of likelihood functions

Consider first consumers’discrete choice problem as that of maximizing an indirect

utility function. We have consumers choosing among alternatives on each of

occasions. Let V*

ijt denote consumer i’s indirect utility for alternative jon choice

occasion t. It is assumed that indirect utility can be expressed as a linear function of

the alternative’s non-price attributes, x

ijt

, income y

and price p

ijt

ijt ¼x0

ijtϕ*þγ*y

ipijt



þ"*ijt with V*

i0t¼"*

i0t:ð1Þ

We assume the error terms are independent and identically distributed according

to a type I extreme value distribution. For exposition, we initially leave the scale

parameter as unknown, "*

ijt EV 0;μðÞ. It is well-known that multiplying the

indirect utility function for each choice by a constant does not change the utility

maximizing alternative. Thus, V*

ijt must be normalized, which is typically

accomplished by standardizing the error distribution to be EV (0, 1) such that

Vijt ¼x0

ijtϕþγyipijt



þ"ijt ¼x0

ijtϕ*

μþγ*

μyipijt



þ"*

ijt

μ:ð2Þ

The reservation price is the price that induces indifference between purchase and non-purchase in the

category. The equalization price (Swait et al.1993) is the price that induces indifference between two

choice alternatives within the category.

Heterogeneity distributions of willingness-to-pay 315

The familiar MNL choice probabilities take the form

Pru

ijt ¼

exp x0

ijtϕ*γ*pijt



1þP

m¼1

exp x0

imtϕ*γ*pimt

5¼

exp x0

ijtϕγpijt

1þP

m¼1

exp x0

imtϕγpimt

½

5ð3Þ

where the superscript udenotes the probability obtained using the utility model.

WTP for an improvement in x

ijkt

, the kth attribute of alternative j, is the price change

that would leave the individual indifferent between the alternative with the new level

and the alternative with the original level. For continuous x, we have @Vijt ¼

ϕk@xijkt g@pijt ¼0 and the change in price that keeps utility constant given a

change in attribute kis ϕk

γ¼ϕ*

γ*(Train 2003). Note that the scale parameter μdrops

out of the WTP.

We can re-parameterize the indirect utility function in (1) by dividing through by γ*.

ijt

γ*¼x0

ijt

ϕ*

γ*þyipijt



þ"*

ijt

γ*

Cijt ¼x0

ijt βþyipijt



þηijt

ð4Þ

In this reparameterization, C

ijt

is consumer i’s surplus from good jon purchase

occasion t(Jedidi et al. 2003). Surplus is determined in part by the attributes of the

products in the set, x

ijt

, and the WTP for the attributes, β. Consumers arrive at their

choices by maximizing the surplus (i.e., the difference between the monetary value

of the attribute bundle and the price to acquire the bundle) among the Jalternatives

in a set on occasion t. The MNL choice probability associated with the surplus

model is

Prs

ijt ¼

exp x0

ijtβpijt



1þP

m¼1

exp x0

imtβpimt

5ð5Þ

where the superscript sdenotes the surplus model. The probability expressions in

Eqs. (3) and (5) are equivalent over the range of parameters for which the

transformations b¼ϕ

gand m¼1

gare well defined. In the case of maximum

likelihood (ML) estimation, the Invariance Property of the ML estimator ensures that

precisely the same point estimates of WTP will be achieved regardless of whether

the likelihood is based on (3)or(5) (Cameron and James 1987).

2.2 Bayesian analysis, priors, and posterior distributions for WTP

The ML estimator of the WTP ratio, defined as the ratio of the ML estimates of

and γ, does not possess finite moments and has infinite risk relative to quadratic and

many other loss functions (Zellner 1978). In a Bayesian framework the problems

associated with the ML estimator are alleviated by the introduction of informative

prior distributions. The model thus consists of the full conditional likelihood for the

data and the prior distribution for the model parameters. A hierarchical prior

distribution defined on the positive real line for γsolves the problem of positive

316 G. Sonnier et al.

WTP for a decrease in utility and ensures that the prior and posterior moments of the

ratio are finite. The prior and posterior for the ratio is implied by the same for the

numerator and denominator.

In the context of random coefficient models, Meijer and Rouwendal (2006)

discuss the properties of the WTP ratio for a number of different distributions for

and γ. Only in special cases (e.g., a log-normal distribution for both coefficients)

does the ratio of coefficients follow the same distribution as the coefficients. This

implies that, generally, the distributional form of the prior used with the likelihood in

(5) will differ from that implied by the prior for

and γ. Thus, unlike ML estimates

of WTP from the homogenous model, the posterior distribution of WTP formed by

mixing the likelihoods in (3) and (5) with priors for the respective coefficients will

generally result in distinct posterior WTP distributions and distinct characterizations

of demand as a function of price. What discrepancy can we expect from these two

approaches? To the extent that the data overwhelm the prior, the posterior WTP

distributions from the two approaches will converge despite the differences in the

prior. This will generally happen for models that impose homogeneity on the

coefficients. The more interesting case occurs with hierarchical models.

In hierarchical models, we typically encounter many units (e.g., consumers) and

relatively few observations per unit. Thus, the full conditional likelihood of any one

consumer is informed by a limited amount of data and the prior distribution will

generally have much more influence on the posterior compared with a homogenous

model. A hierarchical model may build on either parameterization, using either of

the likelihood functions in Eqs. (3)or(5) as the full conditional likelihood.

The

likelihood in Eq. (3) in combination with a (hierarchical) prior for γ

that has positive

density arbitrarily close to zero readily accommodates respondents that do not appear

to be sensitive to price. Such respondents can in turn have a tremendous influence on

the posterior WTP distribution implied by the model and thus on any characteriza-

tion of demand as a function of price. Hierarchical models with likelihood functions

built on Eq. (5), measure WTP directly by β

. An advantage of this formulation is

that a hierarchical prior for WTP can be specified directly.

For example, a normal

prior for β

will place less mass on absolutely large WTP values.

The problems we outline with the WTP ratio are neither unique to choice models

nor WTP. They apply to any quantity that can be defined as a ratio of model

parameters. However, estimation of WTP (and the related concept of reservation

price) is a particularly relevant problem in marketing and economics. Recently, the

marketing literature has sharpened its focus on the study of WTP and reservation

prices because of the direct implications for pricing strategy (Jedidi and Zhang 2002;

Jedidi et al. 2003; Shaffer and Zhang 1995,2000). The economics literature has

recognized the potential problems with random coefficient ratio estimates of WTP

(Meijer and Rouwendal 2006; Revelt and Train 1998). Marketing practitioners have

We adopt a fully Bayesian approach to inference in this paper. However, the points made apply in the

context of hierarchical models independent of the estimation technique.

Another motivation for directly parameterizing the model in WTP terms is that the researcher often has

variables such as demographics that may be useful in the characterization of consumer heterogeneity. In

such instances it seems more reasonable to build hierarchical regression structures for WTP instead of

parameters with less clear interpretation. We do not explore this issue here.

Heterogeneity distributions of willingness-to-pay 317

also recognized the problems, advocating use of the median as a summary of the

posterior WTP distribution (Orme 2001). While the median will likely be a more

robust statistic, Bayesian decision theoretic analyses of demand as a function of price

aimed at identifying optimal actions rely on the entire posterior distribution of WTP.

To the extent that the posterior distribution of WTP is sensitive to the prior

assumptions, so will the optimal action.

2.3 Optimal pricing

Ignoring the implied prior on WTP can adversely impact demand and price analyses.

Firms often use the model coefficients estimated from CBC data to build market

share simulators, which are useful for assessing response to price changes and

optimal pricing. Given a set of non-price attributes, market share (and demand) can

be completely characterized by consumer surplus. Consumers choose the inside

alternative that yields the maximum surplus and forgo a category purchase if the

surplus from the best alternative is less than the surplus generated by the outside

alternative. The price that determines the incidence and choice decisions is the

reservation price, e

pijt, which induces indifference between buying alternative jand

forgoing a category purchase. For the surplus model, e

pijt ¼x0

ijtbiþηijt ηi0t



Importantly, any proper indirect utility function implies a function for e

pijt. In our

case, the reservation price from the utility model is e

pijt ¼x0

ijtϕiþ"ijt "i0t

ðÞ

γi

. From this

equation, we can see that the change in the reservation price given a change in an

attribute is given by the WTP for that attribute.

If the no-buy option is not included in the CBC experiments, the reservation price

is not identified. In this case, what we can identify is the equalization price p



ijt which

is the price for good jthat equalizes the surplus generated by goods jand j′. For the

surplus model, p



ijt ¼xijt xij0t



0biþpij0thijt hij0t



. Again, any proper indirect

utility function implies a function for p



ijt. In our case, p



ijt ¼xijtxij 0t

ðÞ

0ϕiþγipij0tþ"ijt "ij 0t

ðÞ

γi.

Consider now using the utility or surplus models to find the profit maximizing

price for firm j, taking the competing firms’prices as given. To the extent that the

utility model a priori puts greater mass on extreme WTP values, the posterior

distribution of reservation and equalization prices will also be thick tailed. This is

especially so in sparse data environments, and implies that the firm could continue to

raise prices and still find consumers willing to purchase. Thus, inference about the

profit maximizing prices based on the posterior distributions of the parameters will

depend on the model.

3 A simulation study

We more closely investigate the properties of the two approaches to WTP estimation

in the following simulation study. We generate four data sets, two each from the

utility and surplus models, which we will refer to as D1, D2, D3, and D4. For the

utility model data sets, D1 and D2, we assume the following population distribution,

ΦiNΦ;ΣΦ



, where Φi¼ϕ0

ilog gi

ðÞ



0. For both D1 and D2, the covariance

matrix Σqis assumed to be diagonal and we choose parameters such that the

distribution of γ

is centered near 1. For D1, we allow for some mass of the

318 G. Sonnier et al.

distribution of γ

to be near zero by choosing a large value for the variance of log

(γ

). For D2, we choose the variance of log (γ

) such that γ

is tightly distributed

around one, with little to no mass near zero. In the case of the former, some

individual-level WTPs will be extremely large for values of γ

→0 while in the latter,

the distribution of WTP should be closer to normal. For the datasets generated by the

surplus model, D3 and D4, we assume qiNq;Σq



where qi¼b0

ilog mi

ðÞ



In this case, the distribution of WTP is specified directly. For D3, we choose

parameters such that μ

is, on average, larger and the deterministic component of

surplus has relatively lower explanatory power. For D4, we choose parameters such

that μ

is, on average, smaller, translating into more extreme choice probabilities.

For all models, we assume 300 individuals choosing amongst three alternatives

and an outside good on each of 15 choice occasions. The covariates include three

alternative specific constants, a discrete attribute with four levels, and a price. Each

alternative is created by randomly choosing a level of the discrete attribute and a

price from the range [1.5–2.5] (in increments of 0.1). Tables 1and 2contain the

parameters of the distributions used to generate the four data sets. We retain the last

choice of each simulated respondent to create a holdout sample. Using MCMC

methods, we estimate the utility and surplus models on each of the four data sets, for

a total of eight sets of results. The details of the sampler have been reported

elsewhere (e.g., Allenby and Lenk 1994; Arora et al. 1998; Train 2003). We use a

normal-inverted Wishart hyper-prior structure for the population distribution

parameters qand Σq. The prior on qis set to N0KðÞ

;106IKKðÞ



. The prior on

Σqis set to IW K þ1;IKKðÞ



. These are proper but diffuse priors. We use identical

priors for Φand ΣΦin the utility model.

Table 1 Data generating parameters, utility model data sets

Data set D1 D2

Mean Variance Mean Variance

−0.5 1 −0.5 1

11 11

1.5 1 1.5 1

0.5 1 0.5 1

0.75 1 0.75 1

11 11

Log (γ

)−1 2 0 0.2

Table 2 Data generating parameters, surplus model data sets

Data set D3 D4

Mean Variance Mean Variance

−0.5 1 −0.5 1

11 11

1.5 1 1.5 1

0.5 1 0.5 1

0.75 1 0.75 1

11 11

log (μ

) 1 0.1 −1 0.5

Heterogeneity distributions of willingness-to-pay 319

For the utility models, we compute the individual-level WTPs as ϕi

gion each

iteration of the sampler. For the surplus models, draws of the individual-level WTPs

are directly available. We compute the mean absolute error (MAE) and the root

mean-squared error (RMSE) between the true and estimated WTPs on each iteration

of the sampler and report the means over iterations. Using the harmonic mean

estimator (Newton and Raftery 1994), we compute the log marginal density (LMD)

statistic for each model. We also report the deviance information criteria (DIC)

(Spiegelhalter et al. 2002) and the log predictive density (LPD) of the holdout data.

Tables 3,4and 5presents the results of our simulation study. D1 and D2 are

generated according to the heterogeneous utility model. D1 contains individuals with

price coefficients near zero and thus extremely large WTPs. Relative to the other

conditions, the error statistics are quite high in this setting. As evidenced by smaller

RMSE and MAE , note that the surplus model has more accurate recovery of the true

WTPs, even though the utility model is consistent with the data generating process.

In terms of fit statistics, the LMD, DIC and LPD all favor the utility model. In D2,

the distribution of the price coefficient has most of its mass away from zero. Again,

the surplus model has lower RMSE and MAE. The LMD and LPD favor the utility

model, while the DIC favors the surplus model. Thus, even when the true WTPs are

a ratio of random coefficients, the surplus model more accurately recovers the true

WTPs compared with the utility model under a range of population distribution

parameters.

Table 3 Root mean squared error

Data set D1 D2 D3 D4

Data generation Utility Utility Surplus Surplus

Model Utility Surplus Utility Surplus Utility Surplus Utility Surplus

WTP

48.54 45.68 2.43 1.55 26.06 1.60 1.52 1.19

WTP

67.61 64.82 1.53 1.30 16.98 1.36 1.04 0.85

WTP

91.83 91.25 1.42 1.19 16.31 1.36 1.06 0.72

WTP

26.78 24.82 1.74 1.28 16.90 1.60 1.33 0.95

WTP

41.58 38.96 1.68 1.33 16.97 1.46 1.37 0.95

WTP

33.42 29.87 1.65 1.27 19.16 1.73 1.13 0.83

Average RMSE 51.63 49.23 1.74 1.32 18.73 1.52 1.24 0.92

Table 4 Mean absolute error

Data Set D1 D2 D3 D4

Data generation Utility Utility Surplus Surplus

Model Utility Surplus Utility Surplus Utility Surplus Utility Surplus

WTP

11.88 10.25 1.52 1.21 14.64 1.29 1.10 0.94

WTP

14.85 12.32 1.05 1.01 9.15 1.08 0.75 0.66

WTP

18.01 15.92 0.97 0.89 8.44 1.08 0.73 0.56

WTP

8.58 7.10 1.17 1.00 9.05 1.27 0.94 0.74

WTP

10.55 8.64 1.16 1.03 9.14 1.16 0.97 0.75

WTP

10.91 8.95 1.13 0.95 10.49 1.38 0.82 0.66

Average MAE 12.46 10.53 1.17 1.01 10.15 1.21 0.89 0.72

320 G. Sonnier et al.

D3 and D4 are generated with the heterogeneous surplus model. In D3, the true scale

parameter μ

is, on average, larger. In this setting, the utility model estimates of WTP

are particularly error-prone. Once more, the surplus model is better at recovering the

true WTP parameters. Interestingly, the LMD statistic favors the utility model, despite

the lack of recovery of the true WTPs.

The DIC and LPD favor the surplus model. In

D4, the scale parameter is, on average, smaller. Relative to D3, the utility model does

a better job of recovering the WTPs here, but again, the surplus model has more

accurate WTP recovery. All three of the fit statistics favor the surplus model.

In summary, the surplus models always recover the true WTPs with more

accuracy, regardless of the data generating mechanism. Even when the true WTPs

are distributed as a ratio of random coefficients, the ratio estimator does not recover

the true WTPs as accurately as simply directly specifying a prior on WTP. We

attribute this to the fact that the surplus model employs a more sensible prior

distribution for WTP.

4 Two CBC studies

Using CBC data sets provided to us by firms in the camera and automotive

categories, we replicate the findings from our simulation study in the sense that the

posterior of WTP from the utility model is rather different from the posterior

obtained from the surplus model. Moreover, inferences obtained with the utility

model lack face validity. Table 6presents the attributes and levels involved in the

design of each study.

4.1 Data and models

The first data set is CBC data on midsize sedans. The data were provided by a major

automotive manufacturer. Respondents qualified for participation in the study on the

basis of the vehicle they currently own, their intention to purchase a midsize sedan,

and other socio-economic information. A total of 333 respondents participated in the

study. Each respondent completed 15 choice tasks, with each task consisting of three

sedans. The no-buy option was not included in this study. The second data set is

Table 5 Model fit statistics

Data set D1 D2 D3 D4

Data generation Utility Utility Surplus Surplus

Model Utility Surplus Utility Surplus Utility Surplus Utility Surplus

LMD −3532.20 −3553.00 −3942.80 −3946.30 −5309.60 −5428.90 −2493.40 −2340.50

DIC 3859.20 3871.10 4324.80 4318.30 5570.70 5497.90 2913.60 2751.50

LPD −287.47 −296.64 −322.79 −331.15 −410.41 −403.82 −236.34 −231.89

This is similar to previous simulation studies in the literature that find the harmonic mean estimator of

the LMD sometimes favors models with relatively poor parameter recovery (Andrews et al. 2002; Liechty

et al. 2005).

Heterogeneity distributions of willingness-to-pay 321

CBC data on cameras. The study was conducted by the Eastman Kodak Company to

assess the market for a new camera format, the Advanced Photo System (APS). A

detailed description of the data is given by Gilbride and Allenby (2004). A total of

302 respondents participated in the study. Each respondent completed 14 choice

tasks, with each task consisting of three 35 mm cameras, three APS cameras, and a

no-buy option. Some attributes were available only on the APS camera, and price

was nested within camera type.

Table 6 Attributes and levels

Camera data Sedan data

Attribute Levels Attribute Levels

Body Style Low Make/Model Ford Taurus

Medium Toyota Camry

High Nissan Maxima

Honda Accord

VW Passat

Mid-roll change

None Engine 4 cylinder; 1.8 L;

150 HP

Manual 4 cylinder; 2.4 L;

160 HP

Automatic 6 cylinder; 3.0 L;

155 HP

6 cylinder; 3.0 L;

222 HP

Annotation

None Audio and navigation Standard Audio

Pre-set List Premium Audio

Customized List Premium Audio

with Navigation

Custom Input

Method 1

Custom Input

Method 2

Custom Input

Method 3

Camera operation

feedback

No Antilock Brakes (ABS) No

Yes Yes

Zoom None Side Door/Window Curtain

Airbags(CAB)

2X Yes

Viewfinder Regular Vehicle Skid Control (VSC) No

Large Yes

Camera settings

feedback

None

LCD

Viewfinder

LCD and Viewfinder

Price (nested within

camera type)

from $41 to $499 Price $17,400

$18,900

$20,400

$21,900

$23,400

$24,900

$26,400

feature only available on APS

322 G. Sonnier et al.

For both data sets, we model consumer i’s surplus for alternative jat choice

occasion tas a linear function of non-price attributes, attribute WTPs, and price

Cijt ¼x0

ijtβipijt þ"ijt "ijt EV 0;μi

ðÞ

θiNθ;Σθ



θi¼β0

ilog μi

ðÞ

0:ð6Þ

For the camera data, we set the deterministic component of the surplus for the no-

buy option to zero. For identification, the lowest level of each attribute is dropped

(with the exception of the body type attribute since the baseline is the “no-buy”

option). We use the negative of price (in $100s) in the likelihood. The coding

scheme is the same as that employed by Gilbride and Allenby (2004), and results in

a total of K=18 parameters. For the sedan data, the make/model “VW Passat”is

dropped, as are the lowest level of each of the remaining non-price attributes. This

results in a total of K=13 parameters. We use the negative of price (in $1,000s) in

the likelihood.

For both data sets, we compare estimates of the distribution of WTP from the

surplus model with that of the linear utility model, where θ

is replaced with

Φi¼ϕ0

ilog gi

ðÞ



0. Here, the choice probabilities are based on (3). The same

normal-inverted Wishart hyper-prior structure used for qand Σqis used for hyper-

priors on the population parameters Φand ΣΦ. The linear utility model requires we

calculate the WTP from the model parameters using the ratio transformation. On each

iteration of the sampler, we compute the ratio ϕi

giusing the draws of the individual level

parameters. We then compute the mean, median, and standard deviation over

individuals, and report the mean of these quantities over iterations of the sampler.

For both data sets, the samplers are run for 20,000 iterations. We keep the last 5,000

iterations for posterior inference. Parameter estimates are calculated with T−1choice

tasks. We keep the last task for each individual to assess holdout performance via

LPD. To assess in-sample performance, we compute the LMD and DIC statistic for

each model.

4.2 Results

In Tables 7and 8we report the mean and standard deviation of the distribution of

WTP for the utility and surplus models. Posterior standard deviations of the reported

statistics are in parentheses. Table 7contains the results from the sedan data while

Table 8contains the results from the camera data. The mean and standard deviation

of the population distribution of WTP are dramatically affected by the priors. For the

sedan data, the means of the utility model estimates are two to three times the

magnitude of the surplus model. The WTP distributions are also far more dispersed,

with standard deviations that are five to six times larger. For the camera data, the

means are also much larger for the utility model. However, most of the standard

deviation estimates for the utility model have large posterior standard deviations. For

both data sets, the median of the population distribution is much less sensitive to the

prior than the mean or standard deviation.

The in-sample fit statistics are somewhat mixed. In both data sets, the LMD

strongly favors the utility model. This result echoes that of the third synthetic data

Heterogeneity distributions of willingness-to-pay 323

set, D3, which was generated by the surplus model. In this case, the LMD strongly

favored the utility model despite its inconsistency with the data generating process

and its extremely poor parameter recovery. The DIC favors the utility model in the

sedan data and the surplus model in the camera data. In contrast to the in-sample fit

measures, the LPD favors the surplus model in both the sedan data and the camera

data, indicating the surplus model has superior out-of-sample performance. We will

now examine more closely the distribution of WTP and optimal prices implied by

the two models. From this vantage point, the differences across the two models are

less ambiguous.

It is evident that the utility and surplus models result in dramatically different

estimates of the distribution of WTP. The utility model estimates seem to be

implausible and not reflective of consumers’monetary valuation of product

attributes. Figure 1presents boxplots of the individual-level make/model WTP

estimates for both the utility and surplus models. These are measured relative to the

VW Passat and can be interpreted as equalization prices; the relative price difference

that equalizes the utility of comparably equipped competitive sedans and the Passat.

The median of the utility model’s individual level WTP estimates for the three

Table 7 WTP estimates for sedan data (standard errors in parentheses)

WTP ($1,000’s) Utility model Surplus model

ϕi

giβ

Attribute Level Mean Median Std dev Mean Median Std dev

Make/Model Ford Taurus −5.02 −2.20 46.54 −2.23 −2.27 10.06

(2.76) (0.52) (12.80) (0.32) (0.43) (0.29)

Toyota Camry 16.23 6.76 47.87 7.00 6.74 9.73

(3.43) (0.58) (15.86) (0.23) (0.36) (0.37)

Nissan Maxima 12.42 4.82 46.88 5.13 5.12 9.35

(3.46) (0.52) (15.92) (0.27) (0.42) (0.40)

Honda Accord 9.25 3.70 42.10 3.96 3.81 8.48

(3.06) (0.48) (13.68) (0.26) (0.31) (0.25)

Engine 4-cyl; 2.4L; 160HP 4.88 1.98 14.27 1.82 1.82 2.00

(1.12) (0.39) (5.50) (0.37) (0.37) (0.25)

6-cyl; 3.0L; 155HP 7.89 3.78 16.61 3.67 3.60 3.48

(1.33) (0.49) (5.65) (0.34) (0.33) (0.36)

6-cyl; 3.0L; 222HP 9.89 5.21 20.25 4.87 4.78 3.95

(1.39) (0.51) (6.85) (0.23) (0.24) (0.38)

Audio Premium Audio 2.52 1.10 9.29 1.34 1.31 1.56

(1.02) (0.26) (3.61) (0.21) (0.20) (0.21)

Premium Audio w/Navi 3.54 1.57 10.54 1.61 1.62 2.02

(1.12) (0.26) (4.20) (0.24) (0.25) (0.21)

Safety features Antilock Brakes 4.78 2.11 12.23 2.08 2.07 2.21

(1.05) (0.26) (4.20) (0.20) (0.20) (0.18)

Side Curtain Airbags 2.89 1.22 9.36 1.18 1.17 1.52

(0.88) (0.26) (3.46) (0.19) (0.21) (0.27)

Vehicle Skid Control 2.83 1.28 9.64 1.29 1.28 1.42

(0.82) (0.23) (3.59) (0.26) (0.26) (0.24)

Fit statistics

LMD −2220.4 −2665.8

DIC 2,966.4 3,044.3

LPD −340.7 −333.5

324 G. Sonnier et al.

Japanese make/models are near or in excess of the range of prices shown to

respondents.

This is not being caused by a just a handful of respondents with

estimates of γ

near zero. The 75th percentiles for the individual-level equalization

price between Toyota Camry vs. VW Passat and Nissan Maxima vs. VW Passat are

approximately $30,428 and $24,684, respectively. The retail price of the Passat is

about $23,000. This implies that a quarter of the respondents would require Passat to

Table 8 WTP estimates for camera data (standard errors in parentheses)

Attribute WTP ($100’s) Utility model Surplus model

ϕi

giβ

Level Mean Median Std dev Mean Median Std dev

Body style Low −25.78 −4.67 84.40 −5.47 −5.51 4.90

(6.74) (0.82) (54.41) (0.38) (0.43) (0.36)

Medium −10.15 0.49 56.60 1.07 1.14 3.59

(4.52) (0.28) (44.88) (0.22) (0.25) (0.38)

High −7.82 0.33 48.09 0.85 0.80 3.78

(3.83) (0.27) (36.43) (0.27) (0.31) (0.64)

Mid-roll change Manual 1.96 0.44 14.02 −0.36 −0.36 2.84

(1.55) (0.24) (6.12) (0.50) (0.48) (0.26)

Automatic 3.93 0.52 16.99 0.80 0.71 2.40

(1.21) (0.12) (10.99) (0.12) (0.14) (0.14)

Annotation Pre-set list 1.51 0.38 7.32 0.34 0.33 1.01

(0.60) (0.13) (3.39) (0.08) (0.09) (0.12)

Customized list 3.25 0.98 9.47 1.21 1.19 1.09

(0.76) (0.16) (4.80) (0.11) (0.12) (0.14)

Custom Input Method 1 4.10 −0.28 22.63 −1.44 −1.48 3.37

(1.62) (0.17) (16.02) (0.24) (0.28) (0.24)

Custom Input Method 2 4.64 1.06 14.71 1.23 1.22 1.73

(1.12) (0.16) (8.49) (0.11) (0.13) (0.18)

Custom Input Method 3 2.09 −0.78 19.75 −0.99 −1.03 2.58

(1.35) (0.15) (11.85) (0.31) (0.32) (0.28)

Operation feedback Feedback 2.65 0.58 10.50 0.85 0.82 1.72

(0.86) (0.12) (4.85) (0.25) (0.25) (0.10)

Zoom ×2 Zoom 5.47 1.95 14.96 2.70 2.70 2.09

(1.52) (0.25) (8.04) (0.18) (0.19) (0.15)

×4 Zoom 7.66 2.50 20.70 3.11 3.13 3.13

(2.12) (0.41) (10.42) (0.24) (0.26) (0.33)

Viewfinder Large viewfinder 0.40 −0.18 8.91 −0.47 −0.46 1.76

(0.75) (0.10) (4.23) (0.17) (0.16) (0.22)

Settings feedback LCD 4.30 1.12 13.27 −0.23 −0.23 1.30

(1.19) (0.20) (8.30) (0.16) (0.16) (0.16)

Viewfinder 4.49 1.20 13.42 −0.16 −0.15 1.59

(1.07) (0.19) (7.99) (0.16) (0.16) (0.19)

LCD and Viewfinder 5.33 1.51 14.92 0.21 0.21 1.53

(1.24) (0.24) (9.23) (0.17) (0.18) (0.16)

Fit statistics

LMD −3879.7 −4100.4

DIC 4,633.6 4,620.4

LPD −438.5 −434.7

Note that the median of the posterior mean of the individual-level estimates will not be the same as the

posterior mean of the median of the population distribution.

Heterogeneity distributions of willingness-to-pay 325

Make/Model WTP Estimates: Utility Model

-$175

-$125

-$75

-$25

$25

$75

$125

$175

Taurus Camry Maxima Accord

($1,000's)

Make/Model WTP Estimates: Surplus Model

-$30

-$20

-$10

$10

$20

$30

Taurus Camry Maxima Accord

($1,000's)

Fig. 1 Boxplots of individual-level make/model WTP estimates

Table 9 Attributes and levels for optimal pricing, sedan data

Attribute Levels Attributes and levels for

alternatives

12345

Make/Model Ford Taurus X

Toyota Camry X

Nissan Maxima X

Honda Accord X

VW Passat X

Engine 4 cylinder; 1.8 L; 150 HP X

4 cylinder; 2.4 L; 160 HP X X

6 cylinder; 3.0 L; 155 HP X

6 cylinder; 3.0 L; 222 HP X

Audio and navigation Standard Audio X X X X X

Premium Audio

Premium Audio with Navigation

Antilock brakes No X

Yes X X X X

Side door/Window curtain airbags No

Yes X X X X X

Vehicle skid control No X

Yes X X X X

Price ($1,000) $20.1 $20.7 $24 $20.5 $22.5

326 G. Sonnier et al.

have a zero price as well as a cash subsidy to induce indifference with a similarly

equipped Camry or Maxima, which does not seem credible.

In the camera study, the individual-level estimates of WTP implied by the utility

models also seem lacking in face validity. For example, the surplus model estimate

of the median of the individual-level WTP estimates for a 2× zoom lens is $295,

with demand essentially zero at prices exceeding $550. According to the utility

model, the median of the individual level WTPs is $322. At a price of $550, 32% of

respondents are still in the market. A quarter of respondents have WTP estimates in

excess of $750. Demand does not reach zero until prices exceed $3,000. These

estimates of WTP seem unreasonably high. Furthermore, any analysis of demand

should take into account the uncertainty in the individual-level estimates. We now

turn our attention to such an analysis.

Table 10 Attributes and levels for optimal pricing, camera data

Attributes and levels for alternatives

Attribute Levels 1 2 3

Body style Low X

Medium X

High X

Mid-roll change None

Manual X

Automatic X X

Annotation None

Pre-Set List X

Customized List

Custom Input Method 1 X

Custom Input Method 2 X

Custom Input Method 3

Camera operation feedback No X

Yes X X

Zoom None

2× X

4× X X

Viewfinder Regular X X

Large X

Camera settings feedback None

LCD

Viewfinder

LCD & Viewfinder X X X

Price (nested within camera type) from $41 to $499 $100 $225 $400

Table 11 Market shares, sedan scenario

Taurus Camry Maxima Accord Passat

Price $20.1 $20.7 $24 $20.5 $22.5

Utility (%) 16 36 19 21 8

Surplus (%) 16 36 18 22 9

Heterogeneity distributions of willingness-to-pay 327

4.3 An optimal pricing exercise

For the utility model, profits from alternative jin scenario zcan be written as

puz

jΦi;xz

j;pj¼Puz

jΦi;xz

j;pj



pjcj

 ð7Þ

We seek the price puz*

jthat maximizes the firm’s expected profit,

EΦπuz

jΦi;xz

j;pj

. The expected profit in scenario zis easily calculated with the

output of the Gibbs sampler. For a given price, we simply average the profits

calculated over the draws of Φ

. Using routine optimization procedures, it is

straightforward to find the optimal price. For the surplus model, profits from

alternative jin scenario zcan be written as

psz

jqi;xz

j;pj¼Psz

jqi;xz

j;pj



pjcj



:ð8Þ

As with the utility model, we seek the price psz*

jthat maximizes the firm’s expected

profit, Eqpsz

jqi;xz

j;pj

Our goal is to compare puz*

jand psz*

j. Tables 9and 10 present the attributes and

levels used to construct the competitive scenarios for our pricing exercise. In the

sedan data, we consider a competitive set consisting of five sedans. In the camera

data, we consider a competitive set consisting of three cameras. Tables 11 and 12

present the prices and market shares for each alternative for the sedan and camera

scenarios. On each iteration of the sampler, we compute P

and report the mean over

iterations. For the sedan data, the two models predict practically the same shares. For

the camera data, there is some disagreement, with the utility model predicting higher

shares for Cameras 1 and 2 and lower shares for Camera 3 and the No-Buy alternative.

For the sedan data, we will find the optimal price for Ford Taurus, assuming the

competitive vehicle prices remain at their current levels. For the camera data, we will

Table 12 Market shares, camera scenario

Camera 1 Camera 2 Camera 3 No Buy

Price $100 $225 $400 $0

Utility (%) 17 35 32 16

Surplus (%) 14 28 35 23

Table 13 Taurus optimal price, sedan scenario

Utility model

Taurus* Camry Maxima Accord Passat

Price($1,000) $33.2 $20.7 $24 $20.5 $22.5

Share (%) 3 41 22 25 9

Surplus model

Taurus* Camry Maxima Accord Passat

Price $25.8 $20.7 $24 $20.5 $22.5

Share (%) 6 40 20 24 10

* denotes optimized product

328 G. Sonnier et al.

find the optimal price for Camera 3, assuming the competitive camera prices remain at

their current levels. To conduct the exercise, we need to make some assumptions on

costs. For simplicity, we assume the sedans are all built at a variable cost of $18,000.

For the cameras, we assume variable costs of $50, $60, and $70 for Cameras 1, 2, and

3, respectively. Similar results were obtained using other sedans and cameras in the

competitive scenarios, as well as other cost assumptions.

Tables 13 and 14 present the findings from the optimal pricing exercise. We

present the optimal price for Ford Taurus and Camera 3 along with the new market

shares. For the sedan data, using the utility model coefficients in the optimization

results in an optimal price for Taurus of $33,200. At this price, the largest relative

price difference is $12,500, observed between Taurus and Camry. The largest

relative price difference shown in the experiments is $9,000. The prior implied by

the utility model supports excessive equalization prices, leading to optimized prices

beyond the empirical range of prices in the data. In contrast, optimization based on

the surplus model leads to an optimal price for Taurus of $25,800. The largest

relative price difference is well within the range of experimental prices. We obtain

similar results from the camera data. Using the utility model, the optimal price for

Camera 3 is over $1,500. The maximum price shown to respondents in the study

was $499. For the camera data, using the surplus model results in an optimal price of

about $520. While this is slightly in excess of the maximum price, it is much more

reasonable.

5 Summary and conclusions

Researchers in marketing and economics have recognized the problems associated

with using random coefficient choice models derived from linear indirect utility

functions to estimate WTP for product attributes. In this setting, WTP is estimated

via the ratio of attribute and price coefficients. We illustrate that the prior implied for

WTP by seemingly reasonable priors for the attribute and price coefficients results in

posterior WTP distributions with extremely fat tails. This also affects the model’s

characterization of demand which has implications for pricing analyses. A number of

ad-hoc solutions have been proposed, including constraining the price coefficient to

be homogenous, or using the median as a measure of central tendency of the WTP

Table 14 Camera 3 optimal price, camera scenario

Utility model

Camera 1 Camera 2 Camera 3* No Buy

Price $100 $225 $1,525 $0

Share (%) 19 35 10 21

Surplus model

Camera 1 Camera 2 Camera 3* No Buy

Price $100 $200 $522 $0

Share 15% 33% 27% 25%

* denotes optimized product

Heterogeneity distributions of willingness-to-pay 329

distribution. In this paper, we present a straightforward solution to the problems

caused by the implied prior for WTP. Parameterizing the choice model in the space

of consumer surplus allows for direct specification of a prior distribution for WTP.

Such a direct specification is especially advantageous in the context of hierarchical

models where the aforementioned solutions conflict with the purpose and value of

quantifying consumer heterogeneity.

Using both simulated data and CBC data sets from the automotive and camera

categories, we document the influence of the implied prior for WTP. Commonly

employed diffuse priors for the attribute and price coefficients put too much prior

mass on extreme WTP values to render reasonable posterior WTP distributions in

small sample settings. Some posterior summaries are less sensitive to the assumed

prior (e.g. median versus mean). However, marketing actions, such as setting profit

maximizing prices, depend on the entire posterior distribution of WTP and thus will

be sensitive to the implied prior. In the surplus parameterization a hierarchical prior

for WTP can be directly specified. We found a hierarchical normal prior to be useful

in controlling the tails of the WTP distribution. The relatively thinner tails of the

normal result in more reasonable estimates of the WTP distribution and, in turn,

profit-maximizing prices.

The surplus model results in more reasonable estimates of the distribution of

WTP and profit-maximizing prices as well as superior out-of-sample performance.

However, the in-sample fit statistics across the two parameterizations are ambiguous,

even with simulated data. We leave this issue, specifically the performance of the

Newton-Raftery estimator of the LMD and the DIC statistic as criteria for model

choice, to future research. We acknowledge the existence of data generating

mechanisms that leave respondent WTP for a particular attribute level inestimable.

Among these are non-compensatory processing, price based quality inferences or the

simple ignorance of the price attribute in the conjoint exercise. The utility model

with standard priors will readily accommodate respondents who are, for whatever

reason, insensitive to price in the conjoint task. The modeling question then becomes

one of how and whether to implement prior knowledge about the range of likely

WTP values. We have demonstrated that the surplus model is very effective in terms

of how to implement such prior knowledge because it allows the researcher to put a

prior directly on WTP.

Whether to implement prior knowledge about WTP in conjoint studies, especially

when the data are better fit with arbitrarily large WTP values, touches upon the core

of the inferential problems associated with conjoint experiments in marketing.

Conjoint data are collected with the implicit goal of characterizing market demand.

To the extent that the conjoint likelihood differs from the likelihood that generates

choices in the market place, this generalization calls for the diligent use of prior

knowledge held by the researcher about market behavior. That is, the prior should

preserve certain well-known aspects of the target environment in the posterior and

still be informed by the conjoint likelihood in other respects. We acknowledge that

our argument here is limited to forming prior-predictive distributions given the

conjoint data and other prior knowledge. In the long run, only a better understanding

of the actual data generating mechanism underlying the conjoint data will enable

researchers to develop the necessary procedural modifications to move it closer to

the likelihood that generates choices in the market.

330 G. Sonnier et al.

Acknowledgement The authors would like to thank Peter Rossi, JP Dubé, Jordan Louviere, Kenneth

Train, and Greg Allenby for helpful insights. We also thank seminar participants at The Ohio State

University, Duke University, University of Michigan and the University of Chicago for providing useful

comments.

References

Allenby, G., & Lenk, P. (1994). Modeling household purchase behavior with logistic normal regression.

Journal of the American Statistical Association, 89, 1218–1231.

Andrews, R., Ainslie, A., & Currim, I. (2002). An empirical comparison of logit choice models with

discrete vs. continuous representations of heterogeneity. Journal of Marketing Research, 39, 479–487.

Arora, N., Allenby, G. M., & Ginter, J. L. (1998). A disaggregate model of primary and secondary

demand. Marketing Science, 17(1), 29–44.

Cameron, T., & James, M. (1987). Estimating willingness-to-pay from survey data: An alternative pre-test-

market evaluation procedure. Journal of Marketing Research, 24, 389–395.

Edwards, Y., & Allenby, G. (2003). Multivariate analysis of multiple response data. Journal of Marketing

Research, 40, 321–334.

Gilbride, T., & Allenby, G. (2004). A choice model with conjunctive, disjunctive, and compensatory

screening rules. Marketing Science, 23, 391–406.

Jedidi, K., Jagpal, S., & Manchanda, P. (2003). Measuring heterogeneous reservation prices for product

bundles. Marketing Science, 22, 107–130.

Jedidi, K., & Zhang, J. (2002). Augmenting conjoint analysis to estimate consumer reservation prices.

Management Science, 48, 1350–1368.

Liechty, J., Fong, D., & DeSarbo, W. (2005) Dynamic models incorporating individual heterogeneity:

Utility evolution in conjoint analysis. Marketing Science, 24, 285–293.

Meijer, E., & Rouwendal, J. (2006) Measuring welfare effects in models with random coefficients. Journal

of Applied Econometrics, 21, 227–244.

Newton, M. A., & Raftery, A. E. (1994). Approximate bayesian inference by the weighted likelihood

bootstrap. Journal of the Royal Statistical Society. Series B, 56,43–48.

Orme, B. (2001). Assessing the monetary value of attribute levels with conjoint: Warnings and suggestions.

Sawtooth Solutions Customer Newsletter (Spring), Sequim, WA: Sawtooth Software, Inc.

Revelt, D., & Train, K. (1998) Mixed logit with repeated choices: Households’choices of appliance

efficiency level. Review of Economics and Statistics, 4, 647–657.

Rossi, P., Allenby, G., & McCulloch, R. (2005) Bayesian Statistics and Marketing. England: Wiley.

Shaffer, G., & Zhang, J. (1995). Competitive coupon targeting. Marketing Science, 14, 395–416.

Shaffer, G., & Zhang, J. (2000). Pay to switch or pay not to switch: Third degree price discrimination in

markets with switching costs. Journal of Economics & Management Strategy, 9, 397–424.

Spiegelhalter, D., Best, N., Carlin, B., & van der Linde, A. (2002) Bayesian measures of model

complexity and fit. Journal of the Royal Statistical Society B, 64, 583–639.

Swait, J., Erdem, T., Louviere, J., & Dubelaar, C. (1993). The equalization price: A measure of consumer-

perceived brand equity. International Journal of Research in Marketing, 10,23–45, (March).

Train, K. (2003) Discrete choice methods with simulation. Cambridge: Cambridge University Press.

Zellner, A. (1978). Estimation of functions of population means and regression coefficients including structural

coefficients: A minimum expected loss (MELO) approach. Journal of Econometrics, 8,127–158.

Heterogeneity distributions of willingness-to-pay 331

A Contingent Valuation Test for Measuring the Construct Validity of Willingness-to-Pay Estimates Derived from Choice Experiments

Article

Dec 2021

A new empirical approach for mitigating exploding implicit prices in mixed multinomial logit models

Article

Full-text available

Jan 2023

Romain Crastes

This paper introduces a new shifted negative log‐normal distribution for the price parameter in mixed multinomial logit models. The new distribution, labeled as the μ‐shifted negative log‐normal distribution, has desirable properties for welfare analysis and in particular a point mass that is further away from zero than the negative log‐normal distribution. This contributes to mitigating the “exploding” implicit prices issue commonly found when the price parameter is specified as negative log‐normal and the model is in preference space. The new distribution is tested on five stated preference datasets. Comparisons are made with standard alternative approaches such as the willingness‐to‐pay (WTP) space approach. It is found that the μ‐shifted distribution yields substantially lower mean marginal WTP estimates compared to the negative log‐normal specification and similar to the values derived from models estimated in WTP‐space with flexible distributions, while at the same time fitting the data as well as the negative log‐normal specification.

The Willingness to Pay for (Environmental) Collective Goods

Thesis

Jan 2022

Niklas Gogoll

Fridays for future, students for future, scientists for future… Environmental activism increased drastically in the last years resulting in a growing number of activists. While some of these activists live with a sustainable ecological footprint, others do not and pollute the environment in an unsustainable manner e.g. by flying frequently. One strand of economic literature interprets this (at first glance contradictory) behavior as an attitude-behavior-gap: Having a high preference should result in a high willingness to pay and therefore in an adaption of one’s own behavior, which is not the case for these activists. Not changing one’s behavior can easily be explained by the free rider problem caused by the marginality of one’s impact though. However, this in turn raises the question, why some people live sustainable, abstain from environment polluting goods hence have a willingness to pay for the environment. We argue that both kinds of behavior can be explained by separating the willingness to pay for public goods. Since collective action is hard to sustain reciprocally and without the intervention of a (public) entity, especially for large public goods, two willingness to pay for a public good have to be considered instead – one for the private and one for the public provision of the public good. Assuming that both types of environmental activists understand, that their own contribution is marginally small, this dissertation argues – first in a theoretical model and then in an empirical application – that the willingness to pay for public goods in the private case is actually only dependent on the preference for other (mainly social) incentives – e.g. to silence one’s conscience or for reputational reasons. The unsustainable type of environmental activist just has a lower willingness to pay for social incentives compared to the sustainable typ. Only if the state interferes, the preference for the public good will be considered in the decision-making process of individuals. Consequently, it proposes a different form of measuring the willingness to pay for public goods – the so-called Quasi-Monarch. As a Quasi-Monarch, one individual can hypothetically dictate the contribution of all individuals including herself. In this scenario, no one would have an incentive to not state their “real” willingness to pay for the respective good.

Undercutting Transit? Exploring Potential Competition Between Automated Vehicles and Public Transportation in the United States

Article

Nov 2023
TRANSPORT RES REC

Automated vehicles (AVs) have the potential to dramatically disrupt current transportation patterns and practices. One particular area of concern is AVs' impacts on public transit systems. If vehicle automation enables significant price decreases or performance improvements for ride-hailing services, some fear that it could undercut public transit, which could have significant implications for the environment and transportation equity. The extent to which individuals adopt automated transportation modes will drive many system-level outcomes, and research on public preferences for AVs is immature and inconclusive. In this study, we used responses from an online choice-based conjoint survey fielded in the Washington, D.C. metropolitan region (N = 1,694) in October 2021 to estimate discrete choice models of public preferences for different automated (ride-hailing, shared ride-hailing, bus) and nonautomated (ride-hailing, shared ride-hailing, bus, rail) modes. We used the estimated models to simulate future marketplace competition across a range of trip scenarios. Respondents on average were only willing to pay a premium for automated modes when a vehicle attendant was also present, limiting the potential cost-savings that AV operators might achieve by removing the driver. Scenario analysis additionally revealed that for trips where good transit options were available, transit remained competitive with automated ride-hailing modes. These results suggest that fears of a mass transition away from transit to AVs may be limited by people's willingness to use AVs, at least in the short term. Future AV operators should also recognize the presence of an AV attendant as a critical feature for early AV adoption.

Optimizing B2B Product Offers with Machine Learning, Mixed Logit, and Nonlinear Programming

Preprint

Aug 2023

In B2B markets, value-based pricing and selling has become an important alternative to discounting. This study outlines a modeling method that uses customer data (product offers made to each current or potential customer, features, discounts, and customer purchase decisions) to estimate a mixed logit choice model. The model is estimated via hierarchical Bayes and machine learning, delivering customer-level parameter estimates. Customer-level estimates are input into a nonlinear programming next-offer maximization problem to select optimal features and discount level for customer segments, where segments are based on loyalty and discount elasticity. The mixed logit model is integrated with economic theory (the random utility model), and it predicts both customer perceived value for and response to alternative future sales offers. The methodology can be implemented to support value-based pricing and selling efforts. Contributions to the literature include: (a) the use of customer-level parameter estimates from a mixed logit model, delivered via a hierarchical Bayes estimation procedure, to support value-based pricing decisions; (b) validation that mixed logit customer-level modeling can deliver strong predictive accuracy, not as high as random forest but comparing favorably; and (c) a nonlinear programming problem that uses customer-level mixed logit estimates to select optimal features and discounts.

Biodiversity Benefits of Birdwatching Using Citizen Science Data and Individualized Recreational Demand Models

Article

Full-text available

Jul 2023
ENVIRON RESOUR ECON

Birding is one of the most popular recreational activities, but bird populations have been declining worldwide. Understanding how much people benefit from local bird populations levels, species richness and their preferences can help inform bird conservation management. This paper uses eBird data and random utility models to assess the birders’ preferences and welfare for trips to local areas. The sample eBird citizen science data includes 35,656 trips by 290 individual birders to 1227 unique birding hotspots in Alberta, Canada. The economic value of seeing one additional bird species during a trip is estimated to be $0.68 on average. We estimated a nonlinear relationship between the utility and number of bird species suggesting satiation in recreation preferences, and the highest MWTP is estimated to be in the summer and fall seasons. Bird species at risk, based on Alberta’s strategy for the management of species at risk, are valued almost ten times higher as seeing other types of bird species. We also estimate individualized choice models and find that preference for species richness is heterogeneous across birders. Results of a combinatorial test find that the individualized choice models produce average welfare estimates that are 67% higher than the single model but the difference is not statistically significant. The members of eBird represent a convenience sample that may not constitute the general population. Thus along with proper weighting, these benefit estimates produced in this research can help inform future bird conservation management decisions including alternative funding mechanisms.

Non-linear pricing effects in conjoint analysis

Article

Full-text available

Oct 2022
QME-QUANT MARK ECON

The application of conjoint analysis to new product development is challenged in studies of complex products that simultaneously examine the major drivers of a purchase decision and the composition of product components. Demands on data increase as more product features are included in an analysis, and at some point it becomes necessary to study the components separately. This paper presents evidence of a non-linear pricing effect that complicates the analysis of large conjoint studies when multiple conjoint exercises are integrated, or bridged into a single analysis. Our model is illustrated with data from the automotive industry showing that option packages are under-valued without accounting for the non-linear effects of price.

The impact of policy design on willingness to pay for ecosystem services from prairie strips

Article

Full-text available

Sep 2022

Prairie strips on agricultural lands are supported by the Conservation Reserve Program and provide environmental benefits such as reduced soil loss and improved wildlife habitat. The current study measures the value that the public places on those benefits and if that value changes under different policy designs. The policy design varies by who runs the program (state agency vs. nongovernment organization) and who has enrollment priority (historically managed land vs. degraded land). Results from a choice experiment indicate significant overall public support for the expansion and that willingness to pay is highest with priority for land historically managed in conservation‐oriented way.

Can an Incentive-Based approach to rebalancing a Dock-less Bike-share system Work? Evidence from Sacramento, California

Article

Sep 2022
TRANSPORT RES A-POL

Bike-share services will produce more limited benefits if users cannot find bikes when and where they need them. Bike-share operators must thus have process for “rebalancing” the bikes within the system to ensure that they are available where demanded. A potentially cost-effective strategy for rebalancing bikes is to offer incentives of some sort to users to walk farther to get a bike (origin-based incentive) or bring a bike to the undersupplied area (destination-based incentive). This paper aims to examine bike-share users’ willingness-to-walk to pick up a bike or drop off a bike at some distance from their origins or destinations if rewarded and to identify characteristics influencing willingness-to-walk. We use data from a survey of dock-less e-bike-share users conducted in the Sacramento region. The analysis shows that half of the respondents use bike-share if the available bike is located 8.9 min away. Our estimates of willingness-to-walk farther than the mean distance for incentives at origins and destinations were 3.8 min and 4.2 min per dollar, respectively. Our results give operators and policy makers insights into the potential effectiveness of incentives as a strategy for spatially rebalancing bike-share fleets.

Protection or Peril of Following the Crowd in a Pandemic-Concurrent Flood Evacuation

Preprint

Full-text available

Feb 2022

The decisions of whether and how to evacuate during a climate disaster are influenced by a wide range of factors, including sociodemographics, emergency messaging, and social influence. Further complexity is introduced when multiple hazards occur simultaneously, such as a flood evacuation taking place amid a viral pandemic that requires physical distancing. Such multi-hazard events can necessitate a nuanced navigation of competing decision-making strategies wherein a desire to follow peers is weighed against contagion risks. To better understand these nuances, we distributed an online survey during a pandemic surge in July 2020 to 600 individuals in three midwestern and three southern states in the United States with high risk of flooding. In this paper, we estimate a random parameter logit model in both preference space and willingness-to-pay space. Our results show that the directionality and magnitude of the influence of peers' choices of whether and how to evacuate vary widely across respondents. Overall, the decision of whether to evacuate is positively impacted by peer behavior, while the decision of how to evacuate is negatively impacted by peers. Furthermore, an increase in flood threat level lessens the magnitude of these impacts. These findings have important implications for the design of tailored emergency messaging strategies. Specifically, emphasizing or deemphasizing the severity of each threat in a multi-hazard scenario may assist in: (1) encouraging a reprioritization of competing risk perceptions and (2) magnifying or neutralizing the impacts of social influence, thereby (3) nudging evacuation decision-making toward a desired outcome.

Modeling Household Purchase Behavior with Logistic Normal Regression

Article

Full-text available

Dec 1994

The successful development of marketing strategies requires the accurate measurement of household preferences and their reaction to variables such as price and advertising. Manufacturers, for example, often offer products at a reduced price for a limited period. One reason for this practice is that it induces households to try the promoted product with the hope of retaining them as permanent customers. The successful implementation of this strategy requires knowledge of the extent of price sensitivity in the population, effective methods of advertising, and the existence of a carry-over effect in the household's evaluation of the product. Logistic regression models are often used to relate household demographics, prices, and advertising variables to household purchase decisions. In this article we extend the standard model to include cross-sectional and serial correlation in household preferences and provide algorithms for estimating the model with random effects. The model is applied to scanner panel data for ketchup purchases, and substantive insights into household preference, brand switching, and autocorrelated purchase behavior are obtained.

Multivariate Analysis of Multiple Response Data

Article

Full-text available

Aug 2003

Multiple response questions, also known as a pick any/J format, are frequently encountered in the analysis of survey data. The relationship among the responses is difficult to explore when the number of response options, J, is large. The authors propose a multivariate binomial probit model for analyzing multiple response data and use standard multivariate analysis techniques to conduct exploratory analysis on the latent multivariate normal distribution. A challenge of estimating the probit model is addressing identifying restrictions that lead to the covariance matrix specified with unit-diagonal elements (i.e., a correlation matrix). The authors propose a general approach to handling identifying restrictions and develop specific algorithms for the multivariate binomial probit model. The estimation algorithm is efficient and can easily accommodate many response options that are frequently encountered in the analysis of marketing data. The authors illustrate multivariate analysis of multiple response data in three applications.

Approximate Bayesian Inference by the Weighted Likelihood Bootstrap

Article

Full-text available

Jan 1994

We introduce the weighted likelihood bootstrap (WLB) as a way to simulate approximately from a posterior distribution. This method is often easy to implement, requiring only an algorithm for calculating the maximum likelihood estimator, such as iteratively reweighted least squares. In the generic weighting scheme, the WLB is first order correct under quite general conditions. Inaccuracies can be removed by using the WLB as a source of samples in the sampling-importance resampling (SIR) algorithm, which also allows incorporation of particular prior information. The SIR- adjusted WLB can be a competitive alternative to other integration methods in certain models. Asymptotic expansions elucidate the second- order properties of the WLB, which is a generalization of Rubin’s Bayesian bootstrap [D. B. Rubin, Ann. Stat. 9, 130-134 (1981)]. The calculation of approximate Bayes factors for model comparison is also considered. We note that, given a sample simulated from the posterior distribution, the required marginal likelihood may be simulation consistently estimated by the harmonic mean of the associated likelihood values; a modification of this estimator that avoids instability is also noted. These methods provide simple ways of calculating approximate Bayes factors and posterior model probabilities for a very wide class of models.

Dynamic Models Incorporating Individual Heterogeneity: Utility Evolution in Conjoint Analysis

Article

Full-text available

May 2005

It has been shown in the behavioral decision making, marketing research, and psychometric literature that the structure underlying preferences can change during the administration of repeated measurements (e.g., conjoint analysis) and data collection because of effects from learning, fatigue, boredom, and so on. In this research note, we propose a new class of hierarchical dynamic Bayesian models for capturing such dynamic effects in conjoint applications, which extend the standard hierarchical Bayesian random effects and existing dynamic Bayesian models by allowing for individual-level heterogeneity around an aggregate dynamic trend. Using simulated conjoint data, we explore the performance of these new dynamic models, incorporating individual-level heterogeneity across a number of possible types of dynamic effects, and demonstrate the derived benefits versus static models. In addition, we introduce the idea of an unbiased dynamic estimate, and demonstrate that using a counterbalanced design is important from an estimation perspective when parameter dynamics are present.

Bayesian measures of model complexity and fit (with discussion)

Article

Jan 2002

Modeling Household Purchase Behavior with Logistic Normal Regression

Article

Jan 1994
J AM STAT ASSOC

Assessing the Monetary Value of Attribute Levels with Conjoint Analysis: Warnings and Suggestions Assessing the Monetary Value of Attribute Levels with Conjoint Analysis: Warnings and Suggestions

Article

Conjoint analysis is often used to assess how buyers trade off product features with price. Researchers can test the price sensitivity of potential product configurations using simulation models based on conjoint results. Most often, a simulation is done within a specific context of competitors. But when a product is truly new to the market and has no direct competitors, price sensitivity for that new product can be estimated compared to other options such as buying nothing. The common forms of conjoint analysis measure contrasts between levels within attributes. The worths of levels are estimated on an arbitrary interval scale, so the absolute magnitudes of utilities have no meaning. Also, each attribute's utilities are determined only to within an arbitrary additive constant, so a utility level from one attribute cannot be directly compared to another from a different attribute. To a trained conjoint analyst, an array of utilities conveys clear meaning. But that meaning is often difficult for others to grasp. It is not surprising, then, that researchers look for ways to make conjoint utilities easier to interpret.

Estimating Willingness to Pay from Survey Data: An Alternative Pre-Test-Market Evaluation Procedur

Article

Nov 1987

Closed-ended contingent valuation surveys are used to assess demands in hypothetical markets and recently have been applied widely to the valuation of (non-market) environmental resources. This interviewing strategy holds considerable promise for more general market research applications. The authors describe a new maximum likelihood estimation technique for use with these special data. Unlike previously used methods, the estimated models are as easy to interpret as ordinary least squares regression results and the results can be approximated accurately by packaged probit estimation routines.

An Empirical Comparison of Logit Choice Models with Discrete Versus Continuous Representations of Heterogeneity

Article

Dec 2002

Currently, there is an important debate about the relative merits of mod-els with discrete and continuous representations of consumer hetero-geneity. In a recent JMR study, Andrews, Ansari, and Currim (2002; here-after AAC) compared metric conjoint analysis models with discrete and continuous representations of heterogeneity and found no differences between the two models with respect to parameter recovery and predic-tion of ratings for holdout profiles. Models with continuous representa-tions of heterogeneity fit the data better than models with discrete repre-sentations of heterogeneity. The goal of the current study is to compare the relative performance of logit choice models with discrete versus con-tinuous representations of heterogeneity in terms of the accuracy of household-level parameters, fit, and forecasting accuracy. To accomplish this goal, the authors conduct an extensive simulation experiment with logit models in a scanner data context, using an experimental design based on AAC and other recent simulation studies. One of the main find-ings is that models with continuous and discrete representations of het-erogeneity recover household-level parameter estimates and predict holdout choices about equally well except when the number of purchases per household is small, in which case the models with continuous repre-sentations perform very poorly. As in the AAC study, models with continuous representations of heterogeneity fit the data better.

Augmenting Conjoint Analysis to Estimate Consumer Reservation Price

Article

Oct 2002

Consumer reservation price is a key concept in marketing and economics. Theoretically, this concept has been instrumental in studying consumer purchase decisions,competitive pricing strategies,and welfare economics. Managerially,knowledge of consumer reservation prices is critical for implementing many pricing tactics such as bundling,tar get promotions,nonlinear pricing,and one-to-one pricing,and for assessing the impact of marketing strategy on demand. Despite the practical and theoretical importance of this concept, its measurement at the individual level in a practical setting proves elusive. We propose a conjoint-based approach to estimate consumer-level reservation prices. This approach integrates the preference estimation of traditional conjoint with the economic theory of consumer choice. This integration augments the capability of traditional conjoint such that consumers' reservation prices for a product can be derived directly from the individuallevel estimates of conjoint coefficients. With this augmentation,we can model a consumer's decision of not only which product to buy,but also whether to buy at all in a category. Thus, we can simulate simultaneously three effects that a change in price or the introduction of a new product may generate in a market: the customer switching effect,the cannibalization effect,and the market expansion effect. We show in a pilot application how this approach can aid product and pricing decisions. We also demonstrate the predictive validity of our approach using data from a commercial study of automobile batteries.

Heterogeneity Distributions of Willingness-to-Pay in Choice Models

Abstract and Figures

Recommended publications

Duality-Based Bayesian Analysis of Residential Gas Demand under Decreasing Block Rate Pricing

Credence and Robustness Behavior

Bayesian Analysis in Population Ecology

Customer Targeting Framework: Scalable Repeat Purchase Scoring Algorithm for Large Databases