ArticlePDF Available

An expectation-maximization (EM) algorithm to estimate the integrated choice and latent variable (ICLV) model

Authors:

Abstract and Figures

As computing capability has grown dramatically, the transport choice model has rigorously included latent variables. However, integrated latent and choice variable (ICLV) models are hampered by a serious problem that is caused by the maximum simulated likelihood (MSL) method. The method cannot properly reproduce the true coefficients, which is a problem that is often referred to as a lack of empirical identification. In particular, the problem is exacerbated particularly when an ICLV model is calibrated based on cross-sectional data. An expectation-maximization (EM) algorithm has been successfully employed to calibrate a random coefficient choice model, but it has never been applied to the calibration of an ICLV model. In this study, an EM algorithm was adapted to calibrate an ICLV model, and it successfully reproduced the true coefficients in the model. The main contribution of adopting an EM algorithm was to simplify the calibration procedure by decomposing the procedure into three well-known econometric problems: a weighted linear regression, a weighted discrete choice problem, and a weighted ordinal choice problem. Simulation experiments also confirmed that an EM algorithm is a stable method for averting the problem of lack of empirical identification.
Content may be subject to copyright.
A preview of the PDF is not available
... Among frequentist methods to estimate logit models, researches have explored iterative optimation methods. Within this class of methods, the expectation-maximization (EM) algorithm has been reported (Bhat, 1997;Cherchi and Guevara, 2012;Sohn, 2017) to outperform MSLE in numerical stability (i.e., less sensitivity to initial values), empirical identification (i.e., avoiding an invertible Hessian matrix), and estimation simplicity. ...
... Whereas MSLE directly maximizes the loglikelihood function using quasi-Newton methods, the simplicity of EM stems from iteratively maximizing a simpler surrogate function and update parameters while maintaining monotonic improvements in the loglikelihood (Dempster et al., 1977;McLachlan and Krishnan, 2007). Furthermore, iterative parameter updates of the EM algorithm are either closed-form or straightforward econometric problems that can be solved using standard statistical packages (Train, 2008;Sohn, 2017). EM also provides a convenient parameterization of the complete-data likelihood function without worrying about over-identification (Ruud, 1991). ...
Article
Full-text available
Motivated by the promising performance of alternative estimation methods for mixed logit models, in this paper we derive, implement, and test minorization-maximization (MM) algorithms to estimate the semiparametric logit-mixed logit (LML) and mixture-of-normals multinomial logit (MON-MNL) models. In particular, we show that the reported computational efficiency of the MM algorithm is actually lost for large choice sets. Because the logit link that represents the parameter space in LML is intrinsically treated as a large choice set, the MM algorithm for LML actually becomes unfeasible to use in practice. We thus propose a faster MM algorithm that revisits a simple step-size correction. In a Monte Carlo study, we compare the maximum simulated likelihood estimator (MSLE) with the algorithms that we derive to estimate LML and MON-MNL models. Whereas in LML estimation alternative algorithms are computationally uncompetitive with MSLE, the faster-MM algorithm appears emulous in MON-MNL estimation. Both algorithms – faster-MM and MSLE – could recover parameters as well as standard errors at a similar precision in both models. We further show that parallel computation could reduce estimation time of faster-MM by 45% to 80%. Even though faster-MM could not surpass MSLE with analytical gradient (because MSLE also leveraged similar computational gains), parallel faster-MM is a competitive replacement to MSLE for MON-MNL that obviates computation of complex analytical gradients, which is a very attractive feature to integrate it into a flexible estimation software. We also compare different algorithms in an empirical application to estimate consumer's willingness to adopt electric motorcycles in Solo, Indonesia. The results of the empirical application are consistent with those of the Monte Carlo study.
... Moreover, our paper contributes to two other strands of literature: First, we introduce a Dirichlet process mixture model with a multinomial logit kernel into the domain of behavioural travel demand analysis and demonstrate the value of the proposed model framework in a case study on motorists' route choice preferences. Second, we contribute to a growing body of literature concerned with the development and application of EM algorithms for the estimation of complex discrete choice models (Bhat, 1997;Sohn, 2016;Train, 2008Train, , 2009Vij and Krueger, 2017). While it is well known that the EM algorithm can facilitate inference in finite-dimensional discrete mixture M-MNL models (Bhat, 1997;Train, 2008Train, , 2009Vij and Krueger, 2017), we demonstrate that the computational benefits of the EM algorithm generalise to the infinite-dimensional setting. ...
... Moreover, Vij and Krueger (2017) introduce an EM algorithm for inference in a M-MNL model with a gridded mixing distribution. Sohn (2016) derives an EM algorithm for the integrated choice and latent variable model (Walker and Ben-Akiva, 2002). ...
Article
We present a mixed multinomial logit (MNL) model, which leverages the truncated stick-breaking process representation of the Dirichlet process as a flexible nonparametric mixing distribution. The proposed model is a Dirichlet process mixture model and accommodates discrete representations of heterogeneity, like a latent class MNL model. Yet, unlike a latent class MNL model, the proposed discrete choice model does not require the analyst to fix the number of mixture components prior to estimation, as the complexity of the discrete mixing distribution is inferred from the evidence. For posterior inference in the proposed Dirichlet process mixture model of discrete choice, we derive an expectation maximisation algorithm. In a simulation study, we demonstrate that the proposed model framework can flexibly capture differently-shaped taste parameter distributions. Furthermore, we empirically validate the model framework in a case study on motorists' route choice preferences and find that the proposed Dirichlet process mixture model of discrete choice outperforms a latent class MNL model and mixed MNL models with common parametric mixing distributions in terms of both in-sample fit and out-of-sample predictive ability. Compared to extant modelling approaches, the proposed discrete choice model substantially abbreviates specification searches, as it relies on less restrictive parametric assumptions and does not require the analyst to specify the complexity of the discrete mixing distribution prior to estimation.
... Notably, its efficiency peaks when dealing with models having fewer choice alternatives. Sohn (2017) introduced an expectation-maximization (EM) algorithm to address the empirical identification concern. Despite its potential, a scalability issue arises, as the algorithm demands considerable computer memory, especially when addressing correlations among latent variables. ...
... In the second step, the estimated covariance matrix is used to update the estimated coefficient matrix. While the first step is able to be decomposed for each latent variable, the second step cannot be when the latent variables are correlated [24]. As an alternative, we can estimate the parameters through direct maximization. ...
Article
Full-text available
The R packages Dire and EdSurvey allow analysts to make a conditioning model with new variables and then draw new plausible values. This is important because results for a variable not in the conditioning model are biased. For regression-type analyses, users can also use direct estimation to estimate parameters without generating new plausible values. Dire is distinct from other available software in R in that it requires fixed item parameters and simplifies calculation of high-dimensional integrals necessary to calculate composite or subscales. When used with EdSurvey, it is very easy to use published item parameters to estimate a new conditioning model. We show the theory behind the methods in Dire and a coding example where we perform an analysis that includes simple process data variables. Because the process data is not used in the conditioning model, the estimator is biased if a new conditioning model is not added with Dire.
... Sohn [24] discovered numerous useful characteristics of the EM method when used to estimate integrated choice and latent variable models. The EM algorithm significantly reduced the calculation time since it does not involve any time-consuming numerical computation of derivatives or the Hessian of the simulated likelihood function. ...
... To decompose the likelihood maximisation into simplified optimisation problems, we estimate DLCM using the EM algorithm. Readers can refer Bansal et al. (2018), Bhat (1997), Sohn (2017), and Zarwi et al. (2017) to know more about applications of the EM algorithm in estimating choice models. We extend the existing EM algorithm for the heterogeneous HMMs to account for the auto-correlated choices, preference heterogeneity in the choice model, and riders' learning behaviour. ...
Preprint
Full-text available
Crowding valuation of subway riders is an important input to various supply-side decisions of transit operators. The crowding cost perceived by a transit rider is generally estimated by capturing the trade-off that the rider makes between crowding and travel time while choosing a route. However, existing studies rely on static compensatory choice models and fail to account for inertia and the learning behaviour of riders. To address these challenges, we propose a new dynamic latent class model (DLCM) which (i) assigns riders to latent compensatory and inertia/habit classes based on different decision rules, (ii) enables transitions between these classes over time, and (iii) adopts instance-based learning theory to account for the learning behaviour of riders. We use the expectation-maximisation algorithm to estimate DLCM, and the most probable sequence of latent classes for each rider is retrieved using the Viterbi algorithm. The proposed DLCM can be applied in any choice context to capture the dynamics of decision rules used by a decision-maker. We demonstrate its practical advantages in estimating the crowding valuation of an Asian metro's riders. To calibrate the model, we recover the daily route preferences and in-vehicle crowding experiences of regular metro riders using a two-month-long smart card and vehicle location data. The results indicate that the average rider follows the compensatory rule on only 25.5% of route choice occasions. DLCM estimates also show an increase of 47% in metro riders' valuation of travel time under extremely crowded conditions relative to that under uncrowded conditions.
Article
This paper aims to understand how people’s lifestyles are associated with their willingness to adopt a relatively new and innovative mobility solution, Mobility-as-a-Service (MaaS). The lifestyle is conceptualized as a combination of a mechanistic lifestyle manifested by an individual’s activity-travel patterns and a psychographic lifestyle depicted by an individual’s psychological traits. We propose a hierarchical latent variable and latent class model in which respondents are probabilistically allocated to one of the latent classes based upon mechanistic lifestyle, whereas psychographic lifestyle is incorporated in the model as values and personality traits exerting impact on attitudes which themselves are part of the utility function of MaaS subscription choice. The model is calibrated by the data emanated from a stated choice experiment and a lifestyle survey distributed among 1299 respondents in the Netherlands. The results confirm that psychographic lifestyles play a substantial role in people’s decision to subscribe to MaaS. Having positive attitudes towards multimodal travel increases the propensity to adopt MaaS, where the attitudes are moderated by values and personality traits significantly. Moreover, mechanistic lifestyles, having non car-oriented modality lifestyle in particular, enable to segment the respondents to two latent classes showing their preference heterogeneity.
Article
Full-text available
It is well known that estimating the parameters of an integrated choice and latent variable (ICLV) model is not a trivial undertaking. The log-likelihood of an ICLV model cannot be evaluated analytically, and can only be evaluated by a simulation that requires large numbers of sample draws. While conducting simulation-based model estimations, researchers often encounter an estimation failure. suggests a novel estimation method to circumvent the problem by using an expectation-maximization algorithm (EM). However, a drawback of this method continues to be the requirement of a huge amount of computer memory to deal with an augmented covariance matrix. In the present study, this problem was overcome by connecting each latent variable in a structural equation to all individual specific variables. This restriction did not hamper the utility of an ICLV model during empirical experimentation. The main contribution of this study is to introduce a simple method devised to solve large-scale ICLV models.
Article
Full-text available
This paper presents a general methodology and framework for including latent variables—in particular, attitudes and perceptions—in choice models. This is something that has long been deemed necessary by behavioral researchers, but is often either ignored in statistical models, introduced in less than optimal ways (e.g., sequential estimation of a latent variable model then a choice model, which produces inconsistent estimates), or introduced for a narrowly defined model structure. The paper is focused on the use of psychometric data to explicitly model attitudes and perceptions and their influences on choices. The methodology requires the estimation of an integrated multi-equation model consisting of a discrete choice model and the latent variable model's structural and measurement equations. The integrated model is estimated simultaneously using a maximum likelihood estimator, in which the likelihood function includes complex multi-dimensional integrals. The methodology is applicable to any situation in which one is modeling choice behavior (with any type and combination of choice data) where (1) there are important latent variables that are hypothesized to influence the choice and (2) there exist indicators (e.g., responses to survey questions) for the latent variables. Three applications of the methodology provide examples and demonstrate the flexibility of the approach, the resulting gain in explanatory power, and the improved specification of discrete choice models.
Article
Full-text available
This paper describes a recursive method for estimating random co-efficient models. Starting with a trial value for the moments of the distribution of coefficients in the population, draws are taken and then weighted to represent draws from the conditional distribution for each sampled agent (i.e., conditional on the agent's observed dependent vari-able.) The moments of the weighted draws are calculated and then used as the new trial values, repeating the process to convergence. The recursion is a simulated EM algorithm that provides a method of sim-ulated scores estimator. The estimator is asymptotically equivalent to the maximum likelihood estimator under specified conditions. The re-cursive procedure is faster than maximum simulated likelihood (MSL) with numerical gradients, easier to code than MSL with analytic gra-dients, assures a positive definite covariance matrix for the coefficients at each iteration, and avoids the numerical difficulties that often oc-cur with gradient-based optimization. The method is illustrated with a mixed logit model of households' choice among energy suppliers.
Chapter
The search for flexible models has led the simple multinomial logit model to evolve into the powerful but computationally very demanding mixed multinomial logit (MMNL) model. That flexibility search lead to discrete choice hybrid choice models (HCMs) formulations that explicitly incorporate psychological factors affecting decision making in order to enhance the behavioral representation of the choice process. It expands on standard choice models by including attitudes, opinions, and perceptions as psychometric latent variables. In this paper we describe the classical estimation technique for a simulated maximum likelihood (SML) solution of the HCM. To show its feasibility, we apply it to data of stated personal vehicle choices made by Canadian consumers when faced with technological innovations. We then go beyond classical methods, and estimate the HCM using a hierarchical Bayesian approach that exploits HCM Gibbs sampling considering both a probit and a MMNL discrete choice kernel. We then carry out a Monte Carlo experiment to test how the HCM Gibbs sampler works in practice. To our knowledge, this is the first practical application of HCM Bayesian estimation. We show that although HCM joint estimation requires the evaluation of complex multi-dimensional integrals, SML can be successfully implemented. The HCM framework not only proves to be capable of introducing latent variables, but also makes it possible to tackle the problem of measurement errors in variables in a very natural way. We also show that working with Bayesian methods has the potential to break down the complexity of classical estimation.
Article
One of the benefits of advanced traveler information systems (ATISs) is their ability to divert travelers to alternative routes during traffic incidents to alleviate congestion. ATISs may effectively convince travelers to divert to alternative routes by providing information that is considered useful. Therefore, it is important to identify the factors that explain drivers' route diversion behaviors to properly assist in the design and implementation of ATISs. An application of latent variable models to determine the factors that affect drivers' stated intentions to divert from their usual routes when faced with traffic congestion is described. Two latent variables were identified: drivers' attitudes toward route diversion and their perceptions of the reliability of information provided by radio traffic reports (RTRs) or changeable message signs (CMSs). These two latent variables were determined to be significant explanatory variables of route diversion intentions. Some drivers' travel and socioeconomic characteristics and the type of information provided by RTRs and CMSs were also found to be important explanatory variables.
Article
Model Notation, Covariances, and Path Analysis. Causality and Causal Models. Structural Equation Models with Observed Variables. The Consequences of Measurement Error. Measurement Models: The Relation Between Latent and Observed Variables. Confirmatory Factor Analysis. The General Model, Part I: Latent Variable and Measurement Models Combined. The General Model, Part II: Extensions. Appendices. Distribution Theory. References. Index.
Article
In the current paper, we propose a new multinomial probit-based model formulation for integrated choice and latent variable (ICLV) models, which, as we show in the paper, has several important advantages relative to the traditional logit kernel-based ICLV formulation. Combining this MNP-based ICLV model formulation with Bhat’s maximum approximate composite marginal likelihood (MACML) inference approach resolves the specification and estimation challenges that are typically encountered with the traditional ICLV formulation estimated using simulation approaches. Our proposed approach can provide very substantial computational time advantages, because the dimensionality of integration in the log-likelihood function is independent of the number of latent variables. Further, our proposed approach easily accommodates ordinal indicators for the latent variables, as well as combinations of ordinal and continuous response indicators. The approach can be extended in a relatively straightforward fashion to also include nominal indicator variables. A simulation exercise in the virtual context of travel mode choice shows that the MACML inference approach is very effective at recovering parameters. The time for convergence is of the order of 30–80 min for sample sizes ranging from 500 observations to 2000 observations, in contrast to much longer times for convergence experienced in typical ICLV model estimations.
Article
In this paper we discuss the specification, covariance structure, estimation, identification, and point-estimate analysis of a logit model with endogenous latent attributes that avoids problems of inconsistency. We show first that the total error term induced by the stochastic latent attributes is heteroskedastic and nonindependent. In addition, we show that the exact identification conditions support the two-stage analysis found in much current work. Second, we set up a Monte Carlo experiment where we compare the finite-sample performance of the point estimates of two alternative methods of estimation, namely frequentist full information maximum simulated likelihood and Bayesian Metropolis Hastings-within-Gibbs sampling. The Monte Carlo study represents a virtual case of travel mode choice. Even though the two estimation methods we analyze are based on different philosophies, both the frequentist and Bayesian methods provide estimators that are asymptotically equivalent. Our results show that both estimators are feasible and offer comparable results with a large enough sample size. However, the Bayesian point estimates outperform maximum likelihood in terms of accuracy, statistical significance, and efficiency when the sample size is low.
Article
When the dimension of the vector of estimated parameters increases, simulation based methods become impractical, because the number of draws required for estimation grows exponentially with the number of parameters. The lack of empirical identification when the number of parameters increases is usually known as the “curse of dimensionality” in the simulation methods. We investigate this problem in the case of the random coefficients Logit model. We compare the traditional Maximum Simulated Likelihood (MSL) method with two alternative estimation methods: the Expected Maximization (EM) and the Laplace Approximation (HH) methods that do not require simulation. We use Monte Carlo experimentation to investigate systematically the performance of the methods under different circumstances, including different numbers of variables, sample sizes and structures of the variance–covariance matrix. Results show that indeed MSL suffers from lack of empirical identification as the dimensionality grows while EM deals much better with this estimation problem. On the other hand, the HH method, although not being simulation-based, showed poor performance with large dimensions, principally because of the necessity of inverting large matrices. The results also show that when MSL is empirically identified this method seems superior to EM and HH in terms of ability to recover the true parameters and estimation time.