ArticlePDF Available

Maximum likelihood estimation of the polychoric correlation coefficient

Authors:

Abstract and Figures

The polychoric correlation is discussed as a generalization of the tetrachoric correlation coefficient to more than two classes. Two estimation methods are discussed: Maximum likelihood estimation, and what may be called two-step maximum likelihood estimation. For the latter method, the thresholds are estimated in the first step. For both methods, asymptotic covariance matrices for estimates are derived, and the methods are illustrated and compared with artificial and real data.
Content may be subject to copyright.
A preview of the PDF is not available
... Building on the idea of ordinal variables as discretized variants of latent continuous variables, sample polychoric correlations do not compute the correlation among observed ordinal variables directly but the correlation among the assumed latent continuous variables underlying the ordinal variables (Flora & Curran, 2004;Jin & Yang-Wallentin, 2017;Muthén, 1978;Olsson, 1979a). Assuming that the latent variables are normally distributed, it can be shown that maximum-likelihood estimation of polychoric correlations is asymptotically unbiased, which is an advantage over biased Pearson correlations (Lubbe, 2019). ...
... Finally, another limitation concerns the assumption of normally distributed latent variables by the maximum-likelihood estimator of polychoric correlations (Olsson, 1979a). Whenever polychoric correlations were computed (i.e., Fig. 1, NEST poly , NEST hybrid , PA poly ), ordinal variables had been simulated by transforming normally distributed variables using predetermined thresholds (Yang & Xia, 2015). ...
Article
Full-text available
An essential step in exploratory factor analysis is to determine the optimal number of factors. The Next Eigenvalue Sufficiency Test (NEST; Achim, 2017) is a recent proposal to determine the number of factors based on significance tests of the statistical contributions of candidate factors indicated by eigenvalues of sample correlation matrices. Previous simulation studies have shown NEST to recover the optimal number of factors in simulated datasets with high accuracy. However, these studies have focused on continuous variables. The present work addresses the performance of NEST for ordinal data. It has been debated whether factor models – and thus also the optimal number of factors – for ordinal variables should be computed for Pearson correlation matrices, which are known to underestimate correlations for ordinal datasets, or for polychoric correlation matrices, which are known to be instable. The central research question is to what extent the problems associated with Pearson correlations and polychoric correlations deteriorate NEST for ordinal datasets. Implementations of NEST tailored to ordinal datasets by utilizing polychoric correlations are proposed. In a simulation, the proposed implementations were compared to the original implementation of NEST which computes Pearson correlations even for ordinal datasets. The simulation shows that substituting polychoric correlations for Pearson correlations improves the accuracy of NEST for binary variables and large sample sizes ( N = 500). However, the simulation also shows that the original implementation using Pearson correlations was the most accurate implementation for Likert-type variables with four response categories when item difficulties were homogeneous.
... Sin embargo, debido a que las variables de esta investigación son categóricas, se viola el supuesto de normalidad, así que los resultados brindados por este método no serían confiables (Porras, 2016). Una solución a esta problemática es el uso de ACPP basado en el coeficiente de correlación policórico sugerido por Pearson (1901) como una medida de la correlación normal bivariada (Olsson, 1979). ...
Article
This research is intended to examine the relationship between the Employment Vulnerability Index (EVI) and labor informality in Ecuador in 2018, 2019, and 2021 (considering both pre- and post-pandemic periods). The EVI was constructed using a polychoric principal components analysis (PCPA), while a logistic regression model and a two-stage least squares linear probability model were employed to evaluate its association with workers' informal status. The findings indicate a positive correlation between employment vulnerability and the likelihood of a worker being engaged in the informal sector, with this correlation being particularly pronounced among individuals with higher EVI scores. Accordingly, policymakers are advised to concentrate their efforts on enhancing workers' conditions by increasing educational attainment and implementing programs designed to incentivize formal employment.
... Finally, any π n arises from discretizing latent Gaussian variables at fixed cut-points given by A and the probit restrictions on R. Thus, the problem of estimating ρ jj ′ from π n reduces to estimating the polychoric correlation coefficient (Olsson, 1979). The resulting likelihood is a regular parametric family admitting a consistent estimator through maximum likelihood estimation (MLE). ...
Preprint
Full-text available
We present an approach for modeling and imputation of nonignorable missing data under Gaussian copulas. The analyst posits a set of quantiles of the marginal distributions of the study variables, for example, reflecting information from external data sources or elicited expert opinion. When these quantiles are accurately specified, we prove it is possible to consistently estimate the copula correlation and perform multiple imputation in the presence of nonignorable missing data. We develop algorithms for estimation and imputation that are computationally efficient, which we evaluate in simulation studies of multiple imputation inferences. We apply the model to analyze associations between lead exposure levels and end-of-grade test scores for 170,000 students in North Carolina. These measurements are not missing at random, as children deemed at-risk for high lead exposure are more likely to be measured. We construct plausible marginal quantiles for lead exposure using national statistics provided by the Centers for Disease Control and Prevention. Complete cases and missing at random analyses appear to underestimate the relationships between certain variables and end-of-grade test scores, while multiple imputation inferences under our model support stronger, adverse associations between lead exposure and educational outcomes.
... Given two ordinal variables X and Y , suppose that there is a random sample consisting of n pairs ( ) x y , i i for = i n 1,…, , with { } ∈ x k 1, …, (1) Compute the polychoric correlation ρ N [11] between X and Y . ...
Article
Full-text available
Vine pair-copula constructions exist for a mix of continuous and ordinal variables. In some steps, this can involve estimating a bivariate copula for a pair of mixed continuous-ordinal variables. To assess the adequacy of copula fits for such a pair, diagnostic and visualization methods based on normal score plots and conditional Q–Q plots are proposed. The former uses a latent continuous variable for the ordinal variable. The methods are applied to data generated from some existing probability models for a mixed continuous-ordinal variable pair, and for such models, Kullback-Leibler divergence is used to assess whether simple parametric copula families can provide adequate fits. The effectiveness of the proposed visualization and diagnostic methods is illustrated on a dataset.
... SEM assimilates several research processes in a "holistic fashion" (Chin 2000) and further enables researchers to test and assess the concepts and theories that have been previously proposed (Westland 2015b). The CB-SEM (Jöreskog 1978) and partial least squares (PLS) path modeling as a variance-based structural equation modeling (VB-SEM) (Lohmöller 1989;Wold 1975) are among the two established approaches in the second generation of multivariate data analysis with different applications in research (Vinzi et al. 2010;Olsson 1979;Westland 2010). MLE is often considered superior to PLS, as MLE demonstrates useful statistical properties including sufficiency, efficiency, consistency, and parameterization invariance (Myung 2003), while PLS is considered to have limitations in terms of statistical performance including fit indices (Westland 2015a). ...
Article
Full-text available
Despite the increasing popularity of pay-per-click (PPC) advertising and search engine optimization within the financial industry, there is a notable lack of research on the effectiveness of PPC on bank customers' continuous search intention and banking services intention. This study aims to fill this gap by investigating the use of PPC as a tool on customers' search intention and continuous services intention in a retail banking context. Utilizing a quantitative design, we collected data and employed maximum likelihood estimation (MLE) for path analysis to analyze the empirical data. Our findings reveal several substantial results. Firstly, continuous search intentions significantly influence continuous banking services intentions. Secondly, attitude toward PPC advertising significantly affects both continuous search and banking services intentions. Thirdly, satisfaction with PPC advertising is crucial in shaping attitudes toward PPC, continuous search intention, and continuous banking services intention. Fourthly, perceived usefulness directly influences attitudes toward PPC, satisfaction with PPC, and continuous search intention. Lastly, while PPC advertising's perceived ease of use and perceived confirmation are linked to perceived usefulness, they do not directly affect attitude toward PPC. By adopting a dual intentions approach, this study contributes to banking literature by highlighting the importance of understanding the distinct roles of PPC attributes in shaping short- and long-term customer behavioral intentions.
... Some of the reported work (Filmer & Pritchett 2001) have confronted the criticism against using PCA although it is a suitable technique for continuous data that satisfies multivariate normal distribution (Anderson 2003;Hotelling 1933;Mandia 1980). Variables used to construct WEI are in binary and categorical form so polychoric method of correlation is adopted (Olsson 1979). ...
Article
Full-text available
Child malnutrition is one of the major causes of child morbidity and mortality around the globe especially in developing countries. The current study attempts to investigate the factors that contribute to malnutrition captured through WAZ (weight for age) among children in the Cholistan desert area of Punjab, Pakistan. Out of 900 households surveyed, 584 households were identified, having a sample of 1059 children aged between 0 and 59 months. The logit, multilevel logit, generalized linear mixed model, and generalized linear latent and mixed model approaches were employed to analyze the collected data. The findings reveal that the wealth index of households, mother’s age at birth of a child, birth order of the child, duration of breastfeeding, distance to the basic health unit, and use of protected/clean water significantly affect children for being underweight. The policy recommendations are made in line with the study findings to suggest ways that can reduce prevalance of underweight children in the area.
Thesis
Full-text available
Leisure is an opportunity for relaxation, enjoyment, and rejuvenation. Often, actively utilizing leisure is a prerequisite for internal satisfaction. Being aware of the importance of leisure creates opportunities to achieve more balance, happiness, and personal development in life. Therefore, it is important to fully understand the value of leisure and have positive attitudes towards it. In light of this information, the following objectives have been adopted for the research in order to identify the variables that positively or negatively affect leisure attitude (LA) and develop recommendations based on the obtained results: (a) determining the relationship between social media addiction (SMA) and LA; (b) determining the relationship between SMA and the fear of missing out (FoMO); (c) determining the relationship between SMA and the need to belong (NTB); (d) determining the relationship between FoMO and LA; (e) determining the relationship between NTB and FoMO; (f) determining the relationship between NTB and LA; (g) determining the mediating effect of FoMO on the relationship between SMA and LA; (h) determining the moderating effect of NTB on the relationship between SMA and FoMO; (i) determining the moderating effect of NTB on the relationship between FoMO and LA; (j) determining the moderating effect of NTB on the relationship between SMA and LA. The study population consisted of students enrolled in the Recreation Management program (270 students) at Necmettin Erbakan University's Faculty of Tourism. The ideal sample size, representative of the population, was determined as 159 students using the Sample Size Calculator program (with a 95% confidence level and 5% margin of error); however, data were collected from 220 students. The data were collected through a survey using the Bergen Social Media Addiction Scale (BSMAS), Leisure Attitude Scale (LAS), Fear of Missing Out Scale (FoMOS), and Need To Belong Scale (NTBS). Following data analysis, the construct validity and reliability of the measurement tools were retested. The mean distributions of the measurement tools were analyzed, and hypothesis testing was conducted using Process Model 4 and Process Model 59 in the SPSS program. The study has shown the following results: (a) there was no relationship between SMA and LA; (b) there was a positive relationship between SMA and FoMO; (c) there was a positive relationship between SMA and NTB; (d) there was no relationship between FoMO and LA (although there was a positive relationship between FoMO and behavioral LA); (e) there was a positive relationship between NTB and FoMO; (f) there was a positive relationship between NTB and LA; (g) the relationship between SMA and LA was not mediated by FoMO; (h) the relationship between SMA and FoMO was not moderated by NTB; (i) the relationship between FoMO and LA was positively moderated by NTB; (j) the relationship between SMA and LA was not moderated by NTB. The findings of the study are discussed in the relevant section, and recommendations are developed based on the results obtained.
Article
Full-text available
The paper is concerned with the consequences for maximum likelihood factor analysis which may follow if the observed variables are ordinal with only a few scale steps, which are assigned integer values. It is hypothesized that the observed variables are obtained through a classification of some true variables, which are multivariate normal and for which a factor model holds. Using simple formulas for the relations between true correlations and correlations based on the classified variables, we demonstrate numerically the relationships between true factor models and results obtained from classified data. This is done for several choices of thresholds, true factor loadings and numbers of variables, assuming a one-factor model. The results indicate that classification may lead to a substantial lack of fit of the model, i.e. an erroneous indication that more factors are needed. This is especially true if the variables are skewed in opposite direction and have high true loadings, but does not depend much on the number of scale steps. The classification also attenuates the loading estimates, and this effect is increased with a decreasing number of scale steps and increasing variation in skewness among the variables.
Article
1. In August last I presented to the Society a memoir on the inheritance of coat-colour in thoroughbred horses, and of eye-colour in man. This memoir, which was read in November of last year, presented the novel feature of determining correlation between characters which were not capable à priori of being quantitatively measured. The theoretical part of that memoir was somewhat brief, but I showed by illustrations that the method could be extended to deal with problems like the effectiveness of vaccination and of the antitoxin treatment in diphtheria.
Article
In August, 1899, I presented a memoir to the Royal Society on the inheritance of coat-colour in the horse and of eye-colour in man, which was read November, 1899, and ultimately ordered to be published in the 'Phil. Trans.’ Before that memoir was printed, Mr. Yule’s valuable memoir on Association was read, and, further, Mr. Leslie Bramley-Moore showed me that the theory of my memoir as given in § 6 of the present memoir led to somewhat divergent results according to the methods of proportioning adopted. We therefore undertook a new investigation of the theory of the whole subject, which is embodied in the present memoir. The data involved in the paper on coat-colour in horses and eye-colour in man have all been recalculated, and that paper is nearly ready for presentation. But it seemed best to separate the purely theoretical considerations from their application to special cases of inheritance, and accordingly the old memoir now reappears in two sections. The theory discussed in this paper was, further, the basis of a paper on the Law of Reversion with special reference to the Inheritance of Coat-colour in Basset Hounds recently communicated to the Society, and about to appear in the ‘ Proceedings. While I am responsible for the general outlines of the present paper, the rough draft of it was taken up and carried on in leisure moments by Mr. Leslie Bramley-Moore, Mr. L. N. G. Filon, M. A., and Miss Alice Lee, D. Sc. Mr. Bramley-Moore discovered the u -functions ; Mr. Filon proved most of their general properties and the convergency of the series; I alone am responsible for sections 4, 5, and 6. Mr. Leslie Bramley-Moore sent me, without proof, on the eve of his departure for the Cape, the general expansion for z on p. 26. I am responsible for the present proof and its applications. To Dr. Alice Lee we owe most of the illustrations and the table on p. 17. Thus the work is essentially a joint memoir in which we have equal part, and the use of the first personal pronoun is due to the fact that the material had to be put together and thrown into form by one of our number.—K. P.
Article
Pearson (1900) introduced the tetrachoric series method for estimating the correlation between two non-measurable characters each with two levels. For characters with more than two levels, Ritchie-Scott (1918) suggested averaging all possible tetrachoric correlations. Using the theory of orthonormal functions, Lancaster & Hamdan (1964) suggested an alternative method essentially based on giving a weighting to each possible tetrachoric table. In this note, a special form of Lancaster & Hamdan's method is used to give an instructive derivation of the tetrachoric series.