ArticlePublisher preview available

Comparative Fit Indexes in Structural Models

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Normed and nonnormed fit indexes are frequently used as adjuncts to chi-square statistics for evaluating the fit of a structural model. A drawback of existing indexes is that they estimate no known population parameters. A new coefficient is proposed to summarize the relative reduction in the noncentrality parameters of two nested models. Two estimators of the coefficient yield new normed (CFI) and nonnormed (FI) fit indexes. CFI avoids the underestimation of fit often noted in small samples for Bentler and Bonett's (1980) normed fit index (NFI). FI is a linear function of Bentler and Bonett's non-normed fit index (NNFI) that avoids the extreme underestimation and overestimation often found in NNFI. Asymptotically, CFI, FI, NFI, and a new index developed by Bollen are equivalent measures of comparative fit, whereas NNFI measures relative fit by comparing noncentrality per degree of freedom. All of the indexes are generalized to permit use of Wald and Lagrange multiplier statistics. An example illustrates the behavior of these indexes under conditions of correct specification and misspecification. The new fit indexes perform very well at all sample sizes.
QUANTITATIVE
METHODS
IN
PSYCHOLOGY
Comparative
Fit
Indexes
in
Structural Models
P.
M.
Bentler
University
of
California,
Los
Angeles
Normed
and
non
normed
fit
indexes
are
frequently
used
as
adjuncts
to
chi-square
statistics
for
evalu-
ating
the fit of a
structural
model.
A
drawback
of
existing
indexes
is
that they estimate
no
known
population
parameters.
A new
coefficient
is
proposed
to
summarize
the
relative
reduction
in the
noncentrality
parameters
of two
nested models.
Two
estimators
of the
coefficient
yield
new
normed
(CFI)
and
nonnormed
(Fl)
fit
indexes.
CFI
avoids
the
underestimation
of fit
often
noted
in
small
samples
for
Bentler
and
Bonett's
(1980)
normed
fit
index
(NFI).
FI
is a
linear function
of
Bentler
and
Bonett's
non-normed
fit
index
(NNFI)
that avoids
the
extreme underestimation
and
overestima-
tion
often
found
in
NNFI. Asymptotically, CFI,
FI,
NFI,
and a new
index developed
by
Bollen
are
equivalent measures
of
comparative
fit,
whereas NNFI
measures
relative
fit by
comparing
noncen-
trality
per
degree
of
freedom.
All of the
indexes
are
generalized
to
permit
use of
Wald
and
Lagrange
multiplier
statistics.
An
example illustrates
the
behavior
of
these indexes under conditions
of
correct
specification
and
misspccification.
The new fit
indexes perform very
well
at all
Sample sizes.
As
is
well
known,
the
goodness-of-fit
test
statistic
T
used
in
evaluating
the
adequacy
of a
structural model
is
typically
re-
ferred
to the
chi-square distribution
to
determine acceptance
or
rejection
of a
specific
null
hypothesis,
S =
2(0).
In the
context
of
covariance
structure analysis,
S is the
population
covariance
matrix
and 0 is a
vector
of
more
basic
parameters,
for
example,
the
factor
loadings
and
intercorrelations
and
unique
variances
in
a
confirmatory
factor
analysis.
The
statistic
T
reflects
the
closeness
of 2 =
S(0),
based
on the
estimator
8, to
the
sample
matrix
S, the
sample covariance matrix
in
covariance structure
analysis,
in the
chi-square metric. Acceptance
or
rejection
of
the
null hypothesis
via a
test based
on T may be
inappropriate
or
incomplete
in
model evaluation
for
several reasons:
1.
Some
basic
assumptions
underlying
Tmay
be
false
and
the
distribution
of the
statistic
may not be
robust
to
violation
of
these assumptions.
2. No
specific model
S(0)
may be
assumed
to
exist
in the
population,
and T is
intended
to
provide
a
summary regarding
closeness
of S to
S,
but not
necessarily
a
test
of S
=
2(0).
3. In
small
samples,
T may not be
chi-square distributed;
hence,
the
probability values used
to
evaluate
the
null
hypothe-
sis
may not be
correct.
This research
was
supported
in
part
by
United States Public Health
Service
Grants
DA01070
and
DA00017
and is
based
on a
February
1988
technical report
and a
paperprescnted
at the
Psychometric Society
meetings,
June
1988,
Los
Angeles.
Helpful
discussions with
J. de
Leeuw,
R. I.
Jennrich,
T. A. B.
Snijders,
and J. A.
Woodward;
the
eomputer assistance
of
Shinn-Tzong
Wu; and
the
production assistance
of
Julie
Speckart
are
gratefully
acknowl-
edged.
Correspondence concerning this article should
be
addressed
to P. M.
Bentler, Department
of
Psychology, University
of
California,
Los
Ange-
les,
California
90024-1563.
4.
In
large samples,
any a
priori hypothesis
2 =
S(0),
al-
though only trivially false,
may be
rejected.
As
a
consequence,
the
statistic
T
may
not be
clearly
interpret-
able,
and
transformations
of T
designed
to map it
into
a
more
interpretable
0-1,
or
approximate
0-1,
range have been devel-
oped.
Those
indexes
are
usually called goodness-of-fit indexes
(e.g.,
Bentler, 1983,
p.
507;
Joreskog
&
Sorbom,
1984,
p.
1.40).
A
related class
of
indexes, here called comparative goodness-of-
fit
indexes,
assess
T in
relation
to the fit of a
more restrictive
model. These comparative
fit
indexes, formalized
by
Bentler
and
Bonett
(1980),
are
very
widely
used (Bentler
&
Bonett,
1987)
and are the
sole object
of
this article. Alternative
ap-
proaches
to
evaluating model adequacy
are
reviewed elsewhere
(e.g.,
Bollen
&
Liang, 1988; Bozdogan, 1987;
LaDu
&
Tanaka,
in
press;
Wheaton,
1987).
Although covariance structure analy-
sis is
emphasized,
the
methods
developed here hold
for
any
type
of
structural model,
including,
for
example,
mean-covariance
structures
and
log-linear models.
Although more than
30 fit
indexes have been reported
and
their
empirical
behavior
studied
(Marsh,
Balla,
&
McDonald,
1988),
and
although
new
ones continue
to be
developed (Bollen,
1989),
it is
surprising
to
note
that
they have been developed
as
purely
descriptive statistics. Apparently,
no
population parame-
ter has
been
defined
that
is
being estimated
by any of the
exist-
ing
indexes.
In
this article,
I
define
an
explicit population
com-
parative
fit
coefficient,
provide
two
alternative
estimators
of the
coefficient,
and
investigate
the
asymptotic relations between
the
new
and
previously defined comparative
fit
indexes. Further-
more,
new
indexes based
on
Wald
and
Lagrange multiplier sta-
tistics
are
developed.
Nested
Models
and
Comparative
Fit
In
evaluating comparative model
fit, it is
helpful
to
focus
on
more than
one
pair
of
models. Consider
a
series
of
nested
models,
Psychological
Bulletin,
1990,
Vol.
107,
No
2.238-246
Copyright
1990
by (he
American
Psychological
Association,
Inc.
0033-2909/90/$00.P;5
238
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
... CFAs were conducted with maximum likelihood robust estimation (MLR), due to expectations of non-normality. Model fit was examined using the following fit indices: chi square test of exact model fit (χ 2 ), Comparative Fit Index (CFI), the Tucker-Lewis Index (TLI), the Root Mean Square Error of Approximation (RMSEA), and Standardized Root Mean Square Residual (SRMR; Bentler, 1990;Browne & Cudeck, 2016;Hu & Bentler, 2009;Tucker & Lewis, 1973). ...
Article
Self-dehumanization, a phenomenon relevant to social psychology, has been somewhat absent from clinical psychology research. Furthermore, measures of self-dehumanization are few, and to our knowledge, no validated and generalizable self-report measure exists. To address this gap, we present a Self-Dehumanization Scale (SDS). This work incorporates evidence from three studies examining the reliability, validity, and factor structure of the SDS in an undergraduate sample, a clinically relevant community sample, and a sample with at least one minoritized identity. The SDS was derived from dehumanization theory and was developed to measure animalistic and mechanistic self-dehumanization. All studies suggested an 8-item SDS, with Study 1 suggesting a single-factor solution with, however, some indication of a two-factor structure, and Studies 2 and 3 affirming a two-factor solution. The SDS, and its respective factors, generally showed discriminant validity from related, yet distinct, measures of self-hate, self-esteem (Study 2), dissociation, and measures of discrimination (in Study 3). Finally, animalistic and mechanistic SDS showed somewhat mixed but promising evidence regarding their associations to minoritized identities and to symptoms of depression, and suicide risk, above and beyond each study’s fairly stringent control variables. Thus, self-dehumanization may prove to be a clinically promising leverage point in assessing psychopathology, particularly among minoritized communities.
... The pathway analysis was conducted when controlling for background variables in kindergarten (SES, gender, and kindergarten teacher). To evaluate the fit of the model, the following fit indices were used: chi-square goodness of fit index, comparative fit index (CFI; Bentler, 1990), and the root-mean-square error of approximation (RMSEA, Steiger, 1990). Values close to or greater than .95 ...
Article
Full-text available
Self-regulation (SR; emotion-related, and behavioral), executive function, and theory of mind (ToM) all play an important role in child socioemotional functioning (SEF). However, much remains unknown about the interplay among these abilities when facing various challenging situations. Additionally, the role of these abilities in child SEF has not yet been studied among minority children from an Eastern culture. Thus, we conducted one study with two models to examine the combined contribution of these core abilities, concurrently, to children’s SEF during the transition to kindergarten, and longitudinally (about 3 years later) to children’s SEF during COVID-19. Overall, 202 kindergarten children (aged 4.9–6.5 years) participated, of which 136 of them in the longitudinal follow-up (aged 8.83–10.6 years). We used behavioral tasks and teacher and maternal reports. Mothers also reported their own distress during the COVID-19 pandemic. During the transition to kindergarten, we found that emotion-related SR was positively related to children’s SEF. We also found that emotion-related SR moderated the relation between inhibition and ToM. In the follow-up study, we found that emotion-related SR in kindergarten significantly predicted children’s SEF during the COVID-19 crisis, directly and indirectly, through children’s SEF in kindergarten and their maternal COVID-related distress. Moreover, emotion-related SR moderated the longitudinal association between children’s ToM at kindergarten age and their SEF during the COVID-19 crisis. Our findings highlight the central role that emotion-related SR plays in children’s ability to face different challenges.
... RMSEA has, moreover, been declared one of the most informative fit indices (Diamantopoulos & Siguaw, 2000). We also report comparative fit index (CFI; Bentler, 1990) and chi-square. Typically, CFI ≥ .90, ...
Article
Full-text available
Futures Consciousness (FC) describes the human capacity to understand, anticipate, prepare for and embrace the future. Differences in FC between individuals (as a psychological construct) can be reliably measured quantitatively with the Futures Consciousness scale. However, the FC scale is only suitable for adult population. Based on the contention that Futures Consciousness emerges at a younger age, we endeavour to develop and validate an adapted version of the FC scale that is suitable for adolescents (aged 11-18). This paper presents the statistical analyses that led to the validation of a 15-item instrument, the FC-Adolescent scale. Data from N = 1138 adolescents from five countries allowed us to validate the scale in four languages (English, Dutch, Italian, and Turkish) through a dual approach of confirmatory factor analyses and ant colony optimisation item-sampling procedure. The results show that the five-dimensional structure of FC also holds for adolescents and that it can be measured with the scale developed here. Interestingly, we found no correlation between FC and age in the range of 11-18 years old. We discuss implications for research and potential applications for educators and foresight practitioners.
... Support for the hypotheses would be evident if the models fit the data well, i.e. RMSEA < 0.06; Standardized Root Mean Square Residual (SRMR) < 0.08; Comparative Fit Index (CFI) > 0.95, the p-value for the χ 2 > 0.05 [137]; and the 95% CIs of effect sizes for the regression coefficients for hypothesized paths did not include zero. We evaluated multiple fit indices because evaluating any single index can be problematic (e.g. a significant χ 2 test does not have to imply the model misfit, as the significance of the test can be affected by many factors, including clustered data, non-normal data big samples; [138][139][140]). ...
Article
Full-text available
Affective responses during stressful, high-stakes situations can play an important role in shaping performance. For example, feeling shaky and nervous at a job interview can undermine performance, whereas feeling excited during that same interview can optimize performance. Thus, affect regulation—the way people influence their affective responses—might play a key role in determining high-stakes outcomes. To test this idea, we adapted a synergistic mindsets intervention (SMI) (Yeager et al. 2022 Nature 607, 512–520 (doi:10.1038/s41586-022-04907-7)) to a high-stakes esports context. Our approach was motivated by the idea that (i) mindsets both about situations and one’s stress responses to situations can be shaped to help optimize stress responses, and (ii) challenge versus threat stress responses will be associated with improved outcomes. After a baseline performance task, we randomly assigned gamers (n = 300) either to SMI or a control condition in which they learned brain facts. After two weeks of daily gaming, gamers competed in a cash-prize tournament. We measured affective experiences before the matches and cardiovascular responses before and throughout the matches. Contrary to predictions, gamers did not experience negative affect (including feeling stressed), thus limiting the capacity for the intervention to regulate physiological responses and optimize performance. Compared with the control participants, synergistic mindsets participants did not show greater challenge responses or improved performance outcomes. Though our adaptation of Yeager et al.’s SMI did not optimize esports performance, our findings point to important considerations regarding the suitability of an intervention such as this to different performance contexts of varying degrees of stressfulness.
... In order to spot the alteration in passenger satisfaction between COVID-19 periods, we proposed two models: the free model (i.e., no equal constraint of intercepts) and the strict model (i.e., equal constraint of intercepts). The results of relevant models are displayed in Table 2. Instead of 2 , we report Bentler's CFI, which yields more accurate results since it is a standardized index for (Bentler, 1990). Based on the goodness-of-fit indices, it is clear that the model is fit (CFI= 1.000; TLI= 1.000; RMSEA= 0.000; SRMR= 0.000). ...
Article
Full-text available
This paper aims to investigate whether there is an alteration in the drivers of air passenger satisfaction before and after COVID-19. We conducted the multigroup structural equation modelling with the diagonally weighted least squares estimation method as the variables are categorical. Lastly, we performed ANOVA to spot if there is a change in the drivers of passenger satisfaction between before and after the pandemic. The results suggest all service attributes have a significant impact on satisfaction in the pre-COVID-19 era. Even if it seems that in-flight entertainment and in-flight WiFi are not as influential as before in the post-pandemic, ANOVA results revealed the difference between the pre-pandemic and the new-normal period was not statistically significant. Accordingly, airlines should not ignore the need to improve service attributes, called premium services, and holistically improve service design. In addition, after value for money, the most important attribute for passengers is ground handling. Hence, speeding up the boarding process would return carriers in the form of more satisfied customers. To the best of our knowledge, this is the first paper employing Multigroup DWLS SEM to focus on changes in determinants of air passenger satisfaction in a holistic approach, focusing on pandemic periods.
... The maximum-likelihood approach was used for model estimation (Brown, 2015;Kline, 2015). Conventional model fit indices, including the comparative fit index (CFI; Bentler, 1990), the root mean squared error of approximation (RMSEA; Steiger & Lind, 1980), and the square root mean residual (SRMR; Hu & Bentler, 1999) were used to evaluate each model. Threshold values of > 0.90 (CFI), < 0.08 (RMSEA) and < 0.06 (SRMR) were set as cut-points to establish model adequacy. ...
Article
Full-text available
Background Maternal birth experience is being increasingly recognised as a key clinical outcome parameter. The Birth Satisfaction Scale-Revised (BSS-R) is a short self-report measure designed to assess birth experience. The current investigation sought to trans-late the BSS-R into Polish and validate this version of the BSS-R (PL-BSS-R). Participants and procedure The BSS-R was translated into Polish by an expert panel using forward and backward translation. A complex within-subjects design with an embedded between-subjects component was used to determine the key psychometric characteristics of the PL-BSS-R. Two hundred ninety-four Polish-speaking women in Poland completed the follow-up component of the study where the PL-BSS-R was administered. The PL-BSS-R measurement properties were examined using confirmatory factor analysis, divergent, convergent validity analysis, internal consistency appraisal and investigation of known-groups discriminant characteristics. Results The PL-BSS-R was found to have generally very good measurement properties and to be equivalent to the original English-language version across key validity indices. The PL-BBS-R was found to be significantly correlated with neonatal physical health immediately postpartum and differed across delivery modes. Conclusions The PL-BSS-R is a psychometrically robust measure of birth experience appropriate for clinical and research use within Po-land. Important associations were noted between subjective maternal birth experience and objective measures of neonatal physical health, indicating a critically important future research direction.
... To fit the models we used the sample correlation matrix using all available data (pairwise correlations; similar results were found when using full information maximum likelihood). For all model testing (using Lisrel 8.80; similar results were found when using R), we report several fit statistics (e.g., Bentler, 1990;Browne & Cudeck, 1992;Schermelleh-Engel et al., 2003). Nonsignificant chi-square tests indicate adequate model fit; with large samples like ours, however, they are nearly always significant. ...
Article
Full-text available
Relations between conative factors (task-specific motivation, attention self-efficacy, and self-set goals) and individual differences in attention control (AC) performance were investigated in two latent variable studies. Participants performed AC tasks along with measures of working memory and processing speed. During the AC tasks, participants self-reported their motivation, self-efficacy, and self-set goals for the tasks. Task-unrelated thoughts were also assessed. Confirmatory factor analyses demonstrated that latent factors for the constructs could be formed and the conative factors were each related to the AC factor. Structural equation modeling further suggested that the conative factors tended to account for unique variance in attention, even after accounting for shared variance with working memory and processing speed. These results provide evidence that conative factors are important for individual differences in AC and further suggest that multiple factors likely contribute to variation in performance on AC tasks.
Preprint
Full-text available
Background: The agri-food supply chain is crucial for a nation's sustenance and economic stability but faces challenges such as lack of transparency, inefficiencies, and information asymmetry. Integrating Blockchain Database (BCD) technology, along with Internet of Things (IoT) technologies, offers transformative potential. This combination can enhance the Transparent Physical and Information Flow (PHF), thus improving Transparency in the Agri-food Supply Chain (TASC). Objective: This research examines how integrating BCD affects PHF and, in turn, influences TASC in Bangladesh. It is based on two main hypotheses BCD significantly impacts PHF, and a BCD-enhanced PHF subsequently affects TASC. Methods: An analytical framework was designed to explore the integration of BCD technology and its effect on the transparency of Bangladesh's agri-food supply chain. Data analysis followed five stages: Preliminary Data Examination, Exploratory Factor Analysis (EFA), Confirmatory Factor Analysis (CFA), Structural Equation Modeling (SEM), and Hypothesis Testing, utilizing IBM SPSS and IBM AMOS. Data were gathered from 400 stakeholders in the Bangladesh agri-food supply chain. Results and Conclusion: The findings support both hypotheses, showing a significant and positive impact of BCD technology on PHF and, consequently, on TASC. The results highlight the essential role of BCD in enhancing supply chain transparency and operational efficiency. Implications of the Research: This study offers empirical evidence on how blockchain technology can effectively address transparency and efficiency challenges in the agri-food supply chain. It highlights the potential of BCD to enhance decision-making, operational efficiency, and consumer trust within the agricultural sector, particularly in developing countries such as Bangladesh. Originality/Value: This research provides fresh insights into how BCD technologies can enhance transparency and efficiency in the agri-food supply chain. By concentrating on the context of Bangladesh, it offers significant implications for policymakers, industry professionals, and researchers, highlighting the transformative potential of blockchain in managing agricultural supply chai
Preprint
Full-text available
Declining cognitive and motor functions make safe driving difficult for older adults. Trail Making Test (TMT) scores are reported to facilitate the estimation of cognitive functions in older adults and enable correlations with parameters associated with driving skills and vehicle speed. However, the causal relationships between cognitive functions and discrete driving-related parameters remain unclear. First, this study examined the correlations between the TMT indices and driving-related parameters of older adult drivers. Next, it used structural equation modeling to express the causal relationships between the parameters and TMT indicators. Thirty older adult drivers accomplished an intersection passage task on a driving simulator (DS) and consecutively performed multiple TMT iterations. Vehicle operation data collected from DS logs, data on head motions to confirm safety at an intersection, and accumulated TMT scores indicating cognitive functions enabled this study to determine correlations between the TMT indices and the parameters of pedal operation (vehicle speed), steering (steering input and lateral vehicle position), and head motion (horizontal angle and velocity). Models were then created to discern relationships between these parameters and the cognitive functions of older adult drivers. The study results indicate the possibility of automatically estimating the cognitive functions of older adult drivers from their daily driving-related data.
Chapter
Full-text available
Asymptotically distribution-free efficient estimates are obtained for a large class of models and estimators, all based on a postulate of the form: V T(s - converges in law to a multivariate normal distribution with s+= u(O) being a function of a set of structural parameters under the null hypothesis. First, we deal with minimum x2 or nonlinear generalized least squares estimation under nonlinear constraints and con- sider the problems of consistency, asymptotic normality and efficiency, bias, and tests of fit and restrictions. Thereafter, we develop the parallel theory for an estimator obtained by linearization of the structural model as well as constraint functions on the parameters. Linearized estimators and tests based on a one-step improvement from an initial consistent estimator are shown to have the same optimal statistical proper- ties as their fully iterated counterparts. The classical psychometric factor analytic model, the econometric simultaneous equation system, and related models provide illustrations of the theory. A number of new estimators and their asymptotic distributions are described. New perspectives on old estimators are also offered.
Article
A simulation study of the effects of sample size on the overall fit statistic provided by the LISREL program indicates the statistic is well behaved over a wide range of sample sizes for simple models. However, this statistic is apparently not chi square distributed for more complex models when samples are relatively small, and will reject the hypothesized model too often. A set of additional measures suggested by various researchers for evaluating causal models also is examined. These statistics are well behaved for both models tested as they converge to the true value and their variance approaches zero as sample size increases.
Article
Assessing overall model fit is an important problem in general structural equation models. One of the most widely used fit measures is Bentler and Bonett's (1980) normed index. This article has three purposes: (1) to propose a new incremental fit measure that provides an adjustment to the normed index for sample size and degrees of freedom, (2) to explain the relation between this new fit measure and the other ones, and (3) to illustrate its properties with an empirical example and a Monte Carlo simulation. The simulation suggests that the mean of the sampling distribution of the new fit measure stays at about one for different sample sizes whereas that for the normed fit index increases with N. In addition, the standard deviation of the new measure is relatively low compared to some other measures (e.g., Tucker and Lewis's (1973) and Bentler and Bonett's (1980) nonnormed index). The empirical example suggests that the new fit measure is relatively stable for the same model in different samples. In sum, it appears that the new incremental measure is a useful complement to the existing fit measures.
Book
2nd ed. of 1981 PhD thesis. Topics: stochastic and numerical convergence properties of Partial Least Squares (PLS), the comparison of LISREL with PLS, and an analysis of GLS for covariance structures under misspecification
Article
In recent years a number of measures have been suggested for the assessment of fit of overidentified models with latent variables (i.e., covariance structure models). This article discusses the logic of the fit problem, reviews the analytical intentions of six of these measures, with emphasis on their dependence on sample size, and compares the operational behavior of these measures in three-model situations: in a confirmatory factor model based on small N, and in two covariance structure models, one based on a slightly larger N and the other based on a large N. Given that these models and data are “typical,” results suggest that certain measures are both more stable across sample sizes and more sensitive to important variation in fit across substantively plausible models. The article concludes by suggesting a three-component approach to fitting: use of multiple measures, strategical overfitting, and comparison of parameter estimates in borderline versus more clearly sufficient models in terms of fit.
Article
In this paper we compare alternative asymptotic approximations to the power of the likelihood ratio test used in covariance structure analysis for testing the fit of a model. Alternative expressions for the noncentrality parameter (ncp) lead to different approximations to the power function. It appears that for alternative covariance matrices close to the null hypothesis, the alternative ncp's lead to similar values, while for alternative covariance matrices far from Ho the different expressions for the ncp can conflict substantively. Monte Carlo evidence shows that the ncp proposed in Satorra and Saris (1985) gives the most accurate power approximations.
Article
In the context of a robust generalized least squares approach, a new statistic for testing the validity of constraints in structural equation models is proposed. It is shown that the new test requires significantly less computational effort than the traditional one. Extension to models in several populations is also considered. Illustrative applications based on real data are given.