Article

Multilevel Item Response Models: An Approach to Errors in Variables Regression

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this article we show how certain analytic problems that arise when one attempts to use latent variables as outcomes in regression analyses can be addressed by taking a multilevel perspective on item response modeling. Under a multilevel, or hierarchical, perspective we cast the item response model as a within-student model and the student population distribution as a between-student model. Taking this perspective leads naturally to an extension of the student population model to include a range of student-level variables, and it invites the possibility of further extending the models to additional levels so that multilevel models can be applied with latent outcome variables. In the two-level case, the model that we employ is formally equivalent to the plausible value procedures that are used as part of the National Assessment of Educational Progress (NAEP), but we present the method for a different class of measurement models, and we use a simultaneous estimation method rather than two-step estimation. In our application of the models to the appropriate treatment of measurement error in the dependent variable of a between-student regression, we also illustrate the adequacy of some approximate procedures that are used in NAEP.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Single-level multidimensional IRT (MIRT) models were proposed decades ago with the primary goal of modeling the correlations among multiple latent traits and categorical response variables (Mulaik 1972;Reckase 1972;Sympson 1978;Whitely 1980aWhitely , 1980bWay, Ansley, and Forsyth 1988;Ackerman 1989;Embretson and Reise 2000;B eguin and Glas 2001;Muraki and Carlson 1993;Kelderman and Rijkes 1994;Yao and Schwarz 2006;Reckase 2009). In order to analyze the impact of covariates on multiple latent traits and not focus on the estimates of the latent traits, covariates were later introduced into the MIRT model to explain the relationship between predictors and multiple latent traits (Adams, Wilson, and Wu 1997;van der Linden 2007;De Jong and Steenkamp 2010;H€ ohler, Hartig, and Goldhammer 2010;Lu 2012;Muth en and Asparouhov 2013). ...
... In particular, measurement error can diminish the statistical power of impact studies and weaken the ability of researchers to identify relationships among different variables affecting student outcomes (Lu, Thomas, and Zumbo 2005). The issues that emerge when one tries to use latent variables as outcomes in regression analysis (which is also known as latent regression) can be addressed by taking a multilevel perspective on item response modeling (Adams, Wilson, and Wu 1997). Within the framework of multilevel or hierarchical models, the IRT model is placed at the lowest level as a within-subject model, and the student population distribution is typically treated as a between-student model. ...
... Within the framework of multilevel or hierarchical models, the IRT model is placed at the lowest level as a within-subject model, and the student population distribution is typically treated as a between-student model. The multilevel IRT model makes it possible to simultaneously estimate item and ability parameters and the structural multilevel model parameters (e.g., Adams, Wilson, and Wu 1997;Kamata 2001;Pastor 2003). Therefore, the measurement error associated with the estimated abilities is well taken into account in estimating the multilevel parameters (Adams, Wilson, and Wu 1997). ...
Article
Full-text available
In this paper, we propose a multilevel multidimensional item response model for studying the relations among multiple abilities and covariates in a hierarchical data structure. As an example, this study is well suited to examining the scenario in which a test measures multidimensional latent traits (e.g., reading ability, cognitive ability, and computing ability) and in which students are nested within classes or schools. The new model can recover the correlations among multidimensional abilities, along with the correlation between person- and school-level covariates and abilities. A fully Gibbs sampling algorithm within the Markov chain Monte Carlo (MCMC) framework is proposed for parameter estimation. A unique form of the deviance information criterion (DIC) is used as a model comparison index. Two simulation studies show that the estimation method is suitable in recovering all model parameters.
... The issues that emerge when one tries to use latent variables as outcomes in regression analysis can be addressed by taking a multilevel perspective on item response modelling (Adams, Wilson, & Wu, 1997). Hierarchical models have been proven useful for solving the technical problems that arise when traditional approaches and models are applied to nested data, such as students nested within classrooms or repeated measures nested within persons (Raudenbush & Bryk, 2002). ...
... Within the framework of multilevel or hierarchical models, the IRT model is placed at the lowest level as a within-subject measurement model and the student population distribution is typically treated as a between-subjects structural model. The multilevel IRT model makes it possible to simultaneously estimate the item and ability parameters and the structural multilevel model parameters (e.g., Adams et al., 1997;Kamata, 2001;Pastor, 2003). Therefore, measurement error in the estimated abilities is well handled in estimating the multilevel parameters (Adams et al., 1997). ...
... The multilevel IRT model makes it possible to simultaneously estimate the item and ability parameters and the structural multilevel model parameters (e.g., Adams et al., 1997;Kamata, 2001;Pastor, 2003). Therefore, measurement error in the estimated abilities is well handled in estimating the multilevel parameters (Adams et al., 1997). This approach is called the unified one-stage approach. ...
Article
Full-text available
Among current state-of-the-art estimation methods for multilevel IRT models, the two-stage divide-and-conquer strategy has practical advantages, such as clearer definition of factors, convenience for secondary data analysis, convenience for model calibration and fit evaluation, and avoidance of improper solutions. However, various studies have shown that, under the two-stage framework, ignoring measurement error in the dependent variable in stage II leads to incorrect statistical inferences. To this end, we proposed a novel method to correct both measurement bias and measurement error of latent trait estimates from stage I in the stage II estimation. In this paper, the HO-IRT model is considered as the measurement model, and a linear mixed effects model on overall (i.e., higher-order) abilities is considered as the structural model. The performance of the proposed correction method is illustrated and compared via a simulation study and a real data example using the National Educational Longitudinal Survey data (NELS 88). Results indicate that structural parameters can be recovered better after correcting measurement biases and errors.
... Single-level multidimensional IRT (MIRT) models were proposed decades ago, as it have the primary features of modeling the correlations among multiple latent traits and categorical response variables (Mulaik, 1972;Reckase, 1972Reckase, , 2009Sympson, 1978;Whitely, 1980a,b;Way et al., 1988;Ackerman, 1989;Muraki and Carlson, 1993;Kelderman and Rijkes, 1994;Embretson and Reise, 2000;Béguin and Glas, 2001;Yao and Schwarz, 2006). The MIRT models later incorporated covariates to elucidate the connection between multiple latent traits and predictors (Adams et al., 1997;van der Linden, 2008;De Jong and Steenkamp, 2010;Klein Entink, 2009;Klein Entink et al., 2009;Höhler et al., 2010;Lu, 2012;Muthén and Asparouhov, 2013). ...
... Taking a multilevel perspective on item response modeling can avoid issues that arise when analysts use latent regression (using latent variables as outcomes in regression analysis) (Adams et al., 1997). The student population distribution is commonly handled as a between-student model with the IRT model being placed at the lowest level as a within-subject model within the structure of multilevel or hierarchical models. ...
... The student population distribution is commonly handled as a between-student model with the IRT model being placed at the lowest level as a within-subject model within the structure of multilevel or hierarchical models. Using a multilevel IRT model gives analysts the ability to estimate item and ability parameters along with structural multilevel model parameters at the same time (e.g., Adams et al., 1997;Kamata, 2001;Hox, 2002;Goldstein, 2003;Pastor, 2003). This results in measurement error associated with estimated abilities being accounted for when estimating the multilevel parameters (Adams et al., 1997). ...
Article
Full-text available
In many large-scale tests, it is very common that students are nested within classes or schools and that the test designers try to measure their multidimensional latent traits (e.g., logical reasoning ability and computational ability in the mathematics test). It is particularly important to explore the influences of covariates on multiple abilities for development and improvement of educational quality monitoring mechanism. In this study, motivated by a real dataset of a large-scale English achievement test, we will address how to construct an appropriate multilevel structural models to fit the data in many of multilevel models, and what are the effects of gender and socioeconomic-status differences on English multidimensional abilities at the individual level, and how does the teachers' satisfaction and school climate affect students' English abilities at the school level. A full Gibbs sampling algorithm within the Markov chain Monte Carlo (MCMC) framework is used for model estimation. Moreover, a unique form of the deviance information criterion (DIC) is used as a model comparison index. In order to verify the accuracy of the algorithm estimation, two simulations are considered in this paper. Simulation studies show that the Gibbs sampling algorithm works well in estimating all model parameters across a broad spectrum of scenarios, which can be used to guide the real data analysis. A brief discussion and suggestions for further research are shown in the concluding remarks.
... The solution is to introduce the plausible value (PV) machinery based on a model that involves a combination of IRT and a latent regression (Mislevy 1991;Adams et al. 1997;Andersen 2004;von Davier et al. 2007). It is important to recognize that this approach was adopted specifically to produce unbiased estimates of group-level statistics. ...
... If particular characteristics that become subsequently available are of interest, then supplementary latent regression models can be run to generate new PVs so as to ensure unbiased estimation. Software for conducting these latent regression model analyses is available upon request from organizations such as ETS (PC Windows version DGROUP, Rogers and Blew 2012) and ACER (Conquest; Adams et al. 1997). ...
Article
Full-text available
Abstract Background Economists are making increasing use of measures of student achievement obtained through large-scale survey assessments such as NAEP, TIMSS, and PISA. The construction of these measures, employing plausible value (PV) methodology, is quite different from that of the more familiar test scores associated with assessments such as the SAT or ACT. These differences have important implications both for utilization and interpretation. Although much has been written about PVs, it appears that there are still misconceptions about whether and how to employ them in secondary analyses. Methods We address a range of technical issues, including those raised in a recent article that was written to inform economists using these databases. First, an extensive review of the relevant literature was conducted, with particular attention to key publications that describe the derivation and psychometric characteristics of such achievement measures. Second, a simulation study was carried out to compare the statistical properties of estimates based on the use of PVs with those based on other, commonly used methods. Results It is shown, through both theoretical analysis and simulation, that under fairly general conditions appropriate use of PV yields approximately unbiased estimates of model parameters in regression analyses of large scale survey data. The superiority of the PV methodology is particularly evident when measures of student achievement are employed as explanatory variables. Conclusions The PV methodology used to report student test performance in large scale surveys remains the state-of-the-art for secondary analyses of these databases.
... At the same time, individual-level modeling in the context of complex adaptive systems and ecologies [7][8]48], in contrast, focuses on dynamically evolving interactions but fails to capitalize on relevant distinctions between statistical and scientific approaches to data, theory, and instruments [52]. There is a great need for this work to leverage recent and longstanding advances in the scientific quality of individual-level measurement in psychology and the social sciences [2][3][4][5][6][10][11][12][13][14][15][16][17][18][19][20][21][25][26][31][32][33][34][35][36][38][39][40][41][42][43][44][45][46][52][53][54][55][56][57][58] offering inferentially separable model parameters, minimally sufficient statistics in estimation, theory-informed instrument design, and experimental tests of the hypothesis of an additive unit. Agent-based models [7][8] should be integrated with advanced multi-unidimensional, multifaceted, multilevel, and growth variations on stochastic measurement models and methods [44,[53][54][55][56][57][58]. ...
... There is a great need for this work to leverage recent and longstanding advances in the scientific quality of individual-level measurement in psychology and the social sciences [2][3][4][5][6][10][11][12][13][14][15][16][17][18][19][20][21][25][26][31][32][33][34][35][36][38][39][40][41][42][43][44][45][46][52][53][54][55][56][57][58] offering inferentially separable model parameters, minimally sufficient statistics in estimation, theory-informed instrument design, and experimental tests of the hypothesis of an additive unit. Agent-based models [7][8] should be integrated with advanced multi-unidimensional, multifaceted, multilevel, and growth variations on stochastic measurement models and methods [44,[53][54][55][56][57][58]. These features of precision measurement could be implemented as decision supports in broad social ecologies composed of networks of various stakeholders pursuing separate but related interests in different facets of ostensibly the same boundary objects. ...
Article
Full-text available
Five moments in the formation and functioning of complex adaptive systems are: (1) emergent regularities and patterns in the flow of matter, energy, and/or information; (2) condensed schematic representations of these regularities enabling their identification; (3) reproductively interchangeable variants of these representations serving as templates for new instances of the pattern; (4) successful reproduction facilitated by the accuracy and reliability of the representations’ predictions of data flow regularities; and (5) informational feedback that adaptively modifies and reorganizes representations to incorporate new variations in the data flow, cycling back the first moment. These five moments are instantiated via stochastic models providing practical approaches to representing and managing complex adaptive psychological and social systems in education, health care, human resource management, etc. Local independence, unidimensionality, and statistical sufficiency criteria function as means of identifying, evaluating, and deploying conceptual and social forms of life acting as evolving agents in defined ecological niches. Bringing these agents into play systematically requires embodying them in technologies instrumental to making them readily recognizable and sharable across ecosystem niches. Modeling research and practice promoting sustainable and self-organizing ecosystems of this kind set the stage for redefining profit in terms of authentic wealth and value for life.
... The unidimensional model was tested in R package Test Analysis Modules (TAM; Kiefer et al., 2017) using marginal maximum likelihood estimation; multidimensional models in R package supplementary item response theory (sirt; Robitzsch & Robitzsch, 2020) also with marginal maximum likelihood estimation. For all models we report the number of 'steps' (six per item, given the seven-point response scale) with mean square weighted fit outside a ¾ -4/3 tolerance (Adams et al., 1997;Adams & Khoo, 1996). Following those authors, our a priori standard was to avoid more than 5% misfit, as this would indicate that too many items correspond poorly to others in the set. ...
Article
Full-text available
The International Mental Health Assessment (IMHA) was developed to provide efficient screening to facilitate prevention and early intervention among employees or community adults at three levels of analysis: a P-factor of general functioning and tendency toward disorder; broad spectra of internalizing and externalizing tendencies and for life difficulties; and nine subscales for common, familiar psychological and behavioral health categories. This study describes the development, refinement, and validation of the inventory using item response theory (IRT), specifically the partial credit model (PCM). Explicit, behavior-focused items drew on commonalities among domain-specific inventories, the DSM-V and empirical literature. A response scale based on concrete frequency of occurrence over the last month was developed to avoid the reference-group effects that plague cross-group survey research, facilitating cross-group comparison at both scale and item levels. In Study 1, a preliminary 69-item version was administered to 5,307 employees, family members, and counseling clients. PCM calibration was used to remove items with overlapping discrimination or unclear scale correspondence. In Study 2, the refined 59-item IMHA was administered to 4,048 employees. In Study 3, the subscales were compared to relevant established inventories to assess and confirm their convergent/divergent validity in a third sample (N = 500). The final 54-item IMHA, intended both for screening for psychological problems among community adults and to facilitate research including cross-cultural and cross-group comparisons, is made available freely for educational, non-profit or research purposes. The three-level measurement strategy draws on recent evidence for the continuous nature of psychopathology and on the well-established co-morbidity of traditional disorder categories, making use of them for communication purposes without unnecessarily reifying them in the model.
... Explanatory IRT models may also include person properties to explain latent trait differences among person groups (e.g., gender). These models are typically referred to as "person explanatory models" or latent regression models [14]. A person property can be either a categorical (e.g., gender, ethnicity, socio-economic status) or continuous (e.g., age, persons' levels in a different latent trait) variable. ...
Article
Full-text available
Explanatory item response modeling (EIRM) enables researchers and practitioners to incorporate item and person properties into item response theory (IRT) models. Unlike traditional IRT models, explanatory IRT models can explain common variability stemming from the shared variance among item clusters and person groups. In this tutorial, we present the R package eirm, which provides a simple and easy-to-use set of tools for preparing data, estimating explanatory IRT models based on the Rasch family, extracting model output, and visualizing model results. We describe how functions in the eirm package can be used for estimating traditional IRT models (e.g., Rasch model, Partial Credit Model, and Rating Scale Model), item-explanatory models (i.e., Linear Logistic Test Model), and person-explanatory models (i.e., latent regression models) for both dichotomous and polytomous responses. In addition to demonstrating the general functionality of the eirm package, we also provide real-data examples with annotated R codes based on the Rosenberg Self-Esteem Scale.
... In order to analyze nested item response data, a class of modeling framework, called multilevel item response theory (IRT) modeling has been proposed by many researchers. Two-level (itemperson) IRT models were introduced by Adams, Wilson, and Wu (1997) and they were extended to three levels (item-person-cluster) by Kamata (1998; within the hierarchical generalized linear modeling (HGLM) framework. Bayesian estimation for these models was developed by Fox (2001) and Maier (2001). ...
Article
Full-text available
Within-cluster variance homogeneity is one of the key assumptions of multilevel models; however, assuming a constant (i.e. equal) within-cluster variance may not be realistic. Moreover, existent within-cluster variance heterogeneity should be regarded as a source of additional information rather than a violation of a model assumption. This study extends the three-level Rasch item response model to estimate cluster-specific variances as random effects adopting the Bayesian approach. Data analysis results provided empirical evidence for the possible violations of the within-cluster variance heterogeneity, as well as the utility of the proposed heterogeneous model. A small-scale simulation study conducted to provide information about the estimation efficiency of the model parameters with varying degrees of within-cluster variance heterogeneity.
... Second, the data in educational LSAs often have a multilevel structure that results from students being clustered in schools. This structure needs to be taken into account in the scaling of the proficiency data (Adams et al., 1997;Adams & Wu, 2007;Li et al., 2009), the imputation of missing background data (Enders et al., 2016;Lüdtke et al., 2017), and secondary analyses (Monseur & Adams, 2009). By contrast, if the multilevel structure is represented with fixed effects, such as in PISA, then the methods considered in this article can be applied directly by including an additional set of indicator variables to represent school membership in the imputation models for missing data and the PVs (see the example with the PISA 2015 data). ...
Article
Full-text available
Large-scale assessments (LSAs) use Mislevy’s “plausible value” (PV) approach to relate student proficiency to noncognitive variables administered in a background questionnaire. This method requires background variables to be completely observed, a requirement that is seldom fulfilled. In this article, we evaluate and compare the properties of methods used in current practice for dealing with missing data in background variables in educational LSAs, which rely on the missing indicator method (MIM), with other methods based on multiple imputation. In this context, we present a fully conditional specification (FCS) approach that allows for a joint treatment of PVs and missing data. Using theoretical arguments and two simulation studies, we illustrate under what conditions the MIM provides biased or unbiased estimates of population parameters and provide evidence that methods such as FCS can provide an effective alternative to the MIM. We discuss the strengths and weaknesses of the approaches and outline potential consequences for operational practice in educational LSAs. An illustration is provided using data from the PISA 2015 study.
... Further bibliometric evidence shows that Rasch measurement theory continues to influence measurement research as provided by Aryadoust and Tan (2019) There have been numerous extensions to Rasch models. Here is a partial list of the extensions: (a) mixed Rasch model (Rost 1990), (b) multilevel Rasch measurement model (Adams et al. 1997b), and (c) multidimensional random coefficients multinomial logit models (Adams et al. 1997a). Since research continues on extensions to Rasch measurement theory, this list should be considered incomplete. ...
Chapter
The purpose of this paper is to identify and describe the key concepts of Rasch measurement theory (Rasch G, Probabilistic models for some intelligence and attainment tests. Danish Institute for Educational Research, Copenhagen. (Expanded edition, Chicago: University of Chicago Press, 1980), 1960/1980). There have been several taxonomies describing item response theory (Kim S-H et al., A taxonomy of item response models in Psychometrika. In: Wiberg M, Culpepper S, Janssen R, Gonzáles J, Molenaar D (eds) Quantitative psychology: 83rd annual meeting of the Psychometric Society. Springer, New York City, pp 13–23, 2019; Thissen D, Steinberg L, Psychometrika 51:567–577, 1986; Wright BD, Masters GN, Rating scale analysis: Rasch measurement. MESA Press, Chicago, 1982), and this paper extends these ideas with a specific focus on Rasch measurement theory. Rasch’s measurement work reflects a key milestone in a paradigmatic shift from classical test theory to item response theory (van der Linden WJ, Handbook of item response theory, volume 1: models. CRC Press, Boca Raton, 2016). We include a categorization of measurement models that are commonly viewed as Rasch models (dichotomous, rating scale, partial credit, and many-faceted), as well as extensions of these models (mixed, multilevel, multidimensional, and explanatory models). Georg Rasch proposed a set of principles related to objectivity and invariance that reflect foundational concepts underlying science. Rasch measurement theory is the application of these foundational concepts to measurement. Concept maps provide useful didactic tools for understanding progress in measurement theory in the human sciences, and also for appreciating Rasch’s contributions to current theory and practice in psychometrics.
... The measurement models used within each sample were incomplete in that they did not account for potential residual relationships between items after controlling for the latent variable (Cai, Yang, & Hansen, 2011;Gibbons et al., 2007;Marsh, 1989) and they did not account for the varying and often complicated sampling methods used in the component studies listed in Table 1 (Adams, Wilson, & Wu, 1997;Fragoso, de Andrade, & Soler, 2014;Pastor, 2003). In order to do so, the alignment method would need to be validated, both theoretically and empirically, in these contexts. ...
Article
Large-scale studies spanning diverse project sites, populations, languages, and measurements are increasingly important to relate psychological to biological variables. National and international consortia already are collecting and executing mega-analyses on aggregated data from individuals, with different measures on each person. In this research, we show that Asparouhov and Muthén’s alignment method can be adapted to align data from disparate item sets and response formats. We argue that with these adaptations, the alignment method is well suited for combining data across multiple sites even when they use different measurement instruments. The approach is illustrated using data from the Whole Genome Sequencing in Psychiatric Disorders consortium and a real-data-based simulation is used to verify accurate parameter recovery. Factor alignment appears to increase precision of measurement and validity of scores with respect to external criteria. The resulting parameter estimates may further inform development of more effective and efficient methods to assess the same constructs in prospectively designed studies.
... Although some researches (Adams, Wilson & Wu, 1997;Mislevy, 1987;Rijmen, Tuerlinckx, De Boeck & Kuppens, 2003) have shown examples of univariate approach of using person covariates in the latent regression. A few studies have focused estimation parameters jointly with CI. ...
... mehrdimensionalen Kompetenzstrukturmodelle wird die multidimensionale IRT (Hartig und Höhler 2009) verwendet. Diese nimmt an, dass die latenten Fähigkeitsmerkmale einer Person ihre Reaktion auf Testaufgaben determiniert, da die Lösewahrscheinlichkeit eines Items als Wahrscheinlichkeitsfunktion der Differenz der latenten Personeneigenschaft und der Itemschwierigkeiten beschrieben werden kann (Rost 2004;Bond und Fox 2007 Adams et al. (1997b) sowie Adams und Wu (2007) verwenden. ...
Article
Visualisierungskompetenz ist für den lernförderlichen Nutzen visuell abgebildeter und abbildbarer Informationen bei schulischen Lernprozessen essentiell. Mit fachlicher Anlehnung an den Deutsch- und Mathematikunterricht der Sekundarstufe I wird ein Strukturmodell von Visualisierungskompetenz operationalisiert und validiert. Das Modell und das psychometrische Instrument fokussieren das Inhaltsspektrum für die Fächer Deutsch und Mathematik in schulischen Kontexten. In einer empirischen Studie mit N = 1937 Schülerinnen und Schülern der 7. Klassenstufe aus 83 Schulklassen an 11 Gymnasien und 13 Realschulen wurden I = 208 Items bearbeitet. Strukturanalysen mittels Verfahren der Item-Response-Theorie belegen im Einklang mit der theoretischen Fundierung eine mehrdimensionale Kompetenzstruktur, die neben einer rezeptiven und einer produktiven Komponente auch detailliertere Facetten (Erkennen, Verstehen, Verknüpfen und Generieren) enthält. Die Befunde werden in den Kontext eines Diagnoseinstrumentariums und gezielter Kompetenzförderungsmaßnahmen gestellt.
... The main challenge of having IRT θ score as dependent variable is that the measurement error inθ is heteroscedastic with its variance depending on true θ . With the growing computational power nowadays, a recommended approach to address the measurement error challenge is to use an integrated multilevel IRT model (Adams et al., 1997;Fox & Glas, 2001Kamata, 2001;Pastor & Beretvas, 2006;Wang, Kohli, & Henn, 2016) such that all model parameters are estimated simultaneously. This unified one-stage approach incorporates the standard errors of the latent trait estimates into the total variance of the model, avoiding the possible bias when using the estimated θ as the dependent variable in subsequent analysis. ...
Article
Full-text available
When latent variables are used as outcomes in regression analysis, a common approach that is used to solve the ignored measurement error issue is to take a multilevel perspective on item response modeling (IRT). Although recent computational advancement allows efficient and accurate estimation of multilevel IRT models, we argue that a two-stage divide-and-conquer strategy still has its unique advantages. Within the two-stage framework, three methods that take into account heteroscedastic measurement errors of the dependent variable in stage II analysis are introduced; they are the closed-form marginal MLE, the expectation maximization algorithm, and the moment estimation method. They are compared to the naïve two-stage estimation and the one-stage MCMC estimation. A simulation study is conducted to compare the five methods in terms of model parameter recovery and their standard error estimation. The pros and cons of each method are also discussed to provide guidelines for practitioners. Finally, a real data example is given to illustrate the applications of various methods using the National Educational Longitudinal Survey data (NELS 88).
... Such models correspond to confirmatory factor analysis based on Rasch scaled variables, whose estimated disattenuated variance and covariance parameters yield attenuation corrected correlations. An introduction to unidimensional Rasch models can be found in Lange (2017), and we refer the reader to Wu et al. (2007, Section 3: "Technical Matters") or Adams, Wilson, and Wu (1997) for a description of multivariate extensions. We note that the simulations described in this last paper indicated that Conquest's parameter estimates are generally superior to those obtained via competing approaches. ...
Article
Full-text available
We examined construct and discriminant validities of the Revised Transliminality Scale (RTS) using a large sample (n = 577) of undergraduates who completed the RTS along with established measures of mystical experiences, schizotypy, dissociation, absorption, various phenomenological aspects of consciousness, positive and negative affect, anxiety, mental imagery, social desirability, and depression. Multivariate Rasch modeling was used to establish variables’ measurement validity and to obtain direct (attenuation-corrected) correlations between the factors. As expected, we found positive moderate-to-strong correlations between the RTS and measures of absorption, mystical-type and altered experiences, visual imagery, internal dialogue, dissociation, and schizotypy, as well as positive (albeit weaker) correlations with disruptions to memory and attention, heightened arousal, and lower self-awareness and volitional control. Support for the RTS’ discriminant validity was found with lower order and mixed correlations between the RTS and hypothesized reactions to transliminal phenomena, such as social desirability, depression, positive and negative affect, and state and trait anxiety. Taken together, the cumulative findings support the use of the RTS in consciousness research and are consistent with the hypothesis that transliminality is an expression of neuroplasticity. The patterns of disattenuated correlations also suggested different directions for future research than did the (standard) attenuated correlations.
... 55 Similar models have been proposed under the names graded response model (Samejima 1969) and partial credit models (Andrich 1978) without the heteroskedastic component. 56 Adams, Wilson, and Wu 1997. individual, the structural model of the latent variable and the variance for each individual. ...
Article
When citizens hold multiple values relevant to their policy opinions, they might experience value conflict, value reconciliation or make a value trade-off. Yet, it is unclear which individuals are able to manage their multiple values in these ways. We posit a sophistication-interaction theory of value pluralism where the most politically sophisticated individuals are able to reconcile the existence of multiple values, thus increasing the stability of their policy opinions. We test this hypothesis using a series of heteroskedastic graded item response theory models of public opinion toward policies related to climate change. We find that people structure their policy preferences toward climate change policies in values toward the environment and the economy, but only the most sophisticated citizens are able to reconcile the potential conflict between these values.
... This resulted in defining and quantifying several constructs and indicators related to self-reflection. For example, continuous scale measures were constructed using multidimensional item response modeling (Adams, Wilson & Wu, 1997;Adams & Wu, 2007;Kiefer, Robitzch, & Wu, 2016). Among the many benefits of the multidimensional item response modeling is that it can provide best estimates of the construct after taking into account the varying characteristics of items and the measurement errors. ...
Article
According to recent reports, K-12 full-time virtual school students have shown lower performance in math than their counterparts in regular brick-and-mortar schools. However, research is lacking in what kind of programmatic interventions virtual schools might be particularly well-suited to provide to improve math learning. Engaging students in self-reflection is a potentially promising pedagogical approach for supporting math learning. Nonetheless, it is unclear how models for math learning in regular classrooms translate in an online environment. The purpose of this study was to (a) analyze rich assessment data from virtual schools to explore the association between self-reflection and math performance, (b) compare the patterns found in student self-reflection across elementary, middle, and high school levels, and (c) examine whether providing opportunities for self-reflection had positive impact on learning in a virtual learning environment. In this study, the self-reflection assessments were developed and administered multiple times within several math courses during the 2014-2015 school year. These assessments included 4-7 questions that ask students to reflect on their understanding of the knowledge and skills they learned in the preceding lessons and units. Using these assessments, multiple constructs and indicators were measured, which include confidence about the topic knowledge/understanding, general feelings towards math, accuracy of self-judgment against actual test performance, and frequency of self-reflection. Through a series of three retrospective studies, data were collected from full-time virtual school students who took three math courses (one elementary, one middle, and one high school math course) in eight virtual schools in the United States during the 2013-2014 and 2014-2015 school years. The results showed that (a) participation in self-reflection varied by grade, unit test performance level, and course/topic difficulty; (b) more frequent participation in self-reflection and higher self-confidence level were associated with higher final course performance; and (c) self-reflection, as was implemented here, showed limited impact for more difficult topics, higher grade courses, and higher performing students. Implications for future research are provided.
... A joint model for measurement and structural analysis was developed by Muthén (1979) and has been further examined in Zwinderman (1991) and Adams, Wilson, and Wu (1997). In these publications, a multivariate regression equation is used to model the relationship between the latent trait and additional person covariates. ...
Thesis
Large-scale studies in social sciences often involve the measurement of latent constructs and seek to investigate their relationship with additional variables in subsequent analyses. Within this context the analyst has to face three problems: First, there is uncertainty through the particular indicators which measure the trait of interest. Second, large-scale studies typically exhibit hierarchical structures caused by sampling design or a composite population consisting of clustered observations. Third, uncertainty arises due to the presence of missing values in covariates related to the latent construct. This thesis provides a Bayesian estimation strategy that simultaneously addresses all three issues. I start out with the class of latent regression item response models, which combine the fields of measurement models and structural analysis, and develop a novel algorithm based on the device of data augmentation. Binary and ordered polytomous items can both be included in the analysis. Population heterogeneity is taken into account either through multigroup, finite mixture or random intercept specifications. Sampling from the posterior distribution of parameters is enriched by sampling from the full conditional distributions of missing values in person covariates. Approximations for the distributions of missing values are constructed from classification and regression trees, thus allowing for high flexibility in the incorporation of metric as well as categorical variables and nonlinear relationships. The validity of the proposed strategy is evaluated with respect to statistical accuracy by two simulation studies controlling the missing data generating mechanism. I show that the novel algorithm is capable of recovering all involved parameters in each of the two scenarios and clearly outperforms stochastic regression imputation and complete cases analysis. Two illustrations using data from the National Educational Panel Study on mathematical abilities and eating disorders of ninth grade students demonstrate the empirical usefulness of the method. Finally, I introduce an R package which implements the estimation routines presented in the thesis.
... where f (θ) is a population or structural model for the latent variable θ (Adams, Wilson, & Wu, 1997). This structural model is typically assumed to be a normal distribution. ...
Article
Full-text available
In recent years, network models have been proposed as an alternative representation of psychometric constructs such as depression. In such models, the covariance between observables (e.g., symptoms like depressed mood, feelings of worthlessness, and guilt) is explained in terms of a pattern of causal interactions between these observables, which contrasts with classical interpretations in which the observables are conceptualized as the effects of a reflective latent variable. However, few investigations have been directed at the question how these different models relate to each other. To shed light on this issue, the current paper explores the relation between one of the most important network models -the Ising model from physics- and one of the most important latent variable models -the Item Response Theory (IRT) model from psychometrics. The Ising model describes the interaction between states of particles that are connected in a network, whereas the IRT model describes the probability distribution associated with item responses in a psychometric test as a function of a latent variable. Despite the divergent backgrounds of the models, we show a broad equivalence between them and also illustrate several opportunities that arise from this connection.
... Open-response items were coded on the basis of the TEDS-M coding manual. Inter-rater reliability of coding was good (Cohen's Kappa M = .80). 2 GPK test data were IRT scaled using ConQuest (Wu, Adams, & Wilson, 1997). First, we analyzed test data in the onedimensional Rasch model (1 parameter model) from each of the two occasions of measurement, separately examining the invariance of the two item parameter sets deriving from each scaling analysis. ...
Article
Full-text available
Empirical studies in higher education are needed that systematically connect program characteristics to program outcomes. We therefore examine the effects of opportunities to learn in teacher preparation on future teachers’ general pedagogical knowledge. A sample of 1347 student teachers from 37 teacher preparation programs in 18 universities and pedagogical colleges in Germany and Austria with two time points is used. Results using hierarchical linear modeling show that measures of learning opportunities related to pedagogical content and teaching practice influence the gain in knowledge. Whereas measures for pedagogical content related to areas of didactics (adaptivity in teaching, structuring lessons) show effects on the knowledge gain both on the individual and on the program level, teaching practice measures related to in-school opportunities to learn have effects only on the individual level of future teachers. Implications for the effectiveness of teacher preparation and research suggestions are discussed.
... We estimate the abilities of students in this complex context of observations of students working within groups using a generalized multilevel model (Adams, Wilson, & Wu, 1997;Kang & Wilson, 2003;Wilson, 1999). As a start, a simple Rasch model is the following: ...
Article
This article summarizes assessment of cognitive skills through collaborative tasks, using field test results from the Assessment and Teaching of 21st Century Skills (ATC21S) project. This project, sponsored by Cisco, Intel, and Microsoft, aims to help educators around the world enable students with the skills to succeed in future career and college goals. In this article, ATC21S collaborative assessments focus on the project's “ICT Literacy—Learning in digital networks” learning progression. The article includes a description of the development of the learning progression, as well as examples and the logic behind the instrument construction. Assessments took place in random pairs of students in a demonstration digital environment. Modeling of results employed unidimensional and multidimensional item response models, with and without random effects for groups. The results indicated that, based on this data set, the models that take group into consideration in both the unidimensional and the multidimensional analyses fit better. However, the group-level variances were substantially higher than the individual-level variances. This indicates that a total individual estimate of group plus individual is likely a more informative estimate than individual alone but also that the performances of the pairs dominated the performances of the individuals. Implications are discussed in the results and conclusions.
... Finally, several questions remain regarding the detection properties of item misfit statistics for IRT models, especially for the newly proposed PV À Q 1 and PV À Q à 1 statistics. Future research should investigate how these statistics perform when fitting polytomous IRT models, the consequences of including multiple items that contain misfit, the effects of temporarily removing misfitting items to improve the u à imputations (specifically for power conditions), the effects of varying the shape of the latent trait distribution used to generate the data, how improving the precision ofû by including fixed-effect covariants (Adams, Wilson, & Wu, 1997;Chalmers, 2015) affects the power to detect misfit and Type I error control, and so on. These and other research areas should also be investigated for competing item-fit statistics to determine their general robustness and efficiency so that practitioners can make informed decisions regarding which statistics they should adopt in their item analysis work. ...
Article
Full-text available
When tests consist of a small number of items, the use of latent trait estimates for secondary analyses is problematic. One area in particular where latent trait estimates have been problematic is when testing for item misfit. This article explores the use of plausible-value imputations to lessen the severity of the inherent measurement unreliability in shorter tests, and proposes a parametric bootstrap procedure to generate empirical sampling characteristics for null-hypothesis tests of item fit. Simulation results suggest that the proposed item-fit statistics provide conservative to nominal error detection rates. Power to detect item misfit tended to be less than Stone's χ 2 * item-fit statistic but higher than the S - X 2 statistic proposed by Orlando and Thissen, especially in tests with 20 or more dichotomously scored items.
Article
Full-text available
Item response theory (IRT) has evolved as a standard psychometric approach in recent years, in particular for test construction based on dichotomous (i.e., true/false) items. Unfortunately, large samples are typically needed for item refinement in unidimensional models and even more so in the multidimensional case. However, Bayesian IRT approaches with hierarchical priors have recently been shown to be promising for estimating even complex models in small samples. Still, it may be challenging for applied researchers to set up such IRT models in general purpose or specialized statistical computer programs. Therefore, we developed a user-friendly tool – a SAS macro called HBMIRT – that allows to estimate uni- and multidimensional IRT models with dichotomous items. We explain the capabilities and features of the macro and demonstrate the particular advantages of the implemented hierarchical priors in rather small samples over weakly informative priors and traditional maximum likelihood estimation with the help of a simulation study. The macro can also be used with the online version of SAS OnDemand for Academics that is freely accessible for academic researchers.
Article
Full-text available
This article considers an analytic strategy for measuring and modeling child and adolescent problem behaviors. The strategy embeds an item response model within a hierarchical model to define an interval scale for the outcomes, to assess dimensionality, and to study how individual and contextual factors relate to multiple dimensions of problem behaviors. To illustrate, the authors analyze data from the primary caregiver ratings of 2,177 children aged 9–15 in 79 urban neighborhoods on externalizing behavior problems using the Child Behavior Checklist 4–18 (T. M. Achenbach, 1991a). Two subscales, Aggression and Delinquency, are highly correlated, and yet unidimensionality must be rejected because these subscales have different associations with key theoretically related covariates.
Article
Full-text available
Two models that can be used for exploratory factor analysis of items with a dichotomous response format are discussed: threshold models and multidimensional item response models. The models arise from different traditions: The threshold model is rooted in the factor analytic tradition, the multidimensional item response model had its foundation in item response theory. Despite the different origins, it can be proved that both models are the same. Subsequently, the generalized multidimensional Rasch model is introduced. This model can be used for confirmatory factor analysis of items with a dichotomous response format. Stated otherwise, it is the confirmatory counterpart of the (exploratory) threshold and multidimensional item response models.
Article
Joint models and statistical inference for longitudinal and survival data have been an active area of statistical research and have mostly coupled a longitudinal biomarker-based mixed-effects model with normal distribution and an event time-based survival model. In practice, however, the following issues may standout: (i) Normality of model error in longitudinal models is a routine assumption, but it may be unrealistically violating data features of subject variations. (ii) Data collected are often featured by the mixed types of multiple longitudinal outcomes which are significantly correlated, ignoring their correlation may lead to biased estimation. Additionally, a parametric model specification may be inflexible to capture the complicated patterns of longitudinal data. (iii) Missing observations in the longitudinal data are often encountered; the missing measures are likely to be informative (nonignorable) and ignoring this phenomenon may result in inaccurate inference. Multilevel item response theory (MLIRT) models have been increasingly used to analyze the multiple longitudinal data of mixed types (ie, continuous and categorical) in clinical studies. In this article, we develop an MLIRT-based semiparametric joint model with skew-t distribution that consists of an extended MLIRT model for the mixed types of multiple longitudinal data and a Cox proportional hazards model, linked through random-effects. A Bayesian approach is employed for joint modeling. Simulation studies are conducted to assess performance of the proposed models and method. A real example from primary biliary cirrhosis clinical study is analyzed to estimate parameters in the joint model and also evaluate sensitivity of parameter estimates for various plausible nonignorable missing data mechanisms.
Article
Full-text available
Despite the need to foster pre-service teacher competence with respect to information and communication technology (ICT) integration in school during the current era of digitalization, scientific understanding of the correlation between the relevant characteristics of teacher education programs and student teachers’ learning outcomes remains limited. This paper thus examines the relationship between student teachers’ opportunities to learn (OTL) and technological pedagogical knowledge (TPK) with the aim of obtaining insights into their learning processes and the effectiveness of teachers’ preparation upon completing their bachelor studies. A sample of 338 student teachers in their 6th semester at the University of Cologne was used. Findings from path modeling reveal that measures of OTL relate to TPK. While no direct effect of technological pedagogical OTL on TPK was identified, an indirect effect between conventional pedagogical OTL and TPK, mediated by student teachers’ general pedagogical knowledge (GPK) was found. Among the personal factors that affect student teachers, their motivation for using ICT reveals a direct effect on TPK. Further factors, such as gender and teacher education program type have no effect on TPK. The findings will be discussed in relation to expectations of teacher education effectiveness.
Article
Full-text available
The measurement of latent traits and investigation of relations between these and a potentially large set of explaining variables is typical in psychology, economics, and the social sciences. Corresponding analysis often relies on surveyed data from large-scale studies involving hierarchical structures and missing values in the set of considered covariates. This paper proposes a Bayesian estimation approach based on the device of data augmentation that addresses the handling of missing values in multilevel latent regression models. Population heterogeneity is modeled via multiple groups enriched with random intercepts. Bayesian estimation is implemented in terms of a Markov chain Monte Carlo sampling approach. To handle missing values, the sampling scheme is augmented to incorporate sampling from the full conditional distributions of missing values. We suggest to model the full conditional distributions of missing values in terms of non-parametric classification and regression trees. This offers the possibility to consider information from latent quantities functioning as sufficient statistics. A simulation study reveals that this Bayesian approach provides valid inference and outperforms complete cases analysis and multiple imputation in terms of statistical efficiency and computation time involved. An empirical illustration using data on mathematical competencies demonstrates the usefulness of the suggested approach.
Article
Full-text available
Lesson planning is an essential part of teachers’ daily work. In this study, we focus on structuring as an aspect of lesson planning, which generally can be defined as a clear, recognizable organization of instruction into individual phases and segments in which the teacher gradually builds up the complexity of the knowledge to be acquired and ensures a smooth flow of instruction through appropriate sequencing. In a previous study (Krepf and König in press), we conceived structuring as an aspect of lesson planning. To test the validity and reliability of this study’s findings, a scaling-up study was conducted to determine whether structuring as an aspect of planning could be modelled reliably using a different and larger sample. In this study, 310 written lesson plans created by pre-service teachers during induction (172 at T1 [first lesson plan]; 138 at T2 [last lesson plan/state examination]) from North Rhine-Westphalia (NRW) and Berlin derived from the PlanvoLL‑D project (König et al. 2020a, 2020b) comprised the study’s data. The lesson plans were evaluated through content analysis using deductively formed categories. Afterward, the coding was quantified and analyzed using item response theory (IRT) scaling. The results indicated that two subscales could be separated in terms of content: a “contextualization” scale and a “phasing” scale. Furthermore, three explication levels could be distinguished. Measures of lesson structure planning increased during induction significantly with practical relevance. This study contributes to the research on modelling and measuring pre-service teachers’ planning competence.
Article
Full-text available
This study suggests a comprehensive conceptualization of teacher knowledge for teaching early literacy in primary schools. Following the discourse on the professional knowledge of teachers, we argue that teachers’ knowledge relevant to support reading and writing at the beginning of primary school education is multidimensional by nature: Teachers need content knowledge (CK), pedagogical content knowledge (PCK), and general pedagogical knowledge (GPK). Although research on teacher knowledge has made remarkable progress over the last decade, and in particular in domains such as mathematics, relevant empirical research using standardized assessment that would allow in-depth analyses of how teacher knowledge is acquired by pre-service teachers during teacher education and how teacher knowledge influences instructional quality and student learning in early literacy is very scarce. The following research questions are focused on: (1) Can teachers’ professional knowledge for teaching early literacy be conceptualized in terms of CK, PCK, and GPK allowing empirical measurement? (2) How do teachers acquire such knowledge during initial teacher education? (3) Is teachers’ professional knowledge a premise for instructional quality in teaching early literacy to students? We present the conceptualization of teacher knowledge for teaching early literacy in primary schools in Germany as the country of our study and specific measurement instruments recently developed by our research group. Assessment data of 386 pre-service teachers at different teacher education stages is used to analyze our research questions. Findings show (1) construct validity of the standardized tests related to the hypothesized structure, (2) curricular validity related to teacher education, and (3) predictive validity related to instructional quality. Implications for teacher education and the professional development of teachers are discussed.
Article
Collateral information has been used to address subpopulation heterogeneity and increase estimation accuracy in some large-scale cognitive assessments. The methodology that takes collateral information into account has not been developed and explored in published research with models designed specifically for noncognitive measurement. Because the accurate noncognitive measurement is becoming increasingly important, we sought to examine the benefits of using collateral information in latent trait estimation with an item response theory model that has proven valuable for noncognitive testing, namely, the generalized graded unfolding model (GGUM). Our presentation introduces an extension of the GGUM that incorporates collateral information, henceforth called Explanatory GGUM. We then present a simulation study that examined Explanatory GGUM latent trait estimation as a function of sample size, test length, number of background covariates, and correlation between the covariates and the latent trait. Results indicated the Explanatory GGUM approach provides scoring accuracy and precision superior to traditional expected a posteriori (EAP) and full Bayesian (FB) methods. Implications and recommendations are discussed.
Article
Full-text available
Objective: The field of implementation science emphasizes efficient and effective fidelity measurement for research outcomes and feedback to support quality improvement. This paper reports on such a measure for motivational interviewing (MI), developed with rigorous methodology and with diverse samples. Method: Using item response theory (IRT) methods and Rasch modeling, we analyzed coded (a) recordings (n = 99) of intervention sessions in a clinical trial of African American adolescents with obesity; (b) standard patient interactions (n = 370) in an implementation science study with youth living with HIV; and (c) standard patient interactions (n = 172) in a diverse community sample. Results: These methods yielded a reliable and valid 12-item scale on several indicators using Rausch modeling including single construct dimensionality, strong item-session maps, good rating scale functionality, and item fit after revisions. However, absolute agreement was modest. The 12 items yielded thresholds for 4 categories: beginner, novice, intermediate and advanced. Conclusions: The 12-item Motivational Interviewing Coach Rating Scale is the first efficient and effective fidelity measure appropriate with diverse ethnic groups, with interventions that are MI only or interventions that integrate MI with other interventions, and with adolescents and families as well as adults. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Article
Full-text available
In educational measurement, various methods have been proposed to infer student proficiency from the ratings of multiple items (e.g., essays) by multiple raters. However, suitable models quickly become numerically demanding or even unfeasible as separate latent variables are needed to account for local dependencies between the ratings of the same response. Therefore, in the present paper we derive a flexible approach based on Thurstone’s law of categorical judgment. The advantage of this approach is that it can be fit using weighted least squares estimation which is computationally less demanding as compared to most of the previous approaches in the case of an increasing number of latent variables. In addition, the new approach can be applied using existing latent variable modeling software. We illustrate the model on a real dataset from the Trends in International Mathematics and Science Study (TIMMSS) comprising ratings of 10 items by 4 raters for 150 subjects. In addition, we compare the new model to existing models including the facet model, the hierarchical rater model, and the hierarchical rater latent class model.
Technical Report
Full-text available
Uluslararası Öğrenci Değerlendirme Programı (PISA), katılımcı ülkelerin eğitim çıktıları hakkında bilgi edinilmesine duyulan ihtiyaç nedeniyle ortaya çıkmıştır. Söz konusu ihtiyaç çerçevesinde uygulanan PISA araştırmaları, farklı ülkelerde öğrenim gören öğrencilerin bilgi ve becerilerinin ayrıntılı olarak ölçülmesini hedeflemektedir. Böylece PISA, söz konusu bilgi ve becerilere öğrencilerin ne ölçüde sahip olduklarını belirlemeye yönelik bilgi sunmaya odaklanmaktadır. Edinilen bilgi, ülkelerin eğitim çıktıları arasında karşılaştırma yapabilme olanağı sağlamaktadır. Belirtilen bu özellikleri nedeniyle sadece PISA araştırmasına katılan ülkelerde değil, katılmayan ülkelerde de PISA araştırmalarının önem taşıdığı söylenebilir. Konunun önemi çerçevesinde bu rapor Hacettepe Üniversitesi Eğitim Fakültesi Dekanı başkanlığında, Eğitim Fakültesi Öğretim Elemanlarından oluşan bir çalışma grubu tarafından hazırlanmıştır. Söz konusu raporda PISA altı bölüm hâlinde incelenmiştir. Her bir bölümde PISA, çeşitli çalışmalardan yararlanılarak incelenmiş ve bir rapor hazırlanmıştır. Aşağıda, PISA ve Türkiye (2000 - 2018) raporunda yer alan bölümlerin öne çıkan özellikleri sunulmuştur
Preprint
Full-text available
IRT models are often applied when observed items are used to measure a unidimensional latent variable. Originally used in educational research, IRT models are now widely used when focus is on physical functioning or psychological well-being. Modern applications often need more general models, typically models for multidimensional latent variables or longitudinal models for repeated measurements. This paper describes a collection of SAS macros that can be used for fitting data to, simulating from, and visualizing longitudinal IRT models. The macros encompass dichotomous as well as polytomous item response formats and are sufficiently flexible to accommodate changes in item parameters across time points and local dependence between responses at different time points.
Preprint
Full-text available
IRT models are often applied when observed items are used to measure a unidimensional latent variable. Originally used in educational research, IRT models are now widely used when focus is on physical functioning or psychological well-being. Modern applications often need more general models, typically models for multidimensional latent variables or longitudinal models for repeated measurements. This paper describes a collection of SAS macros that can be used for fitting data to, simulating from, and visualizing longitudinal IRT models. The macros encompass dichotomous as well as polytomous item response formats and are sufficiently flexible to accommodate changes in item parameters across time points and local dependence between responses at different time points.
Article
In dieser Studie geht es um die Gegenüberstellung von Geschlechterunterschieden in Testleistungen, Notenzeugnissen und Kompetenzeinschätzungen auf Kompetenzrastern als alternatives Zeugnisformat. Ein besonderes Augenmerkt liegt auf der Bedeutung überfachlicher Kompetenzeinschätzungen für das Zustandekommen von Noten und fachlichen Kompetenzeinschätzungen. Zu diesem Zweck wurde eine Stichprobe von N = 469 Schülerinnen und Schülern der vierten Klasse in der Grundschule mit Daten zu Testleistungen in den Fächern Deutsch und Mathematik sowie Beurteilungen durch die Lehrkräfte auf Notenzeugnissen und Kompetenzrastern verwendet. Es zeigten sich weitgehend erwartungskonforme Geschlechterunterschiede in allen drei Bereichen. Als zentraler Befund zeigte sich, dass die überfachlichen Kompetenzeinschätzungen auch über die Testleistungen hinaus Noten und fachliche Kompetenzeinschätzungen vorhersagen und Geschlechterunterschiede erklären können. Außerdem erklärten spezifisch auf soziale Aspekte bezogene Einschätzungen einen signifikanten Anteil in den Noten, nicht aber in den fachlichen Kompetenzeinschätzungen.
Article
Multilevel bifactor item response theory (IRT) models are commonly used to account for features of the data that are related to the sampling and measurement processes used to gather those data. These models conventionally make assumptions about the portions of the data structure that represent these features. Unfortunately, when data violate these models' assumptions but these models are used anyway, incorrect conclusions about the cluster effects could be made and potentially relevant dimensions could go undetected. To address the limitations of these conventional models, a more flexible multilevel bifactor IRT model that does not make these assumptions is presented, and this model is based on the generalized partial credit model. Details of a simulation study demonstrating this model outperforming competing models and showing the consequences of using conventional multilevel bifactor IRT models to analyze data that violate these models' assumptions are reported. Additionally, the model's usefulness is illustrated through the analysis of the Program for International Student Assessment data related to interest in science.
Article
Based on the framework of testlet models, the current study suggests the Bayesian random block item response theory (BRB IRT) model to fit forced-choice formats where an item block is composed of three or more items. To account for local dependence among items within a block, the BRB IRT model incorporated a random block effect into the response function and used a Markov Chain Monte Carlo procedure for simultaneous estimation of item and trait parameters. The simulation results demonstrated that the BRB IRT model performed well for the estimation of item and trait parameters and for screening those with relatively low scores on target traits. As found in the literature, the composition of item blocks was crucial for model performance; negatively keyed items were required for item blocks. The empirical application showed the performance of the BRB IRT model was equivalent to that of the Thurstonian IRT model. The potential advantage of the BRB IRT model as a base for more complex measurement models was also demonstrated by incorporating gender as a covariate into the BRB IRT model to explain response probabilities. Recommendations for the adoption of forced-choice formats were provided along with the discussion about using negatively keyed items.
Article
The present study investigated the consequences of ignoring a nested data structure on the Rasch/one parameter item response theory model. Although most large-scale educational assessment data do exhibit a nested data structure, current practice often ignores such data structure and applies the standard Rasch/IRT models to conduct measurement analyses. We hypothesized that this practice would produce negative consequences on the item parameter estimates. Using simulation, we investigated this hypothesis by comparing the results from an incorrectly specified two level model which ignored the nested data structure to those from a correctly specified three-level hierarchical generalized linear model. Use of the incorrect two-level model did, in fact, result in negative consequences in estimating the standard errors, although the point estimates were unbiased and identical to the ones from the three-level analysis. A real data set from the IEA Civic education study in 1999 was used to illustrate the simulation results.
Article
While online platforms often provide a single composite rating and the ratings of different attributes of a product, they largely ignore the attribute characteristics and customer criticality, which limits managerial action. We propose a multi-facet item response theory (MFIRT) approach to simultaneously examine the effects of product attributes, reviewer criticality, consumption situation, product type, and time in assessing latent customer satisfaction. Analyses of hotel ratings from TripAdvisor and beer ratings from BeerAdvocate suggest that product attributes differ with respect to their discriminating and threshold characteristics and that reviewer segments emphasize different attributes when rating various products over time. The MFIRT approach predicts product performance more accurately than alternative methods and provides novel insights to inform marketing strategies. The MFIRT framework can fundamentally advance how we analyze customer satisfaction and other consumer attitudes and improve marketing research and practice.
Article
Multilevel Rasch models are increasingly used to estimate the relationships between test scores and student and school factors. Response data were generated to follow one-, two-, and three-parameter logistic (1PL, 2PL, 3PL) models, but the Rasch model was used to estimate the latent regression parameters. When the response functions followed 2PL or 3PL models, the proportion of variance explained in test scores by the simulated student or school predictors was estimated accurately with a Rasch model. Proportion of variance within and between schools was also estimated accurately. The regression coefficients were misestimated unless they were rescaled out of logit units. However, item-level parameters, such as DIF effects, were biased when the Rasch model was violated, similar to single-level models.
Article
Background and purpose: Data were analyzed from a national convenience sample of 3,000 bedside nurses in Lebanon to evaluate the psychometric characteristics of the first Arabic version of the Actual Scope of Nursing Practice (A-ASCOP) questionnaire. Methods: The method used in this study was application of the partial credit model using the multidimensional random coefficients multinomial logit model. Results: A-ASCOP subscales (r = .81-.94) and levels of item complexity (r = .95-.98) were highly correlated. Conclusions: As a 26-item scale, the A-ASCOP has high internal consistency (Cronbach's α = 0.93). The A-ASCOP subscales have acceptable multidimensional reliability Expected A-Posteriori/Plausible Value ratio (EAP/PV > .80) and are suitable for descriptive surveys. The three A-ASCOP levels of item complexity were not valid for this sample. We report norms for unidimensional Rasch and 0-100 transformed measures by category of participant.
Chapter
Research on differential item functioning (DIF) has focused traditionally on the detection of effects. However, recent studies have investigated potential sources of DIF, in an attempt to determine how or why it may occur. This study examines variability in item difficulty in math performance that is accounted for by gender, referred to as gender DIF, and the extent to which gender DIF is explained by both person predictors (opportunity to learn [OTL]) and item characteristics (item format). Cross-classified multilevel IRT models are used to examine the relationships among item difficulty, gender, OTL, and item format. Data come from the U.S. cohort of an international study of future math teachers, the Teacher Education and Development Study in Mathematics.
Article
Full-text available
Background: Longitudinal invariance is a perquisite for a valid comparison of oral health-related quality of life (OHRQoL) scores over time. Item response theory (IRT) models can assess measurement invariance and allow better estimation of the associations between predictors and latent construct. By extending IRT models, this study aimed to investigate the longitudinal invariance of the two 8-item short forms of the Child Perception Questionnaire (CPQ11-14) regression short form (RSF:8) and item-impact short form (ISF:8) and identify factors associated with adolescents' OHRQoL and its change. Methods: All students from S1 and S2 (equivalent to US grades 6 and 7) who were born in April 1997 and May 1997 (at age 12) from 45 randomly selected secondary schools were invited to participate in this study and followed up after 3 years. Data on the CPQ11-14 RSF:8 and CPQ11-14 ISF:8, demographics, oral health behavior and status were collected. Explanatory graded response models were fitted to both short forms of the CPQ11-14 data for assessing longitudinal invariance and factors associated with OHRQoL. The Bayesian estimation method - Monte Carlo Markov Chain (MCMC) with Gibbs sampling was adopted for parameter estimation and the credible intervals were used for inference. Results: Data from 649 children at age 12 at baseline and 415 children at age 15 at follow up were analyzed. For the 12 years old children, healthier oral health behavior, better gum status, families with both parents employed and parents' education level were found to be associated with better OHRQoL. Four items among the 2 short forms lacked longitudinal invariance. With statistical adjustment of longitudinal invariance, OHRQoL were found improved in general over the 3 years but no predictor was associated with OHRQoL in follow-up. For those with decreased family income, their OHRQoL had worsened over 3 years. Conclusions: IRT explanatory analysis enables a more valid identification of the factors associated with OHRQoL and its changes over time. It provides important information to oral healthcare researchers and policymakers.
Article
Full-text available
Valid inferences on teaching drawn from students’ test scores require that tests are sensitive to the instruction students received in class. Accordingly, measures of the test items’ instructional sensitivity provide empirical support for validity claims about inferences on instruction. In the present study, we first introduce the concepts of absolute and relative measures of instructional sensitivity. Absolute measures summarize a single item’s total capacity of capturing effects of instruction, which is independent of the test’s sensitivity. In contrast, relative measures summarize a single item’s capacity of capturing effects of instruction relative to test sensitivity. Then, we propose a longitudinal multilevel item response theory model that allows estimating both types of measures depending on the identification constraints.
Article
The study examines the connection between domain-specific learning opportunities in English as a foreign language (EFL) teacher preparation and preservice EFL teachers’ pedagogical content knowledge (PCK). Using a sample of 444 preservice EFL teachers for secondary schools, it contrasts groups at the end of the 2 phases required in German teacher preparation programs: a theoretical phase at university and a supervised professional internship at a school (practical phase). Specifically, it examines differences in learning opportunities (self-reports) and PCK (paper-and-pencil test results). Findings from regression analysis show that learning opportunity measures substantially predict PCK test scores. The article discusses the effectiveness of EFL teacher preparation programs for preservice teachers’ performance on PCK and concludes with possible interpretations and research suggestions.
Chapter
This chapter discusses four reasons that can be given for the use of item response theory (IRT). First, IRT models have been developed specifically to support the process of test development and construct validation. Second, IRT models facilitate the usage of the tests consisting of a number of rotated test forms within one assessment. Third, IRT supports the maintenance of scales that are comparable over setting and time. Fourth, IRT modelling in conjunction with multiple imputation methodology allows construction of performance indicators that are population focussed and deal appropriately with the random error that is associated with the latent measurement process and are amenable for statistical analysis. After a brief overview of the item response theory, the chapter discusses each of these four reasons for its use in large-scale assessments. It describes the way in which IRT is used in the field trial and survey stages of the Programme for International Student Assessment (PISA) project.
Article
Full-text available
The Fisher, or expected, information matrix for the parameters in a latent-variable model is bounded from above by the information that would be obtained if the values of the latent variables could also be observed. The difference between this upper bound and the information in the observed data is the "missing information." This paper explicates the structure of the expected information matrix and related information matrices, and characterizes the degree to which missing information can be recovered by exploiting collateral variables for respondents. The results are illustrated in the context of item response theory models, and practical implications are discussed.
Article
Full-text available
We review various models and techniques that have been proposed for item analysis according to the ideas of Rasch. A general model is proposed that unifies them, and maximum likelihood procedures are discussed for this general model. We show that unconditional maximum likelihood estimation in the functional Rasch model, as proposed by Wright and Haberman, is an important special case. Conditional maximum likelihood estimation, as proposed by Rasch and Andersen, is another important special case. Both procedures are related to marginal maximum likelihood estimation in the structural Rasch model, which has been studied by Sanathanan, Andersen, Tjur, Thissen, and others. Our theoretical results lead to suggestions for alternative computational algorithms.
Article
Full-text available
Recent advances in the statistical theory of hierarchical linear models should enable important breakthroughs in the measurement of psychological change and the study of correlates of change. A two-stage model of change is proposed here. At the first, or within-subject stage, an individual's status on some trait is modeled as a function of an individual growth trajectory plus random error. At the second, or between-subjects stage, the parameters of the individual growth trajectories vary as a function of differences between subjects in background characteristics, instructional experiences, and possible experimental treatments. This two-stage conceptualization, illustrated with data on Head Start children, allows investigators to model individual change, predict future development, assess the quality of measurement instruments for distinguishing among growth trajectories, and study systematic variation in growth trajectories as a function of background characteristics and experimental treatments. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Item response theory (IT) models are now in common use for the analysis of dichotomous item responses. This paper examines the sampling theory foundations for statistical inference in these models. The discussion includes: some history on the stochastic subject versus the random sampling interpretations of the probability in IRT models; the relationship between three versions of maximum likelihood estimation for IRT models; estimating versus estimating -predictors; IRT models and loglinear models; the identifiability of IRT models; and the role of robustness and Bayesian statistics from the sampling theory perspective.
Article
Full-text available
Standard procedures for estimating item parameters in item response theory (IRT) ignore collateral information that may be available about examinees, such as their standing on demographic and educational variables. This paper describes circumstances under which collateral information about examineesmay be used to make inferences about item parameters more precise, and circumstances under which itmust be used to obtain correct inferences.
Article
Full-text available
Two linearly constrained logistic models which are based on the well-known dichotomous Rasch model, the ‘linear logistic test model’ (LLTM) and the ‘linear logistic model with relaxed assumptions’ (LLRA), are discussed. Necessary and sufficient conditions for the existence of unique conditional maximum likelihood estimates of the structural model parameters are derived. Methods for testing composite hypotheses within the framework of these models and a number of typical applications to real data are mentioned.
Article
Full-text available
A rating response mechanism for ordered categories, which is related to the traditional threshold formulation but distinctively different from it, is formulated. In addition to the subject and item parameters two other sets of parameters, which can be interpreted in terms of thresholds on a latent continuum and discriminations at the thresholds, are obtained. These parameters are identified with the category coefficients and the scoring function of the Rasch model for polychotomous responses in which the latent trait is assumed uni-dimensional. In the case where the threshold discriminations are equal, the scoring of successive categories by the familiar assignment of successive integers is justified. In the case where distances between thresholds are also equal, a simple pattern of category coefficients is shown to follow.
Chapter
This chapter presents the item–response model and its use for educational testing. In a review of IRT methodology for educational assessment, Bock, Mislevy, and Woodson outline two approaches well-suited to the population focus of educational assessment. The first approach is the use of the more familiar person-level models, though bypassing the computation of person-level results by estimating item and population parameters directly from counts of response patterns. The second is the use of group-level models in narrowly-defined content areas. Both approaches have and disadvantages. The duplex model is a logical extension of the second approach if information about individuals in a more broadly-defined content area is also desired. It maintains the group-level model's advantage of maximum efficiency for group-level results, and imposes less a computational burden than the first approach. This is achieved at the cost of more restrictive distributional assumptions, such as homoscedasticity within groups and over time. Rather than estimating characteristics of the finite populations that groups constitute, the model presented in the chapter explains performance as a manifestation of processes under the control of a latent structure, and estimates the parameters that characterize the structure.
Article
Interviewer variability in a binary response is an example of a problem requiring variance component estimation in a non‐normal family. The maximum likelihood estimation procedure is derived and used to examine some binary items on a large questionnaire. This raises some interesting questions about the use of unbalanced ANOVA methods with these data.
Technical Report
The following values have no corresponding Zotero field: ID - 225
Article
Asymptotic corrections are used to compute the means and the variance-covariance matrix of multivariate posterior distributions that are formed from a normal prior distribution and a likelihood function that factors into separate functions for each variable in the posterior distribution. The approximations are illustrated using data from the National Assessment of Educational Progress (NAEP). These corrections produce much more accurate approximations than those produced by two different normal approximations. In a second potential application, the computational methods are applied to logistic regression models for severity adjustment of hospital-specific mortality rates.
Article
Conventional methods of multivariate normal analysis do not apply when the variables of interest are not observed directly but must be inferred from fallible or incomplete data. A method of estimating such effects by marginal maximum likelihood, implemented by means of an EM algorithm, is proposed. Asymptotic standard errors and likelihood ratio tests of fit are provided. The procedures are illustrated with data from the administration of the Armed Services Vocational Aptitude Battery to a probability sample of American youth.
Article
This article describes a marginal maximum likelihood (MML) estimation algorithm forWilson's (1990) ordered partition model (OPM), a measurement model that does not require the set of available responses to assessment tasks to be fully ordered. The model and its estimation algorithm are illustrated through the analysis of an example data set. In the example, we use the ordered partition model to compare a set of alternative scoring schemes for open-ended science items.
Article
As interest in quantitative research synthesis grows, investigators increasingly seek to use information about study features-study contexts, designs, treatments, and subjects-to account for variation in study outcomes. To facilitate analysis of diverse study findings, a mixed linear model with fixed and random effects is presented and illustrated with data from teacher expectancy experiments. This strategy enables the analyst to (a) estimate the variance of the effect size parameters by means of maximum likelihood; (b) pose a series of linear models to explain the effect parameter variance; (c) use information about study characteristics to derive improved empirical Bayes estimates of individual study effect sizes; and (d) examine the sensitivity of all substantive inferences to likely errors in the estimation of variance components.
Article
A hierarchical logistic regression model is proposed for studying data with group structure and a binary response variable. The group structure is defined by the presence of micro observations embedded within contexts (macro observations), and the specification is at both of these levels. At the first (micro) level, the usual logistic regression model is defined for each context. The same regressors are used in each context, but the micro regression coefficients are free to vary over contexts. At the second level, the micro coefficients are treated as functions of macro regressors. An empirical Bayes estimation procedure is proposed for estimating the micro and macro coefficients. Explicit formulas are provided that are computationally feasible for large-scale data analyses; these include an algorithm for finding the maximum likelihood estimates of the covariance components representing within— and between—macro-equation error variability. The methodology is applied to World Fertility Survey data, with individuals viewed as micro observations and countries as macro observations.
Article
A multidimensional Rasch-type item response model, the multidimensional random coefficients multinomial logit model, is presented as an extension to the Adams & Wilson (1996) random coefficients multinomial logit model. The model is developed in a form that permits generalization to the multidimensional case of a wide class of Rasch models, including the simple logistic model, Masters' partial credit model, Wilson's ordered partition model, and Fischer's linear logistic model. Moreover, the model includes several existing multidimensional models as special cases, including Whitely's multicomponent latent trait model, Andersen's multidimensional Rasch model for repeated testing, and Embretson's multidimensional Rasch model for learning and change. Marginal maximum likelihood estimators for the model are derived and the estimation is examined using a simulation study. Implications and applications of the model are discussed and an example is given.
Article
The precision of item parameter estimates can be increased by taking advantage of dependencies be tween the latent proficiency variable and auxiliary ex aminee variables such as age, courses taken, and years of schooling. Gains roughly equivalent to two to six additional item responses can be expected in typical educational and psychological applications. Empirical Bayesian computational procedures are presented and illustrated with data from the National Assessment of Educational Progress survey.
Article
An item response model, called the ordered partition model, is designed for a measurement context in which the categories of response to an item cannot be completely ordered. For example, two different solution strategies may lead to an equivalent degree of success because both strategies may result in the same score, but an examiner may want to maintain the distinction between the strategies. Thus, the data would not be nominal nor completely ordered, so may not be suitable for other polytomous item response models such as the partial credit or the graded response models. The ordered partition model is described as an extension of the partial credit model, its relationship to other models is discussed, and two examples are presented
Article
A method of estimating the parameters of the normal ogive model for dichotomously scored item-responses by maximum likelihood is demonstrated. Although the procedure requires numerical integration in order to evaluate the likelihood equations, a computer implemented Newton-Raphson solution is shown to be straightforward in other respects. Empirical tests of the procedure show that the resulting estimates are very similar to those based on a conventional analysis of item difficulties and first factor loadings obtained from the matrix of tetrachoric correlation coefficients. Problems of testing the fit of the model, and of obtaining invariant parameters are discussed.
Article
The Partial Credit model with a varying slope parameter has been developed, and it is called the Generalized Partial Credit model. The item step parameter of this model is decomposed to a location and a threshold parameter, following Andrich's Rating Scale formulation. The EM algorithm for estimating the model parameters was derived. The performance of this generalized model is compared with a Rasch family of polytomous item response models based on both simulated and real data. Simulated data were generated and then analyzed by the various polytomous item response models. The results obtained demonstrate that the rating formulation of the Generalized Partial Credit model is quite adaptable to the analysis of polytomous item responses. The real data used in this study consisted of NAEP Mathematics data which was made up of both dichotomous and polytomous item types. The Partial Credit model was applied to this data using both constant and varying slope parameters. The Generalized Partial Credit model, which provides for varying slope parameters, yielded better fit to data than the Partial Credit model without such a provision. Index terms: item response model polytomous item response model the Partial Credit model the Rating Scale model the Nominal Response model NAEP
Article
The search for appropriate statistical methods for hierarchical, multilevel data has been a prominent theme in educational statistics over the past 15 years. As a result of this search, an important class of models, termed hierarchical linear models by this review, has emerged. In the paradigmatic application of such models, observations within each group (e.g., classroom or school) vary as a function of group-level or "microparameters." However, these microparameters vary randomly across the population of groups as a function of "macroparameters." Research interest has focused on estimation of both micro- and macroparameters. This paper reviews estimation theory and application of such models. Also, the logic of these methods is extended beyond the paradigmatic case to include research domains as diverse as panel studies, meta-analysis, and classical test theory. Microparameters to be estimated may be as diverse as means, proportions, variances, linear regression coefficients, and logit linear regression coefficients. Estimation theory is reviewed from Bayes and empirical Bayes viewpoints and the examples considered involve data sets with two levels of hierarchy.
Article
The "Nation's Report Card," the National Assessment of Educational Progress (NAEP), is the only nationally representative and continuing assessment of what America's students know and can do in various subject areas. This report summarizes some of the sophisticated statistical methodology used in the 1992 Trial State Assessment of Mathematics. Chapters include: (1) "Overview: The Design, Implementation, and Analysis of the Trial State Mathematics Assessment Program" (Eugene G. Johnson, Stephen L. Koffler, and John Mazzeo); (2) "Developing the Mathematics Objectives, Cognitive Items, Background Questions, and Assessment Instruments" (Stephen L. Koffler); (3) "Sample Design and Selection" (Leyla K. Mohadjer, Keith F. Rust, Valerija Smith, and Jacqueline Severynse); (4) "State and School Cooperation and Field Administration" (Nancy Caldwell); (5) "Processing Assessment Materials" (Dianne Smrdel, Linda Reynolds, and Brad Thayer); (6) "Creation of the Database and Evaluation of the Quality Control of Data Entry" (John J. Ferris and David S. Freund); (7) "Weighting Procedures and Variance Estimation" (Adam Chu and Keith F. Rust); (8) "Theoretical Background and Philosophy of NAEP Scaling Procedures" (Eugene G. Johnson, Robert J. Mislevy, and Neal Thomas); (9) "Data Analysis and Scaling for the 1992 Trial State Assessment in Mathematics" (John Mazzeo, Huahua Chang, Edward Kulick, Y. Fai Fong, and Angela Grima); and (10) "Conventions Used in Reporting the Results of the 1992 Trial State Assessment in Mathematics" (John Mazzeo). Contains extensive appendixes and 68 references. (JRH)
Article
Standard procedures for estimating item parameters in Item Response Theory models make no use of auxiliary information about test items, such as their format or content, or the skills they require for solution. This paper describes a framework for exploiting this information, thereby enhancing the precision and stability of item parameter estimates and providing diagnostic information about items' operating characteristics. In the proposed model, final item parameter estimates represent a compromise between Linear Logistic Test Model estimates, where items with identical features would have identical estimates, and unrestricted maximum likelihood estimates. The principles were illustrated in a context for which a relatively simple approximation is available: empirical Bayes (EB) estimation of Rasch item difficulty parameters. Computation proceeded in three steps: (1) unrestricted maximum likelihood estimates of item parameters; (2) point estimates of the regression parameters; and (3) final estimates of item parameters. A numerical example applied EB estimation procedures to the responses from 150 sixth graders on the Fractions subtest of the California Achievement Test. Three models, varying in their assumptions of item exchangeability, were fitted to the data. Analysis showed that auxiliary information about item features contributed as much information about item parameters as the likelihood function did. (Author/LPG)
Article
The multiple-matrix item sampling designs that provide information about population characteristics most efficiently administer too few responses to students to estimate their proficiencies individually. Marginal estimation procedures, which estimate population characteristics directly from item responses, must be employed to realize the benefits of such a sampling design. Numerical approximations of the appropriate marginal estimation procedures for a broad variety of analyses can be obtained by constructing, from the results of a comprehensive extensive marginal solution, files of plausible values of student proficiencies. This article develops the concepts behind plausible values in a simplified setting, sketches their use in the National Assessment of Educational Progress (NAEP), and illustrates the approach with data from the Scholastic Aptitude Test (SA T).
Article
Maximum likelihood estimation of item parameters in the marginal distribution, integrating over the distribution of ability, becomes practical when computing procedures based on an EM algorithm are used. By characterizing the ability distribution empirically, arbitrary assumptions about its form are avoided. The Em procedure is shown to apply to general item-response models lacking simple sufficient statistics for ability. This includes models with more than one latent dimension.
Article
S ummary A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis.
Article
Thesis (Ph. D. in Education)--University of California, Berkeley, May 1994. Includes bibliographical references (134-139).
Article
A procedure is proposed for the analysis of multilevel nonlinear models using a linearization. The case of log linear models for discrete response data is studied in detail.
Article
This paper discusses the application of a class of Rasch models to situations where test items are grouped into subsets and the common attributes of items within these subsets brings into question the usual assumption of conditional independence. The models are all expressed as particular cases of the random coefficients multinomial logit model developed by Adams and Wilson. This formulation allows a very flexible approach to the specification of alternative models, and makes model testing particularly straightforward. The use of the models is illustrated using item bundles constructed in the framework of the SOLO taxonomy of Biggs and Collis.
Article
Standard procedures for drawing inferences from complex samples do not apply when the variable of interest cannot be observed directly, but must be inferred from the values of secondary random variables that depend on stochastically. Examples are proficiency variables in item response models and class memberships in latent class models. Rubin's multiple imputation techniques yield approximations of sample statistics that would have been obtained, had been observable, and associated variance estimates that account for uncertainty due to both the sampling of respondents and the latent nature of. The approach is illustrated with data from the National Assessment for Educational Progress.
Article
Consider vectors of item responses obtained from a sample of subjects from a population in which ability is distributed with densityg(), where the are unknown parameters. Assuming the responses depend on through a fully specified item response model, this paper presents maximum likelihood equations for the estimation of the population parameters directly from the observed responses; i.e., without estimating an ability parameter for each subject. Also provided are asymptotic standard errors and tests of fit, computing approximations, and details of four special cases: a non-parametric approximation, a normal solution, a resolution of normal components, and a beta-binomial solution.
Article
The problem of characterizing the manifest probabilities of a latent trait model is considered. The item characteristic curve is transformed to the item passing-odds curve and a corresponding transformation is made on the distribution of ability. This results in a useful expression for the manifest probabilities of any latent trait model. The result is then applied to give a characterization of the Rasch model as a log-linear model for a 2 J -contingency table. Partial results are also obtained for other models. The question of the identifiability of “guessing” parameters is also discussed.
Article
A unidimensional latent trait model for responses scored in two or more ordered categories is developed. This “Partial Credit” model is a member of the family of latent trait models which share the property of parameter separability and so permit “specifically objective” comparisons of persons and items. The model can be viewed as an extension of Andrich's Rating Scale model to situations in which ordered response alternatives are free to vary in number and structure from item to item. The difference between the parameters in this model and the “category boundaries” in Samejima's Graded Response model is demonstrated. An unconditional maximum likelihood procedure for estimating the model parameters is developed.
Article
This paper presents a general mixed model for the analysis of serial dichotomous responses provided by a panel of study participants. Each subject's serial responses are assumed to arise from a logistic model, but with regression coefficients that vary between subjects. The logistic regression parameters are assumed to be normally distributed in the population. Inference is based upon maximum likelihood estimation of fixed effects and variance components, and empirical Bayes estimation of random effects. Exact solutions are analytically and computationally infeasible, but an approximation based on the mode of the posterior distribution of the random parameters is proposed, and is implemented by means of the EM algorithm. This approximate method is compared with a simpler two-step method proposed by Korn and Whittemore (1979, Biometrics 35, 795-804), using data from a panel study of asthmatics originally described in that paper. One advantage of the estimation strategy described here is the ability to use all of the data, including that from subjects with insufficient data to permit fitting of a separate logistic regression model, as required by the Korn and Whittemore method. However, the new method is computationally intensive.
Simulating parameter recovery for the random coefficients multinomial logit. Paper presented at the Seventh International Objective Measurement Workshop
  • M L Wu
  • R J Adams
Wu, M. L., & Adams, R. J. (1993, April). Simulating parameter recovery for the random coefficients multinomial logit. Paper presented at the Seventh International Objective Measurement Workshop, Atlanta, GA.
Estimating measurement error and its effects on statistical analysis
  • R J Adams
SES, a standardized socioeconomic composite made up of parental occupation and education; and References Adams, R. J. (1989). Estimating measurement error and its effects on statistical analysis. Unpublished doctoral dissertation, University of Chicago.
Science learning in Victorian schools (Research Monograph 41)
  • R J Adams
  • B A Doig
  • M Rosier
Adams, R. J., Doig, B. A., & Rosier, M. (1991). Science learning in Victorian schools (Research Monograph 41). Hawthorn, Australia: Australian Council for Educational Research.
9-item partial credit test of student conceptions of the Earth and its place in the solar system
  • Earth
  • Space
Earth and Space, a 9-item partial credit test of student conceptions of the Earth and its place in the solar system;
BILOG: Analysis and scoring of binary items with one-, two-, and three-parameter logistic models
  • R J Mislevy
  • R D Bock
Mislevy, R. J., & Bock, R. D. (1983). BILOG: Analysis and scoring of binary items with one-, two-, and three-parameter logistic models. Mooresville, IN: Scientific Software Inc.
Characterizing the manifest probabilities of latent trait models Maximum likelihood estimation in generalized Rasch models
  • Multilevel Item
  • Models Cressie
  • N Holland
Multilevel Item Response Models Cressie, N., & Holland, P. W. (1983). Characterizing the manifest probabilities of latent trait models. Psychometrika, 48, 129-141. de Leeuw, J., & Verhelst, N. (1986). Maximum likelihood estimation in generalized Rasch models. Journal of Educational Statistics, 11, 183-196.
Loglinear multidimensional IRT models for polyto-mously scored items. Paper presented at the Fifth International Objective Measure-ment Workshop Many faceted Rasch measurement
  • H Kelderman
Kelderman, H. (1989, March). Loglinear multidimensional IRT models for polyto-mously scored items. Paper presented at the Fifth International Objective Measure-ment Workshop, University of California, Berkeley. Linacre, J. M. (1989). Many faceted Rasch measurement. Unpublished doctoral disser-tation, University of Chicago.
A random coefficients multinomial logit: A generalized approach to fitting Rasch models Objective measurement HI: Theory into practice
  • R J Adams
  • M R Wilson
Adams, R. J., & Wilson, M. R. (1996). A random coefficients multinomial logit: A generalized approach to fitting Rasch models. In G. Engelhard & M. Wilson (Eds.), Objective measurement HI: Theory into practice (pp. 143-166). Norwood, NJ: Ablex.
MATS: Multi-aspect test software
  • M L Wu
  • R J Adams
  • M R Wilson
Wu, M. L., Adams, R. J., & Wilson, M. R. (1995, April). MATS: Multi-aspect test software. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco.