Item Response Theory: Parameter Estimation Techniques

The awareness assessment model: measuring awareness and collaboration support over participant’s perspective

Article

Full-text available

Mar 2024
Univers Access Inform Soc

Awareness has been a valuable concept in collaborative systems since its formation, an essential part of groupware. Awareness research followed the evolution of the whole field over the last decades. We can see the progress in a mutual understanding of awareness and developing concepts and technology of awareness support. An efficient awareness mechanism ensures a better understanding and, consequently, a better projection of future actions; in contrast, the lack of these mechanisms undermines comprehension and prevents participants from projecting their work accordingly. Few works present methods or processes that assist in providing awareness in groupware systems; most common strategies focus on the design/development stages or are ad-hoc evaluation models. Furthermore, there are no standardized tests for awareness assessment; thus, measures must be established to assess awareness and identify the criteria for achieving awareness indicators. This work establishes an assessment model for collaborative interfaces by analyzing the awareness mechanisms provided from the participant’s viewpoint. In this model, we consider the participant’s skill in understanding the awareness and the difficulty involved, providing advances toward designing, developing, and evaluating groupware systems. The proposed assessment model allows us to measure the awareness support provided considering the collaboration, workspace, and contextual awareness perspectives. Assuming a plural collaborative environment, where different participants with different skills, knowledge, and wisdom meet and interact, this model seeks to build a more faithful representation of these existing profiles across a broad spectrum of individual abilities.

Understanding Ability and Reliability Differences Measured with Count Items: The Distributional Regression Test Model and the Count Latent Regression Model

Article

Feb 2024
MULTIVAR BEHAV RES

In psychology and education, tests (e.g., reading tests) and self-reports (e.g., clinical questionnaires) generate counts, but corresponding Item Response Theory (IRT) methods are underdeveloped compared to binary data. Recent advances include the Two-Parameter Conway-Maxwell-Poisson model (2PCMPM), generalizing Rasch’s Poisson Counts Model, with item-specific difficulty, discrimination, and dispersion parameters. Explaining differences in model parameters informs item construction and selection but has received little attention. We introduce two 2PCMPM-based explanatory count IRT models: The Distributional Regression Test Model for item covariates, and the Count Latent Regression Model for (categorical) person covariates. Estimation methods are provided and satisfactory statistical properties are observed in simulations. Two examples illustrate how the models help understand tests and underlying constructs.

Relating the Ramsay Quotient Model to the Classical D-Scoring Rule

Article

Full-text available

Oct 2023

Alexander Robitzsch

In a series of papers, Dimitrov suggested the classical D-scoring rule for scoring items that give difficult items a higher weight while easier items receive a lower weight. The latent D-scoring model has been proposed to serve as a latent mirror of the classical D-scoring model. However, the item weights implied by this latent D-scoring model are typically only weakly related to the weights in the classical D-scoring model. To this end, this article proposes an alternative item response model, the modified Ramsay quotient model, that is better-suited as a latent mirror of the classical D-scoring model. The reasoning is based on analytical arguments and numerical illustrations.

Bayesian inference for multidimensional graded response model using Pólya-Gamma latent variables

Article

May 2023
J STAT COMPUT SIM

Fu Zhihui

In the context of both cognitive and affective tests, items are usually designed to involve more than two responses, for which polytomous models are applicable. The purpose of this paper is to propose a highly effective Pólya-Gamma Gibbs sampling algorithm based on auxiliary variables to estimate the multidimensional graded response model that has been widely used in psychological, educational, and health-related assessment. The strategy is based on the Pólya Gamma family of distributions which provides a closed-form posterior distribution for logistic-based models. With the introduction of the two latent variables, the full conditional distributions are tractable, and consequently the Gibbs sampling is easy to implement. Nice features including empirical performance of the proposed methodology are demonstrated by simulation studies. Finally, two empirical data sets were analysed to demonstrate the efficiency and utility of the proposed method.

Efficient Online Crowdsourcing with Complex Annotations

Article

Mar 2024

Crowdsourcing platforms use various truth discovery algorithms to aggregate annotations from multiple labelers. In an online setting, however, the main challenge is to decide whether to ask for more annotations for each item to efficiently trade off cost (i.e., the number of annotations) for quality of the aggregated annotations. In this paper, we propose a novel approach for general complex annotation (such as bounding boxes and taxonomy paths), that works in an online crowdsourcing setting. We prove that the expected average similarity of a labeler is linear in their accuracy conditional on the reported label. This enables us to infer reported label accuracy in a broad range of scenarios. We conduct extensive evaluations on real-world crowdsourcing data from Meta and show the effectiveness of our proposed online algorithms in improving the cost-quality trade-off.

Educational Research and Evaluation An International Journal on Theory and Practice ISSN: (Print) ( Comparison of statistical models for individual's ability index and ranking Comparison of statistical models for individual's ability index and ranking

Article

Mar 2024

Economic efficiency demands an accurate assessment of individual ability for selection purposes. This study investigates Classical Test Theory (CTT) and Item Response Theory (IRT) for estimating true ability and ranking individuals. Two Monte Carlo simulations and real data analyses were conducted. Results suggest a slight advantage for IRT, but ability estimates from both methods were highly correlated (r=0.95), indicating similar outcomes. The Logistic two-parameter IRT model emerged as the most reliable and rigorous approach. ARTICLE HISTORY

Development and Testing of the Knowledge–Attitudes–Practices Questionnaire for Nurses on the Perioperative Pulmonary Rehabilitation of Patients with Lung Cancer

Article

Full-text available

Feb 2024

Objective This study aims to develop and validate a suitable scale for assessing the level of nurses' knowledge and practice of perioperative pulmonary rehabilitation. Methods We divided the study into two phases: scale development and validation. In Phase 1, the initial items were generated through a literature review. In Phase 2, a cross-sectional survey was conducted involving 603 thoracic nurses to evaluate the scale's validity, reliability, and difficulty and differentiation of items. Item and exploratory factor analyses were performed for item reduction. Thereafter, their validity, reliability, difficulty, and differentiation of items were assessed using Cronbach's α coefficient, retest reliability, content validity, and item response theory (IRT). Results The final questionnaire comprised 34 items, and exploratory factor analysis revealed 3 common dimensions with internal consistency coefficients of 0.950, 0.959, and 0.965. The overall internal consistency of the scale was 0.966, with a split-half reliability of 0.779 and a retest reliability Pearson's correlation coefficient of 0.936. The content validity of the scale was excellent (item-level content validity index = 0.875–1.000, scale-level content validity index = 0.978). The difficulty and differentiation of item response theory were all verified to a certain extent (average value = 2.391; threshold β values = −1.393–0.820). Conclusions The knowledge–attitudes–practices questionnaire for nurses can be used as a tool to evaluate knowledge, attitudes, and practices among nurses regarding perioperative pulmonary rehabilitation for patients with lung cancer.

Exploration and analysis of a generalized one-parameter item response model with flexible link functions

Article

Full-text available

Aug 2023

This paper primarily analyzes the one-parameter generalized logistic (1PGlogit) model, which is a generalized model containing other one-parameter item response theory (IRT) models. The essence of the 1PGlogit model is the introduction of a generalized link function that includes the probit, logit, and complementary log-log functions. By transforming different parameters, the 1PGlogit model can flexibly adjust the speed at which the item characteristic curve (ICC) approaches the upper and lower asymptote, breaking the previous constraints in one-parameter IRT models where the ICC curves were either all symmetric or all asymmetric. This allows for a more flexible way to fit data and achieve better fitting performance. We present three simulation studies, specifically designed to validate the accuracy of parameter estimation for a variety of one-parameter IRT models using the Stan program, illustrate the advantages of the 1PGlogit model over other one-parameter IRT models from a model fitting perspective, and demonstrate the effective fit of the 1PGlogit model with the three-parameter logistic (3PL) and four-parameter logistic (4PL) models. Finally, we demonstrate the good fitting performance of the 1PGlogit model through an analysis of real data.

Sequential Bayesian Ability Estimation Applied to Mixed-Format Item Tests

Article

Full-text available

Sep 2023
APPL PSYCH MEAS

Large-scale tests often contain mixed-format items, such as when multiple-choice (MC) items and constructed-response (CR) items are both contained in the same test. Although previous research has analyzed both types of items simultaneously, this may not always provide the best estimate of ability. In this paper, a two-step sequential Bayesian (SB) analytic method under the concept of empirical Bayes is explored for mixed item response models. This method integrates ability estimates from different item formats. Unlike the empirical Bayes method, the SB method estimates examinees’ posterior ability parameters with individual-level sample-dependent prior distributions estimated from the MC items. Simulations were used to evaluate the accuracy of recovery of ability and item parameters over four factors: the type of the ability distribution, sample size, test length (number of items for each item type), and person/item parameter estimation method. The SB method was compared with a traditional concurrent Bayesian (CB) calibration method, EAPsum, that uses scaled scores for summed scores to estimate parameters from the MC and CR items simultaneously in one estimation step. From the simulation results, the SB method showed more accurate and reliable ability estimation than the CB method, especially when the sample size was small (150 and 500). Both methods presented similar recovery results for MC item parameters, but the CB method yielded a bit better recovery of the CR item parameters. The empirical example suggested that posterior ability estimated by the proposed SB method had higher reliability than the CB method.

Large-Scale Psychological Assessments and Learning Assessments: Guidelines for Researchers

Article

Full-text available

Dec 2022

Psychological assessment (PA) and educational assessment (EA) are among the most important contributions of cognitive and behavioral sciences to modern society. They provide important information about individuals and groups that are a part of the society. The aim of this article is to present guidelines for researchers regarding PA and large-scale learning assessments (LSLAs). The paths of PA in a world in health crisis due to the Covid-19 pandemic are discussed. In the context of LSLAs, we discuss theoretical, methodological and analytical aspects that must be considered by evaluators and researchers in the area. We conclude that PA and LSLAs are related to the extent that both fulfill the social function of identifying gaps that deserve attention, as well as functional aspects that must be maintained and encouraged. Another important characteristic is the requirement for constant technical improvement by both evaluators and researchers.

Difficulty-Controllable Neural Question Generation for Reading Comprehension using Item Response Theory

Conference Paper

Full-text available

Jul 2023

Question generation (QG) for reading comprehension, a technology for automatically generating questions related to given reading passages, has been used in various applications, including in education. Recently, QG methods based on deep neural networks have succeeded in generating fluent questions that are pertinent to given reading passages. One example of how QG can be applied in education is a reading tutor that automatically offers reading comprehension questions related to various reading materials. In such an application, QG methods should provide questions with difficulty levels appropriate for each learner's reading ability in order to improve learning efficiency. Several difficulty-controllable QG methods have been proposed for doing so. However, conventional methods focus only on generating questions and cannot generate answers to them. Furthermore, they ignore the relation between question difficulty and learner ability, making it hard to determine an appropriate difficulty for each learner. To resolve these problems, we propose a new method for generating question--answer pairs that considers their difficulty, estimated using item response theory. The proposed difficulty-controllable generation is realized by extending two pre-trained transformer models: BERT and GPT-2.

The Awareness Assessment Model: measuring the awareness and collaboration support over the participant’s perspective

Conference Paper

May 2023

[Context] Awareness has been a valuable concept in Collaborative Systems since its formation, being an essential part of groupware. The efficient awareness mechanism ensures a better understanding and, consequently, a better projection of future actions; in contrast, the lack of these mechanisms undermines comprehension and prevents participants from projecting their work accordingly. [Problem] This is a multi-factorial problem, and finding a goodm starting point in the literature can be challenging for novice groupware designers; they must reinvent awareness from their own experience of what it is, how it works, and how it is used. [Goal] This work consists of establishing an assessment model for collaborative interfaces by analyzing the awareness mechanisms provided from the participant’s viewpoint. Our awareness assessment model developed adopting the statistical technique Item Response Theory (IRT) and considers the participant’s skill in understanding the awareness and the difficulty involved. [Results] The proposed assessment model allow us to measure the awareness support provided considering the collaboration, workspace, and contextual awareness perspectives. The results obtained were translated into an awareness support scale and three levels of quality were defined.

flexCAT: Computerized Adaptive Test Development Platform

Article

Jun 2023

Computerized Adaptive Testing (CAT) is a new testing mode that utilizes the adaptive measurement concept of "tailored to fit." Compared with traditional paper-and-pencil testing, CAT has the advantages of improving measurement accuracy, reducing test length, and ensuring test security. Therefore, it is highly regarded by researchers and practitioners both domestically and internationally. However, the platform construction of CAT involves complex statistical measurement theory and tedious numerical calculations, which hinder the application and promotion of CAT in practice. This article mainly introduces the development platform of computerized adaptive testing - flexCAT. Users can quickly build their own CAT system using the convenient human-computer interactive interface provided by the flexCAT platform. This article will introduce the first web-based computerized adaptive testing development platform in China - flexCAT, from the perspectives of its advantages, basic theory, module functions, etc. The aim is to provide free adaptive testing platform development services for research and application personnel in the fields of education, psychology, and further promote the development of psychological and educational measurement theory and technology in China. The URL for the flexCAT platform is: http://www.psychometrics-studio.cn/app/cat_demo/index.html?Id=false&Block=false.

Decarbonizing the Residential Sector: How Prominent is Household Energy-Saving Behavior in Decision Making?

Article

May 2023

Fateh Bélaïd

In addition to scrutinizing the decision process behind energy efficiency investment, this study investigates its association with energy-saving behavior. Its conceptual underpinnings are based on the intersection of behavioral change and "energy efficiency paradox" theories. Based upon a rich, disaggregated dataset representative of the French housing sector, it develops an energy-saving score based on the item response theory model, which considers household attributes and ability levels. Then this score is used as an independent factor of a multivariate probit model to examine the drivers of household investment decisions for various energy performance solutions. The results highlight that: (i) contextual and attitudinal attributes are two major drivers of energy efficiency investments, and (ii) depending on the energy solution considered, there is a significant inverse relationship between energy-savings behavior and energy efficiency investments. This reveals that environmental awareness is not necessarily a driving factor behind energy efficiency investments and emphasizes the so-called "rebound effect" issue. The results support the view that promoting energy-saving behaviors and energy efficiency investments necessitate differentiated public policies that consider both individual preferences and housing stock heterogeneity. The analysis offers valuable policy guidance and research agenda outlining future energy efficiency research priorities.

Fluctuations of Ability Estimates in Testing in Item Response Theory

Article

Apr 2023

Hideo Hirose

Searching Questions and Learning Problems in Large Problem Banks: Constructing Tests and Assignments on the Fly

Article

Full-text available

Jun 2024

Oleg Sychev

Modern advances in creating shared banks of learning problems and automatic question and problem generation have led to the creation of large question banks in which human teachers cannot view every question. These questions are classified according to the knowledge necessary to solve them and the question difficulties. Constructing tests and assignments on the fly at the teacher’s request eliminates the possibility of cheating by sharing solutions because each student receives a unique set of questions. However, the random generation of predictable and effective assignments from a set of problems is a non-trivial task. In this article, an algorithm for generating assignments based on teachers’ requests for their content is proposed. The algorithm is evaluated on a bank of expression-evaluation questions containing more than 5000 questions. The evaluation shows that the proposed algorithm can guarantee the minimum expected number of target concepts (rules) in an exercise with any settings. The available bank and exercise difficulty chiefly determine the difficulty of the found questions. It almost does not depend on the number of target concepts per item in the exercise: teaching more rules is achieved by rotating them among the exercise items on lower difficulty settings. An ablation study show that all the principal components of the algorithm contribute to its performance. The proposed algorithm can be used to reliably generate individual exercises from large, automatically generated question banks according to teachers’ requests, which is important in massive open online courses.

Response Item Network (ResIN): A network-based approach to explore attitude systems

Article

Full-text available

May 2024

Belief network analysis (BNA) refers to a class of methods designed to detect and outline structural organizations of complex attitude systems. BNA can be used to analyze attitude-structures of abstract concepts such as ideologies, worldviews, and norm systems that inform how people perceive and navigate the world. The present manuscript presents a formal specification of the Response-Item Network (or ResIN), a new methodological approach that advances BNA in at least two important ways. First, ResIN allows for the detection of attitude asymmetries between different groups, improving the applicability and validity of BNA in research contexts that focus on intergroup differences and/or relationships. Second, ResIN’s networks include a spatial component that is directly connected to item response theory (IRT). This allows for access to latent space information in which each attitude (i.e. each response option across items in a survey) is positioned in relation to the core dimension(s) of group structure, revealing non-linearities and allowing for a more contextual and holistic interpretation of the attitudes network. To validate the effectiveness of ResIN, we develop a mathematical model and apply ResIN to both simulated and real data. Furthermore, we compare these results to existing methods of BNA and IRT. When used to analyze partisan belief-networks in the US-American political context, ResIN was able to reliably distinguish Democrat and Republican attitudes, even in highly asymmetrical attitude systems. These results demonstrate the utility of ResIN as a powerful tool for the analysis of complex attitude systems and contribute to the advancement of BNA.

Smooth Information Criterion for Regularized Estimation of Item Response Models

Article

Full-text available

Apr 2024

Alexander Robitzsch

Item response theory (IRT) models are frequently used to analyze multivariate categorical data from questionnaires or cognitive test data. In order to reduce the model complexity in item response models, regularized estimation is now widely applied, adding a nondifferentiable penalty function like the LASSO or the SCAD penalty to the log-likelihood function in the optimization function. In most applications, regularized estimation repeatedly estimates the IRT model on a grid of regularization parameters λ. The final model is selected for the parameter that minimizes the Akaike or Bayesian information criterion (AIC or BIC). In recent work, it has been proposed to directly minimize a smooth approximation of the AIC or the BIC for regularized estimation. This approach circumvents the repeated estimation of the IRT model. To this end, the computation time is substantially reduced. The adequacy of the new approach is demonstrated by three simulation studies focusing on regularized estimation for IRT models with differential item functioning, multidimensional IRT models with cross-loadings, and the mixed Rasch/two-parameter logistic IRT model. It was found from the simulation studies that the computationally less demanding direct optimization based on the smooth variants of AIC and BIC had comparable or improved performance compared to the ordinarily employed repeated regularized estimation based on AIC or BIC.

Item Statistics of Multiple Choice Physics Achievement Test Using Classic Test Theory and Item Response Theory

Article

Full-text available

Mar 2024

This study examine the comparability of item statistics generated from the frameworks of classical test theory (CTT) and 2-parameter model of item response theory (IRT). A 40-item Physics Achievement Test was developed and administered to 600 senior secondary school two students, who were randomly selected from 12 senior secondary schools in Taraba State, Nigeria. Results showed that item statistics obtained from both frameworks were relatively similar. However, item statistics obtained from IRT 2-parameter model looked balanced than those from CTT. In addition, for item selection process, IRT 2-parameter model retained more items than CTT model. This result implies that test developers and public examining bodies should integrate IRT model into their test development processes. Through IRT model, test constructors would be able to generate stable items than in the CTT model used at present and at the end, the test scores of examinees will be more reliably estimated.

Software Effort Estimation using Machine Learning Algorithms

Article

Full-text available

Feb 2024

Effort estimation is a crucial aspect of software development, as it helps project managers plan, control, and schedule the development of software systems. This research study compares various machine learning techniques for estimating effort in software development, focusing on the most widely used and recent methods. The paper begins by highlighting the significance of effort estimation and its associated difficulties. It then presents a comprehensive overview of the different categories of effort estimation techniques, including algorithmic, model-based, and expert-based methods. The study concludes by comparing methods for a given software development project. Random Forest Regression algorithm performs well on the given dataset tested along with various Regression algorithms, including Support Vector, Linear, and Decision Tree Regression. Additionally, the research identifies areas for future investigation in software effort estimation, including the requirement for more accurate and reliable methods and the need to address the inherent complexity and uncertainty in software development projects. This paper provides a comprehensive examination of the current state-of-the-art in software effort estimation, serving as a resource for researchers in the field of software engineering.

Nonparametric Kernel Smoothing Item Response Theory Analysis of Likert Items

Article

Full-text available

Feb 2024

Likert scales are the most common psychometric response scales in the social and behavioral sciences. Likert items are typically used to measure individuals' attitudes, perceptions, knowledge, and behavioral changes. To analyze the psychometric properties of individual Likert-type items and overall Likert scales, mostly methods based on classical test theory (CTT) are used, including corrected item-total correlations and reliability indices. CTT methods heavily rely on the total scale scores, making it challenging to directly examine the performance of items and response options across varying levels of the trait. In this study, Kernel Smoothing Item Response Theory (KS-IRT) is introduced as a graphical nonparametric IRT approach for the evaluation of Likert items. Unlike parametric IRT models, nonparametric IRT models do not involve strong assumptions regarding the form of item response functions (IRFs). KS-IRT provides graphics for detecting peculiar patterns in items across different levels of a latent trait. Differential item functioning (DIF) can also be examined by applying KS-IRT. Using empirical data, we illustrate the application of KS-IRT to the examination of Likert items on a psychological scale.

On the Identifiability of 3- and 4-Parameter Item Response Theory Models From the Perspective of Knowledge Space Theory

Article

Full-text available

Feb 2024
PSYCHOMETRIKA

The present work aims at showing that the identification problems (here meant as both issues of empirical indistinguishability and unidentifiability) of some item response theory models are related to the notion of identifiability in knowledge space theory. Specifically, that the identification problems of the 3-and 4-parameter models are related to the more general issues of forward-and backward-gradedness in all items of the power set, which is the knowledge structure associated with IRT models under the assumption of local independence. As a consequence, the identifiability problem of a 4-parameter model is split into two parts: a first one, which is the result of a trade-off between the left-side added parameters and the remainder of the Item Response Function, e.g., a 2-parameter model, and a second one, which is the already well-known identifiability issue of the 2-parameter model itself. Application of the results to the logistic case appears to provide both a confirmation and a generalization of the current findings in the literature for both fixed-and random-effects IRT logistic models.

Computerized adaptive testing to screen pre-school children for emotional and behavioral problems

Article

Full-text available

Jan 2024
Eur J Pediatr

Questionnaires to detect emotional and behavioral (EB) problems in preventive child healthcare (PCH) should be short; this potentially affects their validity and reliability. Computerized adaptive testing (CAT) could overcome this weakness. The aim of this study was to (1) develop a CAT to measure EB problems among pre-school children and (2) assess the efficiency and validity of this CAT. We used a Dutch national dataset obtained from parents of pre-school children undergoing a well-child care assessment by PCH (n = 2192, response 70%). Data regarded 197 items on EB problems, based on four questionnaires, the Strengths and Difficulties Questionnaire (SDQ), the Child Behavior Checklist (CBCL), the Ages and Stages Questionnaire: Social Emotional (ASQ:SE), and the Brief Infant–Toddler Social and Emotional Assessment (BITSEA). Using 80% of the sample, we calculated item parameters necessary for a CAT and defined a cutoff for EB problems. With the remaining part of the sample, we used simulation techniques to determine the validity and efficiency of this CAT, using as criterion a total clinical score on the CBCL. Item criteria were met by 193 items. This CAT needed, on average, 16 items to identify children with EB problems. Sensitivity and specificity compared to a clinical score on the CBCL were 0.89 and 0.91, respectively, for total problems; 0.80 and 0.93 for emotional problems; and 0.94 and 0.91 for behavioral problems. Conclusion: A CAT is very promising for the identification of EB problems in pre-school children, as it seems to yield an efficient, yet high-quality identification. This conclusion should be confirmed by real-life administration of this CAT. What is Known: • Studies indicate the validity of using computerized adaptive test (CAT) applications to identify emotional and behavioral problems in school-aged children. • Evidence is as yet limited on whether CAT applications can also be used with pre-school children. What is New: • The results of this study show that a computerized adaptive test is very promising for the identification of emotional and behavior problems in pre-school children, as it appears to yield an efficient and high-quality identification.

A generalized expectation model selection algorithm for latent variable selection in multidimensional item response theory models

Article

Full-text available

Nov 2023
STAT COMPUT

In this paper, we propose a generalized expectation model selection (GEMS) algorithm for latent variable selection in multidimensional item response theory models which are commonly used for identifying the relationships between the latent traits and test items. Under some mild assumptions, we prove the numerical convergence of GEMS for model selection by minimizing the generalized information criteria of observed data in the presence of missing data. For latent variable selection in the multidimensional two-parameter logistic (M2PL) models, we present an efficient implementation of GEMS to minimize the Bayesian information criterion. To ensure parameter identifiability, the variances of all latent traits are assumed to be unity and each latent trait is required to have an item exclusively associated with it. The convergence of GEMS for the M2PL models is verified. Simulation studies show that GEMS is computationally more efficient than the expectation model selection (EMS) algorithm and the expectation maximization based L1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_{1}$$\end{document}-penalized method (EML1), and it yields better correct rate of latent variable selection and mean squared error of parameter estimates than the EMS and EML1. The GEMS algorithm is illustrated by analyzing a real dataset related to the Eysenck Personality Questionnaire.

On the monotonicity of the residual heteroscedasticity item response model

Article

Full-text available

Nov 2023

The residual heteroscedasticity (RH) model is a recently popularized asymmetric model that aims to model complex item response behavior. In this paper, we probe the conditions under which the existing form of the RH model does not guarantee monotonic boundary response functions (BRFs), a necessary condition for ensuring at least ordinal-level measurement. We derive the conditions under which RH BRFs are not monotonic and we use this result to propose a Bayesian computational strategy that enforces monotonicity. Through real and simulated data illustrations, we demonstrate that failures of monotonicity occur in real data and that our proposed computational solution effectively enforces monotonicity and yields accurate item parameter estimates. Finally, we demonstrate that any IRT model developed by specifying a residual variance function is likely to encounter similar issues with monotonicity. We recommend our reparameterization for both data generation in simulation studies and for fitting the RH model to real data.

Automated Parallel Test Forms Assembly using Zero-suppressed Binary Decision Diagrams

Article

Full-text available

Oct 2023

Recently, through the progress achieved in the study of computer science, automated test assemblies of parallel test forms, for which each form has equivalent measurement accuracy but with a different set of items, have emerged as a new standard tool. An important goal for automated test assembly is to assemble as many parallel test forms as possible. Although many automated test assembly methods exist, maximum clique using the integer programming method is known to be able to assemble the greatest number of assembled test forms with the highest measurement accuracy. Nevertheless, because of the high time complexity of integer programming, the method requires a month or more to assemble 300,000 tests. This study proposes a new automated test assembly using Zero-suppressed Binary Decision Diagrams (ZDD): a graphical representation for a set of item combinations. This representation is derived by reducing a binary decision tree. According to the proposed method, each node in the binary decision tree corresponds to an item of an item pool, which is a test item database. Each node has two edges, each signifying that the corresponding item is included in a test form or not. Furthermore, all equivalent nodes are shared, providing that they have equal measurement accuracy and equal test length. Numerical experiments demonstrate that the proposed method can assemble 1,500,000 test forms within 24 hr, although earlier methods have been capable of assembling only 300,000 test forms during a week or more.

Multilevel IRT models to explore heterogeneity in subjective financial knowledge at individual and regional levels: the Italian case

Article

Full-text available

Oct 2023

Introduction Modern FinTech tools (e.g., instant payments, blockchain, roboadvisor) represent the new frontier of digital finance. Consequently, the evaluation of the knowledge level of the population about these topics is a crucial concern. In this context, several exogenous factors may influence individual differences in financial literacy. In particular, the territorial characteristics can have an impact on FinTech. In this work, we investigate individual heterogeneity in subjective financial knowledge in Italy, specifically focusing on modern FinTech tools, and exploring the differences at the individual and regional levels. Methods A sample of 598 Italian individuals from 10 different Italian regions was involved. A multilevel IRT model is performed to evaluate the level of FinTech individual knowledge and the differences according to Italian regions to account for the hierarchical structure of the data. Results Results reported a weak regional effect, revealing that heterogeneity in financial knowledge can be mainly attributed to individual characteristics. At the individual level, age, economic condition, knowledge of traditional financial objects and numeracy showed a significant effect. In addition, a scientific field of study and work have an impact on respondents' knowledge level. Discussion What is shown and discussed in this contribution can inspire policymakers' actions to increase financial literacy in the population. In particular, the obtained results imply that policymakers should improve the population's awareness of less popular FinTech tools and foster individuals' literacy about numbers and traditional financial tools, which proved to have a great influence in explaining FinTech knowledge differences.

RMX/PIccc: An Extended Person–Item Map and a Unified IRT Output for eRm, psychotools, ltm, mirt, and TAM

Article

Full-text available

Sep 2023

A constituting feature of item response models is that item and person parameters share a latent scale and are therefore comparable. The Person–Item Map is a useful graphical tool to visualize the alignment of the two parameter sets. However, the “classical” variant has some shortcomings, which are overcome by the new RMX package (Rasch models—eXtended). The package provides the RMX::plotPIccc() function, which creates an extended version of the classical PI Map, termed “PIccc”. It juxtaposes the person parameter distribution to various item-related functions, like category and item characteristic curves and category, item, and test information curves. The function supports many item response models and processes the return objects of five major R packages for IRT analysis. It returns the used parameters in a unified form, thus allowing for their further processing.

Metaheuristic Algorithms in Artificial Intelligence with Applications to Bioinformatics, Biostatistics, Ecology and, the Manufacturing Industries

Preprint

Full-text available

Aug 2023

Nature-inspired metaheuristic algorithms are important components of artificial intelligence, and are increasingly used across disciplines to tackle various types of challenging optimization problems. We apply a newly proposed nature-inspired metaheuristic algorithm called competitive swarm optimizer with mutated agents (CSO-MA) and demonstrate its flexibility and out-performance relative to its competitors in a variety of optimization problems in the statistical sciences. In particular, we show the algorithm is efficient and can incorporate various cost structures or multiple user-specified nonlinear constraints. Our applications include (i) finding maximum likelihood estimates of parameters in a single cell generalized trend model to study pseudotime in bioinformatics, (ii) estimating parameters in a commonly used Rasch model in education research, (iii) finding M-estimates for a Cox regression in a Markov renewal model and (iv) matrix completion to impute missing values in a two compartment model. In addition we discuss applications to (v) select variables optimally in an ecology problem and (vi) design a car refueling experiment for the auto industry using a logistic model with multiple interacting factors.

Análisis Psicométrico del Inventario Alemán de Ansiedad ante los Exámenes basado en el Modelo de Respuesta Graduada

Article

Full-text available

Jun 2023

El presente artículo tiene como objetivo analizar las propiedades psicométricas del Inventario Alemán de Ansiedad ante los exámenes adaptado a Costa Rica (GTAI-CR), con base en el modelo de respuesta graduada. Para este propósito se aplicó el instrumento a 184 personas (101 hombres, 82 mujeres y 1 persona no identificada con las categorías anteriores). Cada una de las cuatro subescalas del GTAI fue evaluada de manera independiente. Se obtuvo que las subescalas de forma global y sus ítems de forma independiente mostraron un ajuste aceptable al modelo. Las curvas características de cada categoría en cada ítem fueron plausibles para grupos representativos de población. Por otro lado, en cada subescala se calculó el rango donde las estimaciones de estas puntuaciones latentes presentaron precisiones aceptables. Finalmente, se presentan recomendaciones para que las escalas de la GTAI-CR puedan mejorar la precisión de las puntuaciones en las que brindan baja información.

Reliability and validity of generalized anxiety disorder 7-item scale in early pregnant women

Article

Full-text available

Jun 2023

Jason Zhu

Objective: This study aimed to evaluate the structural reliability and validity of generalized anxiety disorder 7-item (GAD-7) scale in early pregnant women. Methods: In this cross-sectional study, 30,823 patients in early pregnancy registered in the Obstetrics and Gynecology Hospital of Fudan University completed the GAD-7 scale and patient health questionnaire-9 item (PHQ-9). The discriminative ability, reliability, construct validity, and criterion validity were assessed to evaluate the psychometric properties and factor structures. Items with a discrimination parameter (α) of <0.65, factor loading of <0.30, or cross loading of >0.40 in two or more factors simultaneously were deleted from the scale. Results: All GAD-7 scale items exhibited a high discrimination power. The reliability of the GAD-7 scale was good (Cronbach's alpha coefficient = 0.891). Exploratory factor analysis extracted one factor with eigenvalues of greater than 1.0, which explained 61.930% of the common variance. Confirmatory factor analysis confirmed that the one-factor structure fitted the data well. The correlation coefficient with the PHQ-9 was 0.639. Conclusion: The Chinese version of the GAD-7 scale can be used as a screening tool for early pregnant women. It performs well in terms of discriminative ability, reliability, construct validity, and criterion validity. Pregnant women who screen positive may require more attention and investigation to confirm the presence of generalized anxiety disorder.

CUDA-aware MPI implementation of Gibbs sampling for an IRT model

Article

Full-text available

Jun 2023
CLUSTER COMPUT

Item response theory (IRT) is a popular approach for addressing large-scale assessment problems in psychometrics and other areas of applied research. An emergent research direction that integrates it with machine learning techniques has made IRT applicable to a wide range of fields. The fully Bayesian approach for estimating IRT models is computationally expensive due to the large number of iterations, which require a large amount of memory to store massive amount of data. This limits the use of the procedure in many applications using traditional CPU architecture. In an effort to overcome such restrictions, previous studies focused on utilizing high performance computing using either distributed memory-based Message Passing Interface (MPI) or massive threads compute unified device architecture (CUDA) to achieve certain speedups with a simple IRT model. This study focuses on this model and aims at demonstrating the scalability of parallel algorithms integrating CUDA into MPI computing paradigm.

Psychometric Properties of the Chinese Version of the 10-Item Social Provisions Scale in Chinese Populations

Article

Full-text available

May 2023
J PSYCHOPATHOL BEHAV

This study performed a cross-cultural validation of the Chinese version of the 10-item Social Provisions Scale (C-SPS-10) in Chinese populations. Study 1 examined the factor structure, internal reliability, discrimination, criterion validity, and network structure of C-SPS-10 by utilizing a sample of disaster victims in the 2021 Henan floods. Study 2 substantiated the findings of Study 1 in a general population sample. Measurement invariances between populations and between sexes in terms of the C-SPS-10 were also tested using the network approach. Study 3 used three samples to examine the test-retest reliability of the C-SPS-10 over three different time periods. The general results showed that the C-SPS-10 has excellent factor structure, internal reliability, discrimination, and criterion validity. The C-SPS-10 was confirmed to have good psychometric properties. Although the full scale functions well, problems may exist at a domain level. Moreover, the full scale of the C-SPS-10 was varied as a useful tool to capture trait-like characteristics of individuals’ perceptions of social support for the general population.

A Note on Latent Traits Estimates under IRT Models with Missingness

Article

Full-text available

Apr 2023

Missingness due to not‐reached items and omitted items has received much attention in the recent psychometric literature. Such missingness, if not handled properly, would lead to biased parameter estimation, as well as inaccurate inference of examinees, and further erode the validity of the test. This paper reviews some commonly used IRT based models allowing missingness, followed by three popular examinee scoring methods, including maximum likelihood estimation, maximum a posteriori, and expected a posteriori. Simulation studies were conducted to compare these examinee scoring methods across these commonly used models in the presence of missingness. Results showed that all the methods could infer examinees' ability accurately when the missingness is ignorable. If the missingness is nonignorable, incorporating those missing responses would improve the precision in estimating abilities for examinees with missingness, especially when the test length is short. In terms of examinee scoring methods, expected a posteriori method performed better for evaluating latent traits under models allowing missingness. An empirical study based on the PISA 2015 Science Test was further performed.

Optimizing Large-Scale Educational Assessment with a "Divide-and-Conquer" Strategy: Fast and Efficient Distributed Bayesian Inference in IRT Models

Article

May 2024
PSYCHOMETRIKA

With the growing attention on large-scale educational testing and assessment, the ability to process substantial volumes of response data becomes crucial. Current estimation methods within item response theory (IRT), despite their high precision, often pose considerable computational burdens with large-scale data, leading to reduced computational speed. This study introduces a novel “divide- and-conquer” parallel algorithm built on the Wasserstein posterior approximation concept, aiming to enhance computational speed while maintaining accurate parameter estimation. This algorithm enables drawing parameters from segmented data subsets in parallel, followed by an amalgamation of these parameters via Wasserstein posterior approximation. Theoretical support for the algorithm is established through asymptotic optimality under certain regularity assumptions. Practical validation is demonstrated using real-world data from the Programme for International Student Assessment. Ultimately, this research proposes a transformative approach to managing educational big data, offering a scalable, efficient, and precise alternative that promises to redefine traditional practices in educational assessments.

Conditional maximum-likelihood estimation in probability-based multistage designs

Article

Mar 2024
Behaviormetrika

This article introduces conditional maximum-likelihood (CML) item parameter estimation in multistage designs based on probabilities $p^{[b]}(x_{+}^{[b]})$ for choosing a particular module ${\textbf {m}}^{[b+1]}$ conditional on a raw score $x_{+}^{[b]}$ in a previous module ${\textbf {m}}^{[b]}$. This type of multistage design is applied to ensure a minimum exposure rate for all items, for example, in international large-scale assessments (ILSAs). For the item parameter estimation, various likelihood-based methods are available. While the marginal maximum-likelihood method (MML) provides consistent estimates in multistage designs, the CML method in its original formulation leads to biased item parameter estimates. In this contribution, we will propose a modification of the common CML method for probabilistic routing strategies, based on the approach for deterministic routing strategies (Zwitser & Maris, 2015, Psychometrika), that provides practically unbiased item parameter estimates for the Rasch model. In a simulation study, it is shown that this modified CML estimation method also provides in probabilistic multistage designs, practically unbiased item parameter estimates.

Patient-perceived progression in multiple system atrophy: natural history of quality of life

Article

Full-text available

Mar 2024

Background Health-related quality of life (Hr-QoL) scales provide crucial information on neurodegenerative disease progression, help improve patient care and constitute a meaningful endpoint for therapeutic research. However, Hr-QoL progression is usually poorly documented, as for multiple system atrophy (MSA), a rare and rapidly progressing alpha-synucleinopathy. This work aimed to describe Hr-QoL progression during the natural course of MSA, explore disparities between patients and identify informative items using a four-step statistical strategy. Methods We leveraged the data of the French MSA cohort comprising annual assessments with the MSA-QoL questionnaire for more than 500 patients over up to 11 years. A four-step strategy (1) determined the subdimensions of Hr-QoL, (2) modelled the subdimension trajectories over time, (3) mapped item impairments with disease stages and (4) identified most informative items. Results Four dimensions were identified. In addition to the original motor, non-motor and emotional domains, an oropharyngeal component was highlighted. While the motor and oropharyngeal domains deteriorated rapidly, the non-motor and emotional aspects were already impaired at cohort entry and deteriorated slowly over the disease course. Impairments were associated with sex, diagnosis subtype and delay since symptom onset. Except for the emotional domain, each dimension was driven by key identified items. Conclusion The multidimensional Hr-QoL deteriorates progressively over the course of MSA and brings essential knowledge for improving patient care. As exemplified with MSA, the thorough description of Hr-QoL over time using the four-step strategy can provide perspectives on neurodegenerative diseases’ management to ultimately deliver better support focused on the patient’s perspective.

Enhancing Educational Assessment: Leveraging Item Response Theory’s Rasch Model

Conference Paper

Feb 2024

In the realm of educational assessment, accurate measurement of students’ knowledge and abilities is crucial for effective teaching and learning. Traditional assessment methods often fall short in providing precise and meaningful insights into students’ aptitudes. However, Item Response Theory (IRT), a psychometric framework, offers a powerful toolset to address these limitations. This article proposes an exploration of IRT’s models and their potential to enhance educational assessment practices.

Coup-proofing: latent concept and measurement

Article

Jan 2024

Hwalmin Jin

The study of coup-proofing holds significant importance in political science as it offers insights into critical topics such as military coups, authoritarian governance, and international conflicts. However, due to the multifaceted nature of coup-proofing and empirical inconsistencies with existing indicators, there is a need for a more profound understanding and a new measurement methodology. We propose a new measure of the extent of coup-proofing, utilizing a Bayesian item response theory. We estimate the extent of coup-proofing using a sample of 76 countries between 1965 and 2005 and theoretically relevant observed indicators. The findings from the estimation demonstrate that the extent of coup-proofing varies across regime type, country, and time. Furthermore, we verify the construct validity of our measurement.

Exploring the measurement of psychological resilience in Chinese civil aviation pilots based on generalizability theory and item response theory

Article

Full-text available

Jan 2024

Understanding and accurately measuring resilience among Chinese civil aviation pilots is imperative, especially concerning the psychological impact of distressing events on their well-being and aviation safety. Despite the necessity, a validated and tailored measurement tool specific to this demographic is absent. Addressing this gap, this study built on the widely used CD-RISC-25 to analyze and modify its applicability to Chinese civil aviation pilots. Utilizing CD-RISC-25 survey data from 231 Chinese pilots, correlational and differential analyses identified items 3 and 20 as incongruent with this population's resilience profile. Subsequently, factor analysis derived a distinct two-factor resilience psychological framework labeled “Decisiveness” and “Adaptability”, which diverged from the structure found in American female pilots and the broader Chinese populace. Additionally, to further accurately identify the measurement characteristics of this 2-factor measurement model, this study introduced Generalized Theory and Item Response Theory, two modern measurement analysis theories, to comprehensively analyze the overall reliability of the measurement and issues with individual items. Results showed that the 2-factor model exhibited high reliability, with generalizability coefficient reaching 0.89503 and dependability coefficient reaching 0.88496, indicating the 2-factor measurement questionnaire can be effectively utilized for relative and absolute comparison of Chinese civil aviation pilot resilience. However, items in Factor 2 provided less information and have larger room for optimization than those in Factor 1, implying item option redesign may be beneficial. Consequently, this study culminates in the creation of a more accurate and reliable two-factor psychological resilience measurement tool tailored for Chinese civil aviation pilots, while exploring directions for optimization. By facilitating early identification of individuals with lower resilience and enabling the evaluation of intervention efficacy, this tool aims to positively impact pilot psychological health and aviation safety in the context of grief and trauma following distressing events.

Integrated assessment of financial knowledge through a latent profile analysis

Article

Dec 2023

Knowledge is defined as a multi-faceted latent variable that is not directly measurable but through manifest variables, i.e., items. Latent variable models are, therefore, widely used in this context to analyze latent traits from items, usually expressed by ordinal variables. Finding homogeneous groups of units according to their knowledge levels turns out helpful to policymakers and to any other who has to take decisions into the domain. As a result, latent variables models are combined within integrated approaches to find homogeneous groups. The present work proposes a coordinated strategy combining the item response theory (IRT) models with the archetypal analysis (AA). The proposed method is applied to a data set of 625 Italian respondents to a survey conducted within the European project “Fintech and Artificial Intelligence in Finance”. Empirical evidence demonstrates that the proposed method is an effective and helpful tool to get homogeneous groups and their respective profiles according to the knowledge levels of the respondents based on their responses to the survey.

Bayesian inference with spike-and-slab priors for differential item functioning detection in a multiple-group IRT tree model

Article

Dec 2023

Estimation of individuals’ collaborative problem solving ability in computer-based assessment

Article

Full-text available

Nov 2023
Educ Inform Tech

In the human-to-human Collaborative Problem Solving (CPS) test, students’ problem-solving process reflects the interdependency among partners. The high interdependency in CPS makes it very sensitive to group composition. For example, the group outcome might be driven by a highly competent group member, so it does not reflect all the individual performances, especially for a low-ability member. As a result, how to effectively assess individuals’ performances has become a challenging issue in educational measurement. This research aims to construct the measurement model to estimate an individual’s collaborative problem-solving ability and correct the impact of partners’ abilities. First, 175 eighth graders’ dyads were divided into six cooperative groups with different levels of problem-solving (PS) ability combinations (i.e., high-high, high-medium, high-low, medium-medium, medium–low, and low-low). Then, they participated in the test of three CPS tasks, and the log data of the dyads were recorded. We applied Multidimensional Item Response Theory (MIRT) measurement models to estimate an individual’s CPS ability and proposed a mean correction method to correct the impact of group composition on individual ability. Results show that (1) the multidimensional IRT model fits the data better than the multidimensional IRT model with the testlet effect; (2) the mean correction method significantly reduced the impact of group composition on obtained individual ability. This study not only successfully increased the validity of individuals’ CPS ability measurement but also provided useful guidelines in educational settings to enhance individuals’ CPS ability and promote an individualized learning environment.

Development of a social and professional stress scale for parents of children with autism spectrum disorder

Article

Full-text available

Sep 2023

Parents of children with Autism Spectrum Disorder (ASD) may experience increased stress in their social and professional activities due to the challenges of raising a child with ASD. The present study developed a scale to measure the Social and Professional Stress (SPS) experienced daily by these parents. The study sample consisted of 255 parents residing in Brazil aged between 21 and 61 years (mean = 38, SD = 6.0). Item Response Theory (IRT) was used to develop the SPS-Scale, which showed good psychometric properties. Our findings indicated a higher level of SPS among mothers who are primary caregivers and who have children with symptoms of ASD at medium or severe levels. The child's age and the interviewee's marital status also showed an association with the SPS experienced by the parents. Overall, the SPS-Scale proved to be a valid instrument to measure the SPS experienced daily by parents of children or adolescents diagnosed with ASD.

An Insight to Estimated Item Response Matrix in Item Response Theory

Article

Full-text available

Jan 2023

Hideo Hirose

This paper investigates the performance of item response theory based on distance criteria rather than likelihood criteria. For this purpose, the estimated item response matrix is introduced. This matrix is a reconstruction of the item response matrix using maximum likelihood estimates of the parameters in item response theory. Then the distance between the observed and estimated matrices can be determined using the Frobenius matrix norm. An approximated low-rank matrix can be generated from the observed item response matrix by singular value decomposition, and the distance between the observed and low-rank matrices can be obtained in the same way. By comparing these two distances, we can evaluate the performance of the estimated item response matrix comparable to the performance of an approximated low-rank matrix. Applying this comparison to actual examination data, it is found that the rank of the approximated low-rank matrix that is equivalent to the estimated item response matrix is very low when using matrices as training data. However, using test data, the predictive ability of item response theory seems high enough since the minimum distance between the approximated low-rank matrix and the observed item response matrix is approximately equal to or slightly less than the distance between the estimated item response matrix and the observed item response matrix. This fact has been first discovered by utilizing the estimated item response matrix defined here.

RESEARCH CRITIQUE ON A THREE MODEL PARAMETER OF CHEMISTRY ACHIEVEMENT TEST AMONG SECONDARY SCHOOL STUDENTS IN IBADAN METROPOLIS, NIGERIA BY OLUYI RUTH EBUNOLUWA

Article

Full-text available

Jul 2023

The Interplay of Socioeconomic Status, Gender, and Age in Determining Physical Activity: Evidence from the China Family Panel Studies

Preprint

Full-text available

Jul 2023

Background Physical activity plays an integral role in promoting health and well-being. Despite its importance, comprehensive studies exploring the influences of socio-demographic factors on physical activity in the Chinese context are relatively scarce. This study aims to investigate the relationship between physical activity and socio-demographic factors such as gender, age, and socioeconomic status, using data from the 2018 China Family Panel Studies (CFPS). Methods Data was derived from the 2018 CFPS, resulting in a final sample size of 21,854 adults, with physical activity as the dependent variable. The International Socio-Economic Index of Occupational Status (ISEI) was used to gauge socioeconomic status. Other incorporated variables included gender, age, community type, marital status, physical health, and mental health. The study employed a logistic regression model considering the dichotomous nature of the dependent variable. Results Significant correlations were found between physical activity and gender, age, and socioeconomic status. Men were found to be more likely to engage in physical activity than women, and the likelihood of physical activity increased with age and socioeconomic status. Further, the influence of socioeconomic status on physical activity was found to vary significantly across different genders and age groups, with complex intersections noted among these factors. Conclusion The study underscores the need for public health interventions that are mindful of the complex interplay between gender, age, and socioeconomic status in influencing physical activity. Efforts to promote physical activity should focus on bridging the disparities arising from these socio-demographic factors, especially targeting women and individuals from lower socioeconomic classes. Future research should delve into the mechanisms through which these factors intersect and explore other potential influential elements to enhance our understanding of physical activity behavior.

flexCAT: 计算机化自适应测验开发平台

Article

Jun 2023

计算机化自适应测验（CAT）是一种全新的测验模式,采用了“因人施测”“量体裁衣”的自适应测量思想。与传统纸笔测验相比, 它具有提高测量精度，减少测验长度，保证测验安全等优势，因此深受国内外研究者和实践者的推崇。但CAT的平台搭建涉及复杂的统计测量理论以及繁琐的数值计算，阻碍了CAT在实践中应用推广。本文主要介绍计算机化自适应测验开发平台——flexCAT，用户借助flexCAT平台，利用便捷的人机交互页面可以快速搭建自己的CAT系统。本文将从优势特点，基础理论，模块功能等方面介绍国内首个基于网络的计算机化自适应测验开发平台——flexCAT，旨在为教育、心理等领域研究及应用人员免费提供自适应测试平台开发服务，进一步推动心理与教育测量理论与技术在中国的发展。flexCAT平台的网址为：http://www.psychometrics-studio.cn/app/cat_demo/index.html?Id=false&Block=false

D-scoring Method of Measurement: Classical and Latent Frameworks

Book

Full-text available

May 2023

Dimiter Dimitrov

D-scoring Method of Measurement (DSM) presents a unified framework of classical and latent measurement. Provided are detailed descriptions of DSM procedures and illustrative examples of how to apply the DSM in various scenarios of measurement. The DSM is designed to combine merits of the traditional CTT and IRT for the purpose of transparency, ease of interpretations, computational simplicity of test scoring and scaling, and practical efficiency. This book shows how practical applications of DSM procedures are facilitated by the inclusion of operationalized guidance for their execution that can be readily translated into computer source codes for popular software packages such as R.

Computational Aspects of Psychometric Methods: With R

Book

May 2023

Item Response Theory: Parameter Estimation Techniques

No full-text available

Recommended publications

The Impact of Variability of Item Parameter Estimators on Test Information Function

Item Response Theory: Parameter Estimation Techniques