Hypothetical person-response functions for three types of response behavior. A: Test anxiety. B: Item disclosure. C: Random response behavior.

Source publication

Global, Local, and Graphical Person-Fit Analysis Using Person-Response Functions

Article

Full-text available

Mar 2005

Person-fit statistics test whether the likelihood of a respondent's complete vector of item scores on a test is low given the hypothesized item response theory model. This binary information may be insufficient for diagnosing the cause of a misfitting item-score vector. The authors propose a comprehensive methodology for person-fit analysis in the...

Context 1

... assume that the resulting atypical item-score vector was detected by the U3 statistic. To facilitate the diagnosis of the cause of the misfit, we estimated the PRF ( Figure 1A) for this respondent. Given the effect of test anxiety described, the PRF started at a low value for the lower levels of item difficulty, increased for the items of average difficulty when test anxiety has diminished, and decreased when item dif- ficulty increased further. ...

View in full-text

Context 2

... return to this latter case in the Item disclosure section. For the PRF in Figure 1A, a local test statistic (Emons, 2003), to be explained below, may be used to determine whether the increase in the first 10 items is significant. When a signifi- cant local test result is found, the researcher may use the bell shape for further diagnostic decision-making, possibly tak- ing additional background information into account. ...

View in full-text

Context 3

... that the U3 person-fit statistic identified the resulting item-score vector as a misfit. A smooth estimate of the PRF shows a decrease for the easiest 40 items because with increasing item diffi- culty the probability of a correct answer decreases and then shows an increase for the 10 most difficult items because here the respondent gave an unusually high number of correct answers given the item difficulty level; see Figure 1B for this U-shaped PRF. The local test of the PRF may be used to investigate whether the increase in the last 10 items is significant. ...

View in full-text

Context 4

... that the item-score vector that was pro- duced by random response behavior on almost all items was identified by the U3 statistic. Figure 1C gives a near- horizontal PRF that resulted from an almost constant ran- dom response behavior probability of .25 for all J items. This PRF does not deviate noticeably from monotone non- increasingness, and the local test cannot be applied here. ...

View in full-text

Context 5

... about use of other information. A near-hori- zontal PRF, as in Figure 1C, that is typical of randomly responding cannot be distinguished from a similar PRF that would result from test anxiety for a low-ability respondent or test anxiety for higher ability respondents that resulted from serious panic. Here, other auxiliary information about the respondent may be helpful when evaluating item- score vectors. ...

View in full-text

Context 6

... assume that a respondent takes different versions of the same test several times per year, for example to measure cognitive improvement after therapy. Given this knowledge, for a high-ability respondent who took the first version of this test, a PRF like that in Figure 1C would probably indicate random response behavior. In this situa- tion, no additional action needs to be taken. ...

View in full-text

Context 7

... was concluded that each application requires some trial and error to find the best compromise. The PRFs in Figure 1 were estimated using this kernel-smoothing procedure. ...

View in full-text

Context 8

... procedure demonstrates that the confidence enve- Figure 10) was obtained using Equation 11. Table 2 gives the results of the local person-fit tests. ...

View in full-text

Context 9

... Case 1, the PRF shows a local increase for the first four subsets, A 1 through A 4 ( Figure 10A). We combined these subsets into one vector, Y, and counted the number of Guttman errors, G. ...

View in full-text

Figure 4. Power of Q-Q test for normality, for three different sample...

Figure 8. Power of the MSP test for normality, for three different...

Figure 19. (Left) The power of the JB test (8) against the alternative...

New Graphical Methods and Test Statistics for Testing Composite Normality

Article

Full-text available

Jul 2015

Marc S. Paolella

Several graphical methods for testing univariate composite normality from an i.i.d. sample are presented. They are endowed with correct simultaneous error bounds and yield size-correct tests. As all are based on the empirical CDF, they are also consistent for all alternatives. For one test, called the modified stabilized probability test, or MSP, a...

Contrasting multistage and computer-based testing: score accuracy and aberrant responding

Article

Full-text available

Dec 2023

The goal of the present study was to compare and contrast the efficacy of a multistage testing (MST) design using three paths compared to a traditional computer-based testing (CBT) approach involving items across all ability levels. Participants were n = 627 individuals who were subjected to both a computer-based testing (CBT) instrument and a measure constructed using multistage testing to route individuals of low, middle, and high ability to content that was respective to their ability level. Comparisons between the medium of testing involved person ability accuracy estimates and evaluation of aberrant responding. The results indicated that MST assessments deviated markedly from CBT assessments, especially for low- and high-ability individuals. Test score accuracy was higher overall in MST compared to CBT, although error of measurement was enhanced for high-ability individuals during MST compared to CBT. Evaluating response patterns indicated significant amounts of Guttman-related errors during CBT compared to MST using person-fit aberrant response indicators. It was concluded that MST is associated with significant benefits compared to CBT.

Identifying person misfit using the person backward stepwise reliability curve (PBRC)

Article

Full-text available

Oct 2023

The goal of the present study was to propose a visualization of aberrant response patterns based on the idea put forth by the Cronbach-Mesbach curve. First, an index of person reliability is developed using the K-R 20 formula followed by a backward stepwise procedure in which one person at a time is deleted from the model. Observations for which reliability is no longer monotonically increasing suggest that they are candidates for aberrant responding. Using data from the quantitative domain of a national aptitude test the proposed visualization technique was demonstrated. The external validity of the procedure was tested by contrasting the person fit reliability estimates with those derived from other indices of aberrant responding such as the Ht. Results indicated that individuals not covarying with other individuals concerning their response patterns and concordance to the measurement of a unified latent trait were identified by both the present procedure and Ht and U3 at a rate of 100%. By plotting those individuals using Person Response Curves (PRCs) results confirmed the lack of monotonicity in the relationship between item difficulty and person skill. Consequently, results confirm the usefulness of the present methodology as an index for identifying responders who manifest themselves with aberrant responses and who are not conducive to the measurement of the latent trait.

Functional Data Analysis and Person Response Functions

Article

Full-text available

Aug 2023
Measurement

How to Identify Careless Responders in Surveys

Chapter

Full-text available

Sep 2022

A Multi-index Examination Cheating Detection Method Based on Neural Network

Conference Paper

Full-text available

Nov 2019

The Impact of Aberrant Response on Reliability and Validity

Article

Jul 2019
Measurement

Aberrant response has an important impact on item parameter estimation, individuals’ evaluation, and other statistical analysis. There are various types of aberrant response behaviors in educational and psychological tests, like sleeping, guessing, and plodding. Random response is the most common one. The purpose of this research was to clarify the impact of random response on reliability and validity. Both simulation study and empirical study were conducted. In simulation study, one pilot experiment and two formal experiments were fulfilled to discuss the impact on construct validity. There were five factors that were taken into account. They were the rate of aberrant response individuals, the number of dimensions, the number of items per dimension, the correlations between dimensions, and the types of random response (global random response and local random response). Five fit indices based on confirmatory factor analysis model were calculated to evaluate construct validity. And response data were generated by multidimensional item response model. In empirical study, Nomophobia Scale and Freshman Adaptation Scale were used to investigate the impact of random response on criterion validity and test–retest reliability, respectively.

Measurement in Marketing

Article

Jan 2019

Using process mining to analyze students' quiz-taking behavior patterns in a learning management system

Article

Dec 2017
COMPUT HUM BEHAV

The aim of this paper is to explore students' behavior and interaction patterns in different types of online quiz-based activities within learning management systems (LMS). Analyzing students' behavior in online learning activities and detecting specific patterns of interaction in LMS is a topic of great interest for the educational data mining (EDM) and learning analytics (LA) research communities. Previous studies have focused primarily on frequency analysis without addressing the temporal aspects of students' learning behavior. Therefore, we apply a process-oriented approach, investigating perspectives on using process mining methods in the context of online learning and assessment. To explore a broad range of possible student behavior patterns, we analyze students' interactions in several online quizzes from different courses and with different settings. Using process mining methods, we identify specific types of interaction sequences that shed new light on students' quiz-taking strategies in LMS. We believe that these findings bring important implications for researchers studying student behavior in online environments as well as practitioners using online quizzes for learning and assessment.

Item response theory requires logically unjustifiable assumptions

Article

Full-text available

Jul 2017
QUAL QUANT

Merton S. Krause

If items have different levels of difficulty (or sensitivity) relative to some psychological attribute, passing (or endorsing) any one cannot mean the same about a person as passing any other, so percent of items passed regardless of which these are cannot indicate a person’s level on any attribute. If persons have different levels on a psychological attribute, an item’s being passed by one person cannot mean the same about its difficulty level as being passed by any other person, so percent of persons passing it regardless of which persons these are cannot indicate the item’s difficulty level. Percent of items passed by a person and percent of persons passing an item are incommensurate quantities not expressible in terms of the same quality or dimension. Both such percents are dependent on what sample of items and of persons are used. A person’s attribute level is not demonstrably probabilistic, because truly independent replicate occasions of a person responding to an item are impossible. Passing an item depends on more than a person’s single attribute level, the item’s difficulty level, and random chance. On all these matters Item Response Theory relies on assumptions that are logically unjustifiable.

Practical Person-Fit Assessment with the Linear FA Model: New Developments and a Comparative Study

Article

Full-text available

Dec 2016

Linear factor analysis (FA) is, possibly, the most widely used model in psychometric applications based on graded-response or more continuous items. However, in these applications consistency at the individual level (person fit) is virtually never assessed. The aim of the present study is to propose a simple and workable approach to routinely assess person fit in FA-based studies. To do so, we first consider five potentially appropriate indices, of which one is a new proposal and the other is a modification of an existing index. Next, the effectiveness of these indices is assessed by using (a) a thorough simulation study that attempts to mimic realistic conditions, and (b) an illustrative example based on real data. Results suggest that the mean-squared lico index and the personal correlation work well in conjunction and can function effectively for detecting different types of inconsistency. Finally future directions and lines of research are discussed.

Hypothetical person-response functions for three types of response behavior. A: Test anxiety. B: Item disclosure. C: Random response behavior.

Contexts in source publication

Similar publications

Citations