Figure 1 - uploaded by Rob Meijer
Content may be subject to copyright.
Hypothetical person-response functions for three types of response behavior. A: Test anxiety. B: Item disclosure. C: Random response behavior.

Hypothetical person-response functions for three types of response behavior. A: Test anxiety. B: Item disclosure. C: Random response behavior.

Source publication
Article
Full-text available
Person-fit statistics test whether the likelihood of a respondent's complete vector of item scores on a test is low given the hypothesized item response theory model. This binary information may be insufficient for diagnosing the cause of a misfitting item-score vector. The authors propose a comprehensive methodology for person-fit analysis in the...

Contexts in source publication

Context 1
... assume that the resulting atypical item-score vector was detected by the U3 statistic. To facilitate the diagnosis of the cause of the misfit, we estimated the PRF ( Figure 1A) for this respondent. Given the effect of test anxiety described, the PRF started at a low value for the lower levels of item difficulty, increased for the items of average difficulty when test anxiety has diminished, and decreased when item dif- ficulty increased further. ...
Context 2
... return to this latter case in the Item disclosure section. For the PRF in Figure 1A, a local test statistic (Emons, 2003), to be explained below, may be used to determine whether the increase in the first 10 items is significant. When a signifi- cant local test result is found, the researcher may use the bell shape for further diagnostic decision-making, possibly tak- ing additional background information into account. ...
Context 3
... that the U3 person-fit statistic identified the resulting item-score vector as a misfit. A smooth estimate of the PRF shows a decrease for the easiest 40 items because with increasing item diffi- culty the probability of a correct answer decreases and then shows an increase for the 10 most difficult items because here the respondent gave an unusually high number of correct answers given the item difficulty level; see Figure 1B for this U-shaped PRF. The local test of the PRF may be used to investigate whether the increase in the last 10 items is significant. ...
Context 4
... that the item-score vector that was pro- duced by random response behavior on almost all items was identified by the U3 statistic. Figure 1C gives a near- horizontal PRF that resulted from an almost constant ran- dom response behavior probability of .25 for all J items. This PRF does not deviate noticeably from monotone non- increasingness, and the local test cannot be applied here. ...
Context 5
... about use of other information. A near-hori- zontal PRF, as in Figure 1C, that is typical of randomly responding cannot be distinguished from a similar PRF that would result from test anxiety for a low-ability respondent or test anxiety for higher ability respondents that resulted from serious panic. Here, other auxiliary information about the respondent may be helpful when evaluating item- score vectors. ...
Context 6
... assume that a respondent takes different versions of the same test several times per year, for example to measure cognitive improvement after therapy. Given this knowledge, for a high-ability respondent who took the first version of this test, a PRF like that in Figure 1C would probably indicate random response behavior. In this situa- tion, no additional action needs to be taken. ...
Context 7
... was concluded that each application requires some trial and error to find the best compromise. The PRFs in Figure 1 were estimated using this kernel-smoothing procedure. ...
Context 8
... procedure demonstrates that the confidence enve- Figure 10) was obtained using Equation 11. Table 2 gives the results of the local person-fit tests. ...
Context 9
... Case 1, the PRF shows a local increase for the first four subsets, A 1 through A 4 ( Figure 10A). We combined these subsets into one vector, Y, and counted the number of Guttman errors, G. ...

Similar publications

Article
Full-text available
Several graphical methods for testing univariate composite normality from an i.i.d. sample are presented. They are endowed with correct simultaneous error bounds and yield size-correct tests. As all are based on the empirical CDF, they are also consistent for all alternatives. For one test, called the modified stabilized probability test, or MSP, a...

Citations

... Another index proposed by the same author is U3 (van der Flier, 1977; see also Emons et al., 2005): This estimates the Guttman pattern with a specific set of weights wg = ln (πg/1-πg). Large values are again indicative of aberrant responding in the form of carelessness, inattention, lack of motivation, guessing, or randomness. ...
Article
Full-text available
The goal of the present study was to compare and contrast the efficacy of a multistage testing (MST) design using three paths compared to a traditional computer-based testing (CBT) approach involving items across all ability levels. Participants were n = 627 individuals who were subjected to both a computer-based testing (CBT) instrument and a measure constructed using multistage testing to route individuals of low, middle, and high ability to content that was respective to their ability level. Comparisons between the medium of testing involved person ability accuracy estimates and evaluation of aberrant responding. The results indicated that MST assessments deviated markedly from CBT assessments, especially for low- and high-ability individuals. Test score accuracy was higher overall in MST compared to CBT, although error of measurement was enhanced for high-ability individuals during MST compared to CBT. Evaluating response patterns indicated significant amounts of Guttman-related errors during CBT compared to MST using person-fit aberrant response indicators. It was concluded that MST is associated with significant benefits compared to CBT.
... Several studies confirmed the efficacy of U3 as an index of inattentive responding (e.g., Beck et al., 2019). The index reflects the ratio of the actual number of Guttman errors in a response pattern relative to the maximum number of errors using the log scale (Emons et al., 2005). It is being estimated as follows: ...
Article
Full-text available
The goal of the present study was to propose a visualization of aberrant response patterns based on the idea put forth by the Cronbach-Mesbach curve. First, an index of person reliability is developed using the K-R 20 formula followed by a backward stepwise procedure in which one person at a time is deleted from the model. Observations for which reliability is no longer monotonically increasing suggest that they are candidates for aberrant responding. Using data from the quantitative domain of a national aptitude test the proposed visualization technique was demonstrated. The external validity of the procedure was tested by contrasting the person fit reliability estimates with those derived from other indices of aberrant responding such as the Ht. Results indicated that individuals not covarying with other individuals concerning their response patterns and concordance to the measurement of a unified latent trait were identified by both the present procedure and Ht and U3 at a rate of 100%. By plotting those individuals using Person Response Curves (PRCs) results confirmed the lack of monotonicity in the relationship between item difficulty and person skill. Consequently, results confirm the usefulness of the present methodology as an index for identifying responders who manifest themselves with aberrant responses and who are not conducive to the measurement of the latent trait.
... Person fit research examines the pattern of individual item scores in comparison with what has been expected on the basis of the measurement model (Emons et al., 2005). Traditionally, one aspect of model-data fit is assessed using person fit statistics based on response residuals (Walker et al., 2016). ...
... For a particular person, the person response function (PRF) is a plot relating the probability of endorsing or correctly responding to items at various difficulty levels (Embretson & Reise, 2000). PRFs can be used to identify various categories of response behaviors. ...
... PRFs represent the relationship between observed person responses and an underlying construct represented by a set of items that are located on a latent variable. There has been consistent interest in this area including research conducted by Thurstone (1926), Mosier (1940Mosier ( ), (1941, Lumsden (1977), Trabin andWeiss (1979), Carroll (1985), Sijtsma and Meijer (2001), and Emons et al. (2005), Walker et al. (2018, and Carroll and Schohan (1953) first suggested estimating the probabilities of endorsing or correctly responding to the various items in order to make score interpretations clear. Furthermore, they suggested representing these probabilities graphically, and called such graphs individual operating characteristics. ...
... For careful responders, the correlation between synonym (antonym) pairs should be strongly positive (negative); respondents with low correlations (or maybe even correlations that have the wrong sign) are considered careless responders. We will not discuss person-fit statistics here because they would require a more indepth discussion of item-response theory and because they are not common in practice; the interested reader is referred to Emons et al. (2005). ...
... Finally, by comparing the students' predicted results with the actual test results, we can judge whether the students cheat or not. George Karabatsos et al. proposed a variety of cheating detection methods based on Person-Fit index [28][29][30][31]. ...
... Global aberrant response means individuals respond aberrantly throughout the test, while local aberrant response means individuals respond aberrantly to a subset of items in a test. Some researchers begin to focus on how to distinguish global aberrant response and local aberrant response by using person-fit indices or other statistical methods (Emons, Sijtsma, & Meijer, 2005, 2004Ferrando & Lorenzo-Seva, 2015;Liu, Lan, & Xin, 2016). In fact, local aberrant response may offer some diagnostic information. ...
Article
Aberrant response has an important impact on item parameter estimation, individuals’ evaluation, and other statistical analysis. There are various types of aberrant response behaviors in educational and psychological tests, like sleeping, guessing, and plodding. Random response is the most common one. The purpose of this research was to clarify the impact of random response on reliability and validity. Both simulation study and empirical study were conducted. In simulation study, one pilot experiment and two formal experiments were fulfilled to discuss the impact on construct validity. There were five factors that were taken into account. They were the rate of aberrant response individuals, the number of dimensions, the number of items per dimension, the correlations between dimensions, and the types of random response (global random response and local random response). Five fit indices based on confirmatory factor analysis model were calculated to evaluate construct validity. And response data were generated by multidimensional item response model. In empirical study, Nomophobia Scale and Freshman Adaptation Scale were used to investigate the impact of random response on criterion validity and test–retest reliability, respectively.
... In addition, satisficing is not measured directly but inferred from response behavior that is presumably due to respondents' reluctance or inability to expend the required cognitive resources. Included in this category are techniques for identifying unusual observations such as outlier detection based on the Mahalanobis D statistic (Meade and Craig, 2012) and person-fit 22 statistics developed in the item response theory literature (Conijn, Emons, and van Assen, 2013;Emons, Sijtsma, and Meijer, 2005;Meijer, 2003;Reise and Widaman, 1999). Although these methods may be useful in identifying aberrant response behavior, it is doubtful that such behavior is necessarily due to satisficing. ...
... Another example is so-called "sleeping behavior," in which the student underperforms on the first questions in the test. Various forms of cheating may, of course, occur as well; students raise their score artificially through correct answers to test questions having been divulged or the student may have obtained the correct answer from another student completing the same test (Emons, Sijtsma, & Meijer, 2005;Meijer & Sijtsma, 2001). ...
... They can therefore only be used to detect aberrant behavior, but most person-fit statistics are not sufficient to determine the specific type of aberrant behavior. This is why, for instance, Emons et al. (2005) propose a three-phase analytical procedure involving, among other things, graphical analysis, and local analysis, in order to provide deeper insight into the possible causes of aberrant behavior. This analytical procedure, however, exceeds the scope of this study. ...
Article
The aim of this paper is to explore students' behavior and interaction patterns in different types of online quiz-based activities within learning management systems (LMS). Analyzing students' behavior in online learning activities and detecting specific patterns of interaction in LMS is a topic of great interest for the educational data mining (EDM) and learning analytics (LA) research communities. Previous studies have focused primarily on frequency analysis without addressing the temporal aspects of students' learning behavior. Therefore, we apply a process-oriented approach, investigating perspectives on using process mining methods in the context of online learning and assessment. To explore a broad range of possible student behavior patterns, we analyze students' interactions in several online quizzes from different courses and with different settings. Using process mining methods, we identify specific types of interaction sequences that shed new light on students' quiz-taking strategies in LMS. We believe that these findings bring important implications for researchers studying student behavior in online environments as well as practitioners using online quizzes for learning and assessment.
... Variation in results across independent item calibration studies of the same item set would depend in part upon how and how much non-random variation in item ''inappropriateness'', ''item differential functioning'', or respondent ''response set'' occurs across their different (neither random nor assuredly representative) samples of persons and measurement occasions (e.g., de Ayala 2009, pp. 409-413, 417-418, and 323-345;Emons et al. 2005;Meijer and Sijtsma 1995;Roussos and Stout 2004;Zumbo 2007). Therefore, simply to assume that any given calibration opportunity-sample of n persons on some given occasion or the average of any obtained set of such samples can properly be relied upon to produce an accurate person and occasion population estimate of a given item set's m ICCs is fanciful. ...
Article
Full-text available
If items have different levels of difficulty (or sensitivity) relative to some psychological attribute, passing (or endorsing) any one cannot mean the same about a person as passing any other, so percent of items passed regardless of which these are cannot indicate a person’s level on any attribute. If persons have different levels on a psychological attribute, an item’s being passed by one person cannot mean the same about its difficulty level as being passed by any other person, so percent of persons passing it regardless of which persons these are cannot indicate the item’s difficulty level. Percent of items passed by a person and percent of persons passing an item are incommensurate quantities not expressible in terms of the same quality or dimension. Both such percents are dependent on what sample of items and of persons are used. A person’s attribute level is not demonstrably probabilistic, because truly independent replicate occasions of a person responding to an item are impossible. Passing an item depends on more than a person’s single attribute level, the item’s difficulty level, and random chance. On all these matters Item Response Theory relies on assumptions that are logically unjustifiable.
... Practical person-fit indices are non-specific screening devices for tracing potentially inconsistent respondents. Ideally, however, once a pattern has been flagged as potentially inconsistent, further information should be obtained regarding (among other things) (a) the type of inconsistency, and (b) the impact that the inconsistency has on the trait estimates (Emons et al., 2005). FA-based analytical and graphical procedures for obtaining this information already exist and are implemented in stand-alone programs . ...
Article
Full-text available
Linear factor analysis (FA) is, possibly, the most widely used model in psychometric applications based on graded-response or more continuous items. However, in these applications consistency at the individual level (person fit) is virtually never assessed. The aim of the present study is to propose a simple and workable approach to routinely assess person fit in FA-based studies. To do so, we first consider five potentially appropriate indices, of which one is a new proposal and the other is a modification of an existing index. Next, the effectiveness of these indices is assessed by using (a) a thorough simulation study that attempts to mimic realistic conditions, and (b) an illustrative example based on real data. Results suggest that the mean-squared lico index and the personal correlation work well in conjunction and can function effectively for detecting different types of inconsistency. Finally future directions and lines of research are discussed.