
Coefficient alpha and related internal consistency reliability coefficients

To read the full-text of this research, you can request a copy directly from the author.


The author studied the conditions under which coefficient alpha and 10 related internal consistency reliability coefficients underestimate the reliability of a measure. Simulated data showed that alpha, though reasonably robust when computed on n components in moderately heterogeneous data, can under certain conditions seriously underestimate the reliability of a measure. Consequently, alpha, when used in corrections for attenuation, can result in nontrivial overestimation of the corrected correlation. Most of the coefficients studied, including lambda 2, did not improve the estimate to any great extent when the data were heterogeneous. The exceptions were stratified alpha and maximal reliability, which performed well when the components were grouped into two subsets, each measuring a different factor, and maximized lambda 4, which provided the most consistently accurate estimate of the reliability in all simulations studied.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Alfa katsayısının bu sınırlılıklarına karşı, Cronbach ve diğerleri (1965) alt boyutlara sahip ölçme araçlarından elde edilen birleşik puanların güvenirliği için tabakalı alfa (stratified alpha) katsayısının kullanımı önermiştir. Bazı araştırmacılar çok boyutlu yapılarda tabakalı alfa katsayısının gerçek güvenirliğin iyi bir kestiricisi olduğunu raporlamışlardır (Kamata vd., 2003;Osburn, 2000, Revelle ve Zinbarg, 2009Tang ve Cui, 2012). Konjenerik ölçme yapılarında güvenirlik kestirimleri için önerilen bir diğer katsayı omegadır (McDonald, 1999). ...
... Bu kapsamda alanyazında çok boyutlu veya bileşenli veriler üzerinde yapılan güvenirlik analizi çalışmalarının sınırlı olduğu görülmektedir (Brunner ve SÜβ, 2005;Cortina, 1993;Kamata vd., 2003;Raykov vd., 2015;Serbetar ve Sedlar, 2016;Viladrich vd., 2017;Watkins, 2017, Widhiarso ve Ravand, 2014. Bu araştırmalar arasında maddeler arası korelasyonu manipüle ederek araştırma yapan az sayıda çalışmayla (Cortina, 1993;Kamata vd., 2003;Osburn, 2000) karşılaşılmıştır. ...
... Bu örnekler, gerçek hayatta buna benzer sonuçlar verebilecek veri setleriyle karşılaşabileceğimizin bir kanıtıdır ve aslında güvenilir olan bir ölçme sonucunun alfa katsayısı ile güvenilir olmadığı yönünde yanlış karar verilmesine neden olabileceği ihtimalini de ortaya koymaktadır.Bir ölçme aracından elde edilen ölçümlerin makul düzeyde güvenilir olması gerekir ancak araştırmacıların tek odak noktası bu olmamalıdır. Alfa katsayısının eşdeğer olmayan ölçmelerde gerçek güvenirliğin alt sınırını verdiği(Osburn, 2000) bilgisine dayanarak varsayımlarının karşılanmadığı durumlarda bu katsayı, yalnızca 0,70 sınır değerinin üzerinde kestirim yaptığı için raporlanmamalıdır. Burada odaklanılması gereken en önemli nokta ölçümlerin güvenirliğinin gerçek değerinin veya gerçeğe en yakın değerinin raporlanması olmalıdır. ...
Bu çalışmada çok boyutlu ölçümlerde iç tutarlık anlamında güvenirliğin değerlendirilmesinde Cronbach alfa katsayısı yerine daha uygun alternatiflerini kullanma konusunda farkındalık oluşmasına katkı sağlamak amaçlanmıştır. Bu amaç kapsamında örneklem büyüklüğü (100, 200 ve 400), boyutlar arası korelasyon (0,00 ve 0,50), maddeler arası korelasyon (0,30-0,50 ve ≥0,75), test uzunluğu (10 ve 20) ve boyutlara düşen madde sayısı (5-5, 7-3, 10-10 ve 15-5) koşulları altında iki boyutlu basit ve karmaşık test yapılarında üretilen verilerde alfa, tabakalı alfa ve omega katsayılarının performansı incelenmiştir. Ayrıca araştırmacılara tabakalı alfa ve omega katsayılarının manuel olarak kolaylıkla hesaplanabildiğini göstermek amacıyla örnekler sunulmuştur. Veriler R programı ile psych ve sirt paketleri kullanılarak analiz edilmiştir. Araştırmanın sonucunda güvenirlik kestirimleri üzerinde en fazla etkisi olan değişkenlerin sırasıyla maddeler arası korelasyon ve boyutlar arası korelasyon olduğu bulunmuştur. Test uzunluğunun ilişkisiz modellerde maddeler arası düşük korelasyon koşulu altında güvenirlik kestirimleri üzerinde daha fazla etkiye sahip olduğu görülmüştür. Örneklem büyüklüğü ortalama güvenirlik kestirimleri üzerinde etkili bulunmamıştır ancak kestirimlerin hatası üzerinde etkili bulunmuştur. Omega ve tabakalı alfa katsayılarının performansı benzer bulunurken alfa katsayısı ise bu katsayılardan daha düşük güvenirlik değerleri kestirmiştir. Hem alanyazına hem de mevcut çalışmanın sonuçlarına dayalı olarak araştırmacılara, çok boyutlu ölçümlerde iç tutarlılık anlamında güvenirlik kanıtı olarak alfa katsayısını kullanmamaları ya da tek başına kullanmamaları; bunun yerine omega, tabakalı alfa gibi daha uygun katsayıları kullanıp raporlamaları önerilir.
... For the goodness of fit tests, we calculated the chi square index (chi square fit index/degrees of freedom (χ 2 /df )), comparative fit index (CFI), general fit index (GFI), normalized fit index (NFI), and rootmean-square error (RMSEA). 22,23 The values CFI, GFI, NFI > 0.90, χ 2 / df < 5, RMSEA < 0.0854 represented acceptable goodness of fit. 21,22 For the reliability analysis, the Cronbach Alpha value was calculated. ...
... Cronbach Alpha value < 0.80 represented high reliability of a scale. 23 We correlated DFS-TR and STAI-2 using Pearson Correlation Analysis for convergent validity. Correlation values between <0.50 and <0.69 represented moderate, <0.70 and <0.89 high correlation. ...
... Reliability analysis indicates how much a scale consistently represents the construct that aims to measure. 23 A commonly used method to test the internal consistency of a scale is to calculate its Cronbach's alpha value. 23 The recommended alpha value for a scale with high internal consistency needs to be higher than 0.80. ...
Full-text available
Objective: This study aims to investigate the psychometric properties of the Turkish version of the Dark Future Scale that measures future anxiety. Methods: The sample consisted of 478 university students aged 18-25 and used convenience sampling. They completed an online survey about sociodemographics, tobacco use, and life satisfaction, Dark Future Scale and Trait Anxiety Inventory-2 Trait Scale. Confirmatory factor analysis and Cronbach alpha values were used to test scale’s structural validity and reliability. For convergent validity, we correlated the Turkish version of the Dark Future Scale with trait anxiety and examined the mean differences in smoking status and its association with life satisfaction. Results: Majority of the participants were female (73.6%), with a mean age of 21.5 (SD=1.67). Majority (53.6%) were regular tobacco users. Results of the confirmatory factor analysis revealed a 1-factor solution to be most optimal (χ2=17.091, df=4, P=.002, χ2 /df=4.3, and root-mean-square error=0.083, comparative fit index=0.988, general fit index=0.986, The Adjusted Goodness of Fit (AGFI)=0.986, normalized fit index=0.985). The alpha value for the scale reliability was 0.86. Turkish version of the Dark Future Scale was also significantly and positively correlated with trait anxiety (r(478)=.67, P <.01). Exploration of the association between smoking status and Turkish version of the Dark Future Scale showed that the mean score was significantly higher among smokers (M=19.1, SD=6.65) than nonsmokers (M=17.7, SD=7.69). Lastly, higher future anxiety was associated with lower life satisfaction (r(478)=-0.42, P < .01). Conclusion: Turkish version of the Dark Future Scale is a reliable and valid scale to measure future anxiety. A brief and easy to apply, reliable, and valid future anxiety measure may be useful for many researchers in psychology and psychiatry.
... Mental health data were collected (a) before the COVID- 19 17 The PSS is a ten-item scale scored with a five-point Likert scale (0 'never' to 5 'very often'), with a possible range of 0-40. Stress levels were categorised as low (score of 0-13), medium (score of [14][15][16][17][18][19][20][21][22][23][24][25][26] or high (score of [27][28][29][30][31][32][33][34][35][36][37][38][39][40]. The reliability of the Czech version of the PSS in a single-factor approach was ɑ = 0.91. ...
... These values are acceptable given that the B-IPQ measures different aspects of COVID-19 perception at the item level, which increases the heterogeneity of higher-order factors. 29,30 The longitudinal stability (ICC) was 0.71 for cognitive perception and 0.87 for emotional perception of COVID-19 disease. In agreement with the meta-analytic study about the B-IPQ, 31 we consider the validity and reliability of the instrument to be sufficiently established. ...
Full-text available
Background Although several studies have documented the impact of the COVID-19 pandemic on mental health, the long-term effects remain unclear. Aims To examine longitudinal changes in mental health before and during the consecutive COVID-19 waves in a well-established probability sample. Method An online survey was completed by the participants of the COVID-19 add-on study at four time points: pre-COVID-19 period (2014–2015, n = 1823), first COVID-19 wave (April to May 2020, n = 788), second COVID-19 wave (August to October 2020, n = 532) and third COVID-19 wave (March to April 2021, n = 383). Data were collected via a set of validated instruments, and analysed with latent growth models. Results During the pandemic, we observed a significant increase in stress levels (standardised β = 0.473, P < 0.001) and depressive symptoms (standardised β = 1.284, P < 0.001). The rate of increase in depressive symptoms (std. covariance = 0.784, P = 0.014), but not in stress levels (std. covariance = 0.057, P = 0.743), was associated with the pre-pandemic mental health status of the participants. Further analysis showed that secondary stressors played a predominant role in the increase in mental health difficulties. The main secondary stressors were loneliness, negative emotionality associated with the perception of COVID-19 disease, lack of resilience, female gender and younger age. Conclusions The surge in stress levels and depressive symptoms persisted across all three consecutive COVID-19 waves. This persistence is attributable to the effects of secondary stressors, and particularly to the status of mental health before the COVID-19 pandemic. Our findings reveal mechanisms underlying the surge in mental health difficulties during the COVID-19 waves, with direct implications for strategies promoting mental health during pandemics.
... The number of items on the Cronbach's alpha coefficient scale has an effect on its structure. In scales with few items, Cronbach's alpha may produce lower values than the genuine dependability value (Osburn, 2000). As a result, it is advised that the Cronbach's alpha coefficient from reliability estimates for scales with few elements not be used (Deng and Chan, 2017). ...
... Cronbach alfa katsayısı ölçeği yapı olarak ölçeği meydana getiren madde sayısından etkilenmektedir. Az maddeden meydana gelen ölçeklerde Cronbach alfa katsayısı gerçek güvenirlik değerinden daha düşük sonuçlar verebilmektedir (Osburn, 2000). Bu sebepten ötürü madde sayısı fazla olmayan faktörlere sahip olan ölçeklere ilişkin yapılan güvenirlik hesaplamalarından Cronbach alfa katsayısının kullanılmaması önerilmektedir (Erkuş, 1999). ...
... In the Test subsample, confirmatory analyses have been conducted to estimate the differences in reliability and validity of the original (1-factor) and best-fitting (obtained in ex- ploratory analyses) models. We evaluated reliability using Cronbach's α and McDonalds's ω [21,22] and the discriminant and convergent validity using the Heterotrait-monotrait Ratio of Correlations (HTMTs) between and within measures [23]. Finally, the measurement invariance procedure was conducted to assess the measurement bias of FTDKS in the population of caregivers or professionals and adults without medical training [24]. ...
... Internal consistency was determined using Cronbach's α [21] and McDonalds's ω with Schmid-Leiman exploratory factor extraction [22] coefficients in the Training subsample. Table 2 presents the summary of results from the reliability analysis. ...
Introduction: Taking into account a progressive increase in the number of individuals affected by dementia and the importance of being knowledgeable about its symptoms, it has become crucial to develop well-validated instruments for measuring knowledge about dementia. The aim of this study was to translate and validate the Frontotemporal Dementia Knowledge Scale (FTDKS) in a Polish population. Methods: The FTDKS was translated into the Polish language based on the most highly recommended methodological approaches for translating and validating instruments for cross-cultural healthcare research. Psychometric properties were evaluated in a sample of 869 individuals (general population, healthcare professionals, and caregivers) who completed the questionnaire. The reliability of the FTDKS was tested as an internal consistency using both Cronbach's alpha and McDonald's omega factor analysis. The convergent and discriminant validity was assessed using the Heterotrait-monotrait Ratio of Correlation between scores of FTDKS, vocabulary intelligence, and Alzheimer's Disease Knowledge Scale (AKDS). Results: The results indicate that the scale produces satisfactory psychometric properties (Cronbach's alpha and McDonald's omega over 0.80). The internal consistency was slightly higher in the population of healthcare professionals and caregivers than among the general population. Discussion: The internal consistency of the Polish version of FTDKS demonstrates a similar validity to the original version. The FTDKS can be used to evaluate the effectiveness of educational interventions among caregivers, healthcare professionals, and the general population.
... The Cronbach α values between 0.60 and 0.8 represents acceptable to satisfactory internal consistency, above 0.8 represents very good internal consistency, and above 0.9 excellent internal consistency. 51 ...
Full-text available
Background and Aims Accurate assessment of any patient relies on the use of appropriate measurements which are culturally‐ and linguistically‐applicable and valid. The following study aimed to translate, cross‐culturally adapt and test the nomological validity, structural validity, internal consistency, test‐retest reliability, sensitivity‐to‐change and feasibility of the Swahili version of the Pain Catastrophizing Scale (Swa‐PCS) among refugees who survived torture/war trauma living with chronic pain in Kenya. Methods An observational study was conducted. Translation and cultural adaptation of the original PCS for the Swahili‐speaking refugee population in Kenya, who survived torture or war trauma was undertaken. Following this process, a validation study was conducted on the newly‐adapted instrument, to ascertain the psychometric properties (nomological validity, structural validity, internal consistency, test‐retest reliability, sensitivity‐to change, and ceiling and floor effects). Results Fifty participants were included in this study. Correlations between pain catastrophization and fear‐avoidance behavior measures were significant (r = 0.538, p < 0.01). Ceiling effects were 42−48% with no floor effects. Standard errors of measurement values were between 0.938 and 3.38. Minimal‐detectable‐change values were between 2.17 and 7.82. Internal consistency was satisfactory to good, for the whole and subsections respectively (range α = 0.693−0.845). Magnification had the lowest α. Test−retest reliability was also satisfactory to good (range ICC = 0.672−0.878). Confirmatory factor analysis confirmed that the Swa‐PCS had three factors which explained the majority of the variance. Root mean square error of approximation and comparative fit index were calculated for goodness‐of‐fit assessment, and were 0.18 and 0.83, respectively. Conclusion This study showed that the adapted Swa‐PCS displayed overall satisfactory to good internal consistency, test‐retest reliability and sensitivity‐to‐change. Furthermore, the Swa‐PCS scores were related to fear‐avoidance behavior scores as expected (nomological validity). Structural validation of the Swa‐PCS requires further investigation. Further testing of the psychometric properties of the Swa‐PCS is however warranted.
... ,Osburn (2000),Bendermacher (2010),Douglas and Wright (2015) andTaber (2016)]. In addition, the Cronbach's Alpha Reliability describes the reliability of a sum (or average) of q measurements where the q measurements may represent q raters, occasions, alternative forms, or questionnaire/test items"(Douglas and Wright, 2015; p. 1). ...
Full-text available
In order for the Higher Education Institutions in the world to work effectively they have collected data, and been developing Multi-dimensional Evaluation method and several indicators through many internal mechanisms for the purpose of improving the performance of institutions, achieving the goal of internal accountability over their activities, identifying effective teaching and learning practices, strengthening the research activities and outcomes, and developing the institutional environment; at the same time they are achieving the above requirements of accountability, quality and transparency. In this paper extensive data is collected, and multidimensional evaluation methods will be proposed to assess the current status of the administrative, educational, and research systems of any Higher Education Institution; and a non-profit university will be taken as a case study (University of Nizwa, Oman) to apply and assess the classes of indicator. The data for this research was collected through a random sample of 304 academic staff. The results of dimensional evaluation/ indicators of the academic staff will be given and analyzed in this paper. It may be mentioned here, the survey contains 22 dimensions and 156 indicators. In addition, the statistical analysis of this paper is accurate, comprehensive, and describes the evaluation of all the dimensions and indicators.
... Also, the authors in [39] stated that CR values that are more than or equal to 0.70 could be regarded as acceptable. Similarly, λ-2 values greater than 0.70 are acceptable [41,42]. Table 2 shows that the values of α are more than 0.8. ...
Full-text available
Studies evaluating students’ UX of applications that will influence their continuance use of educational systems in Higher Educational Institutions (HEIs) have not been sufficiently addressed in the African region, specifically in Ghana. Thus, conducting a study on students’ UX of systems in HEIs will enhance students’ interest in continuing to use such systems. Therefore, this study examines students’ user experience (UX) and how it impacts their continued use of the Student Management Information System (SMIS). The study proposed a research model by integrating user experience questionnaire (UEQ) constructs with continuance intention to use. The study adopted an online questionnaire to collect data from 415 students at Koforidua Technical University (KTU). The partial least square-structural equation model (PLS-SEM) method was used to evaluate the proposed model’s reliability, validity, and relationship among the constructs. The UEQ data analysis tool was used in analysing the data. The study’s findings showed that attractiveness, perspicuity, efficiency, stimulation, and novelty significantly influenced students’ continuance intention to use the SMIS. However, dependability did not significantly affect students’ continuance intention to use the SMIS. Also, the study’s findings on the benchmark results showed that attractiveness, perspicuity, and stimulation were categorized as good, while dependability, efficiency, and novelty were categorized as excellent. These findings offer valuable insights for UX designers and developers aiming to create engaging and intuitive SMIS solutions. Also, it will provide students feedback on system UX that can be incorporated into future release updates. Furthermore, this research contributes significantly to the understanding of student UX with SMIS in developing country HEIs and its impact on continued usage.
... In line with this, Cronbach's alpha may significantly underestimate the reliability of a scale when its assumptions are violated (e.g., Gelin et al. 2003;Maydeu-Olivares et al. 2007;Osburn 2000;Zumbo and Kroc 2019). This is important insofar, as underestimating the reliability of a scale may lead to inferences that are not warranted; for example, based on a low Cronbach's alpha, researchers may consider a scale too unreliable, even though the scale's actual reliability may be sufficient for the scale's given purpose or use. ...
... If, however, all the involved items exhibit high covariance, then (α) will approach 1 as the items in the scale approach infinity. Consequently, the higher the coefficient, the greater the shared covariance among the items and the greater the reliability in measurements (Osburn 2000;Thompson 2002Ritter 2010. As shown in Table 7 below, the results indicated high consistency (i.e., internal reliability), since (α)= 0.954, almost reaching the numerical value of an absolute 1, which is the highest that can be achieved in a Cronbach's Alpha (α) test. ...
Full-text available
The interactional centrality of the imperative, its semantic-pragmatic versatility, and its different syntactic configurations have attracted considerable scholarly interest in various linguistic paradigms. Contributing to and extending this line of investigation through a Construction Grammar (CxG) approach, the present paper focuses on atypical (i.e., non-canonical) imperatives and their Addressee-encoding in terms of overt pronominal subjects, referred to as OBLIGATORY SUBJECT IMPERATIVES (OSIs). In so doing, the paper specifically investigates ‘you do that’ as an instance of a so-called weak imperative with consistent discourse-responsive scope over a previous Addressee-induced proposition p. In this context, ‘you do that’ will be argued to couch not the Speaker’s (S’s) intentions or wishes, as conventionally expected by an imperative, but the S’s low endorsement of the fulfillment of p made manifest in gradient forms of (disinterested) acceptance, indifference, or acquiescence. The paper draws on synchronic, corpus-attested evidence (COCA data) to explore the construction in its dialogual contexts of occurrence and tease apart its inherited and idiosyncratic properties. It further seeks to establish and statistically validate the atypicality of the pattern while setting it apart from its seeming formal ‘twins’ (e.g., ‘You do that just once…’). Finally, the paper casts lights on the possible discourse correlates of ‘you do that’ and ventures into a discussion on the productivity of its imperative-based licensing template. Key words: atypical imperatives, overt pronominal subjects, ‘you do that’, Construction Grammar (CxG), gradient acceptance
... Internal consistency of all image parameters was good with Cronbach's α ranging from 0.84 to 0.99. 43 We also performed inter-rater correlation analysis by comparing all image parameters of 10 randomly selected MRIs between our rater (S.-L.H.) and an experienced neuroradiology specialist (C.-H.W.), and the intraclass correlation coefficients ranged from 0.77 to 0.99. Peak width of skeletonized mean diffusivity (PSMD) is an established and robust imaging marker for CSVD and is associated with microstructural disruption of white matter over the whole brain. ...
Full-text available
Cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL), caused by cysteine-altering variants in NOTCH3, is the most prevalent inherited cerebral small vessel disease. Impaired cerebral interstitial fluid dynamics has been proposed as one of the potential culprits of neurodegeneration and may play a critical role in the initiation and progression of cerebral small vessel disease. In the present study, we aimed to explore the cerebral interstitial fluid dynamics in patients with CADASIL and to evaluate its association with clinical features, imaging biomarkers, and disease severity of CADASIL. Eighty-one participants carrying a cysteine-altering variant in NOTCH3, including 44 symptomatic CADASIL patients and 37 preclinical carriers, and 21 age- and sex-matched healthy control individuals were recruited. All participants underwent brain MRI studies and neuropsychological evaluations. Cerebral interstitial fluid dynamics was investigated by using the non-invasive diffusion tensor image analysis along the perivascular space method. We found that CADASIL patients exhibited significantly lower values of diffusion tensor image analysis along the perivascular space index comparing to preclinical carriers and healthy controls. For the 81 subjects carrying NOTCH3 variants, older age and presence of hypertension were independently associated with decreased diffusion tensor image analysis along the perivascular space index. The degree of cerebral interstitial fluid dynamics was strongly related to the severity of cerebral small vessel disease imaging markers, with a positive correlation between diffusion tensor image analysis along the perivascular space index and brain parenchymal fraction and negative correlations between diffusion tensor image analysis along the perivascular space index and total volume of white matter hyperintensity, peak width of skeletonized mean diffusivity, lacune numbers, and cerebral microbleed counts. In addition, diffusion tensor image analysis along the perivascular space index was a significant risk factor associated with the development of clinical symptoms of stroke or cognitive dysfunction in individuals carrying NOTCH3 variants. In CADASIL patients, diffusion tensor image analysis along the perivascular space index was significantly associated with Mini-Mental State Examination scores. Mediation analysis showed that compromised cerebral interstitial fluid dynamics was not only directly associated with cognitive dysfunction but also had an indirect effect on cognition by influencing brain atrophy, white matter disruption, lacunar lesions and cerebral microbleeds. In conclusion, cerebral interstitial fluid dynamics is impaired in CADASIL and its disruption may play an important role in the pathogenesis of CADASIL. Diffusion tensor image analysis along the perivascular space index may serve as a biomarker of disease severity for CADASIL.
... These three items demonstrated only modest internal consistency (␣ ϭ .53), but this is common for scales with few items (Osburn, 2000;Peterson, 1994). 1 4 Responses were averaged across the three items. We constructed an identical measure of unstructured socializing using Wave 2 data. ...
Full-text available
In a study of 1,344 urban adolescents, the authors examined the relation between participation in organized sports and juvenile delinquency. They compared youth who participated in sports to those who only participated in nonathletic activities and to those who did not participate in any organized activities. They also examined the indirect relations between sports and delinquency via 2 peer-related constructs—deviant peer affiliations and unstructured socializing. Finally, they examined the extent to which gender and prior externalizing problems moderated the direct and indirect relations between sports participation and delinquency. The authors found that the odds of nonviolent delinquency were higher among boys who participated in sports when compared to boys who participated only in nonathletic activities but not when compared to boys who did not participate in any organized activities. Deviant peer affiliations and unstructured socializing mediated the relation between sports participation and boys' nonviolent delinquency. Moreover, prior externalizing problems moderated the mediated path through peer deviance. The authors did not, however, find direct, mediated, or moderated relations between sports and boys' violent delinquency nor between sports and girls' violent or nonviolent delinquency.
... The internal consistency was calculated with Cronbach alpha coefficient. The Cronbach's alpha value of 0.7 or greater was considered satisfactory [24]. Split-half coefficient reliability was assessed by using half of odd and even items. ...
Full-text available
Background Evidence-based practice (EBP) is an essential approach of optimizing patient outcomes and driving progress in clinical practice. As an important reserve talent of medical staff and researchers, the clinical postgraduates are expected to become the backbones of supporting the implementation of EBP in clinical units after graduation. The assessment of their EBP learning outcomes is an important issue, yet few tools have been developed specifically in Mainland China. The purpose of this study is to adapt the Evidence-Based Practice Profile Questionnaire (EBP²Q) to Mainland China’s cultural context and to evaluate the psychometric properties of the Chinese EBP²Q in clinical postgraduates. Methods Cross-cultural modification, including translating the original EBP²Q into Chinese was implemented according to established guidelines. A pilot study was carried out in Mainland China among 30 clinical postgraduates. A subsequent validation study was conducted among 633 clinical postgraduates majoring in clinical medicine, stomatology and nursing from Mainland China. Construct validity was assessed by exploratory factor analysis (n = 313), together with confirmatory factor analysis (n = 320). Reliability was determined by internal consistency and test-retest reliability. Results The Chinese EBP²Q consisted of 40 items. The content validity index of the Chinese EBP²Q achieved 0.938 at an acceptable level. Principal component analysis resulted in a four-factor structure explaining 61.586% of the total variance. All fitting indices satisfied the standard based upon confirmatory factor analyses, indicating that the four-factor structure contributed to an ideal model fit. The internal consistency appeared high for the Chinese EBP²Q, reaching a Cronbach’s alpha value of 0.926. Test–retest reliability was 0.868 and the split-half coefficient was 0.925. Conclusion Chinese version of EBP²Q possesses adequate validity, test-retest reliability and internal consistency. It is a promising questionnaire to be adopted by Chinese medical educators in designing their course and curriculum, or by clinical postgraduates for self-assessment of EBP learning.
... Common mistake: Many authors use Cronbach's alpha as an "internal consistency" coefficient to estimate the reliability of test scores. However, alpha is based on strong assumptions and can provide misleading estimates of reliability (Osburn, 2000;Sijtsma, 2009). Furthermore, alpha does not indicate internal consistency or unidimensionality, and the popular cut-off values of 0.7 or 0.8 are completely arbitrary (Cho & Kim, 2015). ...
... Our results showed that both genetic algorithms and RFE returned abbreviated subscales of the TEQ that had lower coefficient alpha values (i.e., lower reliability) than the full-length scale. Considering that the coefficient alpha tends to underestimate reliability for short and/or heterogeneous measures [49], this finding is not necessarily surprising. Across the eight subscales of the TEQ, genetic algorithms yielded coefficient alpha values that are either higher than or very similar to those obtained from RFE. ...
Full-text available
Psychological scales play a key role in the assessment, screening, and diagnosis of latent variables, such as emotions, mental health, and well-being. In practice, researchers need shorter scales of psychological traits to save administration time and cost. Thus, a variety of optimization algorithms have been proposed to abbreviate lengthy psychological scales into shorter instruments efficiently. The main goal of this application is to form an abbreviated scale with fewer items while maintaining reliability, relationships among the subscales, and model fit for the full scale. In this study, we use an optimization algorithm (genetic algorithm) and a feature selection algorithm (recursive feature elimination) to abbreviate a psychological scale automatically. Although both algorithms search for an optimal subset of features within a large pool of features, the search mechanism underlying each algorithm is quite different. The genetic algorithm employs a systematic but computationally-expensive sampling process to find the optimal features, whereas recursive feature elimination removes the least important features iteratively until a desired number of features are retained. In this study, we use a 77-item measure of test emotions (Test Emotions Questionnaire) to demonstrate how these algorithms can be used for scale abbreviation. We generate a 40-item short form using each algorithm and compare the quality of the selected items against the full-length scale. The results indicate that both methods can provide researchers and practitioners with a systematic procedure for creating psychometrically sound, shorter versions of lengthy psychological instruments.
... Without a full discussion on the evidence for reliability, authors can undermine, or even be inappropriate in, their conclusions (Taber, 2018). While there are alternative measures of internal reliability, such as Maximized λ4 (lambda4) or maximal reliability (Osburn, 2000), the current report will focus solely on Cronbach's α; providing a methodological critique of current common practice within the context of the psychological literature and offering guidance for authors going forward. ...
Angular direction estimation to landmarks of varying distance in the physical environment was utilised to investigate the ecological validity of the Santa Barbara sense of direction scale (SBSOD). Two- and three-dimensional MR measures were included to enable further the scale applicability. Results showed a moderate correlation between SBSOD and angular deviation from landmarks in the immediate landscape, but not with local or distant landmarks. Moreover, the findings suggest that skills which underlie three-dimensional MR better relate to pointing accuracy (PA) of distant landmarks and the cardinal direction, North. Results also showed a gender-related systematic biases in landmark estimation.
... Moreover, Guttman Split-Half, Spearman-Brown values have been concluded to be medium and good level. These results have led us to interpret the scale as a reliable instrument (Osburn 2000, Gliem and Gliem 2003, Yang and Green 2011, Eisinga et al. 2013, De Vet et al. 2017). ...
Full-text available
Mindfulness-based approaches, which have been used in many fields in recent years, have been widely accepted especially in practices in psychology and psychiatry, and have been integrated into traditional treatment methods. In parallel with the positive effects of the concept of mindfulness on mental, spiritual and physical well-being, researches have also steadily increased and various scales have been developed to measure mindfulness which can be summarized as the mindful awareness of the moment by accepting one’s experience without judgment.Some of these scales were adapted to Turkish in the last decade. Considering the extensive usage of the concept of mindfulness in health-related fields, this study aimed to introduce the State Mindfulness Scale into Turkish, as one of the important scales to be evaluated on a more solid theoretical and methodological basis. Accordingly, the source scale was translated into Turkish through the steps suggested by WHO. Confirmatory factor analysis (CFA) was performed for the validity of the scale. Cronbach alpha reliability coefficient, Item Total Score Analysis, Guttman reliability coefficients, Spearman-Brown confidence coefficients were evaluated and the findings regarding the internal consistency of the scale were used in terms of the reliability analysis of the scale. For CFA and other analyses, 345 students at Akdeniz University were included in the study. As a result of CFA, it was found that the 21 item 2-factor structure in the source scale was compatible with the target culture (X2/Sd: 3,41; RMSEA: 0,088: CFI: 0,95). The correlation reliability coefficients of the scale ranged from 0.484 to 0.743. The Cronbach alpha value of the first factor of the scale was 0.899, the Cronbach alpha value of the second factor was 0.728, and the Cronbach alpha value for the total score was 0.921. The findings show that the validity and reliability of the State Mindfulness Scale has been ensured and the scale has been successfully adapted to Turkish.
... Internal reliability indexed with Cronbach's α was 0.769 in the first survey and 0.788 in the second, which was within the acceptable scope [40]. ...
Full-text available
The COVID-19 pandemic has caused extreme deviations from everyday life. The aim of this study was to investigate how these deviations affected adolescents' sense of coherence and their level of aggression, and whether this was influenced by their relationship with animals, especially horses. In two random samples of students from vocational schools in Hungary, taken in June 2018 and June 2020 (n1 = 525, n2 = 412), separate groups were drawn from those who had regularly engaged in equine-assisted activities (ES) and those who had not (OS) before the pandemic. Data were collected using an anonymous, paper-based questionnaire, and during the pandemic an online version of the Sense of Coherence (SOC13) and Bryant-Smith (B12) scales. During the pandemic, boys' sense of coherence weakened and their aggressiveness increased. Multiple linear regression analyses showed that, regardless of gender and age group, increased time spent using the internet (p < 0.001), a lack of classmates (p = 0.017), reduced time spent outdoors (p = 0.026) and reduced physical activity (p < 0.038) during the pandemic significantly increased the tendency for aggressive behavior, whereas being with a horse or pet was beneficial (p < 0.001). The changes imposed by the curfew were rated as bad by 90% of the pupils, however, those with a strong sense of coherence felt less negatively about them. Schools should place a great emphasis on strengthening the students' sense of coherence.
... For internal consistency, values equal or more than 0.70 were considered as satisfactory, and it is suggested that the value of alpha should be above 0.80 for acceptance as high internal consistency. 19 The SEM was calculated using the following equation: SD √ICC × (1 ICC), in which SD is the standard deviation. ...
Full-text available
Amaç: Bu çalışmanın amacı, Profit Bel Haritası Anketi'nin (PBHA) Türkçe versiyonunun bel ağrılı bireylere uyarlanması, geçerliliği ve güvenilirliğinin araştırılmasıdır. Yöntem: Çalışmaya kronik bel ağrısı olan 240 kişi alındı. Anketin güvenirlik değerlendirmesi için değerlendiriciler arası güvenilirlik ve iç tutarlılık analizleri kullanıldı. Değerlendiriciler arası güvenirlik sınıf içi korelasyon katsayısı (ICC) ile değerlendirildi ve iç tutarlılık için Cronbach alpha değeri hesaplandı. Eşzamanlı geçerlilik için PBHA puanları, Pearson korelasyon katsayısı analizi kullanılarak Oswestry Engellilik İndeksi (OEİ) ve Vizüel Analog Skalası (VAS) ile karşılaştırıldı. Tüm katılımcılara PBHA, OEİ, VAS ve Kısa Form-36 (KF-36) uygulandı. Bulgular: Değerlendiriciler arası güvenirlik için sınıf içi korelasyon katsayı puanları 0,643 ile 0,767 arasında değişmekte olup, puanlayıcı içi sonuçların çok iyi olduğunu göstermektedir. PBHA'nın OEI arasındaki Pearson korelasyon katsayısı 0,594 olarak hesaplanırken VAS ile eşzamanlı geçerliği 0,502 bulundu. Güvenilirlik analizi için PBHA'nın Cronbach alfa değeri 0,837 olarak kaydedildi. PBHA'nın SF-36 endeksleri ile ilişkileri orta ve iyi (0,28-0,52) arasında değişti. Sonuç: PBHA'nın Türkçe versiyonu geçerli ve güvenilirdir. Bu ölçek, ağrının kronik bel ağrısı olan kişilerin semptomlarını ve fonksiyonel aktivitelerini nasıl ne sıklıkla ve ne kadar etkileyebileceğini ortaya koyabilir. Anahtar kelimeler: Bel ağrısı, Sonuç ölçümü, Sonuçların tekrarlanabilirliği, Anketler ve ölçekler. Validity, reliability and cross-cultural adaptation of the Turkish version of the Profitmap-Back Questionnaire Purpose: The aim of this current study was to investigate adaptation, validity, and reliability of the Turkish version of the Profile-Fitness Mapping Questionnaire (PFMQ) for people with low back pain. Methods: Two hundred and forty participants who had chronic low back pain enrolled to the study. Intra-rater and internal consistency analysis were used for the reliability assessment of the questionnaire. Intra-rater reliability was assessed by intraclass correlation coefficient (ICC) and Cronbach's alpha was calculated for internal consistency. For concurrent validity, PFMQ scores were compared with ODI and VAS using Pearson's correlation coefficient analysis. The PFMQ, Oswestry Disability Index (ODI), Visual Analog Scale (VAS) and Short Form Health Survey instrument (SF-36) were administered to all participants. Results: For intra-rater reliability, intraclass correlation coefficient scores were varying between 0.643 and 0.767, indicating that intra-rater results were very good. Pearson correlation coefficient of the PFMQ with ODI was calculated 0.594 and it was found with VAS was 0.502 for concurrent validity. For the reliability analysis, the Cronbach alpha value of the PFMQ were recorded as 0.837. The correlations with the SF-36 indices were changed between fair and good (0.28-0.52). Conclusion: The Turkish version of the PFMQ is valid and reliable. This scale can reveal how, how often, and how much can pain affect the symptoms and functional activities of people with chronic low back pain.
... and 0.79 were fair; between 0.80 and 0.89 were good (Cicchetti, 1994). For interpreting reliability estimates, including Guttman's lambda-2 (λ-2), there are some general rules of thumb; λ-2 above 0.70 are sufficient for group-level studies (Guttman, 1945;Osburn, 2000). In order to determine what proportion of the variance in each of the four group climate subscales could be attributed to the group level and the individual level we computed the intraclass correlation coefficient (ICC) which is calculated by dividing the level-2 variance by the total variance (Raudenbush & Bryk 2002). ...
Full-text available
Close(d) care - Group climate in a secure forensic setting for individuals with mild intellectual disability
... For internal consistency, values equal or more than 0.70 were considered as satisfactory, and it is suggested that the value of alpha should be above 0.80 for acceptance as high internal consistency. 19 The SEM was calculated using the following equation: SD √ICC × (1 ICC), in which SD is the standard deviation. ...
Amaç: Bu çalışmanın amacı, Profil-Fitness Haritalama Anketi'nin (ProFitMap-PFHA) Türkçe versiyonunun bel ağrılı bireylere uyarlanması, geçerliliği ve güvenilirliğinin araştırılmasıdır. Yöntemler: Çalışmaya kronik bel ağrısı olan 240 kişi alındı. Anketin güvenirlik değerlendirmesi için değerlendiriciler arası güvenilirlik ve iç tutarlılık analizleri kulanıldı. Değerlendiriciler arası güvenirlik sınıf içi korelasyon katsayısı (ICC) ile değerlendirildi ve iç tutarlılık için Cronbach's alpha değeri hesaplandı. Eşzamanlı geçerlilik için PFHA puanları, Pearson korelasyon katsayısı analizi kullanılarak ODI ve VAS ile karşılaştırıldı. Tüm katılımcılara PFHA, Oswestry Engellilik İndeksi (OEİ), Görsel Analog Skala (GAS) ve Kısa Form Sağlık Anketi aracı (KF-36) uygulandı. Bulgular: Değerlendiriciler arası güvenirlik için sınıf içi korelasyon katsayı puanları 0,643 ile 0,677 arasında değişmekte olup, puanlayıcı içi sonuçların çok iyi olduğunu göstermektedir. PFHA'nın OEI ile Pearson korelasyon katsayısı 0,594 olarak hesaplandı ve eşzamanlı geçerlik için GAS ile 0,502 bulundu. Bu sonuçlar, PFHA'nın OEI ve GAS ile iyi korele olduğunu gösterdi. Sonuç: ProFitMap Anketinin Türkçe versiyonu geçerli ve güvenilirdir. Bu ölçek, ağrının kronik bel ağrısı olan kişilerin semptomlarını ve fonksiyonel aktivitelerini nasıl, ne sıklıkla ve ne kadar etkileyebileceğini ortaya koyabilir.
... Confirmatory factor analysis (CFA) showed that the standard loading coefficients of all indicators were significant. e values of the average variance extracted (AVE) corresponding to the six factors were all greater than 0.5 [46], and the composite reliability (CR) index was all higher than 0.7 [47], indicating high convergence (in Table 5). ...
Full-text available
With in-depth development of industrialization and urbanization in China, improving the professional skills and quality of migrant workers in the construction industry has become an important measure to optimize the labor force structure and promote the industry upgrading. Numerous studies have been carried out on this topic, and construction industrial workers with high skills level and professional quality have replaced the professional identity of migrant workers. However, the psychological cognitive mechanism of migrant workers’ occupational role enhancement behavior has not been fully revealed. This study aims to construct a theoretical model of the intention to influence the industrialization of migrant workers in the construction industry based on the frameworks of the theory of planned behavior and risk perception theory, and to explore the key factors and cognitive mechanisms in their transformation into industrial workers in the construction industry. Empirical study using structural equation modeling through field collection of 383 questionnaires from migrant construction workers shows that perceived behavioral control, subjective norm, and behavioral attitude all have significant positive effects on behavioral intention, with decreasing direct effects in descending order of magnitude. Perceived behavioral control also predicts professionalization through the mediation of behavioral intentions, and the newly introduced risk perception factor in the model has a negative inhibitory effect on behavioral intentions and actual behavior. This study validates the important role of psychological intention on the industrialization of migrant workers in the construction industry, providing a new perspective to promote their transformation into industrial workers, and laying the foundation for the modern transformation and sustainable industry development.
... Reliability for both forms were acceptable, Guttman's Lamda-2 = .89 (Osburn, 2000). Several items had problematic point biserial correlations (<.15) (Thorndike & Thorndike-Christ, 2010), and those items are noted in the tables. ...
... The measure of Cronbach's alpha (ρ) aims to study the internal consistency of the measures; the measure Cronbach's alpha if Item deleted is developed in order to discover as to whether or not any item (question) can be deleted in order to increase the measures of Cronbach's alpha; the split half measure is developed in order to examine the correlation between the scale when the data can be split into two parts; and finally the Guttman's indices of the reliability (sex lower bounds denoted by λ i , i = 1, 2, …, 6). It may be noted here that the value of any measure of reliability of 0.7 or above is considered satisfactory [for more details please refer to Cronbach and Shavelson (2004), Osburn (2000), Bendermacher (2010), Douglas and Wright (2015) and Taber (2016)]. ...
Full-text available
Continuous quality improvement is an endeavour that higher education institutions (HEIs) undertake to evaluate institutional effectiveness and service quality, and to ensure accountability and transparency to its stakeholders such as academics and students. This study positions academic and student surveys as pragmatic mechanisms that HEIs can rely upon to measure and evaluate the quality of the services that they provide. Using a non-profit HEI in the Sultanate of Oman, this study reveals the significant factors that impact the evaluation of HEI experience among academics and students, which include academic advising, classrooms, electronic management systems, general services, laboratories, learning support services, student affairs and society services, and records and registration services. The numerical results of each survey, each dimension and each statement/indicator are computed, discussed, and compared, and several tests were performed, in order to answer the research questions and hypotheses. The implications of the findings and future research directions conclude the paper. Reference to this paper should be made as follows: Al-Hemyari, Z.A. and Al Rajhi, W. (2022) 'How can we develop academic and student surveys to measure and evaluate the quality of services at higher education institutions? Insights from a non-profit HEI in the Sultanate of Oman', Int. J. Quality and Innovation, Vol. 6, No. 2, pp.134-161. Biographical notes: Zuhair A. Al-Hemyari currently is a Full Professor and an expert of research in the University of Nizwa, Oman. He holds a PhD in Mathematical Statistics from The Indian Institute of Technology. His basic research interests are single and double-stage sampling techniques, censored data schemes theory and applications, Bayesian-shrinkage procedures, reliability theory, modelling and optimisation, quality audit, quality management and measuring and analysing the performance of HEIs. Besides teaching, he has published about 85 papers, participated in 42 international conferences. Formerly, he was a Full Professor and an expert of research and studies at the MoHERI, Oman and was an academic advisor and project executor of several projects of the MoHERI. How can we develop academic and student surveys 135 Waleed Al Rajhi is the Dean of Planning and Quality Management at University of Nizwa. He obtained his Master's on Professional Practice, focused on Change Management. He also obtained his PhD focused on quality of life and health-related quality of life among individuals with kidney failure. His managerial positions were the Head of Nephrology Nursing Programme, Acting Dean, and Assistant Dean of Academic Support Services and Students Affairs at the Higher Institute of Health Specialties, Muscat, Oman. His main training and short courses were on learning and teaching strategies, professional practice, quantitative research, evidence-based practice, research methodology and kidney failure treatment modalities.
... This is a lower bound to the exact reliability item with a tendency to underestimate or overestimate the exact reliability (Osburn, 2000, Zimmerman et al., 1993. The standardized ...
Full-text available
New Zealand building codes are often amended to ensure a resilient built environment. The changes in the building code have unintentionally affected the application and use of the amendments in the building code. The purpose of this study is to investigate the unintentional consequences of building code amendments in New Zealand and make adequate recommendations for improvement. The view of relevant building code users in the building code regulatory system on the negative consequences of building code amendments was analysed in this study. In total, the study examined 116 responses from the questionnaire survey to explore the understanding of building code users on the unintended impacts of building code amendments. Findings from the study show that a high proportion of respondents strongly believed on the need to improve the unintended side effect of building code amendment with an emphasis on proactive training, bureaucracy in the design approval process, shortage of technical staff and increased code technical complexity. Hence, justifying the usefulness of the research. Based on the findings from this study, it is evident that providing satisfactory technical guidelines and reducing regulatory deficiency within the building code authorities will help to reduce the negative impacts of building code amendments in New Zealand. The study concludes by stressing the significant impacts of unintended consequences of amending building code and emphasised on informing the policy regulators on the need to improve the identified consequences of building code amendment.
... The measure of Cronbach's alpha (ρ) aims to study the internal consistency of the measures; the measure Cronbach's alpha if Item deleted is developed in order to discover as to whether or not any item (question) can be deleted in order to increase the measures of Cronbach's alpha; the split half measure is developed in order to examine the correlation between the scale when the data can be split into two parts; and finally the Guttman's indices of the reliability (sex lower bounds denoted by λi, i = 1, 2, …, 6). It may be noted here that the value of any measure of reliability of 0.7 or above is considered satisfactory [for more details please refer to Cronbach and Shavelson (2004), Osburn (2000), Bendermacher (2010), Douglas and Wright (2015) and Taber (2016)]. ...
... Bartlett Küresellik testi sonucunda ise p <.001 düzeyinde bulunmuştur (Peltzer ve ark., 2017). Sonuçlar her iki uyarlama çalışmasında da iç tutarlığı yüksek ölçüde desteklemektedir (Guttman, 1945;Sijtsma, 2009;Osburn, 2000). (Bercaw, 2004). ...
Full-text available
Amaç: Bu çalışmanın amacı Brown, Christiansen ve Goldman (1987) tarafından geliştirilen Alkolden Beklentiler Ölçeği-III Yetişkin Formu'nu (ABÖ-III-Y) Türkçeye uyarlayarak geçerlik ve güvenirliğini incelemektir. Gereç ve Yöntem: Yapılan çalışma doğrultusunda alkol kullanım bozukluğu tanısıyla tedavi gören ve tanı almaksızın alkol kullanan toplam 402 yetişkine Alkolden Beklentiler Ölçeği-III Yetişkin Formu (ABÖ-III-Y), İçme Nedenleri Ölçeği Gözden Geçirilmiş Formu, (İNÖ-GF) ve Alkol Kullanım Bozuklukları Tanıma Testi (AKBTT) ölçekleri uygulanmıştır. Elde edilen veriler; dil eşdeğerliği geri-çeviri yöntemi; içerik geçerliği (content validity) uzman görüşüne başvurularak; güvenirliği, zamana göre değişmezliği (test-tekrar test güvenirlik/test-retest reliability) ve iç tutarlılığı (internal consistency) test edilerek; geçerliği ise yapı geçerliği (doğrulayıcı faktör analizi) ile sınanmıştır. Bulgular: ABÖ-III-Y'nin Cronbach alfa değeri 120 madde için .97 olarak bulunmuştur. Doğrulayıcı faktör analizi sonucunda toplam varyansın %37,88'ini açıklayan 5 alt faktör ölçeğin orijinaliyle de uyumlu olarak şu şekilde oluşturulmuştur: Genel olumlu değişiklikler, artan sosyal girişkenlik ve kaygının azalması, gevşeme ve gerilimi azaltma, cinsellikte artış, fiziksel rahatlama. Yapılan madde analizinde madde gücü .40'ın altındaki maddeler çıkarıldıktan sonra oluşturulan ölçekte Cronbach alfa değeri .82 maddenin tümü için .96 bulunarak alt ölçekler için de .93 ile .77 arasında değişmektedir. Yapılan analizlerde ABÖ-III-Y ile İNÖ-GF arasında r=.64 düzeyinde AKBTT ile arasında .24 düzeyinde anlamlı bir korelasyon bulunmuştur. Yapılan uyarlama çalışmalarında AKBTT'den yüksek puan alarak zararlı düzeyde alkol kullananların alkolden beklentileri diğer katılımcılara oranla daha yüksek düzeyde bulunmuştur. Buna göre katılımcıların beklenti düzeyi arttıkça alkol tüketimlerinin de arttığı sonucuna varılmıştır. Sonuç: Yapılan analizler sonucu elde edilen sonuçlar Alkolden Beklentiler Ölçeği-III'ün Yetişkin Türkçe formunun psikometrik özelliklerinin gerekli ölçütleri karşıladığı, geçerli ve güvenilir bir ölçek olduğu sonucuna varılmıştır. Anahtar Sözcükler: Alkolden beklentiler, alkol kullanım bozukluğu, alkol bağımlılığı, geçerlik, güvenirlik.
... 1 Split-half reliability is a special case of the more generalized standardized coefficient alpha formula (Nunnally & Bernstein, 1994). 2 Standardized coefficient alpha is more appropriately applied to composite scores derived from the sum of items/indicators that have been transformed into standardized scores (e.g., z-scores), a practice observed when the test items/indicators have been measured on different scales. In practice, the differences between the two coefficient alphas tend not to be particularly large (Cortina, 1993;Osburn, 2000; but see Falk & Savalei, 2011). 3 In psychometrics, all tests are composed of one or more items. ...
Financial literacy is often measured with only three to five questions, suggesting challenges with achieving respectable levels of internal consistency reliability. Based on our review, we found financial literacy tests composed of three and five test questions yielded mean reliability estimates of .40 (k = 7; N = 167,075) and .54 (k = 8; N = 57,937), respectively: values less than minimally acceptable for even exploratory research. Based on our more comprehensive review of 52 samples and a variety of financial literacy tests (3 to 45 questions), researchers are recommended to measure financial literacy with a minimum of 13 to 15 questions. Finally, we conclude that the potential impact of financial literacy on various outcome variables has been underestimated substantially in many previous investigations, as the relatively low levels of internal consistency reliability in the financial literacy test scores attenuated the obtained effects estimated from the observed scores.
Full-text available
Bu çalışmanın amacı öğrencilerin sosyoekonomik düzey (SED)'ini belirlemek için geliştirilen “Aile Varlık Düzeyi Ölçeği II’nin (AVDÖ II)” Türk üniversite öğrencilerinde geçerliğini ve güvenirliğini sınamaktır. Ölçek, Dünya Sağlık Örgütü’nün 1997 yılında yürüttüğü “Okul Çağı Çocuklarında Sağlık Davranışı (Health Behaviour in School-aged Children-HBSC)” başlıklı projede geliştirilmiş, 2001-2002 yıllarında aynı projede revize edilmiştir. Bu çalışmada da revize edilen form kullanılmıştır. Araştırma grubu 19-24 yaş arası 97 kadın (X ̅yaş =20,08, Ss =0,94 yıl) ve 143 erkek (X ̅yaş =20,29, Ss =1,23 yıl) toplam 240 (X ̅yaş =20,20, Ss =1,13 yıl) öğrenciden oluşmaktadır. Ölçekte; ailenin sahip olduğu araba sayısı, çocuğa ait oda olup olmadığı, son bir yılda aile ile kaç kez tatil amaçlı seyahat yapıldığı ve ailenin sahip olduğu bilgisayar sayısını soran dört madde bulunmaktadır. Yapı geçerliği için yapılan Açımlayıcı Faktör Analizi sonucunda varyansın %45,40’nı açıklayan tek faktörlü bir yapı elde edilmiştir. Elde edilen faktör yükleri 0,55 ile 0,73 arasında değişmektedir. Ayrıca, Rasch analizi sonuçları AVDÖ II’nin yapı geçerliğini desteklenmiştir. Eş zaman geçerliği için yapılan analiz sonucunda AVDÖ II'nin puanları ile SED ölçeği puanları arasında yüksek düzeyde anlamlı ilişki olduğu saptanmıştır (r =0,73, pAVE). Ölçeğin iç tutarlığını belirlemek için hesaplanan Cronbach alfa katsayısının 0,59, CR katsayısının ise 0,77 olduğu bulunmuştur. Sonuç olarak, AVDÖ II’nin SED’i değerlendirmede kullanılabilecek geçerli ve güvenilir bir ölçme aracı olduğu söylenebilir.
Full-text available
This study aims to conduct the validity and reliability study of the Extrajudicial Justice Scale for Nigerian citizens. Sample of the study consisted of 600 Nigerians who were over 18 years old and who volunteered to participate in the study. As a result of comprehensive validity and reliability analysis, the scale consists of 14 items and 4 factors called "Illegal execution", "Indictment", "Unlawful arrest" and "Intimidation". Cronbach's Alpha value of the scale was found to be 0.859 and Cronbach's Alpha values for the factors were found to be 0.782 for Illegal execution, 0.714 for Intimidation, 0.728 for Unlawful arrest, and 0.738 for Indictment. According to the results of the validity and reliability analysis, the Extrajudicial Justice Scale can be used as a valid and reliable instrument for Nigerian citizens to determine the influence of extrajudicial justice on the perception people have about the police. Considering that there is no scale used to assess extrajudicial justice in Nigeria, it is thought that this study will guide future research on the subject and reveal its importance for society.
تهدف هذه الدراسة إلى تقييم دقة تقدير معامل الثبات باستخدام معاملي كرونباخ ألفا وأوميغا، وذلك باستخدام بيانات مولّدة من خلال الحاسوب، وفقاً لثلاثة متغيرات مستقلة؛ هي: عدد الأبعاد (بُعد واحد، بُعدين، ثلاثة أبعاد) وطول الإختبار 10، 30، 60 فقرة، وحجم العينة 50، 100، 250، 500، 1000؛ حيث ولّد خمسةٌ وأربعون ظرفاً مختلفاً، وتم تقدير قيم الثبات باستخدام معاملي الثبات (ألفا، أوميغا)، وأظهرت نتائج اختبار ت للعينات المستقلة أنه لا يوجد فروق دالة احصائياً بين تقديرات الثبات باختلاف المعامِلَين في حال الإختبارات التي تتكون من بُعد واحد على اختلاف أحجام العينات وأطوال الإختبارات. كذلك أظهرت النتائج وجود فروق دالة احصائياً بين تقديرات الثبات باختلاف المعاملين في حال الإختبارات التي تتكون من بُعدين وثلاثة أبعاد على اختلاف أحجام العينات وأطوال الإختبارات، كما أظهرت النتائج أن الفروق في الخطأ المعياري في التقدير قليلة بين طريقتي تقدير الثبات باختلاف ظروف الدراسة. وبذلك توصي الدراسة باستخدام معامل تقدير أوميغا في حال تعدد الأبعاد. الكلمات المفتاحية: معاملا الثبات كرونباخ ألفا وأوميغا، عدد الأبعاد، حجم العينة، طول الإختبار.
This study aims to generalize the reliability of the GAAIS, which is known to perform valid and reliable measurements, is frequently used in the literature, aims to measure one of today's popular topics, and is one of the first examples developed in the field. Within the meta-analytic reliability generalization study, moderator analyses were also conducted on some categorical and continuous variables. Cronbach's α values for the overall scale and the positive and negative subscales, and McDonald's ω coefficients for positive and negative subscales were generalized. Google Scholar, WOS, Taylor & Francis, Science Direct, and EBSCO databases were searched to obtain primary studies. As a result of the screening, 132 studies were found, and these studies were reviewed according to the inclusion criteria. Reliability coefficients obtained from 19 studies that met the criteria were included in the meta-analysis. While meta-analytic reliability generalization was performed according to the random effects model, moderator analyses were performed according to the mixed effect model based on both categorical variables and continuous variables. As a result of the research pooled, Cronbach's α was 0.881, 0.828, and 0.863 for total, the negative, and positive subscales respectively. Also, McDonald's ω was 0.873 and 0.923 for negative and positive subscales respectively. It was found that there were no significant differences between the reliability coefficients for all categorical variables. On the other hand, all continuous moderator variables (mean age, standard deviation age, and rate of female) had a significant effect.
The Nesplora Aula is a virtual reality-based continuous performance test that measures attentional processes in children from 6 to 16 years of age. The measure uses a virtual reality headset to create a classroom-like environment for the examinee, requiring the use of visual and auditory attention. The current article explores a brief history of continuous performance tests, the Nesplora Aula (created by Gema Climent Martinez and Sam Goldstein), its underlying theoretical framework, and its structure. The advantages are discussed, including its ecological validity and drawbacks, such as the technology required and the regional normative sample.
Full-text available
Objectives Early detection of cognitive impairment is essential for timely intervention. Currently, most widely used cognitive screening tests are influenced by language and cultural differences; therefore, there is a need for the development of a language‐neutral, visual‐based cognitive assessment tool. The Visual Cognitive Assessment Test (VCAT), a 30‐point test that assesses memory, executive function, visuospatial function, attention, and language, has demonstrated its utility in a multilingual population. In this study, we evaluated the reliability, validity, and diagnostic performance of the VCAT for screening early cognitive impairment in Chongqing, China Methods A total of 134 individuals (49 healthy controls (HCs), 52 with mild cognitive impairment (MCI), and 33 with mild dementia) completed the Mini‐Mental State Examination (MMSE), Montreal Cognitive Assessment (MoCA), VCAT, and domain‐specific neuropsychological assessments. The diagnostic performances of MMSE, MoCA, and VCAT were evaluated using the area under the curve (AUC), sensitivity, and specificity. Construct validity of the VCAT was assessed with well‐established domain‐specific cognitive assessments. Reliability was measured using Cronbach's alpha. Results The VCAT and its subdomains demonstrated both good construct validity and internal consistency (α = 0.577). The performance of VCAT was comparable to that of MoCA and MMSE in differentiating mild dementia from nondemented groups (AUC: 0.940 vs. 0.902 and 0.977, respectively; p = .098 and .053) and in distinguishing cognitive impairment (CI) from HC (AUC: 0.929 vs. 0.899 and 0.891, respectively; p = .239 and .161), adjusted for education level. The optimal score range for VCAT in determining dementia, MCI, and HC was 0–14, 15–19, and 20–30, respectively. Conclusion The VCAT proves to be a reliable screening test for early cognitive impairment within our cohort. Being both language and cultural neutral, the VCAT has the potential to be utilized among a wider population within China.
Full-text available
The study assesses based on the responses from the survey of 342 persons how behavioural biases affect German investors' investment decisions. Three behavioural biases were examined: overconfidence, representativeness, and herding behavior. It was determined that demographic factors affecting German investors, such as gender, age, experience, education, and frequency of investment, influence this choice. Male German investors are more susceptible to all three biases than females. Young investors (<35 years) are more at risk for the overconfidence bias and the representativeness bias, while older investors (>35 years) are more at risk for the herding bias. Investors with a lower experience (<5 years) on the stock market have a higher tendency for the three biases than German investors with a higher experience (> five years). Investors with a high (i.e. university) education are more susceptible to the three biases than those with a low education. Investors with a high investing frequency (> three months) scored higher for all three biases than investors with a low investing frequency (<3 months).
In recent years, while Vietnam’s information system has experienced strong development, cybercrime is a massive challenge in all sectors. As rates of usage of internet-connected devices continue to increase, cyber awareness turned to be increasingly urgent. Economics-Management (EM) students who will be the workforce and managers in the future, need to be fully equipped with information security knowledge. However, in current EM universities, very few majors have subjects to equip information security knowledge in their curricula. This leads to a question: “Should the subjects of information security knowledge be included in the curriculum for students majoring in EM”. This article will answer this question by focusing on measuring the level of awareness of EM students towards untrained students and trained students about subjects related to information security through a survey on knowledge, self-perception, and practical information security behavior with 465 students in EM majors. The analysis results have shown that the students who received the training had a much better level of information security knowledge, self-perception, and behavior than the untrained students. And training in information security for EM students is more necessary than ever.
Full-text available
Both individuals and organizations extensively use the internet and social media for communications and other purposes in the present era. The study examines the impact of social media effectiveness on informativeness and communication. It also explores the effect of informativeness and communication on the recruitment process. The study also examined the mediating roles of informativeness and communication and the moderating role of social presence. The study has focused on the banking sector of Karachi, Pakistan. The study’s enumerators distributed 450 questionnaires to the HR departments of the targeted banks and received 427 valid questionnaires. For data analysis, the study used Smart PLS version 3.2. It includes reliability, validity analyses, and generating a structural model for testing the hypotheses. The study’s results support seven hypotheses and reject one hypothesis. The study found that social media positively affect informativeness, communication, and recruitment. The study also validated the (i) mediating role of informativeness on social media effectiveness and (ii) the mediating role of communication on social media effectiveness and recruitment process. The study also supported the positive moderating roles of social presence on informativeness and the recruitment process, but it failed to support that social presence positively moderates communication and the recruitment process.
Full-text available
A specific assessment tool is urgently needed to guide effective wound care for diabetic foot ulcers. However, the tool has not been available in Chinese. We aimed to culturally translate and verify the validity and reliability of the new Diabetic Foot Ulcer Assessment Scale (DFUAS). The original scale was translated into Chinese according to the Brislin guidelines. Patients satisfying the inclusion and exclusion criteria were recruited. Each of the included foot ulcers was evaluated independently by two wound care specialists using the new DFUAS and by the third wound care specialists at the same time using the Bates-Jensen Wound Assessment Tool according to per guidelines. 210 diabetic foot ulcers were included for data analysis. The S-CVI of the Chinese version of the DFUAS was 0.96, and the I-CVIs ranged from 0.89 to 0.98. The total Cronbach's Alpha of the scale was 0.709, and the corrected item-total correlation of the items ranged from 0.4 to 0.872. The DFUAS had high inter-observer reliability of 0.997, and there were weak, moderate, and strong correlations between each pair of the items. The Bland-Altman plots showed a good agreement between the scale and the Bates-Jensen Wound Assessment Tool. We concluded that the Chinese version of the DFUAS showed good validity and reliability and is a reliable instrument for the assessment of diabetic foot ulcers.
BACKGROUND: This article aims to perform a psychometric assessment of the scale of organizational readiness for digital innovations in a transition economy and to examine the antecedents of organizational readiness for digital innovations. METHODOLOGY: The study employed a quantitative research method to analyze data collected from a sample of 1236 health professionals. The scale secondary confirmatory factor and linear regression analysis were employed to verify organizational readiness and test the respective hypotheses about organizational readiness for digital innovation, respectively. RESULTS/CONCLUSIONS: The research findings show that the organizational readiness scale for digital innovations is valid and reliable in transition economies. Findings show that the relationship between variables such as adaptation of human resources (AHR), cognitive readiness (COR), planning for new telehealth and e-health (PNTH), IT readiness (ITR), resource readiness (RR), partnership readiness (PR), and cultural readiness (CUR) are correlated with the innovations implementation effectiveness (IIE), and organizational readiness for digital innovation is positive statistically significant. Findings also suggest that Integration of old technologies (IoT) and organizational readiness for digital innovation is statistically significant and have negative relationship.
Full-text available
The aim of the present study was to examine Turkish teacher candidates’ competency levels in writing different types of test items by utilizing Rasch analysis. In addition, the effect of the expertise of the raters scoring the items written by the teacher candidates was examined within the scope of the study. 84 Turkish teacher candidates participated in the present study, which was conducted using the relational survey model, one of the quantitative research methods. Three experts participated in the rating process: an expert in Turkish education, an expert in measurement and evaluation, and an expert in both Turkish education and measurement and evaluation. The teacher candidates wrote true-false, short response, multiple choice and open-ended types of items in accordance with the Test Item Development Form, and the raters scored each item type by designating a score between 1 and 5 based on the item evaluation scoring rubric prepared for each item type. The study revealed that Turkish teacher candidates had the highest level of competency in writing true-false items, while they had the lowest competency in writing multiple-choice items. Moreover, it was revealed that raters’ expertise had an effect on teacher candidates’ competencies in writing different types of items. Finally, it was found that the rater who was an expert in both Turkish education and measurement and evaluation had the highest level of scoring reliability, while the rater who solely had expertise in measurement and evaluation had the relatively lowest level of scoring reliability.
Anxiety sensitivity (AS) refers to fear of anxiety symptoms that are believed to result in physical (Physical Concerns), cognitive (Cognitive Concerns), or social (Social Concerns) harm. AS is implicated in a range of anxiety disorders and may propel maladaptive behaviors by increasing action monitoring systems in order to prevent errors. Indeed, anxious individuals are characterized by elevated neural responses to errors, as indexed by the Error-Related Negativity (ERN). In the current study we examined the moderating effect of clinical diagnosis on the relationship between scores on the Anxiety Sensitivity Index (ASI-3) and the ERN in an unselected sample (N = 124) of women. Based on semi-structured clinical interviews, participants were classified as belonging to an anxiety group (AD), a clinical control group (CC), and a healthy non-clinical group (HC). Participants completed an arrowhead version of the Flanker task, while we collected electroencephalogram (EEG) data. Analyses revealed that diagnostic group moderated the association between residualized ERN (ERNResid) and Cognitive Concerns, such that the AD group demonstrated a significantly stronger and more negative association compared to the HC group. Our results indicate that the relationship between ERNResid and Cognitive Concerns is strongest in individuals characterized by elevated anxiety.
Full-text available
Introducción: En 2013, desarrollamos una escala, para evaluar resúmenes de congresos de la Sociedad de Cirujanos de Chile (SOCICH). Objetivo: Determinar consistencia interna y confiabilidad interobservador de una escala para evaluar resúmenes de congresos. Material y Método: Estudio de confiabilidad. Doce cirujanos fueron capacitados de forma virtual durante 8 horas, para aplicar la escala. Una vez finalizado el entrenamiento, se les envió un cuestionario para evaluar contenidos de la capacitación, y varios resúmenescasos para ser evaluados con la escala antes señalada. Se aplicó estadística descriptiva, luego se estimó el grado de acuerdo entre observadores para cada ítem de la escala. Posteriormente, se evaluó el coeficiente de correlación (CCI), utilizando un modelo de dos factores mixtos en el que los efectos de los evaluadores son aleatorios y los ítems fijos; utilizando una definición de acuerdo absoluto. Además, se evaluó la consistencia interna de los ítems utilizando alfa de Cronbach, considerando intérvalos de confianza del 95% (IC 95%). Resultados: Luego de analizar las mediciones de los 9 ítems por los 12 observadores, se verificó que el CCI fue de 0,871; con un IC 95% de 0,700; 0,965. El valor de la consistencia interna fue de 0,7 considerando los 9 ítems, no se recomienda eliminar ningún ítem. Conclusión: La escala tiene buena confiabilidad interobservador y los ítems son consistentes entre sí; por lo que puede ser considerada como un instrumento confiable para la valoración de resúmenes de congresos.
Bu araştırmada, ilköğretim 7. sınıf öğrencilerinin performanslarının değerlendirilmesine yönelik yazma performansının belirlenmesi sürecinde öğretmen puanlamalarına karışan puanlayıcı davranışlarının yanı sıra puanlayıcı yanlılıklarının ve puanlayıcı güvenirliğinin Çok Yüzeyli Rasch Ölçme modeli (ÇYRÖM) ile belirlenmesi amaçlanmaktadır. Performansa dayalı değerlendirmelerde puanlayıcı davranışları ve puanlayıcıların yanlı davranması birey performansının belirlenmesi aşamasında hem geçerliği hem de güvenirliği olumsuz yönde etkilediği düşünüldüğünde bu davranışların belirlenmesi ve sonuçların bu bağlamda değerlendirilmesi önem taşımaktadır. Araştırmanın çalışma grubunu, 2019-2020 eğitim öğretim yılı ikinci döneminde Ankara ilinde bir devlet okulunda yedinci sınıfta öğrenim gören 57 ortaokul öğrencisi ile puanlayıcı eğitimi alan 11 öğretmen oluşturmaktadır. Mevcut çalışmada, öğrencilere yedi farklı ikna edici metin görevi verilerek bunlardan bir tanesini seçmelerini ve bu göreve uygun olarak ikna edici metinler oluşturmaları istenmiştir. Öğrencilerin yazmış olduğu metinlerden 9 tanesinin ikna edici metin yapısına ait olmadığı görülmüş ve 48 öğrenciye ait metinler 11 öğretmen tarafından altı ölçütten oluşan analitik dereceli puanlama anahtarı ile değerlendirilmiştir. Bu araştırmada öğretmen puanlamalarına bağlı olarak değerlendirmelerde en fazla ortaya çıkan puanlayıcıların davranışları (puanlayıcı katılığı/cömertliği davranışı, halo etkisi, merkeze yönelme davranışı), yanısıra farklılaşan puanlayıcı katılık/ cömertlik davranışının diğer bir deyişle puanlayıcı yanlılığının ve puanlayıcı güvenirliğinin KTK, GK ve MTK kuramlarına dayalı sonuçları incelenmiştir. Çalışmanın bulguları incelendiğinde, öğrencilerin yazma becerilerinin vi değerlendirilmesi sürecinde bireylerin performansında puanlayıcı katılığı ve cömertliği davranışına rastlanmış; merkeze eğilim ve halo etkisi gibi puanlayıcı davranışlarının bulunmadığı görülmüştür. Farklılaşan puanlayıcı katılığı ve cömertliği davranışı yani yanlılık etkileşimleri incelendiğinde; grup düzeyinde puanlayıcılarda yanlı puanlayıcı davranışlarının görülmediği fakat bireysel düzeyde puanlayıcıların yanlı puanlayıcı davranışı sergiledikleri görülmüştür. KTK, GK ve MTK kuramlarına dayalı puanlayıcı güvenirlikleri incelendiğinde, farklı yöntemlerle hesaplanan güvenirlik katsayılarının yüksek olduğu bulunmuş; puanlayıcıların birbirinden farklı puanlamalar yapmasına rağmen puanların birbiri ile tutarlı olduğu sonucuna varılmıştır. Sonuç olarak; performansın değerlendirilmesi sürecinde Çok Yüzeyli Rasch Ölçme modelinin farklılaşan puanlayıcı davranışlarının ve puanlayıcı güvenirliğinin belirlenmesinde kullanılabilecek alternatif ölçme modeli olduğu açıklanmıştır.
Knowledge about the nature of scientific inquiry (NOSI) is not only an integral part of scientific literacy, but also essential for living and working as responsible citizens in the twenty-first century, and facing the danger of “fake science”. Although various NOSI instruments already exist, they primarily focus either on a different target group, i.e. pupils, place their content-related emphasis on experimentation, and/or are based on open-ended or multiple-choice testing response formats. To address this instrument gap, a closed-ended questionnaire with a dichotomous and a post-decision confidence rating response scale was developed and tested to evaluate the respondents’ understanding of eight NOSI aspects in a detailed yet economical manner. 148 German freshman biology student teachers participated in a sequential cross-sectional pilot study. First results indicate acceptable instrument reliability. There are certain items that seem to be answered correctly rather by coincidence or test intelligence, whereas others seem to suggest participants’ naïve NOSI views, or NOSI misconceptions. These findings imply that there is a need to further explore biology student teachers’ NOSI understanding to improve future university teaching. Moreover, further validity analyses for the newly developed testing instrument should be performed.
Full-text available
The purpose of the present study was to address the shortcomings of Cronbach’s alpha concerning the semantic overlap between items. Using an example from a motivational measure, the correction of Cronbach’s alpha was applied by partialing out the effects due to conceptual overlap. The significance of Cronbach’s alpha was tested using simulated random data derived from the measure and by estimating the confidence intervals with known and unknown distributions. The results indicated that the uncorrected conceptual overlap coefficient alpha was equal to 0.89 and 0.66 following the correction. After simulating the corrected statistical results, the distribution of alpha with random numbers had an estimate of 95%, equal to 0.41. The lower bound of the corrected alpha distribution was equal to 0.41, suggesting that the corrected alpha could easily belong to the distribution of alpha developed from simulated random numbers. Thus, the semantic overlap between items on a measure represents a significant threat to the validity of the alpha coefficient.
The study aimed to examine temporal change of boredom in English classes (BPELC) and test the longitudinal validity of the boredom in practical English classes-revised (BPELC-R) scale via longitudinal confirmatory factor analysis-curve of factors model (LCFA-CFM) approach. This approach ensures measurement invariance of BPELC over time, deals with its second-order latent variables, and considers the assessment of inter-individual differences while experiencing the emotion. Data were collected from 412 EFL adult learners on four measurements using BPELC-R and were analyzed by Mplus with LCFA-CFM. The model fit was accepted, which indicates invariance of BPELC-R as well as the factor structure of the instrument including the factor loadings of its subscales over time. Without the consideration of LCFA of BPELC-R, as addressed in this study, any observed change of the construct in the course of language learning could be misinterpreted. Also, though the rate of change in boredom differed across individual L2 learners, they all experienced a decreasing trend over time. Furthermore, the negative association between the intercept and slope suggested that learners with higher initial levels of boredom experienced a steeper decrease over time. The decreasing pattern of boredom is discussed in light of the main theories of this construct.
In this study we used a factor of curves model (FCM) to examine the codevelopment of the subdomains of boredom in practical English language classes in an online setting (BPELC) over time in four phases of an online L2 course to explore the covariance of the initial level and slope of the sub-domains, as well as to check to what extent the variation in these subdomains can be accounted by the underpinning global factor of BPELC. Data were collected from 412 adult EFL learners on four temporal occasions via the Boredom in Practical English Classes – Revised scale, created by Pawlak et al. (2020a). Analysis was conducted by Mplus in three stages. Findings revealed a statically significant decrease over time in both subdomains. Also, a negative covariance was uncovered between the initial level and the growth level of each subdomain. Moreover, the covariances between the initial and growth levels of both subdomains were moderately high. Finally, the variances of the initial and growth levels of each of the subdomains were accounted for to a great extent by the underpinning global factor of BPELC. The results are interpreted in view of the underlying theories of boredom.
Attractive political candidates receive more votes on Election Day compared to their less attractive competitors. One well-cited theoretical account for this attractiveness effect (White et al., 2013) holds that it reflects an adaptive psychological response to disease threats. Voters are predicted to upregulate preferences for attractiveness because it constitutes a cue to health. The global COVID-19 pandemic constitutes an ecologically relevant and realistic setting for further testing this prediction. Here, we report the results from six tests of the prediction based on two large and nationally representative surveys conducted in Denmark (n=3,297) at the outbreak of the pandemic and one year later. Utilizing experimental techniques, validated individual difference measures of perceived disease threat and geographic data on COVID-19 severity, we do not find that disease threats like the COVID-19 pandemic upregulate preferences for attractive and healthy political or non-political leaders. Instead, respondents display heightened preferences for health in socially proximate relations (i.e. colleagues). Moreover, individuals who react aversively to situations involving risks of pathogen transmission (scoring high in Germ Aversion) report higher importance of a wide range of leadership traits, rather than for health and attractiveness in particular. Results are discussed in relation to evolutionary accounts of leadership and followership.
Full-text available
As research in psychology becomes more sophisticated and more oriented toward the development and testing of theory, it becomes more important to eliminate biases in data caused by measurement error. Both failure to correct for biases induced by measurement error and improper corrections can lead to erroneous conclusions that retard progress toward cumulative knowledge. Corrections for attenuation due to measurement error are common in the literature today and are becoming more common, yet errors are frequently made in this process. Technical psychometric presentations of abstract measurement theory principles have proved inadequte in improving the practices of working researchers. As an alternative, this article uses realistic research scenarios (cases) to illustrate and explain appropriate and inappropriate instances of correction for measurement error in commonly occurring research situations.
Full-text available
In this study we demonstrate an approach to replacing validated selection tests to which job applicants may have prior access. This approach, labeled construct equivalence, allows for replacing valid tests currently in use with new, experimental tests that have been shown to measure the same constructs. We demonstrated the construct equivalence approach by collecting data from over 2,000 applicants for four different positions in a large petrochemical company. We investigated the equivalence of the experimental and the current tests by using correlational analyses, structural modeling, and analyses of hiring decisions. Results indicated that the experimental and current tests measure the same constructs and that replacing the current tests with the experimental tests would treat ethnic and sex subgroups consistently. Construct equivalence was shown to be a viable approach to test substitution. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Full-text available
Despite some limitations, Cronbach's coefficient alpha remains the most widely used measure of scale reliability. The purpose of this article was to empirically document the magnitudes of alpha coefficients obtained in behavioral research, compare these obtained values with guidelines and recommendations set forth by individuals such as Nunnally (1967, 1978), and provide insights into research design characteristics that may influence the size of coefficient alpha. Average reported alpha coefficients ranged from .70 for values and beliefs to .82 for job satisfaction. With few exceptions, there were no substantive relationships between the magnitude of coefficient alpha and the research design characteristics investigated. Copyright 1994 by the University of Chicago.
This article examines the statistical correction for attenuation and the controversies surrounding the procedure. Although originally developed for test construction purposes, the correction for attenuation is also used in meta-analysis and assessments of validity generalization. Since Spearman's classic article in 1904, correct use and interpretation of the correction for attenuation has been debated. The logic of the double and single correction formulae is discussed as well as the correction producing validity coefficients greater than 1.00. Three types of misapplications and misinterpretations of the correction in published literature are presented. The article concludes with arguments pertaining to the use of the correction formula, and it attempts to sharpen the focus of issues that have led to differences of opinion about its meaning and purpose.
Coefficient a, an estimate of the classical reliability coefficient, was evaluated under violations of two classical test theory assumptions: essential r-equivalence and uncorrelated errors. The interactive effects of both violations were explored using computer simulated true and error scores with known properties. As correlations among true scores decreased from 1, or essential -r-equivalence was systematically violated, a progressively underestimated the classical reliability coefficient. Simultaneously, as error score correlations increased from 0, the underestimation was attenuated and a became an inflated overestimate of the classical reliability coefficient. Although it is generally accepted that true score assumptions can be tested using confirmatory factor analysis (CFA), the research literature indicates that it is impossible, or at least extremely difficult, to empirically assess uncorrelated error with cross-sectional (as compared to longitudinal) data. Nevertheless, it is shown here that CFA error covariance estimates can be subtracted from a to substantially reduce, if not completely eliminate, the inflation bias that results from positive correlated error.
Various methods for determining unidimensionality are reviewed and the rationale of these methods is as sessed. Indices based on answer patterns, reliability, components and factor analysis, and latent traits are reviewed. It is shown that many of the indices lack a rationale, and that many are adjustments of a previous index to take into account some criticisms of it. After reviewing many indices, it is suggested that those based on the size of residuals after fitting a two- or three-parameter latent trait model may be the most useful to detect unidimensionality. An attempt is made to clarify the term unidimensional, and it is shown how it differs from other terms often used inter changeably such as reliability, internal consistency, and homogeneity. Reliability is defined as the ratio of true score variance to observed score variance. Inter nal consistency denotes a group of methods that are intended to estimate reliability, are based on the vari ances and the covariances of test items, and depend on only one administration of a test. Homogeneity seems to refer more specifically to the similarity of the item correlations, but the term is often used as a synonym for unidimensionality. The usefulness of the terms in ternal consistency and homogeneity is questioned. Uni dimensionality is defined as the existence of one latent trait underlying the data.
An efficient algorithm (MSPLIT) for maximizing split-half reliability coefficients is described. Coefficients derived by the algorithm were found to be generally larger than odd-even split-half coefficients and KR-20 coefficients and were nearly as large as the largest of the coefficients from among every possible split-half arrangement.
Confusion in the literature between the concepts of internal consistency and homogeneity has led to a misuse of coefficient alpha as an index of item homogeneity. Coefficient alpha is actually a complexly determined test statistic, item homogeneity only being one influence on its magnitude. The related statistic, the average intercorrelation, has similar difficulties. Several indices of item homogeneity derived from the model of common factor analysis are offered as alternatives.
The properties of various internal-consistency formulas have been examined with hypothetic stratified-parallel tests constructed by sampling items from universes with specified characteristics. "When a test is constructed by stratifying on content and difficulty, one may properly estimate its coefficient of generalizability by αCD or αC… . Stratifying on content is clearly more important than stratification on difficulty, both in test construction and test analysis." (PsycINFO Database Record (c) 2012 APA, all rights reserved)
The reliability coefficient of any one form of a test is defined "as the variance ratio of true scores to total scores." Discussion of the estimation of test reliability "… is limited to the classical linear model, assuming linear regression of true scores on raw scores, and omitting all discussion of variation in reliability with score level." The following topics are rigorously considered: the reliability coefficient, the Spearman-Brown Formula, complex true scores and errors, coefficient alpha (K-R Formula 20), construction of equivalent forms, and varieties of unreliability. "The coefficient of internal consistency of a test yields a gross over-estimate of its practical reliability. Although internal consistency is of major interest to the test constructor, the examinee's 'mobilization' level and the probability of his choice of the correct answer contribute to the stability of performance." The implications of these considerations for test development are discussed briefly. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
This paper presents a contribution to the sampling theory of a set of homogeneous tests which differ only in length, test length being regarded as an essential test parameter. Observed variance-covariance matrices of such measurements are taken to follow a Wishart distribution. The familiar true score-and-error concept of classical test theory is employed. Upon formulation of the basic model it is shown that in a combination of such tests forming a “total” test, the singal-to-noise ratio of the components is additive and that the inverse of the population variance-covariance matrix of the component measures has all of its off-diagonal elements equal, regardless of distributional assumptions. This fact facilitates the subsequent derivation of a statistical sampling theory, there being at mostm + 1 free parameters whenm is the number of component tests. In developing the theory, the cases of known and unknown test lengths are treated separately. For both cases maximum-likelihood estimators of the relevant parameters are derived. It is argued that the resulting formulas will remain resonable even if the distributional assumptions are too narrow. Under these assumptions, however, maximum-likelihood ratio tests of the validity of the model and of hypotheses concerning reliability and standard error of measurement of the total test are given. It is shown in each case that the maximum-likelihood equations possess precisely one acceptable solution under rather natural conditions. Application of the methods can be effected without the use of a computer. Two numerical examples are appended by way of illustration.
The separate questions on an essay test or the individual judges on a rater panel may constitute congeneric parts rather than tau-equivalent parts. Also, it may be necessary to infer the lengths of the congeneric parts from their variances and covariances, rather than from some obvious feature of each part, such as the range of possible scores. Cronbach's alpha coefficient applied to such part-tests data will underestimate total score reliability. Several reliability coefficients are developed for such instruments. They may be regarded as extensions of the coefficient developed by Kristof for a three-part test.
Two well-known lower bounds to the reliability in classical test theory, Guttman's 2 and Cronbach's coefficient alpha, are shown to be terms of an infinite series of lower bounds. All terms of this series are equal to the reliability if and only if the test is composed of items which are essentially tau-equivalent. Some practical examples, comparing the first 7 terms of the series, are offered. It appears that the second term (2) is generally worth-while computing as an improvement of the first term (alpha) whereas going beyond the second term is not worth the computational effort. Possibly an exception should be made for very short tests having widely spread absolute values of covariances between items. The relationship of the series and previous work on lower bound estimates for the reliability is briefly discussed.
It is well known that coefficient alpha can be used to estimate the reliability of a test even when the test is split into several parts. It is also known that alpha can severely underestimate test reliability when the several parts have an unequal number of items. A gernalization of alpha,β k, is proposed to correct this defect. Several properties ofβ k are also presented.
In some situations where reliability must be estimated it is impossible to divide the measuring instrument into more than two separately scoreable parts. When this is the case, the parts may be homogeneous in content but clearly unequal in length. The resultant scores will not be essentially -equivalent, and hence total test reliability cannot be satisfactorily estimated via Cronbach's coefficient alpha. Limitation on the number of parts rules out Kristof's three-part approach. A technique is developed for estimating reliability in such situations. The approach is shown to function very well when applied to five achievement tests.
This paper gives a method of estimating the reliability of a test which has been divided into three parts. The parts do not have to satisfy any statistical criteria like parallelism or-equivalence. If the parts are homogeneous in content (congeneric),i.e., if their true scores are linearly related and if sample size is large then the method described in this paper will give the precise value of the reliability parameter. If the homogeneity condition is violated then underestimation will typically result. However, the estimate will always be at least as accurate as coefficient and Guttman's lower bound 3 when the same data are used. An application to real data is presented by way of illustration. Seven different splits of the same test are analyzed. The new method yields remarkably stable reliability estimates across splits as predicted by the theory. One deviating value can be accounted for by a certain unsuspected peculiarity of the test composition. Both coefficient and 3 would not have led to the same discovery.
Following a general approach due to Guttman, coefficientα is rederived as a lower bound on the reliability of a test. A necessary and sufficient condition under which equality is attained in this inequality and hence thatα is equal to the reliability of the test is derived and shown to be closely related to the recent redefinition of the concept of parallel measurements due to Novick. This condition is then also shown to be closely related to the unit rank assumption originally adopted by Kuder and Richardson in the derivation of their formula 20. The assumption later adopted by Jackson and Ferguson and the one adopted by Gulliksen are shown to be related to the necessary and sufficient condition derived here. It is then pointed out that the statement that “coefficientα is equal to the mean of the split-half reliabilities” is true only under the restricted condition assumed by Cronbach in the body of his derivation of this result. Finally some limitations on the uses of any function ofα as a measure of internal consistency are noted.