ArticlePublisher preview available

Sample Size Calculation and Optimal Design for Regression-Based Norming of Tests and Questionnaires

August 2021
Psychological Methods 28(1):89–106

August 2021
28(1):89–106

DOI:10.1037/met0000394

Authors:

Francesco Innocenti

Maastricht University

Math Candel

Maastricht University

Gerard van Breukelen

Maastricht University

To prevent mistakes in psychological assessment, the precision of test norms is important. This can be achieved by drawing a large normative sample and using regression-based norming. Based on that norming method, a procedure for sample size planning to make inference on Z-scores and percentile rank scores is proposed. Sampling variance formulas for these norm statistics are derived and used to obtain the optimal design, that is, the optimal predictor distribution, for the normative sample, thereby maximizing precision of estimation. This is done under five regression models with a quantitative and a categorical predictor, differing in whether they allow for interaction and nonlinearity. Efficient robust designs are given in case of uncertainty about the regression model. Furthermore, formulas are provided to compute the normative sample size such that individuals' positions relative to the derived norms can be assessed with prespecified power and precision. (PsycInfo Database Record (c) 2021 APA, all rights reserved). Supplemental materials: doi.org/10.1037/met0000394.supp and github.com/FrancescoInnocenti-Stat/SampSize-OD-Norming

A preview of this full-text is provided by American Psychological Association.

Learn more

Content available from Psychological Methods

This content is subject to copyright. Terms and conditions apply.

Sample Size Calculation and Optimal Design for Regression-Based

Norming of Tests and Questionnaires

Francesco Innocenti

, Frans E. S. Tan

, Math J. J. M. Candel

, and Gerard J. P. van Breukelen

1, 2

Department of Methodology and Statistics, Care and Public Health Research Institute (CAPHRI), Maastricht University

Department of Methodology and Statistics, Graduate School of Psychology and Neuroscience, Maastricht University

Abstract

To prevent mistakes in psychological assessment, the precision of test norms is important. This can be

achieved by drawing a large normative sample and using regression-based norming. Based on that norming

method, a procedure for sample size planning to make inference on Z-scores and percentile rank scores is

proposed. Sampling variance formulas for these norm statistics are derived and used to obtain the optimal

design, that is, the optimal predictor distribution, for the normative sample, thereby maximizing precision of

estimation. This is done under ﬁve regression models with a quantitative and a categorical predictor, differ-

ing in whether they allow for interaction and nonlinearity. Efﬁcient robust designs are given in case of uncer-

tainty about the regression model. Furthermore, formulasare provided to compute the normative sample size

such that individuals’positions relative to the derived norms can be assessed with prespeciﬁed power and

precision.

Translational Abstract

Normative studies are needed to derive reference values (or norms) for tests and questionnaires, so that psy-

chologists can use them to assess individuals. Speciﬁcally, norms allow psychologists to interpret individu-

als’score on a test by comparing it with the scores of their peers (e.g., individuals with the same sex,age, and

educational level) in the reference population. Because norms are also used to make decisions on individuals,

such as the assignment to clinical treatment or remedial teaching, it is important that norms are precise (i.e.,

not strongly affected by sampling error in the sample on which the norms are based). This article shows how

this goal can be attained in three steps. First, norms are derived using the regression-based approach, which

is more efﬁcient than the traditional approach of splitting the sample into subgroups based on demographic

factors and deriving norms per subgroup. Speciﬁcally, the regression-based approach allows researchers to

identify the predictors (e.g., demographic factors) that affect the test score of interest, and to use the whole

sample to derive norms. Second, the design of the normative study (e.g., which age groups to include) is cho-

sen such that the precision of the norms is maximized for a given total sample size for norming. Third, this

total sample size is computed such that a prespeciﬁed power and precision are obtained.

Keywords: normative data, optimal design, percentile rank score, sample size calculation, Z-score

Supplemental materials: https://doi.org/10.1037/met0000394.supp

Normative studies provide reference values, also known as

norms, that psychologists can use to compare individuals with the

reference population, for instance, to make decisions about clinical

treatments, school admission or remedial teaching, or selection of

candidates for job vacancies. Examples of normative studies are

Goretti et al. (2014) and Parmenter et al. (2010), who have derived

reference values for two batteries of neuropsychological tests to

assess cognitive function in patients with multiple sclerosis, and

Van der Elst et al. (2006), who have normed the Dutch version of

three verbal ﬂuency tests. Normative studies are of practical im-

portance because they allow psychologists to interpret scores on

the outcome variable of interest by comparing an individual’s test

score with the scores of his or her peers (e.g., individuals of the

same age, sex, and educational level) in the reference population.

For instance, knowing that a highly educated 75-year-old woman

scored 11.5 on the profession naming verbal ﬂuency test is in itself

not informative on whether this score is within the normal range

or exceptional. According to the normative data provided by Van

der Elst et al., (2006, Table A.2), only 10% of her peers (i.e.,

women of the same age and educational level) have a test score equal

Francesco Innocenti https://orcid.org/0000-0001-6113-8992

Math J. J. M. Candel https://orcid.org/0000-0002-2229-1131

Gerard J. P. van Breukelen https://orcid.org/0000-0003-0949-0272

We have no conﬂict of interest to disclose. A summary of this study was

presented at the 41st Annual Conference of the International Society for

Clinical Biostatistics.

Correspondence concerning this article should be addressed to Francesco

Innocenti, Department of Methodology and Statistics, Care and Public

Health Research Institute (CAPHRI), Maastricht University, P.O. Box 616,

6200 MD, Maastricht, the Netherlands. Email: francesco.innocenti@

maastrichtuniversity.nl

Psychological Methods

ISSN: 1082-989X https://doi.org/10.1037/met0000394

This document is copyrighted by the American Psychological Association or one of its allied publishers.

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

2023, Vol. 28, No. 1, 89–106

This article was published Online First August 12, 2021.

R code to determine the required sample size for the optimal design of a normative study assuming normality and homoscedasticity of the residuals.

Code

December 2023

Francesco Innocenti · Frans E S Tan · Math Candel · Gerard van Breukelen

Download

Sample Size Calculation and Optimal Design for Multivariate Regression-Based Norming

Article

Full-text available

Nov 2023
J EDUC BEHAV STAT

Normative studies are needed to obtain norms for comparing individuals with the reference population on relevant clinical or educational measures. Norms can be obtained in an efficient way by regressing the test score on relevant predictors, such as age and sex. When several measures are normed with the same sample, a multivariate regression-based approach must be adopted for at least two reasons: (1) to take into account the correlations between the measures of the same subject, in order to test certain scientific hypotheses and to reduce misclassification of subjects in clinical practice, and (2) to reduce the number of significance tests involved in selecting predictors for the purpose of norming, thus preventing the inflation of the type I error rate. A new multivariate regression-based approach is proposed that combines all measures for an individual through the Mahalanobis distance, thus providing an indicator of the individual’s overall performance. Furthermore, optimal designs for the normative study are derived under five multivariate polynomial regression models, assuming multivariate normality and homoscedasticity of the residuals, and efficient robust designs are presented in case of uncertainty about the correct model for the analysis of the normative sample. Sample size calculation formulas are provided for the new Mahalanobis distance-based approach. The results are illustrated with data from the Maastricht Aging Study (MAAS).

Normative Data Estimation in Neuropsychological Tests: A Systematic Review

Article

Full-text available

Nov 2023
ARCH CLIN NEUROPSYCH

Objective To quantify the evolution, impact, and importance of normative data (ND) calculation by identifying trends in the research literature and what approaches need improvement. Methods A PRISMA-guideline systematic review was performed on literature from 2000 to 2022 in PubMed, Pub-Psych, and Web of Science. Inclusion criteria included scientific articles about ND in neuropsychological tests with clear data analysis, published in any country, and written in English or Spanish. Cross-sectional and longitudinal studies were included. Bibliometric analysis was used to examine the growth, productivity, journal dispersion, and impact of the topic. VOSViewer compared keyword co-occurrence networks between 1952–1999 and 2000–2022. Results Four hundred twelve articles met inclusion and exclusion criteria. The most studied predictors were age, education, and sex. There were a greater number of studies/projects focusing on adults than children. The Verbal Fluency Test (12.7%) was the most studied test, and the most frequently used variable selection strategy was linear regression (49.5%). Regression-based approaches were widely used, whereas the traditional approach was still used. ND were presented mostly in percentiles (44.2%). Bibliometrics showed exponential growth in publications. Three journals (2.41%) were in the Core Zone. VOSViewer results showed small nodes, long distances, and four ND-related topics from 1952 to 1999, and there were larger nodes with short connections from 2000 to 2022, indicating topic spread. Conclusions Future studies should be conducted on children’s ND, and alternative statistical methods should be used over the widely used regression approaches to address limitations and support growth of the field.

Continuous Norming Approaches: A Systematic Review and Real Data Example

Preprint

Aug 2023

Norming of psychological tests and scales is decisive for the interpretation of test scores. However, conventional norming methods based on subgroups result either in biases or require very large samples to gather precise norms. Continuous norming methods, namely inferential, semi-parametric, and parametric norming, propose to solve those issues. This paper provides a systematic review of international research on continuous norming and summarizes and describes currently applied continuous norming practices. The review includes 121 publications with overall 189 studies. Most of these studies used inferential norming to compute continuous norms for a specific test and emerged in recent years. Summarizing the literature, we identified open questions such as when to prefer which continuous norming method over another. To address these open questions, we conducted a real data example. We used the Need for Cognition-KIDS scale, a personality questionnaire for elementary school children. Comparing the precision of conventional, semi-parametric, and parametric norms revealed a clear hierarchy in favor of parametric norms. Moreover, bias comparison of conventional and parametric norms showed less bias in parametric norms. Estimating the discrepancies between continuous and conventional norm scores revealed tremendous differences for some individuals.

Phonological and semantic verbal fluency test: Scoring criteria and normative data for clustering and switching strategies for Colombian children and adolescents

Article

May 2023
INT J LANG COMM DIS

Background: Verbal fluency tests (VFT) are highly sensitive to cognitive deficits. Usually, the score on VFT is based on the number of correct words produced, yet it alone gives little information regarding underlying test performance. The implementation of different strategies (cluster and switching) to perform efficiently during the tasks provide more valuable information. However, normative data for clustering and switching strategies are scarce. Moreover, scoring criteria adapted to Colombian Spanish are missing. Aims: (1) To describe the Colombian adaptation of the scoring system guidelines for clustering and switching strategies in VFT; (2) to determine its reliability; and (3) to provide normative data for Colombian children and adolescents aged 6-17 years. Methods & procedures: A total of 691 children and adolescents from Colombia completed phonological (/f/, /a/, /s/, /m/, /r/ and /p/) and semantic (animals and fruits) VFT, and five scores were calculated: total score (TS), number of clusters (NC), cluster size (CS), mean cluster size (MCS) and number of switches (NS). The intraclass correlation coefficient was used for interrater reliability. Hierarchical multiple regressions were conducted to investigate which strategies were associated with VFT TS. Multiple regressions were conducted for each strategy, including as predictors age, age2 , sex, mean parents' education (MPE), MPE2 and type of school, to generate normative data. Outcomes & results: Reliability indexes were excellent. Age was associated with VFT TS, but weakly compared with strategies. For both VFT TS, NS was the strongest variable, followed by CS and NC. Regarding norms, age was the strongest predictor for all measures, while age2 was relevant for NC (/f/ phoneme) and NS (/m/ phoneme). Participants with higher MPE obtained more NC, and NS, and larger CS in several phonemes and categories. Children and adolescents from private school generated more NC, NS and larger CS in /s/ phoneme. Conclusions & implications: This study provides new scoring guidelines and normative data for clustering and switching strategies for Colombian children and adolescents between 6 and 17 years old. Clinical neuropsychologists should include these measures as part of their everyday practice. What this paper adds: What is already known on the subject VFT are widely used within the paediatric population due to its sensitivity to brain injury. Its score is based on the number of correct words produced; however, TS alone gives little information regarding underlying test performance. Several normative data for VFT TS in the paediatric population exist, but normative data for clustering and switching strategies are scarce. What this paper adds to existing knowledge The present study is the first to describe the Colombian adaptation of the scoring guidelines for clustering and switching strategies, and provided normative data for these strategies for children and adolescents between 6 and 17 years old. What are the potential or actual clinical implications of this work? Knowing VFT's performance, including strategy development and use in healthy children and adolescents, may be useful for clinical settings. We encourage clinicians to include not only TS, but also a careful analysis of strategies that may be more informative of the underlying cognitive processes failure than TS.

Improving confidence intervals for normed test scores: Include uncertainty due to sampling variability

Article

Full-text available

Nov 2018
BEHAV RES METHODS

Test publishers usually provide confidence intervals (CIs) for normed test scores that reflect the uncertainty due to the unreliability of the tests. The uncertainty due to sampling variability in the norming phase is ignored. To express uncertainty due to norming, we propose a flexible method that is applicable in continuous norming and allows for a variety of score distributions, using Generalized Additive Models for Location, Scale, and Shape (GAMLSS; Rigby & Stasinopoulos, 2005). We assessed the performance of this method in a simulation study, by examining the quality of the resulting CIs. We varied the population model, procedure of estimating the CI, confidence level, sample size, value of the predictor, extremity of the test score, and type of variance-covariance matrix. The results showed that good quality of the CIs could be achieved in most conditions. The method is illustrated using normative data of the SON-R 6-40 test. We recommend test developers to use this approach to arrive at CIs, and thus properly express the uncertainty due to norm sampling fluctuations, in the context of continuous norming. Adopting this approach will help (e.g., clinical) practitioners to obtain a fair picture of the person assessed. Electronic supplementary material The online version of this article (10.3758/s13428-018-1122-8) contains supplementary material, which is available to authorized users.

Efficient design of cluster randomized trials with treatment-dependent costs and treatment-dependent unknown variances

Article

Full-text available

Jun 2018

Cluster randomized trials evaluate the effect of a treatment on persons nested within clusters, where treatment is randomly assigned to clusters. Current equations for the optimal sample size at the cluster and person level assume that the outcome variances and/or the study costs are known and homogeneous between treatment arms. This paper presents efficient yet robust designs for cluster randomized trials with treatment‐dependent costs and treatment‐dependent unknown variances, and compares these with 2 practical designs. First, the maximin design (MMD) is derived, which maximizes the minimum efficiency (minimizes the maximum sampling variance) of the treatment effect estimator over a range of treatment‐to‐control variance ratios. The MMD is then compared with the optimal design for homogeneous variances and costs (balanced design), and with that for homogeneous variances and treatment‐dependent costs (cost‐considered design). The results show that the balanced design is the MMD if the treatment‐to control cost ratio is the same at both design levels (cluster, person) and within the range for the treatment‐to‐control variance ratio. It still is highly efficient and better than the cost‐considered design if the cost ratio is within the range for the squared variance ratio. Outside that range, the cost‐considered design is better and highly efficient, but it is not the MMD. An example shows sample size calculation for the MMD, and the computer code (SPSS and R) is provided as supplementary material. The MMD is recommended for trial planning if the study costs are treatment‐dependent and homogeneity of variances cannot be assumed.

Establishing normative data for multi-trial memory tests: the multivariate regression-based approach

Article

Full-text available

Feb 2017

Objective: Multi-trial memory tests are widely used in research and clinical practice because they allow for assessing different aspects of memory and learning in a single comprehensive test procedure. However, the use of multi-trial memory tests also raises some key data analysis issues. Indeed, the different trial scores are typically all correlated, and this correlation has to be properly accounted for in the statistical analyses. In the present paper, the focus is on the setting where normative data have to be established for multi-trial memory tests. At present, normative data for such tests are typically based on a series of univariate analyses, i.e. a statistical model is fitted for each of the test scores separately. This approach is suboptimal because (1) the correlated nature of the data is not accounted for, (2) multiple testing issues may arise, and (3) the analysis is not parsimonious. Method and results: Here, a normative approach that is not hampered by these issues is proposed (the so-called multivariate regression-based approach). The methodology is exemplified in a sample of N = 221 Dutch-speaking children (aged between 5.82 and 15.49 years) who were administered Rey's Auditory Verbal Learning Test. An online Appendix that details how the analyses can be conducted in practice (using the R software) is also provided. Conclusion: The multivariate normative regression-based approach has some substantial methodological advantages over univariate regression-based methods. In addition, the method allows for testing substantive hypotheses that cannot be addressed in a univariate framework (e.g. trial by covariate interactions can be modeled).

Applied Multivariate Statistical Analysis

Article

Dec 1988

R: A Language and Environment for Statistical Computing

Book

Jan 2015

Core R Team

Model Selection in Continuous Test Norming With GAMLSS

Article

Jun 2017

To compute norms from reference group test scores, continuous norming is preferred over traditional norming. A suitable continuous norming approach for continuous data is the use of the Box-Cox Power Exponential model, which is found in the generalized additive models for location, scale, and shape. Applying the Box-Cox Power Exponential model for test norming requires model selection, but it is unknown how well this can be done with an automatic selection procedure. In a simulation study, we compared the performance of two stepwise model selection procedures combined with four model-fit criteria (Akaike information criterion, Bayesian information criterion, generalized Akaike information criterion (3), cross-validation), varying data complexity, sampling design, and sample size in a fully crossed design. The new procedure combined with one of the generalized Akaike information criterion was the most efficient model selection procedure (i.e., required the smallest sample size). The advocated model selection procedure is illustrated with norming data of an intelligence test.

Standard Errors and Confidence Intervals of Norm Statistics for Educational and Psychological Tests

Article

Nov 2016
PSYCHOMETRIKA

Norm statistics allow for the interpretation of scores on psychological and educational tests, by relating the test score of an individual test taker to the test scores of individuals belonging to the same gender, age, or education groups, et cetera. Given the uncertainty due to sampling error, one would expect researchers to report standard errors for norm statistics. In practice, standard errors are seldom reported; they are either unavailable or derived under strong distributional assumptions that may not be realistic for test scores. We derived standard errors for four norm statistics (standard deviation, percentile ranks, stanine boundaries and Z-scores) under the mild assumption that the test scores are multinomially distributed. A simulation study showed that the standard errors were unbiased and that corresponding Wald-based confidence intervals had good coverage. Finally, we discuss the possibilities for applying the standard errors in practical test use in education and psychology. The procedure is provided via the R function check.norms, which is available in the mokken package.

Regression diagnostics: quantitative applications in the social sciences

Article

Jan 1991

J. Fox

Optimal Design of Experiments: A Case-Study Approach

Book

Jun 2011

"This is an engaging and informative book on the modern practice of experimental design. The authors' writing style is entertaining, the consulting dialogs are extremely enjoyable, and the technical material is presented brilliantly but not overwhelmingly. The book is a joy to read. Everyone who practices or teaches DOE should read this book." -Douglas C. Montgomery, Regents Professor, Department of Industrial Engineering, Arizona State University ''It's been said: 'Design for the experiment, don't experiment for the design.' This book ably demonstrates this notion by showing how tailor-made, optimal designs can be effectively employed to meet a client's actual needs. It should be required reading for anyone interested in using the design of experiments in industrial settings.''-Christopher J. Nachtsheim, Frank A Donaldson Chair in Operations Management, Carlson School of Management, University of Minnesota This book demonstrates the utility of the computer-aided optimal design approach using real industrial examples. These examples address questions such as the following: How can I do screening inexpensively if I have dozens of factors to investigate? What can I do if I have day-to-day variability and I can only perform 3 runs a day? How can I do RSM cost effectively if I have categorical factors? How can I design and analyze experiments when there is a factor that can only be changed a few times over the study? How can I include both ingredients in a mixture and processing factors in the same study? How can I design an experiment if there are many factor combinations that are impossible to run? How can I make sure that a time trend due to warming up of equipment does not affect the conclusions from a study? How can I take into account batch information in when designing experiments involving multiple batches? How can I add runs to a botched experiment to resolve ambiguities? While answering these questions the book also shows how to evaluate and compare designs. This allows researchers to make sensible trade-offs between the cost of experimentation and the amount of information they obtain.

Sample Size Requirements for Traditional and Regression-Based Norms

Article

May 2015

Test norms enable determining the position of an individual test taker in the group. The most frequently used approach to obtain test norms is traditional norming. Regression-based norming may be more efficient than traditional norming and is rapidly growing in popularity, but little is known about its technical properties. A simulation study was conducted to compare the sample size requirements for traditional and regression-based norming by examining the 95% interpercentile ranges for percentile estimates as a function of sample size, norming method, size of covariate effects on the test score, test length, and number of answer categories in an item. Provided the assumptions of the linear regression model hold in the data, for a subdivision of the total group into eight equal-size subgroups, we found that regression-based norming requires samples 2.5 to 5.5 times smaller than traditional norming. Sample size requirements are presented for each norming method, test length, and number of answer categories. We emphasize that additional research is needed to establish sample size requirements when the assumptions of the linear regression model are violated. © The Author(s) 2015.

Sample Size Calculation and Optimal Design for Regression-Based Norming of Tests and Questionnaires

Abstract

Supplementary resource (1)

Recommended publications

Sample Size Calculation and Optimal Design for Multivariate Regression-Based Norming

Optimal two-stage sampling for mean estimation in multilevel populations when cluster size is inform...

Efficient treatment allocation in 2 × 2 multicenter trials when costs and variances are heterogeneou...

Maximin design of cluster randomized trials with heterogeneous costs and variances