ArticlePDF Available

Translating Tests: Some Practical Guidelines

June 1996
European Psychologist 1(2):89-99

June 1996
1(2):89-99

DOI:10.1027/1016-9040.1.2.89

Authors:

Fons Van de Vijver

Tilburg University

Ronald K Hambleton

University of Massachusetts Amherst

Discusses the translation of psychological instruments in cross-cultural research. A taxonomy of bias ranging from unobserved ethnocentrism in constructs to incorrect word choice in translations is provided. Three types of bias are distinguished: (1) construct bias (related to nonequivalence of constructs across cultural groups), (2) method bias (resulting from instrument administration problems), and (3) item bias (often a result of inadequate translations such as incorrect word choice). Ways in which bias can affect the adequacy of instruments are illustrated. Guidelines for test translations are outlined and are fully described. (PsycINFO Database Record (c) 2012 APA, all rights reserved)

Content uploaded by Fons Van de Vijver

Content may be subject to copyright.

Test Translations 1

Translating Tests: Some Practical Guidelines

Fons Van de Vijver

Tilburg University, The Netherlands

and

Ronald K. Hambleton

University of Massachusetts at Amherst, USA

Key words: Test Translations, Test Adaptations, Bias, Guidelines

Test Translations 2

Abstract

With the increasing interest in cross-cultural research, there is a growing need for standard

and validated practices for translating psychological instruments. Developing a

psychologically acceptable instrument for another cultural group almost always requires more

effort than a literal translation which all too often is the common practice. The adequacy of

translations can be threatened by various sources of bias. Three types of bias are distinguished

in this paper: (1) construct bias (related to non-equivalence of constructs across cultural

groups), (2) method bias (resulting from instrument administration problems), and (3) item

bias (often a result of inadequate translations such as incorrect word choice). Ways in which

bias can affect the adequacy of instruments are illustrated and possible remedies are

discussed.

Test Translations 3

Translating Tests: Some Practical Guidelines

Interest has grown steadily in cross-cultural comparisons over the last 20 years. For

example, the 1995 Third International Mathematics and Science Study with over 40

participating countries and tests in over 30 languages is the largest cross-cultural comparative

study of school achievement that has ever been conducted. The number of studies dealing

with cross-cultural comparisons in PsycLit, an electronic journal publishing summaries of a

wide variety of psychology journals, also reflects this increase (Van de Vijver & Lonner,

1995). To some extent, the increase is due to an increase in the number of journals with a

masthead policy to publish mainly or exclusively cross-cultural studies such as the Hispanic

Journal of Behavioral Sciences and Psychology and Developing Societies. However, the

massive increase cannot be explained solely by the inauguration of new journals.

A considerably more important factor is the heightened interest in cross-cultural

differences. Whereas in former days most of the cross-cultural research was carried out by

psychologists who devoted most or all their research efforts to cross-cultural research, it is

much more common today to find reports by researchers whose previous work took place

within a single culture and who now are expanding their work to other cultures. An

instrument that has shown good reliability and validity in one cultural context and has

produced some interesting results, is applied elsewhere in order to examine cultural

similarities and differences. For these researchers, a cross-cultural study does not mark the

beginning of a research program in cross-cultural psychology. Instead, it is a natural extension

of their previous work in a single culture.

The design, method, and analysis of cross-cultural studies have various unique

features that are absent or less salient in intracultural studies. In this paper, the focus will be

Test Translations 4

on an important methodological aspect of cross-cultural research studies: translation of

instruments. The application of an instrument in a new cultural group is more involved than

simply producing text in another language, administering the translated instrument, and

comparing the results (see, for example, Hambleton, 1993, 1994). There are many difficult

questions to be addressed in multilingual studies: For example, does the construct apply to the

target group or does it show an ethnocentric bias? Are the behaviors associated with the bias

similar in the source and target groups? Is the measurement procedure (e.g., stimulus and

response format) adequate for application in the target group?

A taxonomy of bias, meant here as a generic term for all kinds of factors that

jeopardize the validity of intergroup comparisons, ranging from unobserved ethnocentrism in

constructs to incorrect word choice in translations, will be presented in the first part of the

paper. Three kinds of bias will be distinguished, depending on whether they are brought about

by anomalies in the theoretical construct, instrument administration, or specific items.

Depending on the kind of bias that can be expected, translations can amount to either the

application of a literal translation of an instrument, of an adapted version, or of an entirely

new instrument to measure the same construct. These three kinds of bias will be described in

some detail in the second part of the paper. In the third and longest part of the paper,

guidelines for test translations will be presented and described. Implications of these

guidelines will be also be described.

Types of Bias in Test Translation

A distinction can be made among three types of bias in cross-cultural research (Van de

Vijver & Poortinga, in press). The first is construct bias which is said to occur when the

Test Translations 5

construct that is measured by an instrument shows nonnegligible differences across cultures;

both differences in conceptualization and in behaviors associated with the construct can

underlie construct bias. A well known example of differences in conceptualization is

provided by intelligence. In our measurements of intelligence there is an emphasis on

cognitive performance such as reasoning (e.g., the Raven Progressive Matrices Tests) or

previously acquired knowledge (in tests of crystallized intelligence). It has been shown

repeatedly that everyday conceptualizations of intelligence are often broader and also include

social aspects such as communication skills and even obedience (Serpell, 1993; Sternberg,

1985; Super, 1983).

An example of differences in behaviors is formed by the concept of filial piety; Ho (in

press) found that behaviors associated with being a good son or daughter such as taking care

of one’s parents, conforming to their requests, and treating them well, are much broader in

China than in most Western countries. Conclusions drawn on the basis of instruments that

show construct bias, can be misleading when no reference is made to cross-cultural

differences in the conceptualization or behaviors associated with the construct. Statements

about differences in filial piety of Chinese and, say, German subjects, will be incorrect when

they are based on an instrument that describes exclusively behaviors of these cultural groups.

It is important to examine the occurrence of construct bias by exploring any

ethnocentric bias in our theory or operationalizations. Local surveys aimed at exploring the

everyday conceptualizations of the construct and the behaviors associated with the construct

provide an effective means to study construct bias. Such a survey could have various

outcomes. It could support the validity of the existing instrument; it could also point to the

inapplicability of particular items or sets of items (e.g., items about nuclear families in

Test Translations 6

communities that do not live in nuclear families). The findings of the survey are most

consequential when there is a substantial lack of overlap of conceptualization or construct-

characteristic behaviors. In such a case, substantial revisions of the conceptualization or

instrument are required.

Construct bias is more likely to occur when an existing instrument is translated than

when an instrument is simultaneously developed for different languages. In the latter case it is

easier to avoid ethnocentric tendencies and to remove words and concepts in a source

language that are not common in the two languages and cultures. A successful avoidance of

ethnocentric tendencies in instruments may require a multicultural, multilingual team with an

expertise in the construct under study.

Method bias is a generic term for validity-threatening factors that are related to

instrument administration. Various sources of method bias are easy to imagine such as

intergroup differences in social desirability; in response sets such as acquiescence; in

familiarity with stimuli, response formats (e.g., multiple choice, Likert scales), or with testing

situations in general; in physical conditions in which a test is administered; in subjects’

motivation; in administrator effects; and in communication problems between the

administrator and the persons taking the test. If present, method bias usually influences most

or all items and hence, it will lead to differences in scores between groups that are to be

attributed to the administration procedure and not to any intrinsic differences of the groups on

the construct studied.

In order to examine method bias, an often neglected source of bias in cross-cultural

studies, additional information has to be collected. An effective means is the application of

additional methods to collect information about the same underlying trait such as is reflected

Test Translations 7

in the use of monotrait--multimethod matrices (e.g., Campbell & Fiske, 1959; Marsh &

Byrne, 1993), also known as triangulation (e.g., Lipson & Meleis, 1989).

As an alternative, repeated test administrations can be applied. The procedure is

particularly useful for mental tests. A study of the cross-cultural similarity of score changes

from the first to the second test administration can give important clues about the validity of

the measurement. When individuals from different groups with equal test scores on the first

occasion have on average dissimilar scores on the second occasion, one can retrospectively

doubt the validity of the first administration. Measurements of social desirability or studies of

response sets can also address method bias (e.g., Fioravanti, Gough, & Frere, 1981; Hui &

Triandis, 1989). Finally, method bias can be examined by administering the instrument in a

nonstandard way, soliciting all kinds of responses from a respondent about the interpretation

of instructions, items, response alternatives, and motivations for answers. Such a nonstandard

administration provides an approximate check on the suitability of the instrument in the target

group.

Item bias or differential item functioning (as it is sometimes called) is the last source

of anomalies in instrument translations. It refers to instrument anomalies at the item level

such as poor wording, inappropriateness of item content in a cultural group, and inaccurate

translations. An item is biased if persons from different groups with the same score on the

construct, commonly operationalized as the score on the instrument, do not have the same

expected score on the item (Holland & Wainer, 1993; Shepard, Camilli, & Averill, 1981). For

an unbiased item, knowledge of the total test score of a person does not contain information

about group membership, while for a biased item it does. Various statistical techniques have

been developed to detect item bias. The currently most popular technique for dichotomously-

Test Translations 8

scored items is the so-called Mantel-Haenszel procedure (Holland & Thayer, 1988; Holland

& Wainer, 1993).

For test scores with interval-scale properties, other techniques can be applied. An

example is an analysis of variance in which the item score is the dependent variable and

culture and score level are the independent variables. The latter assumes that prior to the

analysis the sample has been split in various score levels. A significant main effect of culture

or of the interaction between culture and score level point to bias (more details can be found

in Van de Vijver & Leung, in press).

Compared to construct and method bias, the examination of item bias is the least

cumbersome. First of all, a large number of sophisticated statistical techniques to detect item

bias are available; second, scrutinizing item bias does not require the collection of additional

data as is the case for construct and method bias.

Options in Instrument Translations: Apply, Adapt, and Assemble

The nature and size of the bias that can be expected will have implications on the

options available when translating an instrument. For instance, when a pilot study has shown

the presence of method bias, various instrument alterations may be required in order to ensure

the validity of the instrument in all groups. Depending on the changes that are required,

instrument translators have three options: (a) to apply the instrument in a literal translation;

(b) to adapt parts of the instrument; (c) to assemble an entirely new instrument (Van de Vijver

& Leung, in press). Going from the first to the third option, there will be more changes

required to make the instrument appropriate in the target group.

The translation options are related to the three types of bias distinguished before. The

Test Translations 9

first option, the application of the instrument, assumes that a literal translation of the

instrument will yield an instrument in the target group that has good coverage of the

theoretical construct and an adequate instrument format. In other words, both construct and

method bias are then assumed to be absent and only item bias is examined. From a

methodological perspective, the application option is straightforward. An instrument is

translated into a target language or, in the case of a new instrument, simultaneously developed

in two or more languages followed by an independent back-translation (Brislin, 1980; Werner

& Campbell, 1970). After a comparison of the original and back-translated versions, possibly

followed by suitable revision, the instrument is applied in the source and target cultures and

the results are compared. The simplicity of the option probably explains its widespread usage.

However, the application option in multilingual studies may not address method and

construct bias. When the instrument leaves important aspects of the construct in the target

group unexplored, the application option will be inadequate and the instrument will have to

be adjusted to the local context. Such an adjustment can take on various forms such as

adaptations of the stimulus or response format or interviewer training, or the application of a

multimethod approach.

The administration of not fully identical instruments in different cultural groups can

complicate statistical analyses. Analyses of variance and t tests on total test scores assume

identity of stimuli, and adjustments are required to deal with the stimulus dissimilarities.

Some statistical techniques have scope for these dissimilarities. As an example, item response

theory can be mentioned (see, for example, Hambleton, Swaminathan, & Rogers, 1991).

Scores of examinees on the ability measured by the instrument (i.e., the latent trait) are

independent of the particular stimuli that have been used to measure ability. Confirmatory

Test Translations 10

factor analysis also allows for incomplete overlap of stimuli.

Attractive as this may sound, the approach to overcome problems of partial overlap by

applying sophisticated statistical techniques has limitations. When there is substantial overlap

between the items administered in all groups, the approach will work well and the culture-

specific items may well enhance the validity of the instrument in the local culture. However,

when the overlap is small, the instrument will not have enough common material (referred to

as the “anchor”) on the basis of which scores can be compared across cultures and culture-

specific items will add important aspects of the construct. A meaningful score comparison is

then difficult to do.

The most severe instrument changes are usually required in the case of construct bias.

Avoiding this type of bias may require the removal of particular items that are inappropriate

in the new cultural context. For example, Van Haaften and Van de Vijver (in review) applied

Amirkhan’s (1990) Coping Strategy Indicator to Sahel dwellers. The item “watched more

television than usual” had to be skipped because there was no electricity in the area of the

study and television sets were uncommon.

The lack of overlap in conceptualization or in shared behaviors across cultures can

become so small that an entirely new instrument has to be assembled. This is most likely to

happen when an instrument that has to be developed in one cultural context, usually some

Western country, contains various --implicit or explicit-- references to the local context of the

test developer.

When entirely new instruments are assembled, the researcher is usually not interested

in comparing average scores across cultures (e.g., in a t test) and there is more interest in the

question as to whether the same psychological construct is measured in all groups. The

Test Translations 11

nomological network of the construct can then be compared across cultural groups using

linear structural models (path models) or regression analysis.

Guidelines for Test Translations

In 1993 an international committee of psychologists was formed by the International

Test Commission, consisting of members of various international organizations representing

branches of psychology in which instrument translations play an important role. The

committee has formulated a set of guidelines describing recommended practices in test

translations. A preliminary report has been published (Hambleton, 1994); the final report will

become available in 1996. The present section describes the 22 guidelines that were

formulated; each guideline will be followed by a brief explanation.

The guidelines cover four domains: context (describing basic principles of

multilingual studies), development (recommended practices in developing multilingual

instruments), administration (issues in instrument administrations), and documentation/score

interpretation (related to interpretation and cross-cultural comparisons of scores).

The context guidelines are as follows:

1. Effects of cultural differences which are not relevant or important to the main

purposes of the study should be minimized to the extent possible.

The guideline expresses a basic principle of cross-cultural research: avoid construct,

method, and item bias as much as possible. Multilingual studies should be geared towards

generating interpretable patterns of intergroup similarities and differences. Without sufficient

precautions against alternative interpretations, intergroup differences tend to be multi-

Test Translations 12

interpretable (Poortinga & Malpass, 1986). For example, differences in observed scores on

the Raven’s Progressive Matrices Test obtained in two widely different cultural groups could

be due to valid intergroup differences in intelligence, but also to intergroup differences in

familiarity with the instrument or the testing situation, in educational background, in

motivation, etc. When no information is available to rule out alternative interpretations, it will

become difficult to interpret observed differences.

The guideline does not state that bias sources should be eliminated but adopts the

more realistic position that they should be minimized. When cultural and linguistic

differences between the groups studied are not too big, it may be realistic to pursue the

elimination of bias; however, when large cultural distances have to be bridged by the

instrument, particular sources of bias may be impossible to overcome. For example, when the

Raven’s test of intelligence has been administered to literate and illiterate groups, it will be

unrealistic to assume equality of the groups on factors related to method bias such as

familiarity with stimuli or with the testing situation in general. Measures of bias-related

factors such as a measure of stimulus familiarity or previous test exposure can often be added

to corroborate a particular interpretation of intergroup differences. These measures can be

introduced as covariates in an analysis of covariance in order to examine as to whether there

are remaining intergroup differences after statistical correction for these biasing factors.

2. The amount of overlap in the constructs in the populations of interest should be

assessed.

The guideline refers to construct bias and stresses the need to assess instead of assume

similarity of meaning and of construct-characteristic behaviors across cultural groups. Pilot

studies aimed at identifying construct-characteristic behaviors and cooperation with local

Test Translations 13

experts are tools to examine construct bias.

The guideline expresses a principle recurring in several others: the validity of an

instrument in multilingual studies cannot be taken for granted but has to be demonstrated.

The guidelines on instrument development are as follows:

3. Instrument developers/publishers should insure that the translation/adaptation

process takes full account of linguistic and cultural differences among the populations

for whom the translated/adapted versions of the instrument are intended.

The translation of a test requires a thorough knowledge of both the target language

and the culture. Hambleton (1994, p. 235) provides a useful example to illustrate the point. In

a Swedish-English comparison of educational achievement the following item was

administered:

Where is a bird with webbed feet most likely to live?

a. in the mountains

b. in the woods

c. in the sea

d. in the desert.

The Swedish translation rendered "webbed feet" as "swimming feet," thereby providing a cue

about the correct answer. Such language- or culture-specific elements can easily slip into

translations (or remain unnoticed when they are present in the original instrument). In back-

translation procedures, aiming at a verbatim comparison of original and back-translated

versions, these problems may remain undetected.

4. Instrument developers/publishers should provide evidence that the language use in

the directions, rubrics, and items themselves as well as in the handbook are

Test Translations 14

appropriate for all cultural and language populations for whom the instrument is

intended.

The terms and concepts used in the instrument should be appropriate to all cultural

groups involved. It is important to indicate which measures have been taken to ensure the

translatability of instruments and to reduce the problem of miscommunication. Various rules

have been formulated as to how translatable instruments can be designed. For example,

Brislin (1986) has formulated various guidelines to optimize the translatability of an

instrument, the most important of which are given here (p. 143-150):

· Use short and simple sentences and avoid unnecessary words (unless

redundancy is deliberately sought).

· Employ the active rather than the passive voice because the latter is easier to

comprehend.

· Repeat nouns instead of using pronouns because the latter may have vague

referents; thus, the English "you" can refer to a single or to a group of persons.

· Avoid metaphors and colloquialisms. In many cases their translations will not

be equally concise, familiar, and captivating.

· Avoid verbs and prepositions telling “where” and “when” that do not have a

precise meaning, such as “soon” and “often.”

· Avoid possessive forms where possible because it may be difficult to

determine the ownership. The ownership such as “his” in “his dog” has to be

derived from the context of the sentence and languages vary in their system of

reference.

· Use specific rather than general terms. Who is included in “members of your

Test Translations 15

family” strongly differs across cultures; more precise terms are less likely to

run into this problem.

5. Instrument developers/publishers should provide evidence that the choice of testing

techniques, item formats, test conventions, and procedures are familiar to all intended

populations.

Various formal instrument characteristics such as its response format can jeopardize

the validity of cross-cultural comparisons. For instance, the ability to solve items in a

multiple-choice format requires previous knowledge and experience. Thus, alternatives often

show subtle differences in meaning. Another example is the ability to deal with speed tests

that often requires a delicate balance between speed and accuracy. The application of

instruments among groups without relevant testing experience can be troubled by such

unexpected intergroup differences. When there is a real danger that such factors will affect

performance, test developers may want to provide information about how they have dealt

with the problem (e.g., lengthy test instructions or a repeated test administration).

6. Instrument developers/publishers should provide evidence that item content and

stimulus materials are familiar to all intended populations.

The guideline is related to the previous two, stressing the importance of examining the

familiarity of stimulus features. The guideline is often addressed in mental testing; the notion

that cognitive tests to be utilized in various cultural groups should be “culture-free” (Cattell,

1940), “culture-fair” (Cattell & Cattell, 1963), or “culture-reduced” (Jensen, 1980) originates

in the recognition of the importance of stimulus familiarity. Stimulus familiarity is often

difficult to measure. Methods to study method bias such as repeated test administrations and

multimethod approaches can be applied to evaluate intergroup differences in stimulus

Test Translations 16

familiarity. Cross-cultural differences that are not invariant across repeated test

administrations or different methods to measure the same construct point to differential

stimulus familiarity.

Items with a different ecological validity in the cultural contexts in which they will be

applied, are not suitable for cross-cultural comparison. If there is reason to suspect differences

in ecological validity, a pilot study can be carried out addressing the issue.

The concept of stimulus familiarity has been most often discussed in the area of

mental testing. However, the concept also refers to other psychological constructs. Cross-

cultural comparisons of scores on personality questionnaires that contain items with a low

ecological validity will have dubious validity.

7. Instrument developers/publishers should implement systematic judgmental

evidence, both linguistic and psychological, to improve the accuracy of the

translation/adaptation process and compile evidence on the equivalence of all

language versions.

The judgmental evidence described in the guideline involves the application of

standardized translation procedures, such as translation--back-translation. Similarity of the

original and back-translated versions are taken to indicate appropriate translation. The

procedure is particularly useful when the researcher does not know the target language

because it gives an evaluation of the quality of the target language version that is accessible to

the researcher. At the same time, an adequate back-translation does not guarantee an

appropriate target language version. For example, the procedure favors literal translations

while readability and naturalness of the target language version is often hardly checked; literal

translations can produce stilted language, a feature that may not be detected by a back-

Test Translations 17

translation. Translators who know that their work will be back-translated may favor such

literal translations.

The quality of translations of texts that are difficult to translate may benefit from an

approach in which not a single bilingual but a whole group of persons participate in the

translation process. Particularly when such a group combines linguistic and psychological

expertise, the quality of the translation may be superior than what is found using a translation-

-back-translation approach with two translators--one doing the source to target language

translation, and the other translator doing the reverse translation (see, Hambleton, 1993).

8. Instrument developers/publishers should insure that the data collection design

permits the use of appropriate statistical techniques to establish item equivalence

between the different language versions of the instrument.

The design of the study in which source and target language versions are compared

should allow for a rigorous test of equivalence. Various designs have been proposed (cf.

Hambleton, 1994, p. 237-238). For example, in one popular design, bilinguals take source

and target versions of the test. An obvious problem of the design is the need to find a

sufficiently large sample of bilinguals. Furthermore, bilinguals may constitute an atypical

sample of the population because they are usually better educated. A major asset of this

design is control on the similarity of the sample taking both versions.

This factor is not controlled in the most common design to study equivalence, a

design in which source-language monolinguals take the source-language version and target

language monolinguals take the target-language version. The design confounds population

and translation characteristics; an intergroup difference on a particular item can be attributed

to poor translation (e.g., inadequate word choice) and/or to population characteristics (e.g.,

Test Translations 18

the item is more attractive in group A than in group B, or persons in group A are simply more

capable than persons in group B).

9. Instrument developers/publishers should apply appropriate statistical techniques to

(1) establish the equivalence of the different versions of the instrument, and (2)

identify problematic components or aspects of the instrument which may be

inadequate to one or more of the intended populations.

An evaluation of the appropriateness of the translation should not only be based on

judgmental evidence such as provided in a translation--back-translation procedure but also on

statistical evidence. Empirical data should be collected and properly analyzed in order to

examine the equivalence of the source and target versions of an instrument. Various

techniques can be used for that purpose (see, Hambleton, 1993; Van de Vijver & Leung, in

press). Frequently applied is factor analysis, either exploratory (e.g., Barrett, 1986) or

confirmatory (e.g., Watkins, 1989).

10. Instrument developers/publishers should provide information on the evaluation of

validity in all target populations for whom the translated/adapted versions are

intended.

Transfer of validity (e.g., construct and predictive validity) from one cultural context

to the other cannot be taken for granted but has to be demonstrated. Instruments that have

good validity in one cultural group may lose some of their psychometric properties after

translation. Larger cultural distances will generally jeopardize the validity more.

11. Instrument developers/publishers should provide statistical evidence of the

equivalence of questions for all intended populations.

The guideline stipulates the need for item bias analyses, scrutinizing the equivalence

Test Translations 19

on an item by item basis (e.g., Holland & Wainer, 1993). Various steps are possible when

item bias is found. First, the items can be taken to constitute a threat to the validity of the

instrument and can be eliminated. When the bias of all items has been examined, cross-

cultural comparison can be restricted to the presumably unbiased set. Second, item bias can

be seen as pointing to interesting cross-cultural differences that require further examination

and explanation. For example, commonalities among the biased items can be sought.

Unfortunately, such commonalities are often hard to find (e.g., Scheuneman, 1987). A third

possible step is described in the next guideline.

12. Nonequivalent questions between versions intended for different populations

should NOT be used in preparing a common scale or in comparing these populations.

However, they may be useful in enhancing content validity of scores reported for each

population separately (emphasis in original).

Nonequivalent questions will be invalid in cross-cultural comparisons of scores

(unless, as indicated before, a statistical technique is applied that can handle stimulus

dissimilarities such as item response theory and linear structural modeling); however, they

may be adequate in intracultural use of the instrument adding to its reliability and validity.

The administration guidelines are as follows:

13. Instrument developers and administrators should try to anticipate the types of

problems that can be expected, and take appropriate actions to remedy these problems

through the preparation of appropriate materials and instructions.

Administration problems can often be detected in small pilot studies in which the

instrument is applied in a nonstandard way soliciting various responses from the respondents.

Careful observation and asking respondents to paraphrase items and to provide reasons for

Test Translations 20

their responses will help to identify such problems.

14. Instrument administrators should be sensitive to a number of factors related to the

stimulus materials, administration procedures, and response modes that can moderate

the validity of the inferences drawn from the scores.

A literal translation of stimuli is often the preferred choice in multilingual studies.

However, test administrators should be aware of the specific problems that this may create;

for example, particular examples may not be very obvious to some groups; the test

instructions may contain some implicit information that may not be clear to individuals from

different cultural groups. As another example, Raven’s test of intelligence has two series of

items, one that can be solved using more perceptual strategies while the second series is more

difficult and requires analytical strategies. The test instructions contain only item examples

that use a perceptual strategy. The change of item content may be confusing to individuals

who have little or no test experience. In general, the application of identical stimuli or literally

translated stimuli does not guarantee that the instrument is appropriate in each cultural group;

a close examination of validity moderating factors is vital in all multilingual studies.

15. Those aspects of the environment that influence the administration of an

instrument should be made as similar as possible across populations for whom the

instrument is intended.

Environmental conditions of laboratory testing may be easy to replicate elsewhere, but

physical conditions of field research tend to be idiosyncratic and hard to replicate elsewhere.

It is therefore important that test administrators are made cognizant of the main

environmental variables that should be kept in mind. For example, in an administration of

computerized speed tests, body posture, distance to the screen, and intensity of ambient light

Test Translations 21

are among the factors that have to be considered.

16. Instrument administration instructions should be in the source and target

languages to minimize the influence of unwanted sources of variation across

populations.

When an instrument will be applied in a new cultural context, instrument developers

need to know sources of unwanted intergroup differences. Pilot studies can help to detect

these differences. The test instructions are an important aid in the reduction of the differences.

Lengthy test instructions, containing various examples and exercises, can go a long way to

minimize these differences.

17. The instrument manual should specify all aspects of the instrument and its

administration that require scrutiny in the application of the instrument in a new

cultural context.

Test developers will have gained relevant information about the specific issues that

arose in the test translation process. Administrators of the test can benefit from this

experience when the manual gives all the necessary details. The test manual should describe

potential problems in order to avoid their repetition.

18. The administration should be unobtrusive and the administrator--examinee

interaction should be minimized. Explicit rules that are described in the manual for

the instrument should be followed.

An important source of errors in cross-cultural comparisons can be the uncontrolled

aspects of administrator-examinee interactions, particular in nonstandardized testing

situations such as unstructured interviews. The manual should specify standard problems and

their solutions. For example, the manual for an intelligence test should specify the correctness

Test Translations 22

and incorrectness of answers and, if applicable, the supplementary questions to be asked

following partially correct answers. When such rules are not specified in the manual, it will

be impossible to standardize test scoring.

The guidelines on documentation/score interpretations are as follows:

19. When an instrument is translated/adapted for use in another population,

documentation of the changes should be provided, along with evidence of the

equivalence.

In the previous section, distinctions were made among applying, adapting, and

assembling tests in translations. When tests are applied, the test content is not changed and

the new test is a direct translation of the test in the source language. When tests are adapted or

new tests are assembled, test users should be informed of all changes introduced to enhance

the validity in a new cultural context. Furthermore, the equivalence of the source and target

language versions of the test should be documented. Linguistic and statistical evidence such

as a specification of the translation procedure, the results of an item bias analysis or of a

factor analysis comparing the loadings across cultural groups, should be described (Van de

Vijver & Leung, in press). Without such evidence, it will be difficult for potential test users to

determine the adequacy of the test in the new context.

20. Score differences among samples of populations administered the instrument

should NOT be taken at face value. The researcher has the responsibility to

substantiate the differences with other empirical evidence (emphasis in original).

Observed intergroup score differences can often be interpreted in several ways

(Poortinga & Malpass, 1986). If a researcher embraces particular interpretations, he/she

should provide evidence to confirm these interpretations or to disprove alternative

Test Translations 23

interpretations. In order to provide the evidence, additional measures will often be required;

for example, disproving that a cultural difference is due to differential social desirability,

acquiescence, or stimulus familiarity, will often require measurement of these factors. The

approach not to take observed score differences at face value may seem unduly restrictive;

however, upon closer examination it is a natural consequence of the poor controls that are

available in cross-cultural comparative studies. In cross-cultural comparisons, groups that

differ on many dimensions are often used; in many cases we are only interested in a single or

a few of these but it would be naive to act as if the other differences do not exist. Therefore,

in cross-cultural psychology we need to safeguard our data against alternative interpretations.

21. Comparisons across populations can only be made at the level of invariance that

has been established for the scale on which scores are reported.

The guideline refers to an important concept in cross-cultural comparisons: the

comparison scale (e.g., Van de Vijver & Poortinga, in press). When an instrument has been

applied in two cultures, the measurement level of three scales has to be observed: the first two

are the measurement level of the scale in the two groups, the third one is the measurement

levels of the score comparisons.

Suppose that an anxiety measure using Likert response scales (i.e., strongly disagree,

disagree, neutral, agree, strongly agree) has been administered in two cultural groups. Let us

assume that the sum of the item scores defines an interval scale in the two cultural groups.

What is the measurement level at which the scores can be compared across cultures? Are

individual differences within a single group measured at the same measurement level as

individual differences across groups? An answer to the question depends on the presence of

bias. When no bias occurs, individual differences within and across groups are measured at

Test Translations 24

the same level. However, bias will tend to lower the measurement level of the comparison

scale.

Suppose that a few items have been poorly translated or are inappropriate in the

second group. The biased items will constitute an offset in the comparison scale. When these

items are not removed, the measurement level of the comparison scale may be ordinal. When

the anxiety measure suffers from construct or method bias, the measurement level of the

comparison scale can become even lower.

According to the present guideline, cross-cultural score comparisons can only be made

at the level of the comparison scale that has been established. The comparison of cross-

cultural differences in a t test or analysis of variance assumes the absence of any kind of bias;

when the equivalence of the instrument across cultural groups has not been examined, the

conclusions of such an analysis are often open to alternative interpretations.

22. The instrument developer should provide specific information on the ways in

which the socio-cultural and ecological contexts of the populations might affect

performance on the instrument, and should suggest procedures to account for these

effects in the interpretation of results.

The test manual should specify all relevant examinee/respondent and context variables

that have been examined in the development of an instrument such as relevant cultural

characteristics of the target groups, socio-economic status, age, gender, and education. When

the results of these analyses are presented in the manual, test users will know which factors

will be relevant in their use of the instrument and how to account for these factors.

Test Translations 25

Implications

Translating psychological instruments for use in other cultural and linguistic groups is

more involved than simply translating text into another language. Various sources of bias can

threaten the adequacy of translations. Distinctions were made in this paper among three types

of bias, depending on whether the bias resides in the construct or its characteristic behaviors

(construct bias), in the measurement procedure (method bias), or in the separate items (item

bias). Simple translation--back-translation procedures are meaningful only when construct

and method bias do not play a role. When these play a role, more instrument adaptations will

be required. Hambleton (1994) stressed the need to demonstrate the similarity of meaning of

the directions, items, and even the scoring guides for the instrument in all linguistic groups

involved.

The presence of bias depends on various factors. For example, the likelihood of bias

will increase with the cultural distance between the groups involved. For example,

comparisons of groups with strongly dissimilar backgrounds can easily suffer from method

bias (e.g., differential stimulus familiarity or social desirability). The presence of bias will

also depend on the nature of the construct. Broadly defined constructs described by

heterogeneous behaviors will be more liable to construct bias.

A “cookbook” specifying the types of bias which can arise in practice and when they

can be expected is impossible to give. Fortunately, such a cookbook is hardly ever required.

Awareness that bias can play a role is an important step towards its detection and resolution.

A combination of awareness and linguistic and psychological expertise will often suffice to

yield high quality translations.

The advent of cross-cultural research described in the introductory section of the paper

Test Translations 26

creates a need for standard practices to carry out intergroup comparisons. The guidelines

described in the paper are an attempt to formalize recommended practice in test translations.

Hopefully, such guidelines will become standard practice in cross-cultural research and will

help to minimize the impact of bias on cross-cultural measurement.

Test Translations 27

References

Amirkhan, J. H. (1990). A factor-analytically derived measure of coping: The Coping

Strategy Indicator. Journal of Personality and Social Psychology, 59, 1066-1074.

Barrett, P. (1986). Factor comparison: An examination of three methods. Personality

and Individual Differences, 7, 327-340.

Brislin, R. W. (1980). Translation and content analysis of oral and written material. In

H. C. Triandis & J. W. Berry (Eds.), Handbook of cross-cultural psychology (Vol. 1, pp. 389-

444). Boston: Allyn & Bacon.

Brislin, R. W. (1986). The wording and translation of research instruments. In W. J.

Lonner & J. W. Berry (Eds.), Field methods in cross-cultural research (pp. 137-164).

Newbury Park, CA: Sage.

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by

the multitrait--multimethod matrix. Psychological Bulletin, 56, 81-105.

Cattell, R. B. (1940). A culture-free intelligence test, I. Journal of Educational

Psychology, 31, 176-199.

Cattell, R. B., & Cattell, A. K. S. (1963). Culture Fair Intelligence Test. Champaign,

IL: Institute for Personality and Ability Testing.

Fioravanti, M., Gough, H. G., & Frere, L. J. (1981). English, French, and Italian

adjective check lists: A social desirability analysis. Journal of Cross-Cultural Psychology, 12,

461-472.

Hambleton, R. K. (1993). Translating achievement tests for use in cross-national

studies. European Journal of Psychological Assessment, 9, 57-68.

Hambleton, R. K. (1994). Guidelines for adapting educational and psychological tests:

Test Translations 28

A progress report. European Journal of Psychological Assessment, 10, 229-244.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item

response theory. Newbury Park, CA: Sage Publications.

Ho, D. Y. F. (in press). Filial piety and its psychological consequences. In M. H. Bond

(Ed.), Handbook of Chinese psychology. Hong Kong: Oxford University Press.

Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the

Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145).

Hillsdale, NJ: Erlbaum.

Holland, P. W., & Wainer, H. (Eds.) (1993). Differential item functioning. Hillsdale,

NJ: Erlbaum.

Hui, C. H., & Triandis, H. C. (1989). Effects of culture and response format on

extreme response style. Journal of Cross-Cultural Psychology, 20, 296-309.

Jensen, A. R. (1980). Bias in mental testing. New York: Free Press.

Lipson, J. G., & Meleis, A. I. (1989). Methodological issues in research with

immigrants. Special Issue: Cross-cultural nursing: Anthropological approaches to nursing

research. Medical Anthropology, 12, 103-115.

Marsh, H. W., & Byrne, B. M. (1993). Confirmatory factor analysis of multigroup--

multimethod self-concept data: Between-group and within-group invariance constraints.

Multivariate Behavioral Research, 28, 313-349.

Poortinga, Y. H., & Malpass, R. S. (1986) Making inferences from cross-cultural data.

In W. J. Lonner & J. W. Berry (Eds.), Field methods in cross-cultural psychology (pp. 17-46).

Beverly Hills, CA: Sage.

Scheuneman, J. D. (1987). An experimental, exploratory study of causes of bias in test

Test Translations 29

items. Journal of Educational Measurement, 24, 97-118.

Serpell, R. (1993). The significance of schooling. Life-journeys in an African society.

Cambridge: Cambridge University Press.

Shepard, L., Camilli, G., & Averill, M. (1981). Comparisons of procedures for

detecting test-item bias with both internal and external ability criteria. Journal of Educational

Statistics, 6, 317-375.

Sternberg, R. J. (1985). Implicit theories of intelligence, creativity, and wisdom.

Journal of Personality and Social Psychology, 49, 607-627.

Super, C. M. (1983). Cultural variation in the meaning and uses of children’s

"intelligence." In J. B. Deregowski, S. Dziurawiec, & R. C. Annis (Eds.), Expiscations in

cross-cultural psychology (pp. 199-212). Lisse: Swets & Zeitlinger.

Van Haaften, E. H., & Van de Vijver, F. J. R. (in review). Psychological

consequences of environmental degradation.

Van de Vijver, F. J. R., & Leung, K. (in press). Methods and data analysis of

comparative research. In J. W. Berry, Y. H. Poortinga, & J. Pandey (Eds.), Handbook of

Cross-Cultural Psychology. Boston: Allyn & Bacon.

Van de Vijver, F. J. R., & Lonner, W. (1995). A bibliometric analysis of the Journal

of Cross-Cultural Psychology. Journal of Cross-Cultural Psychology, 26, 591-602.

Van de Vijver, F. J. R., & Poortinga, Y. H. (in press). Towards an integrated analysis

of bias in cross-cultural assessment. European Journal of Psychological Assessment.

Watkins, D. (1989). The role of confirmatory factor analysis in cross-cultural research.

International Journal of Psychology, 24, 685-701.

Werner, O., & Campbell, D. T. (1970). Translating, working through interpreters, and

Test Translations 30

the problem of decentering. In R. Naroll & R. Cohen (Eds.), A handbook of cultural

anthropology (pp. 398-419). New York: American Museum of Natural History.

A preview of this full-text is provided by American Psychological Association.

Learn more

Content available from European Psychologist

This content is subject to copyright. Terms and conditions apply.

UNIVERSITE DE BORDEAUX - THESE Doctorate in Business Administration (DBA) - Carlos Leger Sherman Palmer Junior - Original Version

Thesis

Full-text available

May 2024

Carlos Palmer Junior

Assessing the impact of personality traits and sociodemographic factors on employee motivation: a study in the sugarcane and bioenergy industry.

Guidelines and recommendations for cross-linguistic aphasia assessment: A review of 10 years of Comprehensive Aphasia Test adaptations

Article

May 2024
APHASIOLOGY

Background Standardised aphasia assessment tools may not always be available in a variety of languages, posing challenges for speech and language therapists to adequately assess and diagnose aphasia in speakers of those languages. In 2013, Working Group 2 (WG2) Aphasia Assessment & Outcomes, part of the Collaboration of Aphasia Trialists network, was formed with the purpose of developing reliable and valid aphasia assessment tools and their cross-linguistic adaptations. Over the past decade, WG2 has undertaken important adaptation projects, including the cross-linguistic adaptation of the Comprehensive Aphasia Test (CAT; Swinburn et al., 2004). Aims This review aims to achieve three objectives: (a) describe the adaptation procedure of the CAT within WG2, (b) summarise common guidelines and recommendations for future adaptations, and (c) provide concrete solutions for specific cross-linguistic and cross-cultural challenges encountered during the adaptation and validation procedures of the CAT. Methods Between 2013 and 2023, WG2 employed a committee approach and fully adapted the CAT into Catalan, Croatian, Dutch, French, Hungarian, Norwegian, Spanish, and Turkish. Further adaptations are in progress for Arabic (Moroccan), Basque, Cantonese Chinese, German, Greek, Icelandic, Lithuanian, Serbian, Slovenian and Swedish. The review comprehensively addresses the linguistic/cultural adaptation and validation procedure for the three components of the battery: the Cognitive screening, the Language battery and the Aphasia Impact Questionnaire. Critical outcomes and some best practice recommendations from psychometric norming and piloting are also discussed. Outcomes and results This review builds upon prior work (Fyndanis et al., 2017) and serves as a practical guide for researchers and clinicians undertaking cross-linguistic adaptations of the CAT, with specific conclusions and recommendations drawn from WG2’s adaptations in 19 languages with diverse typological properties. Building on the work exemplified in this paper, future initiatives can direct their efforts towards adapting the CAT for PWA from different linguistic backgrounds for whom validated assessment instruments may be unavailable. This can be achieved through rigorous systematic adaptation procedures for the establishment of comparable language versions of this tool, valuable for various clinical applications. Such endeavours have the potential to provide access to valuable shared datasets for their use across international aphasia trials, and for comparable clinical work within the aphasiology community.

The Effect of Translation and Cultural Adaptations on Diagnostic Accuracy and Test Performance in Dementia Cognitive Screening Tools: A Systematic Review

Article

Full-text available

Apr 2024

Background: The current cognitive tests have been developed based on and standardized against Western constructs and normative data. With older people of minority ethnic background increasing across Western countries, there is a need for cognitive screening tests to address factors which influence performance bias and timely diagnostic dementia accuracy. The diagnostic accuracy in translated and culturally adapted cognitive screening tests and their impact on test performance in diverse populations have not been well addressed to date. Objective: This review aims to highlight considerations relating to the adaptation processes, language, cultural influences, impact of immigration, and level of education to assess for dementia in non-Western and/or non-English speaking populations. Methods: We conducted a systematic search for studies addressing the effects of translation and cultural adaptations of cognitive screening tests (developed in a Western context) upon their diagnostic accuracy and test performance across diverse populations. Four electronic databases and manual searches were conducted, using a predefined search strategy. A narrative synthesis of findings was conducted. Results: Search strategy yielded 2,890 articles, and seventeen studies (4,463 participants) met the inclusion criteria. There was variability in the sensitivity and specificity of cognitive tests, irrespective of whether they were translated only, culturally adapted only, or both. Cognitive test performance was affected by education, linguistic ability, and aspects of acculturation. Conclusions: We highlight the importance of translating and culturally adapting tests that have been developed in the Western context. However, these findings should be interpreted with caution as results varied due to the broad selection of included cognitive tests.

The Positive and Negative Suicidal Ideation Inventory among Portuguese Adolescents: Factor Structure and Gender Invariance

Article

Full-text available

Apr 2024

Suicide worldwide is an issue that needs to be addressed, and adolescents are an at-risk group. Assessing suicidal ideation is central to tackling the issue of suicide. The Positive and Negative Suicidal Ideation inventory is a widely validated measure of suicidal ideation, and yet, very little is known about its invariance across various groups. The present study aimed to adapt and test the PANSI’s structure in a Portuguese sample while testing its gender invariance. A total of 750 middle and high school students were recruited for the study, and data were collected on various suicide risk and protective factors, including the Portuguese-translated PANSI. Data were put through exploratory and confirmatory factor analysis. Kaiser’s criterion and scree plot both extracted two factors (64.10% variance explained). Confirmatory factor analysis also supported the PANSI’s structure (TLI = 0.943). The PANSI showed good reliability (α ≥ 0.83) and good construct and discriminative validity. The PANSI also exhibited scalar, but not strict, invariance. Overall, these results were similar to previous versions of this scale. The PANSI is a reliable measure of suicide risk among Portuguese adolescents. Future studies should further replicate these results in other cultures and expand on them by testing for invariance across other demographic variables.

Values and Migration Motives in Three Ethnic Groups in Indonesia

Conference Paper

Jan 2016

Indonesia has 1340 ethnic groups. This study focused on three large ethnic groups, which are Bataknese, Minangnese and Sundanese. There were 712 participants in this study, aged 20-23 years. There is a different orientation on migration (within Indonesia) in those three ethnic groups. Bataknese mainly migrate for study. Minangnese mainly migrate for work and trading. Sundanese do not have a strong orientation toward migration, although members migrate for study or work. The aim of this study is to understand the value system of these three ethnic groups as measured by Schwartz’s PVQ-40 in correlation to migration attitudes. Migration attitudes were measured by items such as the importance of migration, the importance of having the tenacity and perseverance, the importance of making an effort, strive and work hard, the importance of having the ability to adjust with the new situation and dealing with problems in new place. There was no significant difference in value system of the three ethnic groups. Means on social life values were higher than means on fulfilling personal needs values in the three ethnic groups. Factors on migration motive have stronger and significant correlation with factors on value system in Bataknese than in Minangnese and Sundanese. We concluded that Bataknese’s motive to migrate was more associated with social life values and fulfilling personal needs values, Minangnese’s motive to migrate with fulfilling personal needs values, and Sundanese’s motive to migrate with social life values.

The Psychometric Properties of the Positive and Negative Suicidal Ideation Scale among Portuguese Young Adults

Article

Full-text available

Apr 2024

Preventing suicide has been a worldwide imperative for the last decade. Accurately assessing suicide risk is the first step towards prevention, and access to reliable tools that measure risk factors is essential to achieve this goal. The Positive And Negative Suicidal Ideation (PANSI) scale is a validated brief suicidal ideation scale that could prove useful to this goal due to its ability to measure both suicide risk and protective factors. The PANSI scale has been adapted to various languages and cultures across various clinical and non-clinical populations. Despite this, no Portuguese has been produced yet. The present study aimed to validate a Portuguese version of PANSI by evaluating its psychometric properties in a sample of 259 young adults. Confirmatory factor analysis showed that the PANSI showed good psychometric properties (TLI = 0.95), good reliability for positive ideation (α = 0.84), and excellent reliability for negative ideation (α = 0.96). The scale also showed good discriminative ability through prediction of a previous suicide attempt and good construct validity in both subscales. The Portuguese adaptation of the PANSI scale is a reliable measure of positive and negative suicidal ideation that could prove useful in both clinical and research settings.

Validación de un instrumento de evaluación de proyectos latinoamericanos Validation of an assessment instrument for Latin American projects

Article

Full-text available

Mar 2024

Esta investigación tiene por objeto adaptar y validar el instrumento denominado Perfil de Implementación del Proyecto (PIP) para la evaluación de proyectos realizados en Latinoamérica. Participaron 420 profesionales involucrados, ya sea como líderes o miembros de equipos, en proyectos culminados en el periodo 2020-2021. Como el instrumento fue elaborado en inglés se utilizó un procedimiento de traducción y re-traducción, en el cual participaron expertos profesionales y académicos en gestión de proyectos junto con traductores certificados, para su adaptación a la población hispanoparlante en Latinoamérica. Para el análisis factorial exploratorio se seleccionó el método de extracción de mínimos cuadrados no ponderados, obteniéndose cuatro factores críticos de éxito: Comunicación con el cliente, seguimiento y planeación, alta gerencia, y capacidades técnicas, con coeficientes Cronbach Alpha comprendidos entre .876 y .933. Posteriormente se aplicó el análisis factorial confirmatorio, el cual demostró que el instrumento posee validez convergente y discriminante y, en consecuencia, puede ser utilizado en la academia para futuras investigaciones sobre la gestión de proyectos, y en lo profesional para evaluar el desempeño de proyectos ecuatorianos, contemplando la limitación de que el porcentaje de participación de proyectos de otros países de Latinoamérica en la muestra de estudio fue del 22 %.

Culture and Generativity

Chapter

Jun 2024

This volume brings together experts in generativity and related fields to provide a compelling overview of contemporary research and theory on this topic. Generativity refers to a concern for—or acting towards—the benefit of future generations as a legacy of the self; it has implications for outcomes at the individual, relational and social, and broader societal levels. Understanding the role and expressions of generativity at various stages of our lives is important to the sense of well-being and purpose, and it impacts parenting, caregiving, and social relationships, as well as having implications for activities and experiences in the workforce, and in voluntary activities in communities and the wider society. The chapters in this volume explore the meaning and impact of generativity across development and across life contexts and roles. They address generativity within a particular area or life domain, or period of the lifespan, and outline key methods and findings, as well as theoretical issues and applied implications. The volume represents the first comprehensive exploration of generativity from early to late adulthood; it offers a broad international perspective and will inform research into generativity across multiple cultures.

Nursing Minimum Data Sets aus der elektronischen Patientendokumentation

Chapter

May 2024

The Bilingual Study Methodology in Translating and Adapting Personality Tests

Article

Full-text available

Apr 2024

The bilingual samples’ studies are listed as a useful tool to confirm the equivalence between linguistically different versions of a test. Yet, such studies are rare in the literature, as they require technical issues to be considered before any conclusion about equivalence can be reached. This paper discusses some of these issues, taking the example of the recent MMPI-2-RF Portuguese adaptation and standardization study. The results of a bilingual study (N = 53) using a single-sample design are analyzed, at item, scale, profile, and structural levels, allowing an encouraging general conclusion about the equivalence of the Portuguese MMPI-2-RF to the North American original version, but also pointing out some directions for improvement. The shortcomings of the classical bilingual studies, and the specific limitations due to the obstacles to bilingual samples’ recruitment in Portugal, are considered. The limited sample size and some other methodological shortcomings are discussed, considering their implications for future Portuguese MMPI equivalence studies.

Effects of Culture and Response Format on Extreme Response Style

Article

Full-text available

Sep 1989

Do cultural and ethnic groups differ in their extreme response style? To answer this question, Hispanic and non-Hispanic subjects were asked to respond to a questionnaire on 5-point or 10-point scales. As predicted, Hispanics were found to exhibit a stronger tendency for extreme checking (about half the time, on the average) than non-Hispanic, but only when the 5-point scales were used. Use of 10-point scales reduced the extreme responses of the Hispanics to the level of non-Hispanics. Extreme responses of non-Hispanics were not affected by the scales. Implications of the findings for social research are discussed.

DIFFERENTIAL ITEM FUNCTIONING AND THE MANTEL-HAENSZEL PROCEDURE

Article

Dec 1986

The Mantel-Haenszel procedure is a noniterative contingency table method for estimating and testing a common two-factor association parameter in a 2×2×k table. As such it may be used to study “item bias” or differential item functioning in two groups of examinees. This technique is discussed in this context and compared to other related techniques as well as to item response theory methods.

Translation and content analysis of oral and written material, in: Triandis, H. C. and Berry, J. W. (eds.), Handbook of cross-cultural psychology

Article

Jan 1980

R.W. Brislin

Translating, Working Through Interpreters, and the Problem of Decentering

Chapter

Bias in Mental Testing

Article

Jun 1981

Fundamentals of Item Response Theory.

Book

Jan 1991

Ronald K. Hambleton; H. Swaminathan; H. Jane Rogers., The following values have no corresponding Zotero field: Label: B496 ID - 337

Bias in Mental Testing

Article

Jul 1980

An experimental, exploratory study of causes of bias in test items

Article

Jan 1987

Janice Dowd Scheuneman

Comparison of Procedures for Detecting Test-Item Bias with Both Internal and External Ability Criteria

Article

Dec 1981

Test bias is conceptualized as differential validity. Statistical techniques for detecting biased items work by identifying items that may be measuring different things for different groups; they identify deviant or anomalous items in the context of other items. The conceptual basis and technical soundness were reviewed for the following item bias methods: transformed item difficulties, item discriminations, one- and three-parameter item characteristic curve methods, and chi-square methods. Sixteen bias indices representing these approaches were computed for black-white and Chicano-white comparisons on both the verbal and nonverbal Lorge-Thorndike Intelligence Tests. In addition, bias indices were recomputed for the Lorge-Thorndike tests using an external criterion. Convergent validity among bias methods was examined in correlation matrices, by factor analysis of the method correlations, and by ratios of agreements in the items found to be “most biased” by each method. Although evidence of convergent validity was found, there will still be important practical differences in the items identified as biased by different methods. The signed full chi-square procedure may be an acceptable substitute for the theoretically preferred but more costly three-parameter signed indices. The external criterion results also reflect on the validity of the methods; arguments were advanced, however, as to why internal bias methods should not be thought of as proxies for a predictive validity model of unbiasedness.

The Significance of Schooling: Life-Journeys in an African Society

Article

Jan 1972

Robert Serpell

This is a book, not an article: Serpell, R. (1993) The significance of schooling: life-journeys in an African society. Cambridge, UK: Cambridge University Press. ( 345 ps.) (reissued in digital paperback edition July 2010. ISBN: 9780521144698) Schooling in the contemporary world has a multiple agenda: the promotion of economic progress, the transmission of culture from generation to generation, and the cultivation of children's intellectual and moral development. This book explores the difficulties of achieving a synthesis of these objectives, in a case study of a rural African community. The analysis contrasts the indigenous perspective on child development with the formal educational model of cognitive growth. Teachers in the local primary school are shown to face the challenge of bicultural mediation, and the significance of schooling is discussed for each of the diverse individuals of the study in terms of his or her own reflections and interpretations. Two different attempts to activate a local dialogue about the school as a community resource are described, and the implications for approaches to educational planning are explored. -Publisher

Translating Tests: Some Practical Guidelines

Abstract

Recommended publications

Data Translation Between Taxonomies

Identifying Sources of Differential Item and Bundle Functioning on Translated Achievement Tests: A C...

Syntactic Translation Strategies for Retaining Parallelism in the Arabic Translation of Moby Dick

Activation of lexical and syntactic target language properties in translation