Technical ReportPDF Available

A guide for cross-cultural validation of measurement instruments in mental health

April 1999

April 1999

DOI:10.13140/RG.2.1.3730.5685

Report number: http://instrumentspsychometriques.mcgill.ca/instruments/guide.htm
Affiliation: Département de Psychiatrie, Université McGill

Authors:

Jean Caron

McGill University

Cross-cultural validation of an assessment instrument is a complex process that requires a substantial investment in time and money. It appears to be difficult to complete such an exercise in less than a year. Before venturing into this process, it is important to make sure that there is no equivalent instrument in French or none that has been translated and validated. If no such equivalent exists, the researcher must do an exhaustive survey of relevant instruments for his or her study, in order to select the one that has passed the most rigorous phases of validation in its original language. Indeed, cross-cultural validation conducted according to generally accepted practice cannot generally produce a version that is more valid or reliable than the original version. To be sure, this process often makes it possible to improve certain aspects of the original version or to complete certain phases of validation, but it cannot compensate for flagrant shortcomings in validation content, construct or reliability. Moreover, in most cases, the utility of an assessment instrument is based on its capacity to detect differences between individuals or particular groups, or differences following changes resulting from a treatment or programme of care, or changes in the environment. We are referring here to the specificity and sensitivity of the instrument. These qualities must also be taken into account in selecting an instrument. Cross-cultural validation involves three main phases: 1) translation and verification of its equivalence; 2) empirical verification of the validity of the translated version; and 3) adaptation of the scores to the cultural context, and development of standards. Each phase also covers the necessary steps in devising a valid version, and a number of options are open to the researcher, each with its advantages and drawbacks. The following section presents these steps, and readers interested in validating instruments in French will also find references to specialized articles on the subject.

Content uploaded by Jean Caron

Content may be subject to copyright.

A guide for cross-cultural validation of

measurement instruments in mental health

Jean Caron, Ph.D.

Director of the CIHR Team in Social and Psychiatric Epidemiology

Associate professor in the Department of Psychiatry, McGill University

Researcher in the Psycho-social Division of the Douglas Hospital Research Centre

Cross-cultural validation of an assessment instrument is a complex process that requires a

substantial investment in time and money. It appears to be difficult to complete such an exercise

in less than a year. Before venturing into this process, it is important to make sure that there is no

equivalent instrument in French or none that has been translated and validated. If no such

equivalent exists, the researcher must do an exhaustive survey of relevant instruments for his or

her study, in order to select the one that has passed the most rigorous phases of validation in its

original language. Indeed, cross-cultural validation conducted according to generally accepted

practice cannot generally produce a version that is more valid or reliable than the original

version. To be sure, this process often makes it possible to improve certain aspects of the original

version or to complete certain phases of validation, but it cannot compensate for flagrant

shortcomings in validation content, construct or reliability. Moreover, in most cases, the utility

of an assessment instrument is based on its capacity to detect differences between individuals or

particular groups, or differences following changes resulting from a treatment or programme of

care, or changes in the environment. We are referring here to the specificity and sensitivity of the

instrument. These qualities must also be taken into account in selecting an instrument.

Cross-cultural validation involves three main phases: 1) translation and verification of its

equivalence; 2) empirical verification of the validity of the translated version; and 3) adaptation

of the scores to the cultural context, and development of standards. Each phase also covers the

necessary steps in devising a valid version, and a number of options are open to the researcher,

each with its advantages and drawbacks. The following section presents these steps, and readers

interested in validating instruments in French will also find references to specialized articles on

the subject.

Translation and verification of equivalence

The translation process must ensure that an instrument retains its inferential equivalence

(Haccoun, 1987), that is, this it is possible to derive the same inferences from the translated

version as from the original version. Correspondence between terms (semantic equivalence) is

not easy to achieve from one culture to another, given the vocabulary and grammar that is

specific to each language. Some expressions that are translated literally are meaningless in

another culture, and equivalent expressions specific to the targeted culture must be found

(equivalence of expressions). Some situations invoked in the culture of the original instrument

may not correspond to the realities of another culture, and these items will have to be replaced by

other situations more appropriate to that culture, while preserving the objective and the meaning

intended by these items (experiential equivalence). Lastly, the same exercise must be applied for

certain concepts that, when translated literally, do not represent the same thing from one culture

to another (conceptual equivalence). The reader is invited to consult Guillemin, Bombardier and

Beaton (1993) for a more in-depth examination of these concepts of equivalence.

Preparation of a preliminary version

• Traditional translation simply involves translation of the original instrument by a bilingual

researcher or professional translator. This method used on its own is not recommended, as

it introduces too much bias, particularly in terms of the researcher or translator’s

interpretation. This difficulty may be circumvented by obtaining a number of parallel

translations by different translators or bilingual researchers, but the following methods are

more advisable.

• The method of translation by a committee of experts involves the participation of a

number of bilingual people familiar with the field in which the instrument is to be used,

which also limits the biases of a single researcher. This committee can concentrate on a

preliminary translated version, or develop a preliminary version. It is recommended that

this committee include a professional translator or linguist who will ensure that the items

are written in a way that is linguistically correct. In ideal conditions, the participation of

the author of the original version would help clarify certain ambiguities resulting from the

process of translation.

• "Back translation" involves having a preliminary translated version of the instrument

translated back into the original language by a second person. The discrepancies between

the original version and the retranslated version help identify problematic items. This

method may be even more sophisticated if there are two parallel back translations done,

which thus involve four people. This method can be considered ideal. A number of

researchers who have used it nonetheless find that it is very difficult to obtain perfect

equivalence between the retranslated version and the original version.

Assessment of the preliminary version

• An expert committee – Whatever method is selected for the preliminary version or

versions, it appears to be important that a number of people (5-10) take a critical look at

the translation to check whether the items in the original version are adapted to the

targeted culture. Moreover, if some problematic items emerge following various

translations or back translations, the committee can then be used to determine which

translation is most appropriate.

• A committee of people representative of those targeted by the instrument (N=5-10) –

When a preliminary version has gone through the preceding drafts, even though it may

appear to still be equivalent after translation, it is important that the items be

comprehensible to the people it targets. It is useful, then, to submit it to a committee that

is representative of the people targeted, to obtain feedback. They can make suggestions

and give their verdict on different wordings of certain items.

• A pre-test with a target population (N=20), by interview – This is another method that

helps verify the items’ clarity and whether they are worded in a way that is accessible to

the population targeted.

Empirical verification of the validity of the translated version

For an instrument to be valid, it has to meet the criteria of content validity, concomitant validity

and construct validity, and contain aspects that ensure its reliability. In this section, we will

present these concepts and indicate the appropriate procedures to check the validity of the

translated instruments.

Content validity – This aspect of validity is assessed based on the subjective judgment of experts

who consider whether the items measure the aspects that the instrument claims to measure.

Concomitant validity – This type of validity is obtained when a new instrument is strongly

correlated with another instrument that measures the same concept(s).

When the translated version correlates strongly with the original version, it is deemed to have

retained its content validity and concomitant validity. This exercise presupposes that the original

version and the translated version are administered to bilingual subjects and that their degree of

correlation is measured. It is nonetheless essential to ensure that the subjects are indeed bilingual

(see Vallerand, 1989). There are a number of procedures and techniques for verifying the content

validity and concomitant validity.

• Prince and Monbour’s procedure (see Haccoun, 1987) – Equivalence is verified by

administering to two groups half the instrument in the translated language and half in the

original language, making sure that the first group receives the first half of the instrument

in the original language followed by the second half in the translated version, and the

second group receives the inverse (i.e., the first half of the instrument in the translated

version followed by the second half in the original language). Equivalence is then

established by comparing the response rate and overall scores of the two groups,

examining correlations and comparing internal consistencies. This frequently used

method nonetheless presents problems that challenge its validity.

• The statement analysis technique (see Haccoun, 1987) - Equivalence is verified by

administering the instrument in both languages to one group of bilingual subjects. The

response rate for each item is then statistically transformed and analysed. The

mathematical curves obtained for each statement must then be compared to verify

equivalence. This is a very sophisticated and effective method that nonetheless has the

disadvantage of requiring advanced mathematical skills.

• The single group technique (Haccoun, 1987) - Equivalence is verified by administering to

a group of bilingual subjects two versions of the instrument on two separate occasions.

At Time 1, half of the entire group is administered the original version, followed by the

translated version; then at Time 2, the process is inverted. Multiple correlations between

the two versions are subsequently examined. This method is advantageous in that the

equivalence of the translation and the temporal stability of the instrument in both

languages can be verified at the same time. T-tests can also be used to check the

equivalence of each item. This is a more robust statistical technique than correlations

(Vallerand, 1989).

Reliability of measurement is essential to ensure the validity of an instrument. This concept refers

to the internal consistency of the instrument and its temporal stability.

The temporal stability of the instrument – We expect a reliable instrument to measure the same

phenomenon with the same precision from one time to the next. If conditions have not changed,

the instrument should produce the same results after a period of time has elapsed. The temporal

stability of an instrument is thus established by the degree of correlation between the responses

given by the same subjects when the instrument is administered at different times. Correlation of

more than 0.60 is usually desirable. The time interval depends on what is being measured. In

fact, the greater the sensitivity of the elements being measured to conditions that may affect the

responses, the shorter the interval should be. A one-month interval usually seems to be

appropriate.

Internal consistency of the instrument – In principle, when researchers want to measure a

phenomenon, they present a number of items designed to grasp it from different angles. Although

these items are intended to measure different aspects of the concept, they should in principle be

related. To measure an instrument’s degree of internal consistency, the recommended statistical

tool is Cronbach’s alpha. The value of this alpha may vary from 0 to 1. This value is affected by

the number of items on the instrument and the number of respondents. The higher these two

parameters, the higher the value of the alpha required. Values of between 0.70 and 0.95 are

usually reasonable when the scale or sub-scale has more than 5 items. An alpha that is too high

(0.90) may indicate a redundancy of certain items. To assess the value of the alpha for scales of

less than 5 items, the reader is invited to consult Gulliksen (1950).

• Construct validity – When an instrument is developed, it is structured around items

specifically selected to measure aspects of a person or situation, which should be

consistent with the theoretical knowledge or underlying theory of the phenomenon being

studied. Moreover, a phenomenon may present itself differently from one culture to the

next, and the original instrument that has been translated, despite clear evidence of its

content validity and concomitant validity, may not adequately measure the phenomenon

in the targeted culture. It is important, then, to verify whether the translated instrument

preserves the structure of the construct, the relations between the various components of

the construct, and lastly, the consequences of the construct.

• The structure of the construct is verified by factorial analyses. Indeed, if an instrument is

intended to measure a phenomenon that theoretically has three dimensions, the factorial

analysis should make it possible to find three factors, and the items designed to measure

each of the dimensions should in principle show the highest saturation on the

corresponding factor. This type of exploratory analysis is designed to check whether the

factorial structure corresponds to that of the original instrument (see Stevens, 1992). A

more sophisticated technique involves conducting a LISREL-type confirmative analysis.

This analysis makes it possible to statistically verify whether the translation corresponds

to the original version. It nonetheless presupposes that construct validity has already been

verified in the original version and that the researcher can have access to factorial

analyses of the original instrument.

• Relations between the components of the construct – When an instrument is designed to

measure different dimensions of a phenomenon, it is important to verify the relations

(correlations) that exist between the factors, and compare them with the ones obtained

with the original version. This exercise adds credibility to the instrument’s construct

validity. Thus, if an instrument on social support postulates a number of dimensions of

support, in principle, finding higher correlations between each of the factors and the

overall score than between the different factors would help reinforce the underlying

theory.

• The consequences of the construct – This involves verifying with the translated version

whether the theoretical hypotheses postulated by the instrument can be verified

empirically. For example, if it is postulated that the quality of social support should

enhance the quality of life, positive correlations should be found between the two

instruments. It is preferable to reproduce studies already conducted with the original

instrument and compare the results with the translated instrument. Finding results that are

consistent with the hypotheses when new studies are conducted with the translated

instrument also reinforces the construct validity.

Readers interested in a more in-depth discussion of the concepts in this section are invited to

consult Vallerand (1989).

Adaptation of the scores to the cultural context and development of standards

When an instrument is developed in a given culture, standards are usually developed to situate

individual scoring or the average scoring of a group in relation to a broader set of references. It

may be that in the culture for which the instrument has been translated, the same phenomenon

appears with a different intensity, scope or frequency. It is therefore important to compare the

distribution of scoring generated by the translated version with that of the original instrument.

Among the basic indicators, the average and the standard deviation help assess the variability of

the measurement. It is important to verify these indicators for men and women. Major differences

with the original in the averages and in the standard deviations could mean: 1) that the chosen

sample is problematic; or 2) that the phenomenon studied in the target culture has peculiar

characteristics. A very different distribution might suggest that the instrument is perhaps not

appropriate to the culture. When the differences are acceptable, it is then important to develop

standards for the target culture. These standards should include the average, the standard

deviation, the percentile ranges and the Z or T scores. The latter make it possible to situate

individuals on an interval scale. The choice of population for developing standards depends on

the instrument’s objective. If the instrument is designed mainly for people with mental health

problems, the sample chosen should reflect that concern.

References

Bullinger, M., Anderson,D., Cella, D., Aaronson, N. (1993). Developing and evaluating cross-

cultural instruments for minimal requirements to optimal models. Quality of Life Research, 2,

451-459.

Flaherty, J.A., Gavira, M.F., Pathak, D., Mitchell,T., Wintrob, R., Richman, J., Birz, S.(1988).

Developing instruments for cross-cultural psychiatric research. Journal of Nervous and Mental

Disease, 176 (5) 257-263.

Guillemin, F., Bonbardier, C., Beaton, D. (1993). Cross-cultural adaptation of health-related

quality of life measures: literature review and proposed guidelines. Journal of Clinical

Epidemiology, 46 (120), 1417-1432.

Gulliksen, H. (1950). Theory of mental tests. New York: John Wiley.

Haccoun, R.R. (1987). Une nouvelle technique de vérification de l'équivalence de mesures

psychologiques traduites. Revue québécoise de psychologie, 8 (3), 30-39.

Hunt, S.M., Alonso, J., Bucqet, D., Niero, M., Wiklund, I., McKenna, S. (1991). Cross-cultural

adaptation of health measures. Health Policy, 19, 33-44.

Stevens, J. (1992) Applied multivariate statistics for the social sciences. Hillsdale: Lawrence

Erlbraum Associates, Publishers

Vallerand, R.J. (1989). Vers une méthodologie de validation transculturelle de questionnaires

psychologiques : implications pour la recherche en langue française. Psychologie Canadienne, 30

(4), 662-689.

La versione italiana di un test transculturale: l'ELAL d'Avicenne

Article

Full-text available

Aug 2019

Rev Esp Endocrinol Pediatr 2020 -Volumen 11

Article

Full-text available

Jan 2021

Resumen Introducción. La hipoglucemia se describe como el efecto adverso más común en el manejo de la diabetes. El miedo a sufrir hipoglucemia, se ha mostrado como un factor que influye de manera ne-gativa en la adherencia al tratamiento de los pa-cientes y en su calidad de vida. El cuestionario Hy-poglycemia Fear Survey for Parents (HFS-P) es una herramienta validada para medir el miedo de hipo-glucemias en padres de niños con diabetes tipo 1. El objetivo de este trabajo es adaptarlo culturalmen-te para población española como paso previo a su validación. Metodología. Se utilizó la metodología basada en la técnica de traducción-retrotraducción recomendada por los distintos autores: Traducción directa, síntesis de traducción, traducción inversa, consolidación por un comité de expertos y pre-test (aplicabilidad / viabilidad). Resultados. A través de la técnica Delphi, se obtuvieron como resultados modificaciones en la apariencia y simplicidad del cuestionario. Se modificaron un total de diecisiete ítems y se añadió uno más con el fin de darle un mayor sentido en el ámbito de estudio. El cuestio-nario se divide en tres dimensiones. Las dudas planteadas en la población piloto no alteraron la apariencia final del cuestionario. Conclusión. Pro-pocionamos la adaptación cultural del HFS-P, y pro-ponemos su definición en español como "Cuestio-nario para el Miedo a la Hipoglucemia en Padres y cuidadores de niños con diabetes tipo 1 (C-MAHP)" posibilitando su estudio psicométrico y su posterior utilización en la población española. Abstract Introduction. Hypoglycemia is described as the most common adverse effect in the management of diabetes. The fear of suffering a hypoglycemia has been shown to be a factor that negatively influences adherence to treatment and patients' quality of life. The Hypoglycemia Fear Survey for Parents (HFS-P) questionnaire is a validated tool to measure fear of hypoglycemia in parents of children with type 1 diabetes. The aim of this work is to adapt it culturally for the Spanish population as a step prior to its validation. Methods. We employed the methodology based on the translation-back-translation technique recommended by different authors: Direct translation , translation synthesis, reverse translation, consolidation by an experts committee and pre-test (applicability / feasibility). Results. We performed changes in the appearance and increased simplicity of the questionnaire applying the Delphi technique. A total of seventeen items were modified and

Vers une m??thodologie de validation transculturelle de questionnaires psychologiques: Implications pour la recherche en langue fran??aise

Article

Full-text available

Jan 1989

Robert J Vallerand

Une nouvelle technique de vérification de l'équivalence de mesures psychologiques traduites

Article

Jan 1987

R.R. Haccoun

Theory of Mental Test

Article

Jun 1952

Harold Gulliksen

Developing and evaluating cross-cultural instruments from minimum requirements to optimal models

Article

Jan 1994

In the age of increased international collaboration in medical research, the necessity of having at hand cross-culturally applicable instruments for the assessment of health-related quality of life (HRQL) in clinical trials has been voiced. Several important theoretical bases leading to cultural bias in HRQL measurement include differences in definitions of HRQL across national and cultural contexts, levels of observation relied upon to indicate HRQL states, and the significance or weight placed upon the various HRQL states or dimensions measured. Despite a growing literature on the development and evaluation of existing HRQL measures in other cultures, comprehensive sets of procedures or requirements for the international part of development and evaluation are lacking. This paper reviews major approaches to developing international HRQL measures, and discusses various methods and criteria that have been recommended for evaluating measurement equivalence in comparisons of research across national and cultural contexts. A summary of recent trends and advances in international HRQL assessment is presented.

Cross-cultural adaptation of health-related quality of life measures: Literature review and proposed guidelines

Article

Jan 1994
J CLIN EPIDEMIOL

Clinicians and researchers without a suitable health-related quality of life (HRQOL) measure in their own language have two choices: (1) to develop a new measure, or (2) to modify a measure previously validated in another language, known as a cross-cultural adaptation process. We propose a set of standardized guidelines for this process based on previous research in psychology and sociology and on published methodological frameworks. These guidelines include recommendations for obtaining semantic, idiomatic, experiential and conceptual equivalence in translation by using back-translation techniques and committee review, pre-testing techniques and re-examining the weight of scores. We applied these guidelines to 17 cross-cultural adaptation of HRQOL measures identified through a comprehensive literature review. The reporting standards varied across studies but agreement between raters in their ratings of the studies was substantial to almost perfect (weighted kappa = 0.66-0.93) suggesting that the guidelines are easy to apply. Further research is necessary in order to delineate essential versus optional steps in the adaptation process.

Cross-cultural adaptation of health measures. European Group for Health Management and Quality of Life Assessment

Article

Oct 1991
HEALTH POLICY

There is increasing interest throughout Europe in measuring health needs in the general population and in the 'quality of life' of patients. This has led to a demand for questionnaires capable of measuring health status in a reliable and valid manner. Most existing measures have, however, been standardised only in the U.S.A. and, to a lesser extent, in the U.K. The issue of translation and retesting of questionnaires prepared in the English language for use in other countries has received surprisingly little attention. This paper describes some of the technical, linguistic and conceptual issues raised by translation and the processes involved in producing acceptable country-specific versions of the Nottingham Health Profile according to a systematic method.

Cross-cultural adaptation of health measures

Article

Feb 1991
HEALTH POLICY

There is Increasing interest throughout Europe in measuring health needs in the general population and in the 'quality of life' of patients. This has led to a demand for questionnaires capable of measuring health status in a reliable and valid manner. Most existing measures have, however, been standardised only In the U.S.A. and, to a lesser extent, in the U.K. The issue of translation and retesting of questionnaires prepared in the English language for use in other countries has received surprisingly little attention. This paper describes some of the technical, linguistic and conceptual issues raised by translation and the processes involved in producing acceptable country-specific versions of the Nottingham Health Profile according to a systematic method.

Applied multivariate statistics for the social sciences Hillsdale: Lawrence Erlbraum Associates Vers une méthodologie de validation transculturelle de questionnaires psychologiques : implications pour la recherche en langue française

Jan 1989
662-689

J Stevens
R J Vallerand

Stevens, J. (1992) Applied multivariate statistics for the social sciences. Hillsdale: Lawrence Erlbraum Associates, Publishers Vallerand, R.J. (1989). Vers une méthodologie de validation transculturelle de questionnaires psychologiques : implications pour la recherche en langue française. Psychologie Canadienne, 30 (4), 662-689.

Cross-cultural adaptation of health-related quality of life measures: literature review and proposed guidelines

Jan 1993
J CLIN EPIDEMIOL
1417-1432

F Guillemin
C Bonbardier
D Beaton

Guillemin, F., Bonbardier, C., Beaton, D. (1993). Cross-cultural adaptation of health-related quality of life measures: literature review and proposed guidelines. Journal of Clinical Epidemiology, 46 (120), 1417-1432.

A guide for cross-cultural validation of measurement instruments in mental health

Abstract

Recommended publications

Parole de marchand

Issues of distance and proximity in neologisms, as instanced in e-commerce

Monetary Control Under Alternative Operating Procedures

Translation, International English, and the Planet of Babel