Technical ReportPDF Available

A guide for cross-cultural validation of measurement instruments in mental health

Authors:

Abstract

Cross-cultural validation of an assessment instrument is a complex process that requires a substantial investment in time and money. It appears to be difficult to complete such an exercise in less than a year. Before venturing into this process, it is important to make sure that there is no equivalent instrument in French or none that has been translated and validated. If no such equivalent exists, the researcher must do an exhaustive survey of relevant instruments for his or her study, in order to select the one that has passed the most rigorous phases of validation in its original language. Indeed, cross-cultural validation conducted according to generally accepted practice cannot generally produce a version that is more valid or reliable than the original version. To be sure, this process often makes it possible to improve certain aspects of the original version or to complete certain phases of validation, but it cannot compensate for flagrant shortcomings in validation content, construct or reliability. Moreover, in most cases, the utility of an assessment instrument is based on its capacity to detect differences between individuals or particular groups, or differences following changes resulting from a treatment or programme of care, or changes in the environment. We are referring here to the specificity and sensitivity of the instrument. These qualities must also be taken into account in selecting an instrument. Cross-cultural validation involves three main phases: 1) translation and verification of its equivalence; 2) empirical verification of the validity of the translated version; and 3) adaptation of the scores to the cultural context, and development of standards. Each phase also covers the necessary steps in devising a valid version, and a number of options are open to the researcher, each with its advantages and drawbacks. The following section presents these steps, and readers interested in validating instruments in French will also find references to specialized articles on the subject.
A guide for cross-cultural validation of
measurement instruments in mental health
Jean Caron, Ph.D.
Director of the CIHR Team in Social and Psychiatric Epidemiology
Associate professor in the Department of Psychiatry, McGill University
Researcher in the Psycho-social Division of the Douglas Hospital Research Centre
Cross-cultural validation of an assessment instrument is a complex process that requires a
substantial investment in time and money. It appears to be difficult to complete such an exercise
in less than a year. Before venturing into this process, it is important to make sure that there is no
equivalent instrument in French or none that has been translated and validated. If no such
equivalent exists, the researcher must do an exhaustive survey of relevant instruments for his or
her study, in order to select the one that has passed the most rigorous phases of validation in its
original language. Indeed, cross-cultural validation conducted according to generally accepted
practice cannot generally produce a version that is more valid or reliable than the original
version. To be sure, this process often makes it possible to improve certain aspects of the original
version or to complete certain phases of validation, but it cannot compensate for flagrant
shortcomings in validation content, construct or reliability. Moreover, in most cases, the utility
of an assessment instrument is based on its capacity to detect differences between individuals or
particular groups, or differences following changes resulting from a treatment or programme of
care, or changes in the environment. We are referring here to the specificity and sensitivity of the
instrument. These qualities must also be taken into account in selecting an instrument.
Cross-cultural validation involves three main phases: 1) translation and verification of its
equivalence; 2) empirical verification of the validity of the translated version; and 3) adaptation
of the scores to the cultural context, and development of standards. Each phase also covers the
necessary steps in devising a valid version, and a number of options are open to the researcher,
each with its advantages and drawbacks. The following section presents these steps, and readers
interested in validating instruments in French will also find references to specialized articles on
the subject.
Translation and verification of equivalence
The translation process must ensure that an instrument retains its inferential equivalence
(Haccoun, 1987), that is, this it is possible to derive the same inferences from the translated
version as from the original version. Correspondence between terms (semantic equivalence) is
not easy to achieve from one culture to another, given the vocabulary and grammar that is
specific to each language. Some expressions that are translated literally are meaningless in
another culture, and equivalent expressions specific to the targeted culture must be found
(equivalence of expressions). Some situations invoked in the culture of the original instrument
may not correspond to the realities of another culture, and these items will have to be replaced by
other situations more appropriate to that culture, while preserving the objective and the meaning
intended by these items (experiential equivalence). Lastly, the same exercise must be applied for
certain concepts that, when translated literally, do not represent the same thing from one culture
to another (conceptual equivalence). The reader is invited to consult Guillemin, Bombardier and
Beaton (1993) for a more in-depth examination of these concepts of equivalence.
Preparation of a preliminary version
Traditional translation simply involves translation of the original instrument by a bilingual
researcher or professional translator. This method used on its own is not recommended, as
it introduces too much bias, particularly in terms of the researcher or translator’s
interpretation. This difficulty may be circumvented by obtaining a number of parallel
translations by different translators or bilingual researchers, but the following methods are
more advisable.
The method of translation by a committee of experts involves the participation of a
number of bilingual people familiar with the field in which the instrument is to be used,
which also limits the biases of a single researcher. This committee can concentrate on a
preliminary translated version, or develop a preliminary version. It is recommended that
this committee include a professional translator or linguist who will ensure that the items
are written in a way that is linguistically correct. In ideal conditions, the participation of
the author of the original version would help clarify certain ambiguities resulting from the
process of translation.
"Back translation" involves having a preliminary translated version of the instrument
translated back into the original language by a second person. The discrepancies between
the original version and the retranslated version help identify problematic items. This
method may be even more sophisticated if there are two parallel back translations done,
which thus involve four people. This method can be considered ideal. A number of
researchers who have used it nonetheless find that it is very difficult to obtain perfect
equivalence between the retranslated version and the original version.
Assessment of the preliminary version
An expert committee – Whatever method is selected for the preliminary version or
versions, it appears to be important that a number of people (5-10) take a critical look at
the translation to check whether the items in the original version are adapted to the
targeted culture. Moreover, if some problematic items emerge following various
translations or back translations, the committee can then be used to determine which
translation is most appropriate.
A committee of people representative of those targeted by the instrument (N=5-10) –
When a preliminary version has gone through the preceding drafts, even though it may
appear to still be equivalent after translation, it is important that the items be
comprehensible to the people it targets. It is useful, then, to submit it to a committee that
is representative of the people targeted, to obtain feedback. They can make suggestions
and give their verdict on different wordings of certain items.
A pre-test with a target population (N=20), by interview – This is another method that
helps verify the items’ clarity and whether they are worded in a way that is accessible to
the population targeted.
Empirical verification of the validity of the translated version
For an instrument to be valid, it has to meet the criteria of content validity, concomitant validity
and construct validity, and contain aspects that ensure its reliability. In this section, we will
present these concepts and indicate the appropriate procedures to check the validity of the
translated instruments.
Content validity – This aspect of validity is assessed based on the subjective judgment of experts
who consider whether the items measure the aspects that the instrument claims to measure.
Concomitant validity – This type of validity is obtained when a new instrument is strongly
correlated with another instrument that measures the same concept(s).
When the translated version correlates strongly with the original version, it is deemed to have
retained its content validity and concomitant validity. This exercise presupposes that the original
version and the translated version are administered to bilingual subjects and that their degree of
correlation is measured. It is nonetheless essential to ensure that the subjects are indeed bilingual
(see Vallerand, 1989). There are a number of procedures and techniques for verifying the content
validity and concomitant validity.
Prince and Monbour’s procedure (see Haccoun, 1987) – Equivalence is verified by
administering to two groups half the instrument in the translated language and half in the
original language, making sure that the first group receives the first half of the instrument
in the original language followed by the second half in the translated version, and the
second group receives the inverse (i.e., the first half of the instrument in the translated
version followed by the second half in the original language). Equivalence is then
established by comparing the response rate and overall scores of the two groups,
examining correlations and comparing internal consistencies. This frequently used
method nonetheless presents problems that challenge its validity.
The statement analysis technique (see Haccoun, 1987) - Equivalence is verified by
administering the instrument in both languages to one group of bilingual subjects. The
response rate for each item is then statistically transformed and analysed. The
mathematical curves obtained for each statement must then be compared to verify
equivalence. This is a very sophisticated and effective method that nonetheless has the
disadvantage of requiring advanced mathematical skills.
The single group technique (Haccoun, 1987) - Equivalence is verified by administering to
a group of bilingual subjects two versions of the instrument on two separate occasions.
At Time 1, half of the entire group is administered the original version, followed by the
translated version; then at Time 2, the process is inverted. Multiple correlations between
the two versions are subsequently examined. This method is advantageous in that the
equivalence of the translation and the temporal stability of the instrument in both
languages can be verified at the same time. T-tests can also be used to check the
equivalence of each item. This is a more robust statistical technique than correlations
(Vallerand, 1989).
Reliability of measurement is essential to ensure the validity of an instrument. This concept refers
to the internal consistency of the instrument and its temporal stability.
The temporal stability of the instrument – We expect a reliable instrument to measure the same
phenomenon with the same precision from one time to the next. If conditions have not changed,
the instrument should produce the same results after a period of time has elapsed. The temporal
stability of an instrument is thus established by the degree of correlation between the responses
given by the same subjects when the instrument is administered at different times. Correlation of
more than 0.60 is usually desirable. The time interval depends on what is being measured. In
fact, the greater the sensitivity of the elements being measured to conditions that may affect the
responses, the shorter the interval should be. A one-month interval usually seems to be
appropriate.
Internal consistency of the instrument – In principle, when researchers want to measure a
phenomenon, they present a number of items designed to grasp it from different angles. Although
these items are intended to measure different aspects of the concept, they should in principle be
related. To measure an instrument’s degree of internal consistency, the recommended statistical
tool is Cronbach’s alpha. The value of this alpha may vary from 0 to 1. This value is affected by
the number of items on the instrument and the number of respondents. The higher these two
parameters, the higher the value of the alpha required. Values of between 0.70 and 0.95 are
usually reasonable when the scale or sub-scale has more than 5 items. An alpha that is too high
(0.90) may indicate a redundancy of certain items. To assess the value of the alpha for scales of
less than 5 items, the reader is invited to consult Gulliksen (1950).
Construct validity – When an instrument is developed, it is structured around items
specifically selected to measure aspects of a person or situation, which should be
consistent with the theoretical knowledge or underlying theory of the phenomenon being
studied. Moreover, a phenomenon may present itself differently from one culture to the
next, and the original instrument that has been translated, despite clear evidence of its
content validity and concomitant validity, may not adequately measure the phenomenon
in the targeted culture. It is important, then, to verify whether the translated instrument
preserves the structure of the construct, the relations between the various components of
the construct, and lastly, the consequences of the construct.
The structure of the construct is verified by factorial analyses. Indeed, if an instrument is
intended to measure a phenomenon that theoretically has three dimensions, the factorial
analysis should make it possible to find three factors, and the items designed to measure
each of the dimensions should in principle show the highest saturation on the
corresponding factor. This type of exploratory analysis is designed to check whether the
factorial structure corresponds to that of the original instrument (see Stevens, 1992). A
more sophisticated technique involves conducting a LISREL-type confirmative analysis.
This analysis makes it possible to statistically verify whether the translation corresponds
to the original version. It nonetheless presupposes that construct validity has already been
verified in the original version and that the researcher can have access to factorial
analyses of the original instrument.
Relations between the components of the construct – When an instrument is designed to
measure different dimensions of a phenomenon, it is important to verify the relations
(correlations) that exist between the factors, and compare them with the ones obtained
with the original version. This exercise adds credibility to the instrument’s construct
validity. Thus, if an instrument on social support postulates a number of dimensions of
support, in principle, finding higher correlations between each of the factors and the
overall score than between the different factors would help reinforce the underlying
theory.
The consequences of the construct – This involves verifying with the translated version
whether the theoretical hypotheses postulated by the instrument can be verified
empirically. For example, if it is postulated that the quality of social support should
enhance the quality of life, positive correlations should be found between the two
instruments. It is preferable to reproduce studies already conducted with the original
instrument and compare the results with the translated instrument. Finding results that are
consistent with the hypotheses when new studies are conducted with the translated
instrument also reinforces the construct validity.
Readers interested in a more in-depth discussion of the concepts in this section are invited to
consult Vallerand (1989).
Adaptation of the scores to the cultural context and development of standards
When an instrument is developed in a given culture, standards are usually developed to situate
individual scoring or the average scoring of a group in relation to a broader set of references. It
may be that in the culture for which the instrument has been translated, the same phenomenon
appears with a different intensity, scope or frequency. It is therefore important to compare the
distribution of scoring generated by the translated version with that of the original instrument.
Among the basic indicators, the average and the standard deviation help assess the variability of
the measurement. It is important to verify these indicators for men and women. Major differences
with the original in the averages and in the standard deviations could mean: 1) that the chosen
sample is problematic; or 2) that the phenomenon studied in the target culture has peculiar
characteristics. A very different distribution might suggest that the instrument is perhaps not
appropriate to the culture. When the differences are acceptable, it is then important to develop
standards for the target culture. These standards should include the average, the standard
deviation, the percentile ranges and the Z or T scores. The latter make it possible to situate
individuals on an interval scale. The choice of population for developing standards depends on
the instrument’s objective. If the instrument is designed mainly for people with mental health
problems, the sample chosen should reflect that concern.
References
Bullinger, M., Anderson,D., Cella, D., Aaronson, N. (1993). Developing and evaluating cross-
cultural instruments for minimal requirements to optimal models. Quality of Life Research, 2,
451-459.
Flaherty, J.A., Gavira, M.F., Pathak, D., Mitchell,T., Wintrob, R., Richman, J., Birz, S.(1988).
Developing instruments for cross-cultural psychiatric research. Journal of Nervous and Mental
Disease, 176 (5) 257-263.
Guillemin, F., Bonbardier, C., Beaton, D. (1993). Cross-cultural adaptation of health-related
quality of life measures: literature review and proposed guidelines. Journal of Clinical
Epidemiology, 46 (120), 1417-1432.
Gulliksen, H. (1950). Theory of mental tests. New York: John Wiley.
Haccoun, R.R. (1987). Une nouvelle technique de vérification de l'équivalence de mesures
psychologiques traduites. Revue québécoise de psychologie, 8 (3), 30-39.
Hunt, S.M., Alonso, J., Bucqet, D., Niero, M., Wiklund, I., McKenna, S. (1991). Cross-cultural
adaptation of health measures. Health Policy, 19, 33-44.
Stevens, J. (1992) Applied multivariate statistics for the social sciences. Hillsdale: Lawrence
Erlbraum Associates, Publishers
Vallerand, R.J. (1989). Vers une méthodologie de validation transculturelle de questionnaires
psychologiques : implications pour la recherche en langue française. Psychologie Canadienne, 30
(4), 662-689.
... È stata fatta una "trasposizione creatrice" del test L'ELAL d'Avicenne dalla lingua francese alla lingua italiana includendo il contesto culturale italiano come sfondo indispensabile per una sua corretta traduzione (Beaton, Bombardier, Guillemin, & Ferraz, 2000;Haccoun, 1987;Jakobson, 1959;Oustinoff, 2003). Ci siamo ispirati alle linee guida della traduzione di un test transculturale proposte dall'università canadese McGill (Caron, 1999) e consultato modelli di traduzione creati per l'adattamento e la validazione di test in differenti lingue e culture (Beaton et al., 2000;Borsa, Damasio, & Bandeira, 2012;Guillemin et al., 1993;Sireci,, Yang, Harter, & Ehrlich, 2006). Abbiamo infine, creato un modello di traduzione di un test transculturale composto da due fasi: la trasposizione della versione originale del test dalla lingua francese alla lingua italiana e la sua applicazione, ad un campione di 20 bambini monolingui italiani, di età compresa tra i 3 anni e i 5 mesi e i 6 anni e 5 mesi, in un contesto la cui lingua materna italiana è predominante. ...
... In questa prima fase è stata condotta la traduzione della versione originale del test L'ELAL d'Avicenne in lingua italiana con l'obiettivo di raggiungere un'equivalenza inferenziale della versione tradotta ovvero una traduzione basata su quattro differenti livelli di equivalenza (Caron, 1999;Guillemin et al., 1993;Haccoun, 1987). • Equivalenza semantica: ricerca di parole che esprimano lo stesso significato in lingua francese e in lingua italiana. ...
... Per arrivare ad un'equivalenza concettuale sono scelte delle parole e degli enunciati che seppur differenti nella forma sono equivalenti nel loro contenuto. (Caron, 1999;Guillemin et al., 1993;Hoccoun, 1987). Il processo di trasposizione del test L'ELAL d'Avicenne dalla lingua francese alla lingua italiana ha previsto una prima fase di creazione della versione preliminare: la versione francese del test L'ELAL d'Avicenne è stata tradotta dal francese all'italiano da tre ricercatori bilingui di madre lingua italiana (due psicologhe e un pedopsichiatra). ...
... Asegurando la consecución de estas etapas se ha conseguido obtener la equivalencia semántica (que el significado de cada ítem sea el mismo en cada cultura después de la traducción), la equivalencia conceptual (el instrumento mide el mismo constructo teórico en cada cultura), la equivalencia de contenido (el contenido de cada ítem es relevante en cada cultura), la equivalencia técnica (el método de recogida de datos es comparable en cada cultura) y, por último, la equivalencia de criterio (la interpretación de la medida se mantiene igual cuando se compara con las normas de cada cultura estudiada) (29) en esta herramienta. ...
Article
Full-text available
Resumen Introducción. La hipoglucemia se describe como el efecto adverso más común en el manejo de la diabetes. El miedo a sufrir hipoglucemia, se ha mostrado como un factor que influye de manera ne-gativa en la adherencia al tratamiento de los pa-cientes y en su calidad de vida. El cuestionario Hy-poglycemia Fear Survey for Parents (HFS-P) es una herramienta validada para medir el miedo de hipo-glucemias en padres de niños con diabetes tipo 1. El objetivo de este trabajo es adaptarlo culturalmen-te para población española como paso previo a su validación. Metodología. Se utilizó la metodología basada en la técnica de traducción-retrotraducción recomendada por los distintos autores: Traducción directa, síntesis de traducción, traducción inversa, consolidación por un comité de expertos y pre-test (aplicabilidad / viabilidad). Resultados. A través de la técnica Delphi, se obtuvieron como resultados modificaciones en la apariencia y simplicidad del cuestionario. Se modificaron un total de diecisiete ítems y se añadió uno más con el fin de darle un mayor sentido en el ámbito de estudio. El cuestio-nario se divide en tres dimensiones. Las dudas planteadas en la población piloto no alteraron la apariencia final del cuestionario. Conclusión. Pro-pocionamos la adaptación cultural del HFS-P, y pro-ponemos su definición en español como "Cuestio-nario para el Miedo a la Hipoglucemia en Padres y cuidadores de niños con diabetes tipo 1 (C-MAHP)" posibilitando su estudio psicométrico y su posterior utilización en la población española. Abstract Introduction. Hypoglycemia is described as the most common adverse effect in the management of diabetes. The fear of suffering a hypoglycemia has been shown to be a factor that negatively influences adherence to treatment and patients' quality of life. The Hypoglycemia Fear Survey for Parents (HFS-P) questionnaire is a validated tool to measure fear of hypoglycemia in parents of children with type 1 diabetes. The aim of this work is to adapt it culturally for the Spanish population as a step prior to its validation. Methods. We employed the methodology based on the translation-back-translation technique recommended by different authors: Direct translation , translation synthesis, reverse translation, consolidation by an experts committee and pre-test (applicability / feasibility). Results. We performed changes in the appearance and increased simplicity of the questionnaire applying the Delphi technique. A total of seventeen items were modified and
Article
In the age of increased international collaboration in medical research, the necessity of having at hand cross-culturally applicable instruments for the assessment of health-related quality of life (HRQL) in clinical trials has been voiced. Several important theoretical bases leading to cultural bias in HRQL measurement include differences in definitions of HRQL across national and cultural contexts, levels of observation relied upon to indicate HRQL states, and the significance or weight placed upon the various HRQL states or dimensions measured. Despite a growing literature on the development and evaluation of existing HRQL measures in other cultures, comprehensive sets of procedures or requirements for the international part of development and evaluation are lacking. This paper reviews major approaches to developing international HRQL measures, and discusses various methods and criteria that have been recommended for evaluating measurement equivalence in comparisons of research across national and cultural contexts. A summary of recent trends and advances in international HRQL assessment is presented.
Article
Clinicians and researchers without a suitable health-related quality of life (HRQOL) measure in their own language have two choices: (1) to develop a new measure, or (2) to modify a measure previously validated in another language, known as a cross-cultural adaptation process. We propose a set of standardized guidelines for this process based on previous research in psychology and sociology and on published methodological frameworks. These guidelines include recommendations for obtaining semantic, idiomatic, experiential and conceptual equivalence in translation by using back-translation techniques and committee review, pre-testing techniques and re-examining the weight of scores. We applied these guidelines to 17 cross-cultural adaptation of HRQOL measures identified through a comprehensive literature review. The reporting standards varied across studies but agreement between raters in their ratings of the studies was substantial to almost perfect (weighted kappa = 0.66-0.93) suggesting that the guidelines are easy to apply. Further research is necessary in order to delineate essential versus optional steps in the adaptation process.
Article
There is increasing interest throughout Europe in measuring health needs in the general population and in the 'quality of life' of patients. This has led to a demand for questionnaires capable of measuring health status in a reliable and valid manner. Most existing measures have, however, been standardised only in the U.S.A. and, to a lesser extent, in the U.K. The issue of translation and retesting of questionnaires prepared in the English language for use in other countries has received surprisingly little attention. This paper describes some of the technical, linguistic and conceptual issues raised by translation and the processes involved in producing acceptable country-specific versions of the Nottingham Health Profile according to a systematic method.
Article
There is Increasing interest throughout Europe in measuring health needs in the general population and in the 'quality of life' of patients. This has led to a demand for questionnaires capable of measuring health status in a reliable and valid manner. Most existing measures have, however, been standardised only In the U.S.A. and, to a lesser extent, in the U.K. The issue of translation and retesting of questionnaires prepared in the English language for use in other countries has received surprisingly little attention. This paper describes some of the technical, linguistic and conceptual issues raised by translation and the processes involved in producing acceptable country-specific versions of the Nottingham Health Profile according to a systematic method.
Applied multivariate statistics for the social sciences Hillsdale: Lawrence Erlbraum Associates Vers une méthodologie de validation transculturelle de questionnaires psychologiques : implications pour la recherche en langue française
  • J Stevens
  • R J Vallerand
Stevens, J. (1992) Applied multivariate statistics for the social sciences. Hillsdale: Lawrence Erlbraum Associates, Publishers Vallerand, R.J. (1989). Vers une méthodologie de validation transculturelle de questionnaires psychologiques : implications pour la recherche en langue française. Psychologie Canadienne, 30 (4), 662-689.
Cross-cultural adaptation of health-related quality of life measures: literature review and proposed guidelines
  • F Guillemin
  • C Bonbardier
  • D Beaton
Guillemin, F., Bonbardier, C., Beaton, D. (1993). Cross-cultural adaptation of health-related quality of life measures: literature review and proposed guidelines. Journal of Clinical Epidemiology, 46 (120), 1417-1432.