Content uploaded by Jean Caron
Author content
All content in this area was uploaded by Jean Caron on Dec 01, 2015
Content may be subject to copyright.
A guide for cross-cultural validation of
measurement instruments in mental health
Jean Caron, Ph.D.
Director of the CIHR Team in Social and Psychiatric Epidemiology
Associate professor in the Department of Psychiatry, McGill University
Researcher in the Psycho-social Division of the Douglas Hospital Research Centre
Cross-cultural validation of an assessment instrument is a complex process that requires a
substantial investment in time and money. It appears to be difficult to complete such an exercise
in less than a year. Before venturing into this process, it is important to make sure that there is no
equivalent instrument in French or none that has been translated and validated. If no such
equivalent exists, the researcher must do an exhaustive survey of relevant instruments for his or
her study, in order to select the one that has passed the most rigorous phases of validation in its
original language. Indeed, cross-cultural validation conducted according to generally accepted
practice cannot generally produce a version that is more valid or reliable than the original
version. To be sure, this process often makes it possible to improve certain aspects of the original
version or to complete certain phases of validation, but it cannot compensate for flagrant
shortcomings in validation content, construct or reliability. Moreover, in most cases, the utility
of an assessment instrument is based on its capacity to detect differences between individuals or
particular groups, or differences following changes resulting from a treatment or programme of
care, or changes in the environment. We are referring here to the specificity and sensitivity of the
instrument. These qualities must also be taken into account in selecting an instrument.
Cross-cultural validation involves three main phases: 1) translation and verification of its
equivalence; 2) empirical verification of the validity of the translated version; and 3) adaptation
of the scores to the cultural context, and development of standards. Each phase also covers the
necessary steps in devising a valid version, and a number of options are open to the researcher,
each with its advantages and drawbacks. The following section presents these steps, and readers
interested in validating instruments in French will also find references to specialized articles on
the subject.
Translation and verification of equivalence
The translation process must ensure that an instrument retains its inferential equivalence
(Haccoun, 1987), that is, this it is possible to derive the same inferences from the translated
version as from the original version. Correspondence between terms (semantic equivalence) is
not easy to achieve from one culture to another, given the vocabulary and grammar that is
specific to each language. Some expressions that are translated literally are meaningless in
another culture, and equivalent expressions specific to the targeted culture must be found
(equivalence of expressions). Some situations invoked in the culture of the original instrument
may not correspond to the realities of another culture, and these items will have to be replaced by
other situations more appropriate to that culture, while preserving the objective and the meaning
intended by these items (experiential equivalence). Lastly, the same exercise must be applied for
certain concepts that, when translated literally, do not represent the same thing from one culture
to another (conceptual equivalence). The reader is invited to consult Guillemin, Bombardier and
Beaton (1993) for a more in-depth examination of these concepts of equivalence.
Preparation of a preliminary version
• Traditional translation simply involves translation of the original instrument by a bilingual
researcher or professional translator. This method used on its own is not recommended, as
it introduces too much bias, particularly in terms of the researcher or translator’s
interpretation. This difficulty may be circumvented by obtaining a number of parallel
translations by different translators or bilingual researchers, but the following methods are
more advisable.
• The method of translation by a committee of experts involves the participation of a
number of bilingual people familiar with the field in which the instrument is to be used,
which also limits the biases of a single researcher. This committee can concentrate on a
preliminary translated version, or develop a preliminary version. It is recommended that
this committee include a professional translator or linguist who will ensure that the items
are written in a way that is linguistically correct. In ideal conditions, the participation of
the author of the original version would help clarify certain ambiguities resulting from the
process of translation.
• "Back translation" involves having a preliminary translated version of the instrument
translated back into the original language by a second person. The discrepancies between
the original version and the retranslated version help identify problematic items. This
method may be even more sophisticated if there are two parallel back translations done,
which thus involve four people. This method can be considered ideal. A number of
researchers who have used it nonetheless find that it is very difficult to obtain perfect
equivalence between the retranslated version and the original version.
Assessment of the preliminary version
• An expert committee – Whatever method is selected for the preliminary version or
versions, it appears to be important that a number of people (5-10) take a critical look at
the translation to check whether the items in the original version are adapted to the
targeted culture. Moreover, if some problematic items emerge following various
translations or back translations, the committee can then be used to determine which
translation is most appropriate.
• A committee of people representative of those targeted by the instrument (N=5-10) –
When a preliminary version has gone through the preceding drafts, even though it may
appear to still be equivalent after translation, it is important that the items be
comprehensible to the people it targets. It is useful, then, to submit it to a committee that
is representative of the people targeted, to obtain feedback. They can make suggestions
and give their verdict on different wordings of certain items.
• A pre-test with a target population (N=20), by interview – This is another method that
helps verify the items’ clarity and whether they are worded in a way that is accessible to
the population targeted.
Empirical verification of the validity of the translated version
For an instrument to be valid, it has to meet the criteria of content validity, concomitant validity
and construct validity, and contain aspects that ensure its reliability. In this section, we will
present these concepts and indicate the appropriate procedures to check the validity of the
translated instruments.
Content validity – This aspect of validity is assessed based on the subjective judgment of experts
who consider whether the items measure the aspects that the instrument claims to measure.
Concomitant validity – This type of validity is obtained when a new instrument is strongly
correlated with another instrument that measures the same concept(s).
When the translated version correlates strongly with the original version, it is deemed to have
retained its content validity and concomitant validity. This exercise presupposes that the original
version and the translated version are administered to bilingual subjects and that their degree of
correlation is measured. It is nonetheless essential to ensure that the subjects are indeed bilingual
(see Vallerand, 1989). There are a number of procedures and techniques for verifying the content
validity and concomitant validity.
• Prince and Monbour’s procedure (see Haccoun, 1987) – Equivalence is verified by
administering to two groups half the instrument in the translated language and half in the
original language, making sure that the first group receives the first half of the instrument
in the original language followed by the second half in the translated version, and the
second group receives the inverse (i.e., the first half of the instrument in the translated
version followed by the second half in the original language). Equivalence is then
established by comparing the response rate and overall scores of the two groups,
examining correlations and comparing internal consistencies. This frequently used
method nonetheless presents problems that challenge its validity.
• The statement analysis technique (see Haccoun, 1987) - Equivalence is verified by
administering the instrument in both languages to one group of bilingual subjects. The
response rate for each item is then statistically transformed and analysed. The
mathematical curves obtained for each statement must then be compared to verify
equivalence. This is a very sophisticated and effective method that nonetheless has the
disadvantage of requiring advanced mathematical skills.
• The single group technique (Haccoun, 1987) - Equivalence is verified by administering to
a group of bilingual subjects two versions of the instrument on two separate occasions.
At Time 1, half of the entire group is administered the original version, followed by the
translated version; then at Time 2, the process is inverted. Multiple correlations between
the two versions are subsequently examined. This method is advantageous in that the
equivalence of the translation and the temporal stability of the instrument in both
languages can be verified at the same time. T-tests can also be used to check the
equivalence of each item. This is a more robust statistical technique than correlations
(Vallerand, 1989).
Reliability of measurement is essential to ensure the validity of an instrument. This concept refers
to the internal consistency of the instrument and its temporal stability.
The temporal stability of the instrument – We expect a reliable instrument to measure the same
phenomenon with the same precision from one time to the next. If conditions have not changed,
the instrument should produce the same results after a period of time has elapsed. The temporal
stability of an instrument is thus established by the degree of correlation between the responses
given by the same subjects when the instrument is administered at different times. Correlation of
more than 0.60 is usually desirable. The time interval depends on what is being measured. In
fact, the greater the sensitivity of the elements being measured to conditions that may affect the
responses, the shorter the interval should be. A one-month interval usually seems to be
appropriate.
Internal consistency of the instrument – In principle, when researchers want to measure a
phenomenon, they present a number of items designed to grasp it from different angles. Although
these items are intended to measure different aspects of the concept, they should in principle be
related. To measure an instrument’s degree of internal consistency, the recommended statistical
tool is Cronbach’s alpha. The value of this alpha may vary from 0 to 1. This value is affected by
the number of items on the instrument and the number of respondents. The higher these two
parameters, the higher the value of the alpha required. Values of between 0.70 and 0.95 are
usually reasonable when the scale or sub-scale has more than 5 items. An alpha that is too high
(0.90) may indicate a redundancy of certain items. To assess the value of the alpha for scales of
less than 5 items, the reader is invited to consult Gulliksen (1950).
• Construct validity – When an instrument is developed, it is structured around items
specifically selected to measure aspects of a person or situation, which should be
consistent with the theoretical knowledge or underlying theory of the phenomenon being
studied. Moreover, a phenomenon may present itself differently from one culture to the
next, and the original instrument that has been translated, despite clear evidence of its
content validity and concomitant validity, may not adequately measure the phenomenon
in the targeted culture. It is important, then, to verify whether the translated instrument
preserves the structure of the construct, the relations between the various components of
the construct, and lastly, the consequences of the construct.
• The structure of the construct is verified by factorial analyses. Indeed, if an instrument is
intended to measure a phenomenon that theoretically has three dimensions, the factorial
analysis should make it possible to find three factors, and the items designed to measure
each of the dimensions should in principle show the highest saturation on the
corresponding factor. This type of exploratory analysis is designed to check whether the
factorial structure corresponds to that of the original instrument (see Stevens, 1992). A
more sophisticated technique involves conducting a LISREL-type confirmative analysis.
This analysis makes it possible to statistically verify whether the translation corresponds
to the original version. It nonetheless presupposes that construct validity has already been
verified in the original version and that the researcher can have access to factorial
analyses of the original instrument.
• Relations between the components of the construct – When an instrument is designed to
measure different dimensions of a phenomenon, it is important to verify the relations
(correlations) that exist between the factors, and compare them with the ones obtained
with the original version. This exercise adds credibility to the instrument’s construct
validity. Thus, if an instrument on social support postulates a number of dimensions of
support, in principle, finding higher correlations between each of the factors and the
overall score than between the different factors would help reinforce the underlying
theory.
• The consequences of the construct – This involves verifying with the translated version
whether the theoretical hypotheses postulated by the instrument can be verified
empirically. For example, if it is postulated that the quality of social support should
enhance the quality of life, positive correlations should be found between the two
instruments. It is preferable to reproduce studies already conducted with the original
instrument and compare the results with the translated instrument. Finding results that are
consistent with the hypotheses when new studies are conducted with the translated
instrument also reinforces the construct validity.
Readers interested in a more in-depth discussion of the concepts in this section are invited to
consult Vallerand (1989).
Adaptation of the scores to the cultural context and development of standards
When an instrument is developed in a given culture, standards are usually developed to situate
individual scoring or the average scoring of a group in relation to a broader set of references. It
may be that in the culture for which the instrument has been translated, the same phenomenon
appears with a different intensity, scope or frequency. It is therefore important to compare the
distribution of scoring generated by the translated version with that of the original instrument.
Among the basic indicators, the average and the standard deviation help assess the variability of
the measurement. It is important to verify these indicators for men and women. Major differences
with the original in the averages and in the standard deviations could mean: 1) that the chosen
sample is problematic; or 2) that the phenomenon studied in the target culture has peculiar
characteristics. A very different distribution might suggest that the instrument is perhaps not
appropriate to the culture. When the differences are acceptable, it is then important to develop
standards for the target culture. These standards should include the average, the standard
deviation, the percentile ranges and the Z or T scores. The latter make it possible to situate
individuals on an interval scale. The choice of population for developing standards depends on
the instrument’s objective. If the instrument is designed mainly for people with mental health
problems, the sample chosen should reflect that concern.
References
Bullinger, M., Anderson,D., Cella, D., Aaronson, N. (1993). Developing and evaluating cross-
cultural instruments for minimal requirements to optimal models. Quality of Life Research, 2,
451-459.
Flaherty, J.A., Gavira, M.F., Pathak, D., Mitchell,T., Wintrob, R., Richman, J., Birz, S.(1988).
Developing instruments for cross-cultural psychiatric research. Journal of Nervous and Mental
Disease, 176 (5) 257-263.
Guillemin, F., Bonbardier, C., Beaton, D. (1993). Cross-cultural adaptation of health-related
quality of life measures: literature review and proposed guidelines. Journal of Clinical
Epidemiology, 46 (120), 1417-1432.
Gulliksen, H. (1950). Theory of mental tests. New York: John Wiley.
Haccoun, R.R. (1987). Une nouvelle technique de vérification de l'équivalence de mesures
psychologiques traduites. Revue québécoise de psychologie, 8 (3), 30-39.
Hunt, S.M., Alonso, J., Bucqet, D., Niero, M., Wiklund, I., McKenna, S. (1991). Cross-cultural
adaptation of health measures. Health Policy, 19, 33-44.
Stevens, J. (1992) Applied multivariate statistics for the social sciences. Hillsdale: Lawrence
Erlbraum Associates, Publishers
Vallerand, R.J. (1989). Vers une méthodologie de validation transculturelle de questionnaires
psychologiques : implications pour la recherche en langue française. Psychologie Canadienne, 30
(4), 662-689.