Cross-Cultural Research and Back-Translation

An Overview on Issues of Cross-Cultural Research and Back-Translation


There are numerous studies which have been conducted in the field of sport based on an adapted or translated instrument across countries. However, using an adapted or translated instrument does not ensure that the adapted or translated one measures the same constructs as the original one does as a result of the cultural and lingual differences. Therefore, researchers who would like to adapt or translate in instrument from English version into different language version should be cognizant of such potential problems. The purpose of this paper is to provide researchers with an overview of issues regarding the cross-cultural study as well as the adapting or translating an instrument. In addition, the practical guidelines and the possible methods that can detect such problems are also included.


Due to the fact that the world is becoming a global village, more and more fields, such as business, public affairs, and research are becoming borderless. The frequent interaction and collaboration in the field of research all over the world result in greater interests in cross-cultural and international research (Sireci & Berberoglu, 2000). Numerous tests and questionnaires developed for the population in the United States have been translated or adapted by many researchers in some non-English countries (Butcher & Garcia, 1978). This phenomenon is also salient in Asian countries, i.e., research instruments translated from English is popular in academics in Taiwan, especially in the area of psychology and sports. Such translations and adaptations seemed to assume that these translated instruments have as satisfactory validity and reliability as the original one does. However, such an assumption could be dangerous due to a variety of factors that could influence the validity of score from an instrument in different cultural settings and languages (Geisinger, 1994; Hambleton, 2001; Van de Vijver & Hambleton, 1996). In addition, some bias including construct bias and item bias could arise when translating or adapting an instrument from another language (Butcher & Garcia, 1978; Van de Vijver & Hambleton, 1996). Under such a circumstance, the validity could be one of the problems causing inaccurate results. Therefore, a more careful examination on these issues is needed when a researcher translates or adapts the existing tests or questionnaires from another language. The purpose of this paper is to examine the potential issues that might be encountered by researchers when they are translating or adapting instruments or tests from another language. Moreover, remedies and practice from existing studies will also be discussed.

Issues and Possible Remedies Regarding Cross-Cultural Research

Generally, there are three types of bias in cross-cultural studies: construct bias, method bias, and item bias (Van de Vijver & Hambleton, 1996). The following are the elaboration of each type of bias as well as the possible methods to alleviate the potential problems:

  • Construct bias: this bias occurs when the construct being measured by an instrument shows non-negligible discrepancy across cultures. For example, the construct of “filial piety”, which means how obedient people are to their parents, differs greatly between Western cultures and Eastern cultures. Further, translating an existing instrument is more likely to result in such a bias than developing an instrument for different languages simultaneously. One possible solution of curing construct bias is to adapt/translate an instrument by a team in which the team members possess the expertise in multi-cultural and multi-lingual contexts.
  • Method bias: this bias is attributed to the administration procedure of the measurement including a variety of factors, such as social desirability among/between groups, respondents’ non-familiarity with the measurement and the physical conditions in which a survey is administered, etc. This bias could affect most or even all items of the measurement. In addition, the difference in scores between groups could result from the bias in the administration procedure of the test as opposed to the intrinsic differences of the groups studied if the method bias exists. There are several methods that could be adopted to examine the method bias: confirmatory factor analysis, which can be exploited to compare the equivalence of factor structures in different cultural settings (Marsh & Byrne, 1993); Multitrait-multimethod matrices (MTMM), in which “the inter-correlations among several traits each measured by several methods are appraised for evidence of validity” (Schmitt, Coyle, & Saari, 1977, p.447); repeated test administrations and measurements of social desirability.
  • Item bias: this bias is sometimes called differential item functioning. It may cause problems if such situations as poor wording, inaccurate translations, inappropriateness of item content in a cultural group exist at the item level of the measurement. More specifically, differential item functioning is present when two people with the same ability or level of the trait differs in response due to cultural differences. The statistical techniques developed to detect item bias are divided into two main categories: one developed for dichotomously scored items is the Mantel-Haenszel procedure (Holland & Thayer, 1988), which was proposed by Holland (1985) to detect whether items function differently for two groups of examinees by means of the 2 ×2 ×K contingency table along with the MH-CHISQ test statistic proposed by Mantel and Haenszel; an another procedure developed for detecting differential item functioning in test scores with interval-scale properties was based on the analysis of variance (ANOVA). Moreover, another widely used technique to detect the item bias is an independent back-translation (Brislin, 1980). An independent back-translation means that “an original translation would render items from the original version of the instrument to a second language, and a second translator—one not familiar with the instrument—would translate the instrument back into the original language” (Geisinger, 1994, p. 306). Finally, item response theory (IRT) applied to a variety of translated tests offers the possibility for cross-cultural researchers to solve the problem of measurement inequivalence as well as to discover the cultural and/or lingual differences (Ellis, 1989).

In addition, Geisinger (1994) raised some issues regarding cross-cultural assessment by using translation and adaptation of an instrument. The following are the descriptions and some suggestions of each issue:

  • Adaptation issue: this issue chiefly asks such a question as “Does a given measure need to be adapted?” This issue might not be problematic when no appreciable differences are detected between the original population and a new target population. However, if an instrument is surveyed in the subjects who speak another language instead of the language used in that instrument originally, then translation or adaptation is needed. Further, not only language but also cultural differences between the original and the target populations should be taken into account.
  • Validity issue: this issue mainly deals with the question: “Does the measure assess the same constructs in the new language or culture?” In general, every time an instrument is changed or is applied to a new target population, its validity and reliability are necessary to be examined to ensure that the new instrument assesses the same meanings or constructs with the same degree of accuracy in that new target population. Namely, the construct validation and the reliability should be checked after a measure is adapted to a new linguistic context (Geisinger, 1992b).
  • Interpretation issue: after the decisions of adaptation and validity issues have been made, the next issue is how to interpret the scores from the translated or adapted instrument on the new target population, i.e., “what do scores on the adapted measure mean?” Researchers should notice that meaningless scores may result from a translated test along with using the same scoring algorithm. There are a variety of differences, such as cultural and linguistic differences, that may render greatly different interpretations. Thus, carefully examining both construct and instrument comparability across cultures before giving interpretations is necessary and critical.

Practical Guidelines for Cross-Cultural Research

This section will present the practical guidelines for cross-cultural researchers to ensure satisfactory reliability and validity of the cross-cultural studies. The following are the suggested guidelines and principles adapted from Geisinger (1994) and Van de Vijver & Hambleton (1996).

  • The general guideline for the cross-cultural study is to avoid construct, method, and item bias as much as possible. Although it may be not possible to totally eliminate them, a researcher should minimize them.
  • The validity needs to be addressed and demonstrated instead of taking it for granted when multi-lingual/multi-cultural research is conducted. Back-translation procedures do not ensure the validity can be achieved. Instead, other techniques including multiple group confirmatory factor analysis should be utilized.
  • Try to avoid slang, jargon, and colloquialism.
  • Make sure that the accuracy of the translated instrument and the equivalence of all language versions are carefully examined.
  • The physical environment for the administration of an instrument should be tailored or adjusted as similar as possible.
  • The score differences among samples of target populations should not be just explained at the face value. It is the researcher’s responsibility to interpret the outcomes objectively and to provide information that might affect the scores.
  • Documentation is needed for information regarding how to use the assessment device and collect reactions and feedback from users, participants, and subjects.

Literature Concerning the Issues of Cross-Cultural Research

Watkins (1989) pointed out some problems with the traditional exploratory factor analysis and illustrated the advantages and applications of confirmatory factor analysis. Confirmatory factor analysis is based on the statistical theory of structural equation modeling and possesses some good properties, such as allowing researchers to specify the factor loadings, correlated residuals, and correlated factors. The utilization of confirmatory factor analysis can assist interpretation of an instrument, provide a better way of comparing factor structures and testing competing models, and aid the analysis of the multitrait-multimethod matrices when cross-cultural studies are conducted.

Sireci and Berberoglu (2000) attempted to evaluate translated-adapted items by means of bilingual respondents because there is no guarantee that the different language versions of instruments are equivalent (in their research, they utilized an English-Turkish version of a course evaluation form). They pointed out some advantages and disadvantages of using bilinguals to evaluate translated items. The same examinees responding to both language versions of an item eliminate the problem of item translation difference. In addition, the bilingual test takers possess the ability to place nontranslated items in both test forms. However, there are some disadvantages of employing bilinguals. For example, the generalization of the results may be problematic since the bilinguals are typically a selected and limited group of people. Moreover, the homogeneity of bilinguals’ language proficiency may be another problem: some have better command of language than others.

In Myers et al.’s study (2000), they stated that multi-group structural equations modeling is a reliable method for examining measurement equivalence. They assessed three constructs derived from cross-cultural advertising research across U.S. and Korea samples. They found that most but not all constructs used in this study met the requirements for cross-cultural equivalence. However, the model did not fit well when the factor loadings were constrained to be equal across groups. Some specific items may be the likely source of the problem detected by further tests. In sum, they concluded that multi-group structural equation modeling is a useful tool for model fit in cross-cultural research.

Ellis (1989) used item response theory (IRT) to evaluate the measurement equivalence of translated American and German intelligence tests. Also, content analysis was utilized to detect probable problems when differential item functioning (DIF) was identified in some items. The conclusions in this study are as follows: differential item functioning may be attributed to translation errors but it is likely due to differences in cultural knowledge or experience; this study provides cross-cultural psychologists with a cultural-free methodology for identifying cultural differences.


Cross-cultural studies have caught researchers’ attention for decades. Translations of instruments are an inevitable tool to conduct such studies. However, literal translation does not ensure that the translated instrument measures the same constructs as in the original instrument. The reason is that there may exist lingual or cultural or both differences across samples. Therefore, cross-cultural researchers should be cognizant of the numerous potential problems, such as construct, method, and item bias that could affect the results of studies. After identifying the possible bias, cross-cultural researchers should use appropriate statistical analysis techniques including confirmatory factor analysis and item response theory to examine, avoid, or eliminate the bias. Further, cross-cultural researchers should also pay close attention to the details regarding the administration of the tests or measurements. For instance, the physical conditions of administration of the measurement, avoidance of using slang, and how to interpret the score differences across samples are the critical factors that could undermine the quality of the studies. Consequently, only when the possible factors that could potentially influence the results of the cross-cultural studies are identified and remedied can researchers ensure the accuracy of the cross-cultural research.


Brislin, R.W. (1980). Translation and content analysis of oral and written material. In H.C. Triandis & J.W. Berry (Eds.), Handbook of cross-cultural psychology (Vol. 1, pp.389-444). Boston: Allyn & Bacon.

Butcher, J.N., & Garcia, R.E. (1978). Cross-national application of psychological tests. Personnel and Guidance Journal, 56, 472-475.

Ellis, B.B. (1989). Differential item functioning: Implications for test translations. Journal of Applied Psychology, 74(6), 912-921.

Geisinger, K.F. (1992b). The metamorphosis of test validation. EducationalPsychologist , 27, 197-222.

Geisinger, K.F. (1994). Cross-cultural normative assessment: Translation and adaptation issues influencing the normative interpretation of assessment instruments. Psychological Assessment, 6(4), 304-312.

Hambleton, R.K. (2001). The next generation of the ITC test translation and adaptation guidelines. European Journal of Psychological Assessment, 17(3), 164-172.

Holland, P.W., & Thayer, D.T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H.I. Braun (Eds.), Test Validity (pp.129-145). Hillsdale, NJ: Erlbaum.

Marsh, H. W., & Byrne, B.M. (1993). Confirmatory factor analysis of multigroup-multimethod self-concept data: Between-group and within-group invariance constraints. Multivariate Behavioral Research, 28, 313-349.

Myers, M.B., Calantone, R.J., Page Jr. T.J., and Taylor, C.R. (2000). An application of multiple-group causal models in assessing cross-cultural measurement equivalence. Journal of International Marketing, 8(4), 108-121.

Schmitt, N., Coyle, B.W., & Saari, B.B. (1977). A review and critique of analyses of multitrait-multimethod matrices. Multivariate Behavioral Research, 12, 447-478.

Sireci, S.G., & Berberoglu, G. (2000). Using bilingual respondents to evaluate translated-adapted items. Applied Measurement In Education, 13(3), 229-248.

Van de Vijver, F. & Hambleton, R.K. (1996). Translating tests: Some practical guidelines. European Psychologist, 1(2), 89-99.

Watkins, D. (1989). The role of confirmatory factor analysis in cross-cultural research. International Journal of Psychology, 24, 685-701.