Content uploaded by Rumana Rashid
Author content
All content in this area was uploaded by Rumana Rashid on Jun 27, 2021
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=hapc20
Download by: [Melba Gomes] Date: 27 July 2016, At: 11:35
Applied Neuropsychology: Child
ISSN: 2162-2965 (Print) 2162-2973 (Online) Journal homepage: http://www.tandfonline.com/loi/hapc20
Can we measure cognitive constructs consistently
within and across cultures? Evidence from a test
battery in Bangladesh, Ghana, and Tanzania
Penny Holding, Adote Anum, Fons J. R. van de Vijver, Maclean Vokhiwa,
Nancy Bugase, Toffajjal Hossen, Charles Makasi, Frank Baiden, Omari
Kimbute, Oscar Bangre, Rafiqul Hasan, Khadija Nanga, Ransford Paul Selasi
Sefenu, Nasmin A-Hayat, Naila Khan, Abraham Oduro, Rumana Rashid,
Rasheda Samad, Jan Singlovic, Abul Faiz & Melba Gomes
To cite this article: Penny Holding, Adote Anum, Fons J. R. van de Vijver, Maclean Vokhiwa,
Nancy Bugase, Toffajjal Hossen, Charles Makasi, Frank Baiden, Omari Kimbute, Oscar Bangre,
Rafiqul Hasan, Khadija Nanga, Ransford Paul Selasi Sefenu, Nasmin A-Hayat, Naila Khan,
Abraham Oduro, Rumana Rashid, Rasheda Samad, Jan Singlovic, Abul Faiz & Melba Gomes
(2016): Can we measure cognitive constructs consistently within and across cultures? Evidence
from a test battery in Bangladesh, Ghana, and Tanzania, Applied Neuropsychology: Child, DOI:
10.1080/21622965.2016.1206823
To link to this article: http://dx.doi.org/10.1080/21622965.2016.1206823
Published online: 27 Jul 2016.
Submit your article to this journal
View related articles
View Crossmark data
APPLIED NEUROPSYCHOLOGY: CHILD
http://dx.doi.org/10.1080/21622965.2016.1206823
Can we measure cognitive constructs consistently within and across cultures?
Evidence from a test battery in Bangladesh, Ghana, and Tanzania
Penny Holdinga, Adote Anumb, Fons J. R. van de Vijverc, Maclean Vokhiwad, Nancy Bugasee, Toffajjal Hossenf,
Charles Makasig, Frank Baidenh, Omari Kimbutei, Oscar Bangreh, Rafiqul Hasanf, Khadija Nangaj, Ransford Paul
Selasi Sefenuh, Nasmin A-Hayatk, Naila Khanl, Abraham Odurom, Rumana Rashidf, Rasheda Samadn, Jan Singlovico,
Abul Faizp,q, and Melba Gomesr
aUnited Nations Children’s Fund (UNICEF), KMTECH, Nairobi, Kenya; bDepartment of Psychology, University of Ghana, Legon, Ghana;
cCross-Cultural Psychology, Tilburg University, The Netherlands; dBlantyre Malaria Project, College of Medicine, University of Malawi, Malawi;
eNavrongo Health Research Centre, Ghana Health Service, Navrongo, Ghana; fResearch Physician, Malaria Research Group, Chittagong,
Bangladesh; gNational Institute of Medical Research, Muhimbili Medical Research Centre, Dar-es-Salaam, Tanzania; hEpidemiology, Ensign
College of Public Health, Kpong, Eastern Region, Ghana; iNational Institute of Medical Research, Muhimbili Medical Research Centre, Dar-es-
Salaam, Tanzania; jDepartment of Epidemiology and Disease Control, School of Public Health, University of Ghana, Accra, Ghana; kChild
Development Centre, Chittagong Maa- Shishoo O General Hospital, Chittagong, Bangladesh; lDepartment of Pediatric Neurosciences,
Bangladesh Institute of Child Health, Dhaka Shishu (Children’s) Hospital, Dhaka, Bangladesh; mEpidemiology & Community Medicine,
Bangladesh Institute of Tropical and Infectious Disease (BITID), Chittagong, Bangladesh; nAssociate Professor of Pediatrics, Chittagong Medical
College, Chittagong, Bangladesh; oWorld Health Organization, Geneva, Switzerland; pMahidol Oxford Research Unit, Faculty of Tropical
Medicine, Mahidol University, Bangkok, Thailand; qDev Care Foundation, Dhaka, Bangladesh; rThe UNICEF/UNDP/World Bank/WHO Special
Programme for Research and Training in Tropical Diseases, Geneva, Switzerland
ABSTRACT
We developed a test battery for use among children in Bangladesh, Ghana, and Tanzania, assessing
general intelligence, executive functioning, and school achievement. The instruments were drawn
from previously published materials and tests. The instruments were adapted and translated in a
systematic way to meet the needs of the three assessment contexts. The instruments were
administered by a total of 43 trained assessors to 786 children in Bangladesh, Ghana, and Tanzania
with a mean age of about 13 years (range: 7–18 years). The battery provides a psychometrically
solid basis for evaluating intervention studies in multiple settings. Within-group variation was
adequate in each group. The expected positive correlations between test performance and age
were found and reliability indices yielded adequate values. A confirmatory factor analysis (not
including the literacy and numeracy tests) showed a good fit for a model, merging the intelligence
and executive tests in a single factor labeled general intelligence. Measurement weights invariance
was found, supporting conceptual equivalence across the three country groups, but not
supporting full score comparability across the three countries.
KEYWORDS
Bangladesh; children;
executive functioning;
Ghana; intelligence;
Tanzania; test adaptations
The challenge in evaluating cognitive development
across different contexts is to ensure comparability of
skills and functions across settings, while also maintain-
ing the ability to discriminate individual ability levels
within populations. The motivation of this project was
to have a common battery to be applied across the mul-
tiple sites of a single study. There is much cross-cultural
evidence that the structure of intelligence is invariant
across cultures but that instruments may need smaller
or larger adaptations to be applicable across cultural
contexts (e.g., Berry, Poortinga, Breugelmans, Chasiotis,
& Sam, 2011). As we are ultimately interested in
assessing cognitive consequences of malaria in different
cultural contexts, we did not develop country-specific
instruments, but used a single culture-informed battery.
Adaptations are frequently made to the content and
administration of instruments, largely developed in
western settings, to reflect the experiences of the popu-
lation being assessed and to retain within-population
variance. There is a growing body of literature that
provides evidence that carefully adapted batteries of
tests provide reliable and valid measures of cognition
in multiple settings (Holding et al., 2004; Kitsao-
Wekulo et al., 2012; Van de Vijver, 2002). The current
study examined the suitability of a cognitive test battery
that was applied in three very different cultural settings
(Bangladesh, Ghana, and Tanzania).
The extent to which adaptations maintain the orig-
inal intention of the test, while increasing the ability
of the test to accurately discriminate ability levels within
CONTACT Melba Gomes gomesm@who.int The UNICEF/UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Diseases,
World Health Organization, 1211 Avenue Appia, Geneva 27, Switzerland.
© 2016 Taylor & Francis
Downloaded by [Melba Gomes] at 11:35 27 July 2016
the new cultural setting, remains controversial. Despite
extensive discussion on the universality of cognitive
constructs (Berry et al., 2011; Van de Vijver, 1997;
Van de Vijver & Leung, 1997), there are very few studies
that address the extent to which different tests or
batteries of tests are able to measure the same cognitive
constructs in a comparable manner across different
economic, cultural, and linguistic groups (e.g.,
Helms-Lorenz, Van de Vijver, & Poortinga, 2003).
Hui and Triandis (1985) argue that a fundamental chal-
lenge in creating equivalence is that the instrument or
test items should be similar or the same. In other words,
each item on the test should mean the same in both cul-
tures. This limits not only cross-cultural comparability
but also cross-cultural adaptability of tests because of
cultural and linguistic differences.
The need to construct a uniform test battery for
children in Bangladesh, Ghana, and Tanzania was
prompted by the plan to perform a detailed investigation
of the long-term impact of a childhood episode of severe,
predominantly malarial, infection that required hospital
admission with symptoms ranging from stupor to
convulsions and deep coma. Patients with these symp-
toms in early childhood had been part of a randomized,
placebo-controlled treatment trial (RCT) that prevented
death and serious neurological sequelae (Gomes et al.,
2009). Whether treatment offered survivors any lasting
protection against potential harm from parasite conges-
tion in the cerebral microvasculature or whether severe
infections nevertheless left children with long term
cognitive and clinical impairment was to be evaluated.
Before testing the severe malaria cohort, we wished to
explore the psychometric properties of the test battery
itself in apparently normal children from the same areas.
The intention was to examine the robustness of the mea-
sures and the constructs applied across the three settings.
The study was part of a larger series of studies supported
by the Saving Brains Programme, Grand Challenges
Canada (SB)
1
that focused on the re-enrolment of children
previously the target of an intervention designed to protect
against potential risks to brain development. The SB pro-
gramme required all participating studies to investigate
the impact of the diverse interventions upon a set of Core
Metrics, a term used in the programme to denote key con-
cepts of development common to all projects. These were
labeled as: General Intelligence, Executive Functions, and
the development of Literacy Skills. We selected tests to
measure each of these three core metrics based on both
previous evidence of the effects of severe malaria (Holding
& Boivin, 2013; Holding et al., 2004) and culture-relevant
test adaptations (Kitsao-Wekulo et al., 2012).
The three populations in which our original study
took place share some features that led to their initial
selection, with all three being low resource, largely rural
communities. They differed, however, in other character-
istics that made it necessary to alter the content of the test
battery to make the material understood by the children
to be evaluated. The adaptation process thus addressed
differences in language and culture, while trying to max-
imize uniformity of the tools. Test performance in the
three populations was examined to assess the extent to
which the tests could discriminate between children at
the three study sites and to test to what extent a single
battery of assessments could provide comparable and
accurate information across multiple cultural settings.
The age groups of the severe malaria cohort dictated
the age-groups of 7 to 18 years for this test-adaptation
study, in the same study areas of the three countries.
Methods
Study population and sample characteristics
Study sites were located in Bangladesh, Ghana, and
Tanzania. The sites selected had in common: a rural
ambience, relative poverty, constraints in access to
health care, and restricted levels of literacy amongst
the adult population. These factors, and the risk of
malaria infection, influenced the initial selection of the
sites themselves. For this study children living in each
of the main study locations, who were independent
from the main re-enrolment cohorts were identified.
Bangladesh
The RCT was originally carried out in four sites of
Chittagong District in south-eastern Bangladesh. The
majority of the population in these locations speaks the
local language, Chittagonian. English is taught in schools
as a second language. Adult literacy across the study sites
varies from 28 to 50%, with male adult literacy at 64.8%
and female adult literacy at 58%in 2015 (UNESCO, 2015).
Ghana
In Ghana the RCT was conducted in Kassena-Nankana
East and West Districts in the Upper-East region of
Ghana. Two main languages are spoken, Kasim and
Nankam, although English, the official language, is the
main language of instruction in schools, and also spo-
ken by many. Despite school attendance being above
90% (slightly favoring males) adult literacy rates are
low, with 65.5% of the population above 15 years
having received no formal education and more females
(74.6%) being uneducated than males (54.4%)
(Adetunde & Akensina, 2008).
1
http://www.grandchallenges.ca/saving-brains/.
2 P. HOLDING ET AL.
Downloaded by [Melba Gomes] at 11:35 27 July 2016
Tanzania
Test piloting in Tanzania was carried out in Kilosa and
Handeni Districts, where Kiswahili is widely spoken,
although there are other indigenous languages used.
National adult literacy rates in Tanzania are estimated
at 75.9% for females and 84.8% for males, although
they are likely to be lower in rural areas (United Nations
Educational, Scientific and Cultural Organization
(UNESCO), Institute for Statistics, 2013). More detailed
information on the population characteristics for each
site is presented in Table 1.
Measures
Tests were selected to measure General intelligence,
Executive Function (Working Memory, Selective/Sus-
tained Attention and Inhibition & Attentional Shift),
and Achievement (Literacy and Numeracy). The instru-
ments selected are described in Table 2. The preparation
of these tests followed a systematic adaptation pro-
cedure for neurocognitive and psychological measures
that has been previously described (Holding, Abubakar,
& Kitsao Wekulo, 2010; Holding & Kitsao-Wekulo,
2009).
The first procedural step was to clearly define the
concepts and constructs to be measured. As highlighted
in the introduction, we were guided by the requirements
of the Core Metrics framework, which identified three
general concepts (General Intelligence, Executive Func-
tion, and Development of Literacy Skills). A review of
the evidence on functional areas that are sensitive to
the effects of malaria infection was used to support
the identification of component constructs of these
general concepts (Holding & Boivin, 2013). General
Intelligence was defined by the constructs identified in
the Lurian Model of the Kaufman Assessment Battery
for Children II (Learning, Sequential and Simultaneous
Processing, and Planning). To further broaden the
Table 1. Background characteristics of study locations.
Location
Bangladesh Ghana Tanzania
South East North-Eastern Northern
Population 990,657 153,293 438,175
Language Bengali, (dialects Chittagonian,
Chak, “Rakhain,” “Marma,”
“Tanchanga,” “Tripura”)
Kasim and Nankam Kiswahili (indigenous
languages Kaguru, Sagara;
Vidunda and Nyamsanga)
Economic activity Agriculture and day labour Subsistence farming and small
scale business
Agriculture 80%, commerce
and tourism
Economic levels Varying between 17.6% and
greater than 55% below the
poverty line across the district
30.6%in North East and 42.5%
in the Kassena- Nankana district
Rural areas have 23.1%below
food poverty line and 40.8%
below basic needs poverty line
Nutritional status children
<5 years
Severe stunting 15.3%, severe
wasting 4%, severely
underweight 10.4%
Stunting 25.8% Stunting 42.1% are stunted,
Wasting: 4.8%are wasted
Access to health facilities 4 to 10 km from household 42.3%take less than 30 min to
reach the nearest health facility
Time to the nearest health
facility approximately 60 mins
Family structure Multigenerational patriarchal
extended family with average
family sizes of 4.36
Mixed generation - extended
family system average family
sizes of 7.2
Monogamy and polygamy
practiced. Average household
size is 4.4
Children’s daily activities Boys: cultivation, farming,
cutting wood Girls: household
chores, cooking, washing,
fetching water, sewing clothes
and caring for poultry or cattle
Boys: gardening, farming, gather
firewood and seeds, sow seeds,
tend livestock Girls: household
chores, cooking, washing,
fetching water
Boys: cultivation, grazing cattle
Girls: collect firewood, grazing
calves and goats, fetching
water, cooking and assisting in
care of younger siblings
Official age school entry 6 years 6 years 6 years
Table 2. Description of tests validated.
Test’s name Description
Atlantis This test requires children to associate a series of nonwords with pictures for fishes, plants, and shells.
Hand movements The child is expected to repeat a series of hand movements performed by the assessor. The number of movements
increases as the trial increases.
Footsteps The child has to select the shortest route of footsteps for a small doll to fetch its ball.
Story completion The child is shown a selection of pictures that, when placed in the correct order, tell a story.
Kilifi naming test This is a test of expressive vocabulary in which the child labels each of a selection of pictures.
Rey-Osterrieth complex figure The child has to reproduce a complex figure drawing, first by copying, and after 20 min, from memory.
NOGO This is similar to the classic NOGO paradigm in which participants have to learn to withhold a response when a
previously associated stimulus is presented. Hand movements were used as the stimuli.
Shift Children are primed to switch responses to a series of hand movements when a specific trigger movement is
displayed.
People search This test requires a child to scan a page of stick figures, and select a specific figure from amongst different figures.
Literacy test Items sample letter shape and sound recognition, as well as reading comprehension and writing fluency.
Numeracy test Items sample number recognition and arithmetic and problem solving skills.
APPLIED NEUROPSYCHOLOGY: CHILD 3
Downloaded by [Melba Gomes] at 11:35 27 July 2016
range of skills included in the battery to include those
previously explored in the investigation of severe
malaria we added the construct of Expressive Language.
To assess Executive Function we selected tests that
measured the constructs prescribed in the Saving Brains
Programme Core Metrics, that is: Working Memory
and Attention (Sustained Attention, Shift, and Inhibi-
tory Control). Achievement, the application of skills in
school based learning, was measured by the develop-
ment of literacy and numeracy skills.
The next step was to identify a potential pool of
measures of the concepts and constructs, and to review
their content for potential challenges to engagement.
Existing literature, highlighting theoretical frameworks
of both published and open-source measures previously
used in similar contexts, where possible, constituted our
test pool of measures for each core concept.
Test preparation involved translation of the instruc-
tions, and piloting of the stimuli, visual and verbal, as
well as of the administration procedures. The key aspects
of test equivalence that the adaptation process strove to
maintain were: (a) item equivalence, such that each item
should contribute in a similar manner to the overall test
score; (b) semantic equivalence, such that each item
should mean the same thing in each context; and (c) pro-
cedural equivalence, such that each test or item should be
administered in an equivalent manner. The first two
focus on content, and the last on administration.
A panel of experts that included key personnel from
the three different locations consolidated culturally
appropriate conceptual vocabularies to guide the prep-
aration of materials for each site. The panel focused
on the content, sensitivity, and face value of the tools
against the cultural background of children in the
locality in which the work was to be implemented.
Guidelines for conceptual translations and item selec-
tion were discussed at a workshop attended by represen-
tatives of each study site. A rigorous review of the
individual items and level of difficulty was subsequently
established in a pre-pilot in children of the target age
group at each location.
Instructions for the tests were produced in each
of the respective local languages through a multistep
process. The first step—translation into the local
language—was checked through multiple iterations of
a back-translation process, to evaluate the semantic
and conceptual equivalence of the translations to the
original instruction (Werner & Campbell, 1970). Three
independent back translations were made into English
to refine the translation process. Further refinement of
the instructions and general assessment procedures fol-
lowed close observation of children’s responses during
the pre-piloting process.
Visual stimuli, images or pictures, were screened for
cultural relevance. For example, a party scene with cakes
and balloons that would not be familiar to a child in any
of the study locations was replaced by a culturally equiva-
lent image of a celebration that was easily recognizable to
children in all three countries. If a replaced item was
not appropriate for all sites, alternatives for a site were
selected. Table 3 summarizes the modifications made.
Training assessors
With the exception of Bangladesh, where assessors
were chosen in response to a classified advertisement
Table 3. Test source and adaption process.
Core concept Construct Test name Source
Changes
Visual Stimuli Verbal stimuli Procedures
General
intelligence
Learning Atlantis KABC II No changes Changing
pronunciation
Extended instructions,
changed start stop rules
Sequential
processing
Hand movements No changes Nonverbal No changes
Simultaneous
processing
Footsteps Local adaptation
sub-tests KABC II
100%change in
pictures and layout
Nonverbal Extended instructions,
changed start stop rules
Planning Story completion 90% of images
replaced
Nonverbal Extended instructions,
changed start stop rules
Verbal intelligence Kilifi naming test
(KNT)
Wekulo and Holding
personal
communication
10%of images
replaced
100%translated Item order
Executive
function
Working memory Rey Osterrieth
complex figure
Rey (1941) and
Osterrieth (1944)
No change Nonverbal No changes
Inhibitory control NOGO Original No change Nonverbal
Attentional shift Shift Original No change Nonverbal
Selective/sustained
attention
People search Connolly and
Pharoah (1993) and
Holding et al. (2004)
Layout and length Nonverbal No changes
Achievement Literacy and
Numeracy tests
Local adaptations of
UWEZO TZ (2012),
UNESCO Bangladesh
Survey (2005), WRAT
Script and Language
to match local con
4 P. HOLDING ET AL.
Downloaded by [Melba Gomes] at 11:35 27 July 2016
requiring candidates with a psychological or child devel-
opment background, most assessors elsewhere had no
prior experience in testing procedures. Some had a
background in psychology, but most were drawn from
an institutional advertisement of the research post
available at the different study sites. All had post high
school education (degrees or diplomas). They were
primarily selected for their fluency in the local lan-
guages and prior health research experience, rather than
experience in assessment. Table 4 summarizes assessor
characteristics.
In all three countries, the assessors’ training was con-
ducted and supervised by qualified psychologists who
accredited the assessors only after demonstration of an
acceptable skill level in test administration, based on
set criteria. Training involved a combination of theory
and practice sessions of test administration spanning
four weeks. In the first week, a workshop covered mod-
ules on the basic elements of child development;
research methods; theoretical models for assessment,
particularly the Luria model selected for this study;
and managing individual differences in a standardized
research setting. In the second week, the assessors were
introduced to the practice of testing, starting with test-
ing peers and then young children usually selected from
schools. Detailed feedback was provided evaluating the
performance of the assessors against a standard guide-
line on assessment techniques. This supervision and
feedback was provided In Tanzania through the use of
videotaped sessions; in Bangladesh two psychology
supervisors observed the test administrations; in Ghana
the supervising psychologist provided oversight.
Data collection process
Different samples of children not involved in the main
re-enrolment study were recruited through schools and
the general community to evaluate test-retest and inter-
rater reliabilities. Those with obvious medical or neuro-
logical signs of disability at the time of assessment were
excluded. The retest phase was completed within two to
three weeks after initial testing at all sites. In Ghana, all
children were selected only from schools, in Bangladesh
they were selected from the schools and community,
while in Tanzania the children for test-retests and
inter-rater reliabilities were selected only from the
community. Inclusion criteria were children from the
appropriate age bracket (>6–18 years of age). After
the test-retest process was satisfactorily completed, the
tests were used in an expanded sample of healthy chil-
dren drawn from schools, to represent what might be
considered those developing “optimally” in their context.
Ethical Clearance for this study was provided by
University of Oxford (OXTREC), Ethical Review
Committee of Chittagong Medical College, Bangladesh,
the Institutional Review Board of Navrongo Health
Research Centre, Ghana, and the Ethics Review Com-
mittee of the National Institute of Medical Research,
Tanzania. Human data included in this manuscript
were obtained in compliance with the WMA Helsinki
Declaration. There was no reward for participation.
All children who participated were served lunch to
ensure children were fed during testing.
Administration procedures
Each child was tested following the same protocol in all
three sites. Each assessor assessed two children each day
for four days per week. The tests were presented in the
same order: Atlantis, Rey-Osterrieth, Hand Movements,
Footsteps, NOGO, Shift, People Search, Story Com-
pletion, and KNT. The duration of assessment for each
child was about two and half hours. Breaks were taken
in administration when necessary. While essentially a
common procedure was followed, the specific protocol
followed in each country was dictated by local require-
ments. These are outlined in the following sections.
Bangladesh
Assessment was conducted in the residence of the
children. A prior appointment was made before the
assessment. Informed consent was taken directly from
the subject if he/she was 18 years of age; if the child
was between 8–9 years age then consent was taken from
parent/guardians; and if aged between 10–17 years con-
sent was taken from guardian and assent taken from the
child by the assessors after describing the details of the
assessment process.
Ghana
Two schools were selected from the Kassena-Nankana
Districts where the two predominant ethnic groups
reside. One was an elementary school (Year 1 to 6),
the other a Junior High School (Year 1 to 3). The assess-
ments were carried out in empty classrooms using
furniture provided by the research team, during school
hours (8 am and 3 pm). Permission was first sought
from the Ghana Education Service District office which
Table 4. Summary of assessor characteristics for each country.
Country
No assessors
recruited
No assessors
requiring extended
training/failure to
pass training
Assessor
gender:
male/female
Assessor years
of education
completed
mean (SD)
Bangladesh 12 0/0 7/5 14.25 (1.86)
Ghana 12 2/2 5/7 16 (0)
Tanzania 19 5/3 7/12 14.74 (1.52)
APPLIED NEUROPSYCHOLOGY: CHILD 5
Downloaded by [Melba Gomes] at 11:35 27 July 2016
sent notices to the schools that, in turn, informed the
parents. Informed consent was taken from available par-
ents. If a child was selected, but the parent was not avail-
able, consent was sought through the Chairperson of the
Parent-Teacher Association in the village. Assent was
also sought from children.
Tanzania
Assessments were carried out at the convenience of the
family where a suitable environment was available in
homes, close to home or at school. Assessment sessions
in schools were carried out either when schools were
closed, or in classrooms separate from the main school
compound. Informed consent was taken from a parent/
guardian a day before the planned assessments were car-
ried out. Detailed information was given to the parent/
guardian about the study and the aims of the project.
After checking that the parent/guardian understood
what was being explained, and was in agreement, signed
consent was taken. Informed assent was taken from the
child just before assessments.
Analytic plan
STATA was used for data management and transforma-
tions of variables. The psychometric analyses were car-
ried out using SPSS version 19. Confirmatory factor
analysis was conducted using AMOS version 22.
As the main focus of the analysis was to evaluate the
consistency with which individual tests behaved
between different linguistic and cultural settings, perfor-
mance on each test from the battery was evaluated
separately by country. Analyses investigated:
a) Within Population Variance, measured through an
examination of the distribution of test scores:
b) Reliability, measured through the evaluation of:
i. Consistency across items within a test with mul-
tiple items (internal consistency) using Guttman’s
Split half for tests where item variability is
primarily by degree of difficulty, and Cronbach’s
alpha where items are intended to also sample
related, but not identical skills.
ii. Consistency across time (test-retest reliability)
using the Intraclass Correlation, to take into
account the use of multiple assessors.
iii. Consistency across assessors (inter-rater
reliability) using Kappa Coefficients.
c) Responsiveness through a series of univariate analy-
sis exploring the relationship between variability in
test performance and the background characteristics
of the children, age and gender differences, as well as
school exposure.
d) Confirmatory Factor Analyses (CFA) was underta-
ken to investigate measurement invariance, that is
the assumption that the underlying association
between the nine sub-tests that constituted the neu-
rocognitive battery were comparable across coun-
tries, and to what level that comparability is
evident (Van de Vijver & Leung, 1997). To carry
out this analysis the scores were standardized to
obtain a similar comparative unit for all measures.
In the first step we examined configural invariance,
testing whether there was a single factor, general
intelligence, in each country with significant loadings
on each subtest, accounting for the correlations
between tests that we found in each country.
The second step, examining measurement weights
(metric invariance), investigated whether the factor
loadings were identical across countries, indicating that
each test made the same contribution to the general fac-
tor in each country.
The third step tested measurement intercept invar-
iance (also called differential item functioning). This
analysis tested whether the regression line that links
the latent factor, intelligence, to subtest scores had the
same intercept (made the same initial contribution) to
each group. This is commonly used as test of scalar
invariance, required to support the integration of scores
across countries in an analysis of variance.
The next step explored invariance of the structural
covariance. This examined whether the error variance
of the latent factor is identical across countries. Finally,
the analysis of measurement residuals tested whether the
error components of the observed variables are
identical.
The first three analyses are usually considered the
most important as they indicate whether there is a joint
latent factor (configural invariance), whether the latent
factor is measured the same way in each country
(measurement weights invariance), and whether scores
can be directly compared across countries (measure-
ment intercepts). We used various criteria to evaluate
the goodness of fit of our tested models: v
2
values
should be nonsignificant to support a good fit, Δv
2
values should be nonsignificant, values of Comparative
Fit Index (CFI) should be .90 or above, decreases in
CFI between subsequent analyses should not be larger
than .01, Tucker-Lewis Index (TLI) values should be
above .90, Standardised Root Mean Residual (SRMR)
values should be .06 or less, and Root Mean Square
Error Approximation (RMSEA) values should be less
than .06. As the v
2
tests are known to be sensitive for
sample size, we did not rely on their outcomes in a rigid
way, but examined the global constellation of fit indices.
6 P. HOLDING ET AL.
Downloaded by [Melba Gomes] at 11:35 27 July 2016
Results
In total, 786 children, with ages ranging between 6 years
to 18 years, were tested across the three countries. A
summary of sample size and age characteristics is
presented in Table 5. The sample from Ghana was
smaller, and the only one where all children were in
school. Table 6 describes the rounds of data collection,
highlighting the numbers of children available for the
different levels of analysis.
MCAR tests showed nonsignificant results in Ghana
and Tanzania and a significant value in Bangladesh
(v
2
(31) ¼70.65, p <.001). On the basis of these results
it was decided to replace missing values using an EM
algorithm.
Within-population variance. Table 7 shows how
scores for all tests across all three sites ranged from the
lowest towards the maximum possible. In only one test,
and only in one site, did the proportion of no responses,
children who failed to engage with the task at all, and
could make no attempt) reach a significant number of
children (NOGO in Bangladesh at 34%).
While mean scores were similar across the three sites,
the data from Ghana was less consistent with the other
two sites. There were significant deviations in the means
for Ghana on Atlantis (mean 59 vs. for 71 and 76 in
Tanzania and Bangladesh respectively), Story Com-
pletion (5 vs. 9/8), KNT (70 vs. 79/76), Rey Osterrieth
Copy (16 vs. 11/12), and Recall (25 vs. 16/16). The
Ghana sample was also characterized by smaller
standard deviations.
Normality of distribution. Normality of scores
reported in Table 7 was based on a skewness value
between 2 and þ2. The majority of distributions
were evaluated as Normal. Only NOGO results in
Bangladesh, and People Search in Tanzania deviated
from normal. The former demonstrated a negative
skewness (mass of distribution concentrated on the
right), and the latter positive skewness (mass of distri-
bution concentrated on the left).
Consistency over time (test–retest reliability). A
significant correlation was achieved between scores at
the two time points (see Table 8). The only test that
did not achieve either a moderate level of agreement
(.5–.6) or strong level agreement (.7–.8) across time
points was NOGO. Although not identical, the pattern
was similar across two of the sites. The exception was
Tanzania, where the sample size was too small to draw
clear conclusions.
Consistency across test items. Again, with the
notable exception of Atlantis, test content indicated
good levels of internal consistency, of .7 and above.
Consistency across assessors (interrater). Results
were very variable, ranging from no better than chance
(nearing 0), to near perfect/perfect agreement (>.9).
The tests that drew the worst results were the KNT
and Rey (copy and recall), in Ghana and Tanzania.
Responsiveness. Age was re-categorized into three
groups: 6 to 10 years, 11 to 14 years, and 15 to 18 years.
Age (Table 9) was associated with score variance, the
majority of effect sizes were large (>.138 shaded in dark
grey in Table 9) or medium (>.059 shaded in light
grey). In contrast, gender had a limited association with
score variance. Only Footsteps and KNT showed signifi-
cant associations, with medium effect sizes (Cohen,
1988).
Atlantis. Post hoc analyses (Scheffé test) identified
differences in patterns across the sites. In Bangladesh,
scores rose from the younger to the middle age group,
and then were found to drop again. In Ghana, changes
were linear, increasing with age, while in Tanzania
increase in age was curvilinear, leveling off from the
middle to the oldest age group.
Hand movements. Effect sizes for age varied
between sites, they were only significant in Tanzania,
Table 5. Characteristics of the sample of children in each country.
Country N Mean age (SD) Gender
Age grouping
N (%) 7–10 years 11–14 years 15–18 years
N (%) N (%) N (%) In school
Tanzania 323 12.26 (2.92) Girls 49 (29.9) 66 (40.2) 49 (29.9) 265 (82)
Boys 61 (38.4) 57 (35.8) 41 (25.8)
Ghana 166 12.42 (2.81) Girls 27 (35.1) 31 (40.3) 19 (24.7) 166 (100)
Boys 27 (30.3) 39 (43.8) 23 (25.8)
Bangladesh 297 13.26 (3.08) Girls 34 (20.4) 66 (39.5) 67 (40.1) 217 (73)
Boys 35 (26.9) 48 (36.9) 47 (36.2)
Total 786 12.67 (2.99) Girls 110 (27.2) 160 (39.5) 135 (33.3) 648 (83)
Boys 123 (32.7) 142 (37.8) 111 (29.5)
Table 6. Sample sizes in test-retest and interrater analyses.
Country
Test-retest Inter-rater
Initial
test pool
Total pool used in
construct validity
a
N N N N
Bangladesh 80 58 80 297
Ghana 131 64 166 166
Tanzania 13 54 64 323
Total 224 176 310 786
a
(with additional “optimal” children).
APPLIED NEUROPSYCHOLOGY: CHILD 7
Downloaded by [Melba Gomes] at 11:35 27 July 2016
where multiple comparisons using Scheffé test identified
that this was accounted for by the difference between
the youngest and the oldest age groups.
Footsteps. Scores on this test were associated with
both age and gender differences, with the effect sizes lar-
ger for age than gender. Multiple comparisons sug-
gested a gradual increase in scores across the age
groups, only significant between the 6 to 10 years and
15 to 18 years age groups. A significant gender differ-
ence favored males.
Story completion. While age, and not gender, was
associated with score variance, the pattern of the older
age group differed in the three sites. In Bangladesh
the oldest group was similar to the youngest, in Ghana
the two younger groups were similar. It was only in
Tanzania where the change from youngest to oldest
was linear.
KNT. While scores showed a significant increase
with age, post hoc analyses showed site differences. In
Bangladesh the change was linear, in the other two sites
the change was curvilinear. There was also significant
association with gender for Ghana and Tanzania but
not for Bangladesh. In both cases, the difference in
scores favored males.
Rey Osterrieth. For the first, Copy, stage of the test,
the pattern was similar across sites, with large effect
sizes, that reflected a superior performance among
younger children. This pattern was repeated in the
Recall stage of the task, although, with the exception
of Tanzania, the effect sizes were smaller. Also in
Table 7. Population variance for tests.
Atlantis N Max possible Mean SD Min Max % No response Normality of distribution
a
Bangladesh 294 108 76.24 17.09 18 106 0 Normal
Ghana 167 59.49 15.69 21 93 0 Normal
Tanzania 323 73.09 16.45 16 105 0 Normal
Hand movements
Bangladesh 295 23 8.15 3.09 1 17 0.7 Normal
Ghana 166 8.31 2.80 1 17 0.6 Normal
Tanzania 323 8.61 3.24 2 23 0.0 Normal
Footsteps
Bangladesh 295 48 16.24 9.04 0 40 1.7 Normal
Ghana 166 14.84 7.29 2 36 0.6 Normal
Tanzania 323 21.25 10.99 1 43 0.0 Normal
Story completion
Bangladesh 293 36 8.11 5.46 0 31 1.1 Normal
Ghana 166 5.36 2.28 1 17 1.2 Normal
Tanzania 321 9.16 5.33 0 24 0.6 Normal
KNT
Bangladesh 295 122 76.04 20.84 12 119 0.7 Normal
Ghana 166 70.33 14.66 25 97 1.2 Normal
Tanzania 323 79.04 18.70 14 117 Normal
Rey O copy
Bangladesh 292 36 16.20 3.52 2 32 1.7 Normal
Ghana 160 25.34 8.69 0 36 4.2 Normal
Tanzania 322 15.45 6.11 0 35 0.4 Normal
Rey O recall
Bangladesh 292 36 10.67 4.48 0 28 1.7 Normal
Ghana 157 16.59 8.38 0 34 6.0 Normal
Tanzania 320 11.68 5.44 0 27 0.3 Normal
NOGO
Bangladesh 196 NA 0.88 0.16 0.33 1.00 34.0 Negative
Ghana 164 0.85 0.17 0.45 1.00 1.8 Normal
Tanzania 322 0.89 0.16 0.17 1.00 0.2 Normal
Shift
Bangladesh 295 NA 0.42 0.29 0.00 1.00 0.7 Normal
Ghana 164 0.28 0.24 0.00 0.85 1.8 Normal
Tanzania 322 0.58 0.23 0.03 1.00 0.2 Normal
People search
Bangladesh 292 NA 0.18 0.67 0.04 0.60 1.2 Normal
Ghana 166 0.14 0.05 0.06 0.32 1.2 Normal
Tanzania 322 0.14 0.60 0.03 0.67 0.2 Positive
Literacy
Bangladesh 297 6 3.99 1.70 0 6 0 Normal
Ghana 166 1.85 1.32 0 5 1.2 Normal
Tanzania 323 4.19 1.40 0 6 0 Normal
Numeracy
Bangladesh 297 6 3.73 1.70 0 6 0 Normal
Ghana 166 2.74 1.06 0 6 1.2 Normal
Tanzania 323 3.69 1.47 0 6 0 Normal
a
Normal (approximating normal); Positive (mass of distribution concentrated on the left); Negative (mass of distribution concentrated on the right).
8 P. HOLDING ET AL.
Downloaded by [Melba Gomes] at 11:35 27 July 2016
Tanzania a significant gender effect was found, favoring
males.
NOGO. Post hoc comparisons identified similar age
effects between Ghana and Tanzania, and improved
performance with age, whereas in Bangladesh, younger
children made fewer errors on the test.
Shift. Age had a significant effect on scores for
Bangladesh and Tanzania, showing linear improvement
with age. In Tanzania only the oldest children had
significantly higher scores.
People Search. A moderate effect of age was
observed in all three countries. A post hoc analysis
showed that performance improvement was linear in
Ghana and Tanzania, while in Bangladesh, the differ-
ence between 11 to 14 years and 15 to 18 years was
not significant.
Literacy and numeracy. In Bangladesh, multiple
comparison analyses showed an increase in scores with
age that was curvilinear, and not significant between 11
to 14 years and 15 to 18 years. The results for Ghana
and Tanzania displayed a more linear improvement in
scores with age for both tests. It can be concluded that,
despite considerable variation across sites and tests,
most instruments showed score increments across ages.
We were also interested in the relationship between
the cognitive tests scores and schooling. Unfortunately
school experience was not consistently measured in
the pilot sample across all sites. However, this data
was available only for the sample included in the analy-
ses reported in the next section. We found that edu-
cation was strongly related to the general intelligence
factor; the standardized loadings were .65, .75, and .64
for Bangladesh, Ghana, and Tanzania, respectively. This
association is strong. It should be noted that our design
is cross-sectional and selective drop-out from schooling
could have affected our samples. If drop-out is nega-
tively related to (among many other factors) previous
educational achievement, it is likely that the brighter
Table 9. Test responsiveness to age and gender.
F Values, df (g²)
Test
Effect of age Effect of gender
Bangladesh Ghana Tanzania Bangladesh Ghana Tanzania
Atlantis 3.63*, 283 (.025) 4.06*, 166 (.05) 19.27
**
, 322 (.11) 0.29, 289 (.00) 3.80, 166 (.02) 4.06, .045 (.01)
Hand movement 2.49, 290 (.02) 4.11*, 166 (.05) 11.66
**
, 323 (.07) 0.38, 290 (.00) 0.06, 166 (.00) 0.02, 323 (.00)
Footsteps 37.07
**
, 290 (.21) 14.25
**
, 166 (.015) 61.75
**
, 323 (.28) 25.45
**
, 290 (.08) 11.65, 164 (.07) 18.61
**
, 323 (.06)
Story completion 7.59
**
, 288 (.05) 11.65
**
, 166 (.13) 3 2.91, 288 (.01) 3.26, 166 (.02) 0.01, 321 (.00)
KNT 39.95, 290 (.22) 30.24
**
, 166 (.27) 79.27
**
, 322 (.33) 2.82, 290 (.01) 14.69
**
, 166 (.08) 6.55, 322 (.02)
Rey O copy 16.24
**
, 292 (.10) 13.64
**
, 160 (.15) 45.54
**
, 292 (.22) 0.57, 289 (.00) 1.27, 160 (.01) 2.98, 322 (.01)
Rey O recall 5.65* 292 (.04) 4.21*, 157 (.05) 52.47
**
, 320 (.25) 0.19 (.00) 1.94, 157 (.01) 4.83*, 320 (.02)
NOGO 29.35
**
, 191 (.24) 3.85*, 164 (.05) 27.73
**
, 322 (.15) 0.07, 191 (.00) 1.01, 164 (.01) 1.04, 322 (.00)
Shift 39.52, 290 (.22) 2.69, 164 (.03) 32.66
**
, 322 (.17) 0.03, 290 (.00) 0.09, 164 (.00) 3.27, 322 (.01)
PS efficiency 14.63
**
, 292 (.09) 25.59, 166 (.24) 63.44
**
, 322 (.29) 0.15, 292 (.00) 0.01, 166 (.00) 1.67, 322 (.01)
Literacy 27.01, 292 (.16) 38.05
**
, 166 (.32) 89.12
**
, 323 (.36) 0.65, 292 (.00) 2.51, 166 (.02) 1.20, 323 (.00)
Numeracy 16.51
**
, 292 (.10) 17.99, 166 (.18) 92.87*, 323 (.37) 2.27, 292 (.01) 0.29, 166 (.00) 0.05, 323 (.00)
**significant at .001. *significant at .05.
Table 8. Reliabilities of tests.
Test-re-test (ICC) Internal consistency
a
,
b
Inter-rater (Kappa)
B G T B G T B G T
N 80 131 13 297 166 323 58 64 54
Atlantis
a
.63 .58 .55 1.00 0.41 0.99 .96 .74 .75
Hand movements
a
.66 .76 .25 – – – 1 .67 .79
Footsteps
1
.68 .77 .64 0.93 0.86 0.81 .88 .64 .86
Story completion
a
.76 .71 .82 0.76 0.60 Neg .96 .88 .86
Kilifi naming
b
.74 .82 .74 0.89 .69 .99 .94 .40 .22
Rey O copy .80 .87 .91 – – – 1 .17 .14
Rey O recall .81 .80 .80 – – – 1 .06 .13
NOGO .80 .48 .54 – – – 1 .84 .88
Shift .72 .71 .49 – – – .98 .63 .67
People search .54 .69 .65 – – – 1 .80 .98
Literacy
b
.87 .92 .78 .84 .85 .85 – – –
Numeracy
b
.71 .56 .55 .82 .70 .70 – – –
Notes. ICC ¼Intra-class correlation; B ¼Bangladesh; G ¼Ghana; T ¼Tanzania.
Consistency over time (test-re-test reliability): For the majority of the tests, over all three sites there was a slight increase in the mean score from time 1 to time
2. The exception to this was the NoGo Shift tasks, where there were reductions in the scores over time. A significant correlation was achieved for the majority
of the tests, and the majority a moderate agreement (.5–.6) or strong agreement (.7–.8). The pattern was similar across two of the sites, although not
identical. The exception was Tanzania, where the sample size was too small to draw clear conclusions. Consistency within the test items: Again, with
the notable exception of Atlantis, test content indicated good levels of internal consistency, of .7 and above. Consistency across assessors (inter-rater).
The results were very variable, ranging from no better than chance (nearing 0), to near perfect/perfect agreement (>.9–1). The source of variability
seems to be both test and team, with the team with greater previous experience achieving higher agreement levels.
a
Split half.
b
Alpha.
APPLIED NEUROPSYCHOLOGY: CHILD 9
Downloaded by [Melba Gomes] at 11:35 27 July 2016
students are more likely to remain in education, which
could augment the correlation and performance across
age groups. It can be concluded that our results are con-
sistent with other studies in that education can be
expected to have a major influence on cognitive test
performance (Ceci, 1991; Falch & Sandgren, 2011;
Ritchie, Bates, & Deary, 2015).
Underlying constructs: The validity of the
instruments selected
Correlation coefficients are presented in Table 10).
The majority of the correlations was positive and
ranged from low to moderate indicating a linear
relationship among many variables. The highest
coefficients were found between the Rey Osterrieth
scores (Copy and Recall) in every site. Median cor-
relations did not differ much across sites (Tanzania:
Md ¼.29; Ghana: Md ¼.26; Bangladesh: Md ¼.35).
There were no observable differences in the pattern
of correlations between those tests that were utilized
to measure General Intelligence and Executive
Functions.
Confirmatory factor analysis
We employed confirmatory factor analysis to address
(a) the dimensionality of the test battery and (b) the
invariance (similarity) of this factor structure across
the groups. Initially we tested a two-factor model,
distinguishing between a general intelligence and an
executive functioning factor. However, as mentioned,
the correlations between the intelligence and executive
functioning subtests were virtually indistinguishable.
The two-factor model had a poor fit and required mul-
tiple double loadings of subtests on both latent factors.
Therefore, we opted for a one-factor solution. We found
a reasonable fit for a model in which all subtests loaded
on a single factor, although we had to allow for corre-
lated errors between Atlantis and People Search,
between Footsteps and NOGO, and between NOGO
and Shift (note that the latter are two of the executive
functioning tests). The results of the invariance tests
are presented in Table 11. The v
2
tests were highly
significant in all cases (which is common in multigroup
confirmatory factor analyses as this statistic is sensitive
to sample size) (Fan, Thompson, & Wang, 1999). How-
ever, all other statistics pointed to a fairly good fit of the
configural invariance model (which allows all estimated
parameters to differ across groups), CFI ¼.993 (recom-
mended: above .90), TLI ¼.900 (same recommended
value), SRMR ¼.043 (recommended: smaller than .05),
and RMSEA ¼.04 (recommended: smaller than .06).
As can be seen in Table 11, all more restrictive invar-
iance models showed worse fit values, notably the
measurement residuals model. All factor loadings were
positive and significant, as can be expected. Yet, when
the loadings of the configural and measurement weights
are compared (Table 12), it is clear that the differences
in factor loadings are small and would not give rise
Table 10. Correlations between the sub-tests, by country.
Test Atlantis Rey Osterrieth Hand movements Footsteps NOGO Shift People search Story completion
Bangladesh
Rey Osterrieth (copy) .356**
Hand movements .460** .302**
Footsteps .277** .412** .356**
NOGO .182** .184** .216** .456**
Shift .201** .171** .263** .478** .670**
People search .402** .423** .347** .353** .236** .371**
Story completion .384** .317** .349** .247** .021 .063 .364**
KNT .367** .447** .319** .538** .403** .457** .508** .349**
Ghana
Rey Osterrieth (copy) .288**
Hand movements .349** .242**
Footsteps .271** .195
*
.292**
NOGO .059 .100 .129 .302**
Shift .202** .080 .300** .254** .284**
People search .248** .200** .335** .405** .160
*
.265**
Story completion .233** .123 .217** .207** .059 .262** .274**
Kilifi naming test .336** .322** .276** .399** .188
*
.172
*
.507** .353**
Tanzania
Rey Osterrieth (copy) .360**
Hand movements .292** .256**
Footsteps .263** .291** .296**
NOGO .245** .255** .130 .113
Shift .367** .171
*
.232** .278** .305**
People search .283** .438** .243** .355** .215** .238**
Story completion .351** .325** .372** .318** .270** .394** .372**
Kilifi naming test .497** .355** .190** .446** .231** .333** .438** .459**
*p <.05. **p <.01.
10 P. HOLDING ET AL.
Downloaded by [Melba Gomes] at 11:35 27 July 2016
to different interpretations of subtests or the general
factor across the three countries. We conclude that
the loadings can be treated as identical across the coun-
tries although it should be acknowledged that the stat-
istical evidence does not completely underscore this
conclusion.
Discussion and conclusion
We found that the systematic process of adaptation of
tests to three different cultural and linguistic contexts
led to instruments with adequate psychometric proper-
ties. The underlying variance in the scores on individual
tests supported the benefits of the adaptation process.
Engagement of children (access to and understanding
of the material) was reflected in the high proportion
of children who were able to attempt the tests, and in
the approximation to normality of the underlying
variance.
Most tests showed adequate reliabilities. However,
there is some concern over interrater consistency. The
source of variability between sites seems to have been
related to the team experience, with the Bangladesh
team (with two experienced psychologists in the field
and supervising all assessments) achieving higher agree-
ment levels. This finding led to changes in the system of
supervision for the main data collection process: we
repeated some training elements, and increased the level
of supervision. Subsequent assessments were evaluated
using a standard observation guideline of assessment
practice. Supervisors reported that the assessment team
improved their level of practice against these guidelines.
The changes led to improved inter-rater reliabilities post
the piloting reported here.
The SEM model suggests that the variability in
outcome across sites is more efficiently explained in
reference to what might be termed “g,” a general under-
lying ability denoting general intelligence, in line with
literature on intelligence models. Despite all the differ-
ences in samples and various confounding factors, the
associations between the instruments are similar across
the three groups of children, with Executive Function
and Intelligence tests merging in all three groups.
Although not further documented here, a split in
younger and older age groups did not change the
one-factor solution. This is not in line with the Western
literature in which the distinction between general
intelligence and executive functioning has been noted
(e.g., Friedman et al., 2006). The strong presence of a
general factor may reflect the numerous individual dif-
ferences in background variables, such as socioeco-
nomic status and opportunities to learn, which tend to
coalesce and are often related to drop-out. It could well
be that more homogeneous non-Western samples
would also allow for a sharper distinction between intel-
ligence and executive functioning.
The comparison of factor weights in the configural
invariance and measurement weights solution shows
that there are almost no differences, thus the cognitive
structure of the battery (with “g” in the apex) is very
stable. Scalar invariance is not supported, and thus a
direct comparison of scores cannot be made.
In summary, our data supports the feasibility of
applying a single battery across multiple settings, to
Table 11. Fit indexes of the invariance tests of the one-factor model.
Model v
2
(df) Δv
2
CFI ΔCFI TLI SRMR RMSEA
Configural invariance 162.61
**
(72) .933 .900 .043 .044
Measurement weights 210.73
**
(88) 48.12
**
.910 .023 .889 .069 .047
Measurement intercepts 898.10
**
(106) 687.37
**
.416 .494 .405 .098 .108
Structural covariances 899.09
**
(108) .99 .417 .001 .417 .111 .107
Measurement residuals 1525.01
**
(132) 625.92
**
.000 .417 .160 .102 .129
**significant at .001.
Table 12. Factor loadings (standardized) of the configural in variance and measurement weights solutions.
Test
Configural invariance Measurement weights
a
Bangladesh Ghana Tanzania Bangladesh Ghana Tanzania
Atlantis .52 .47 .60 .58 .46 .53
Hand movement .50 .49 .43 .53 .39 .44
Footsteps .66 .57 .56 .66 .57 .46
Story completion .47 .45 .64 .52 .59 .46
KNT .77 .71 .71 .74 .69 .73
Rey O recall .59 .39 .55 .60 .22 .59
NOGO .43 .24 .38 .26 .14 .48
Shift .53 .38 .49 .44 .37 .49
People search efficiently .67 .67 .60 .64 .63 .67
a
Note that the unstandardized factor loadings in the measurement weights solution are identical and that the differences in the cells are due to the
standardization.
APPLIED NEUROPSYCHOLOGY: CHILD 11
Downloaded by [Melba Gomes] at 11:35 27 July 2016
measure a common cognitive concept, but where the
specific methodology used in one context (language,
materials, administration procedures) required modifi-
cation to suit the new context (Holding et al., 2004).
The process was time-consuming, but establishing
equivalence ensured that the adaptation maintained
acceptable reliability and validity and provided mean-
ingful interpretations of test scores. While we found evi-
dence of conceptual equivalence that supports the
ability to make comparisons across countries, we cannot
directly compare means across country data sets. Other
methods to summarize impact across settings such as
the comparison of effect sizes are required. Additionally
the variability in performance on individual tests across
settings does not allow us to examine performance in
more discrete cognitive constructs.
References
Adetunde, I. A., & Akensina, A. P. (2008). Factors affecting
the standard of female education: A case study of senior
secondary schools in the Kassena-Nankana district, Journal
of Social Sciences, 4, 338–342. doi:10.3844/jssp.2008.
338.342
Berry, J. W., Poortinga, Y. H., Breugelmans, S. M., Chasiotis,
A., & Sam, D. (2011). Cross-cultural psychology: Theory
and applications (3rd ed.). Cambridge, UK: Cambridge
University Press.
Ceci, S. J. (1991). How much does schooling influence general
intelligence and its cognitive components? A reassessment
of the evidence, Developmental Psychology, 27, 703–722.
doi:10.1037/0012–1649.27.5.703
Cohen, J. (1988). Statistical power analysis for the behavioral
sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Falch, T., & Sandgren, M. S. (2011). The effect of education
on cognitive ability, Economic Inquiry, 49, 838–856.
doi:10.1111/j.1465–7295.2010.00312.x
Fan, X., Thompson, B., & Wang, L. (1999). Effects of sample
size, estimation methods, and model specification on
structural equation modeling fit indexes, Structural Equa-
tion Modeling: A Multidisciplinary Journal, 6, 56–83.
doi:10.1080/10705519909540119
Friedman, N. P., Miyake, A., Corley, R. P., Young, S. E.,
DeFries, J. C., & Hewitt, J. K. (2006). Not all executive
functions are related to intelligence, Psychological Science,
17, 172–179. doi:10.1111/j.1467–9280.2006.01681.x
Gomes, M., Faiz, M. A., Gyapong, J., Warsame, M., Agbenyega,
T., Babiker, A., .. . White, N. J., for the Study 13 Research
Group. (2009). Pre-referral rectal artesunate to prevent
death and disability in severe malaria: A placebo-controlled
trial, Lancet, 373, 557–566. doi:10.1016/s0140–6736(08)
61734–1
Helms-Lorenz, M., Van de Vijver, F. J. R., & Poortinga, Y. H.
(2003). Cross-cultural differences in cognitive performance
and Spearman’s hypothesis: g or c?, Intelligence, 31, 9–29.
doi:10.1016/s0160-2896(02)00111-3
Holding, P., Abubakar, A., & Kitsao Wekulo, P. (2010).
Where there are no tests: A systematic approach to
test adaptation. In M. L. Landow (Ed.), Cognitive impair-
ment: Causes, diagnosis and treatments (pp. 189–200).
New York, NY: Nova Science Publishers.
Holding, P., & Boivin, M. J. (2013). The assessment of
neuropsychological outcomes in pediatric severe malaria.
In M. J. Boivin & B. Giordani (Eds.), Neuropsychology
of children in Africa: Perspectives on risk and resilience, spe-
cialty topics in pediatric neuropsychology (pp. 235–275).
New York, NY: Springer.
Holding, P., & Kitsao-Wekulo, P. (2009). Is assessing
participation in daily activities a suitable approach for
measuring the impact of disease on child development
in African children? Journal of Child & Adolescent
Mental Health, 21, 127–138. doi:10.2989/jcamh.2009.21.
2.4.1012
Holding, P. A., Taylor, H. G., Kazungu, S. D., Mkala, T.,
Gona, J., Mwamuye, B., Mbonani, L., & Stevenson, J.
(2004). Assessing cognitive outcomes in a rural African
population: Development of a neuropsychological battery
in Kilifi District, Kenya, Journal of the International
Neuropsychological Society, 10, 246–260. doi:10.1017/
s1355617704102166
Hui, C. H., & Triandis, H. C. (1985). Measurement in cross-
cultural psychology. A review and comparison of strategies,
Journal of Cross-Cultural Psychology, 16, 131–152.
doi:10.1177/0022002185016002001
Kitsao-Wekulo, P., Holding, P., Taylor, H. G., Abubakar, A.,
Kvalsvig, J., & Connolly, K. (2012). Neuropsychological
testing in a rural African school-age population: Evaluating
contributions to variability in test performance, Assessment,
20, 776–784. doi:10.1177/1073191112457408
Osterrieth, P. A. (1944). Filetest de copie d'une figure complex:
Contribution a l'etude de la perception et de la memoire.
[The test of copying a complex figure: A contribution to
the study of perception and memory]. Archives de Psychologie
30, 286–356.
Pharoah, P.O. and Connolly, K.J. (1993) Effects of maternal
iodine supplementation during pregnancy. Arch Dis Child.
Jan. 66(1), 145–147. PMCID: PMC1793189
Rey, A. (1941). L'examen psychologique dans les cas
d'encéphalopathie traumatique. (Les problems.) [The
psychological examination in cases of traumatic ence-
pholopathy. (Problems.)] Archives de Psychologie, 28,
215–285.
Ritchie, S. J., Bates, T. C., & Deary, I. J. (2015). Is education
associated with improvements in general cognitive ability,
or in specific skills? Developmental Psychology, 51,
573–582. doi:10.1037/a0038981
United Nations Educational, Scientific, and Cultural Organi-
zation (UNESCO), Institute for Statistics. (2013). Retrieved
from http://www.uis.unesco.org/DataCentre/Pages/country-
profile.aspx?code=TZA®ioncode=40540
United Nations Educational, Scientific, and Cultural Organi-
zation (UNESCO), Institute for Statistics. (2015). UNESCO
Bangladesh Survey, 2005. Retrieved from http://www.uis.
unesco.org/DataCentre/Pages/country-profile.aspx?code=
TZA®ioncode=40540.
UWEZO Tanzania Tests (2012). Adult and youth literacy, 1990–
2015: Analysis of data for 41 selected countries. Retrieved
from http://www.uis.unesco.org/literacy/Documents/UIS-
literacy-statistics-1990-2015-en.pdf.
12 P. HOLDING ET AL.
Downloaded by [Melba Gomes] at 11:35 27 July 2016
Van de Vijver, F. J. R. (1997). Meta-analysis of cross-cultural
comparisons of cognitive test performance, Journal of
Cross-Cultural Psychology, 28, 678–709. doi:10.1177/
0022022197286003
Van de Vijver, F. J. R. (2002). Inductive reasoning in Zambia,
Turkey, and The Netherlands: Establishing cross-cultural
equivalence, Intelligence, 30, 313–351. doi:10.1016/s0160-
2896(02)00084-3
Van de Vijver, F. J. R., & Leung, K. (1997). Methods and data
analysis for cross-cultural research. Newbury Park, CA:
Sage.
Werner, O., & Campbell, D. T. (1970). Translating, working
through interpreters, and the problem of decentering. In
R. Naroll & R. Cohen (Eds.), A handbook of cultural
anthropology (pp. 398–419). New York, NY: American
Museum of Natural History.
APPLIED NEUROPSYCHOLOGY: CHILD 13
Downloaded by [Melba Gomes] at 11:35 27 July 2016