ArticlePDF Available

Can we measure cognitive constructs consistently within and across cultures? Evidence from a test battery in Bangladesh, Ghana, and Tanzania

July 2016
Applied Neuropsychology Child 7(1):1-13

July 2016
7(1):1-13

DOI:10.1080/21622965.2016.1206823

Authors:

Penny Holding

Identitèa

Fons Van de Vijver

Tilburg University

Maclean Vokhiwa

College of Medicine University of Malawi

Show all 21 authorsHide

We developed a test battery for use among children in Bangladesh, Ghana, and Tanzania, assessing general intelligence, executive functioning, and school achievement. The instruments were drawn from previously published materials and tests. The instruments were adapted and translated in a systematic way to meet the needs of the three assessment contexts. The instruments were administered by a total of 43 trained assessors to 786 children in Bangladesh, Ghana, and Tanzania with a mean age of about 13 years (range: 7-18 years). The battery provides a psychometrically solid basis for evaluating intervention studies in multiple settings. Within-group variation was adequate in each group. The expected positive correlations between test performance and age were found and reliability indices yielded adequate values. A confirmatory factor analysis (not including the literacy and numeracy tests) showed a good fit for a model, merging the intelligence and executive tests in a single factor labeled general intelligence. Measurement weights invariance was found, supporting conceptual equivalence across the three country groups, but not supporting full score comparability across the three countries.

Background characteristics of study locations.

…

Test source and adaption process.

…

Sample sizes in test-retest and interrater analyses.

…

Fit indexes of the invariance tests of the one-factor model.

…

Factor loadings (standardized) of the configural in variance and measurement weights solutions.

…

Figures - uploaded by Rumana Rashid

Content may be subject to copyright.

Content uploaded by Rumana Rashid

Content may be subject to copyright.

Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=hapc20

Download by: [Melba Gomes] Date: 27 July 2016, At: 11:35

Applied Neuropsychology: Child

ISSN: 2162-2965 (Print) 2162-2973 (Online) Journal homepage: http://www.tandfonline.com/loi/hapc20

Can we measure cognitive constructs consistently

within and across cultures? Evidence from a test

battery in Bangladesh, Ghana, and Tanzania

Penny Holding, Adote Anum, Fons J. R. van de Vijver, Maclean Vokhiwa,

Nancy Bugase, Toffajjal Hossen, Charles Makasi, Frank Baiden, Omari

Kimbute, Oscar Bangre, Rafiqul Hasan, Khadija Nanga, Ransford Paul Selasi

Sefenu, Nasmin A-Hayat, Naila Khan, Abraham Oduro, Rumana Rashid,

Rasheda Samad, Jan Singlovic, Abul Faiz & Melba Gomes

To cite this article: Penny Holding, Adote Anum, Fons J. R. van de Vijver, Maclean Vokhiwa,

Nancy Bugase, Toffajjal Hossen, Charles Makasi, Frank Baiden, Omari Kimbute, Oscar Bangre,

Rafiqul Hasan, Khadija Nanga, Ransford Paul Selasi Sefenu, Nasmin A-Hayat, Naila Khan,

Abraham Oduro, Rumana Rashid, Rasheda Samad, Jan Singlovic, Abul Faiz & Melba Gomes

(2016): Can we measure cognitive constructs consistently within and across cultures? Evidence

from a test battery in Bangladesh, Ghana, and Tanzania, Applied Neuropsychology: Child, DOI:

10.1080/21622965.2016.1206823

To link to this article: http://dx.doi.org/10.1080/21622965.2016.1206823

Published online: 27 Jul 2016.

Submit your article to this journal

View related articles

View Crossmark data

APPLIED NEUROPSYCHOLOGY: CHILD

http://dx.doi.org/10.1080/21622965.2016.1206823

Can we measure cognitive constructs consistently within and across cultures?

Evidence from a test battery in Bangladesh, Ghana, and Tanzania

Penny Holdinga, Adote Anumb, Fons J. R. van de Vijverc, Maclean Vokhiwad, Nancy Bugasee, Toffajjal Hossenf,

Charles Makasig, Frank Baidenh, Omari Kimbutei, Oscar Bangreh, Rafiqul Hasanf, Khadija Nangaj, Ransford Paul

Selasi Sefenuh, Nasmin A-Hayatk, Naila Khanl, Abraham Odurom, Rumana Rashidf, Rasheda Samadn, Jan Singlovico,

Abul Faizp,q, and Melba Gomesr

aUnited Nations Children’s Fund (UNICEF), KMTECH, Nairobi, Kenya; bDepartment of Psychology, University of Ghana, Legon, Ghana;

cCross-Cultural Psychology, Tilburg University, The Netherlands; dBlantyre Malaria Project, College of Medicine, University of Malawi, Malawi;

eNavrongo Health Research Centre, Ghana Health Service, Navrongo, Ghana; fResearch Physician, Malaria Research Group, Chittagong,

Bangladesh; gNational Institute of Medical Research, Muhimbili Medical Research Centre, Dar-es-Salaam, Tanzania; hEpidemiology, Ensign

College of Public Health, Kpong, Eastern Region, Ghana; iNational Institute of Medical Research, Muhimbili Medical Research Centre, Dar-es-

Salaam, Tanzania; jDepartment of Epidemiology and Disease Control, School of Public Health, University of Ghana, Accra, Ghana; kChild

Development Centre, Chittagong Maa- Shishoo O General Hospital, Chittagong, Bangladesh; lDepartment of Pediatric Neurosciences,

Bangladesh Institute of Child Health, Dhaka Shishu (Children’s) Hospital, Dhaka, Bangladesh; mEpidemiology & Community Medicine,

Bangladesh Institute of Tropical and Infectious Disease (BITID), Chittagong, Bangladesh; nAssociate Professor of Pediatrics, Chittagong Medical

College, Chittagong, Bangladesh; oWorld Health Organization, Geneva, Switzerland; pMahidol Oxford Research Unit, Faculty of Tropical

Medicine, Mahidol University, Bangkok, Thailand; qDev Care Foundation, Dhaka, Bangladesh; rThe UNICEF/UNDP/World Bank/WHO Special

Programme for Research and Training in Tropical Diseases, Geneva, Switzerland

ABSTRACT

We developed a test battery for use among children in Bangladesh, Ghana, and Tanzania, assessing

general intelligence, executive functioning, and school achievement. The instruments were drawn

from previously published materials and tests. The instruments were adapted and translated in a

systematic way to meet the needs of the three assessment contexts. The instruments were

administered by a total of 43 trained assessors to 786 children in Bangladesh, Ghana, and Tanzania

with a mean age of about 13 years (range: 7–18 years). The battery provides a psychometrically

solid basis for evaluating intervention studies in multiple settings. Within-group variation was

adequate in each group. The expected positive correlations between test performance and age

were found and reliability indices yielded adequate values. A confirmatory factor analysis (not

including the literacy and numeracy tests) showed a good fit for a model, merging the intelligence

and executive tests in a single factor labeled general intelligence. Measurement weights invariance

was found, supporting conceptual equivalence across the three country groups, but not

supporting full score comparability across the three countries.

KEYWORDS

Bangladesh; children;

executive functioning;

Ghana; intelligence;

Tanzania; test adaptations

The challenge in evaluating cognitive development

across different contexts is to ensure comparability of

skills and functions across settings, while also maintain-

ing the ability to discriminate individual ability levels

within populations. The motivation of this project was

to have a common battery to be applied across the mul-

tiple sites of a single study. There is much cross-cultural

evidence that the structure of intelligence is invariant

across cultures but that instruments may need smaller

or larger adaptations to be applicable across cultural

contexts (e.g., Berry, Poortinga, Breugelmans, Chasiotis,

& Sam, 2011). As we are ultimately interested in

assessing cognitive consequences of malaria in different

cultural contexts, we did not develop country-specific

instruments, but used a single culture-informed battery.

Adaptations are frequently made to the content and

administration of instruments, largely developed in

western settings, to reflect the experiences of the popu-

lation being assessed and to retain within-population

variance. There is a growing body of literature that

provides evidence that carefully adapted batteries of

tests provide reliable and valid measures of cognition

in multiple settings (Holding et al., 2004; Kitsao-

Wekulo et al., 2012; Van de Vijver, 2002). The current

study examined the suitability of a cognitive test battery

that was applied in three very different cultural settings

(Bangladesh, Ghana, and Tanzania).

The extent to which adaptations maintain the orig-

inal intention of the test, while increasing the ability

of the test to accurately discriminate ability levels within

CONTACT Melba Gomes gomesm@who.int The UNICEF/UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Diseases,

World Health Organization, 1211 Avenue Appia, Geneva 27, Switzerland.

Downloaded by [Melba Gomes] at 11:35 27 July 2016

the new cultural setting, remains controversial. Despite

extensive discussion on the universality of cognitive

constructs (Berry et al., 2011; Van de Vijver, 1997;

Van de Vijver & Leung, 1997), there are very few studies

that address the extent to which different tests or

batteries of tests are able to measure the same cognitive

constructs in a comparable manner across different

economic, cultural, and linguistic groups (e.g.,

Helms-Lorenz, Van de Vijver, & Poortinga, 2003).

Hui and Triandis (1985) argue that a fundamental chal-

lenge in creating equivalence is that the instrument or

test items should be similar or the same. In other words,

each item on the test should mean the same in both cul-

tures. This limits not only cross-cultural comparability

but also cross-cultural adaptability of tests because of

cultural and linguistic differences.

The need to construct a uniform test battery for

children in Bangladesh, Ghana, and Tanzania was

prompted by the plan to perform a detailed investigation

of the long-term impact of a childhood episode of severe,

predominantly malarial, infection that required hospital

admission with symptoms ranging from stupor to

convulsions and deep coma. Patients with these symp-

toms in early childhood had been part of a randomized,

placebo-controlled treatment trial (RCT) that prevented

death and serious neurological sequelae (Gomes et al.,

2009). Whether treatment offered survivors any lasting

protection against potential harm from parasite conges-

tion in the cerebral microvasculature or whether severe

infections nevertheless left children with long term

cognitive and clinical impairment was to be evaluated.

Before testing the severe malaria cohort, we wished to

explore the psychometric properties of the test battery

itself in apparently normal children from the same areas.

The intention was to examine the robustness of the mea-

sures and the constructs applied across the three settings.

The study was part of a larger series of studies supported

by the Saving Brains Programme, Grand Challenges

Canada (SB)

that focused on the re-enrolment of children

previously the target of an intervention designed to protect

against potential risks to brain development. The SB pro-

gramme required all participating studies to investigate

the impact of the diverse interventions upon a set of Core

Metrics, a term used in the programme to denote key con-

cepts of development common to all projects. These were

labeled as: General Intelligence, Executive Functions, and

the development of Literacy Skills. We selected tests to

measure each of these three core metrics based on both

previous evidence of the effects of severe malaria (Holding

& Boivin, 2013; Holding et al., 2004) and culture-relevant

test adaptations (Kitsao-Wekulo et al., 2012).

The three populations in which our original study

took place share some features that led to their initial

selection, with all three being low resource, largely rural

communities. They differed, however, in other character-

istics that made it necessary to alter the content of the test

battery to make the material understood by the children

to be evaluated. The adaptation process thus addressed

differences in language and culture, while trying to max-

imize uniformity of the tools. Test performance in the

three populations was examined to assess the extent to

which the tests could discriminate between children at

the three study sites and to test to what extent a single

battery of assessments could provide comparable and

accurate information across multiple cultural settings.

The age groups of the severe malaria cohort dictated

the age-groups of 7 to 18 years for this test-adaptation

study, in the same study areas of the three countries.

Methods

Study population and sample characteristics

Study sites were located in Bangladesh, Ghana, and

Tanzania. The sites selected had in common: a rural

ambience, relative poverty, constraints in access to

health care, and restricted levels of literacy amongst

the adult population. These factors, and the risk of

malaria infection, influenced the initial selection of the

sites themselves. For this study children living in each

of the main study locations, who were independent

from the main re-enrolment cohorts were identified.

Bangladesh

The RCT was originally carried out in four sites of

Chittagong District in south-eastern Bangladesh. The

majority of the population in these locations speaks the

local language, Chittagonian. English is taught in schools

as a second language. Adult literacy across the study sites

varies from 28 to 50%, with male adult literacy at 64.8%

and female adult literacy at 58%in 2015 (UNESCO, 2015).

Ghana

In Ghana the RCT was conducted in Kassena-Nankana

East and West Districts in the Upper-East region of

Ghana. Two main languages are spoken, Kasim and

Nankam, although English, the official language, is the

main language of instruction in schools, and also spo-

ken by many. Despite school attendance being above

90% (slightly favoring males) adult literacy rates are

low, with 65.5% of the population above 15 years

having received no formal education and more females

(74.6%) being uneducated than males (54.4%)

(Adetunde & Akensina, 2008).

http://www.grandchallenges.ca/saving-brains/.

2 P. HOLDING ET AL.

Downloaded by [Melba Gomes] at 11:35 27 July 2016

Tanzania

Test piloting in Tanzania was carried out in Kilosa and

Handeni Districts, where Kiswahili is widely spoken,

although there are other indigenous languages used.

National adult literacy rates in Tanzania are estimated

at 75.9% for females and 84.8% for males, although

they are likely to be lower in rural areas (United Nations

Educational, Scientific and Cultural Organization

(UNESCO), Institute for Statistics, 2013). More detailed

information on the population characteristics for each

site is presented in Table 1.

Measures

Tests were selected to measure General intelligence,

Executive Function (Working Memory, Selective/Sus-

tained Attention and Inhibition & Attentional Shift),

and Achievement (Literacy and Numeracy). The instru-

ments selected are described in Table 2. The preparation

of these tests followed a systematic adaptation pro-

cedure for neurocognitive and psychological measures

that has been previously described (Holding, Abubakar,

& Kitsao Wekulo, 2010; Holding & Kitsao-Wekulo,

2009).

The first procedural step was to clearly define the

concepts and constructs to be measured. As highlighted

in the introduction, we were guided by the requirements

of the Core Metrics framework, which identified three

general concepts (General Intelligence, Executive Func-

tion, and Development of Literacy Skills). A review of

the evidence on functional areas that are sensitive to

the effects of malaria infection was used to support

the identification of component constructs of these

general concepts (Holding & Boivin, 2013). General

Intelligence was defined by the constructs identified in

the Lurian Model of the Kaufman Assessment Battery

for Children II (Learning, Sequential and Simultaneous

Processing, and Planning). To further broaden the

Table 1. Background characteristics of study locations.

Location

Bangladesh Ghana Tanzania

South East North-Eastern Northern

Population 990,657 153,293 438,175

Language Bengali, (dialects Chittagonian,

Chak, “Rakhain,” “Marma,”

“Tanchanga,” “Tripura”)

Kasim and Nankam Kiswahili (indigenous

languages Kaguru, Sagara;

Vidunda and Nyamsanga)

Economic activity Agriculture and day labour Subsistence farming and small

scale business

Agriculture 80%, commerce

and tourism

Economic levels Varying between 17.6% and

greater than 55% below the

poverty line across the district

30.6%in North East and 42.5%

in the Kassena- Nankana district

Rural areas have 23.1%below

food poverty line and 40.8%

below basic needs poverty line

Nutritional status children

<5 years

Severe stunting 15.3%, severe

wasting 4%, severely

underweight 10.4%

Stunting 25.8% Stunting 42.1% are stunted,

Wasting: 4.8%are wasted

Access to health facilities 4 to 10 km from household 42.3%take less than 30 min to

reach the nearest health facility

Time to the nearest health

facility approximately 60 mins

Family structure Multigenerational patriarchal

extended family with average

family sizes of 4.36

Mixed generation - extended

family system average family

sizes of 7.2

Monogamy and polygamy

practiced. Average household

size is 4.4

Children’s daily activities Boys: cultivation, farming,

cutting wood Girls: household

chores, cooking, washing,

fetching water, sewing clothes

and caring for poultry or cattle

Boys: gardening, farming, gather

firewood and seeds, sow seeds,

tend livestock Girls: household

chores, cooking, washing,

fetching water

Boys: cultivation, grazing cattle

Girls: collect firewood, grazing

calves and goats, fetching

water, cooking and assisting in

care of younger siblings

Official age school entry 6 years 6 years 6 years

Table 2. Description of tests validated.

Test’s name Description

Atlantis This test requires children to associate a series of nonwords with pictures for fishes, plants, and shells.

Hand movements The child is expected to repeat a series of hand movements performed by the assessor. The number of movements

increases as the trial increases.

Footsteps The child has to select the shortest route of footsteps for a small doll to fetch its ball.

Story completion The child is shown a selection of pictures that, when placed in the correct order, tell a story.

Kilifi naming test This is a test of expressive vocabulary in which the child labels each of a selection of pictures.

Rey-Osterrieth complex figure The child has to reproduce a complex figure drawing, first by copying, and after 20 min, from memory.

NOGO This is similar to the classic NOGO paradigm in which participants have to learn to withhold a response when a

previously associated stimulus is presented. Hand movements were used as the stimuli.

Shift Children are primed to switch responses to a series of hand movements when a specific trigger movement is

displayed.

People search This test requires a child to scan a page of stick figures, and select a specific figure from amongst different figures.

Literacy test Items sample letter shape and sound recognition, as well as reading comprehension and writing fluency.

Numeracy test Items sample number recognition and arithmetic and problem solving skills.

APPLIED NEUROPSYCHOLOGY: CHILD 3

Downloaded by [Melba Gomes] at 11:35 27 July 2016

range of skills included in the battery to include those

previously explored in the investigation of severe

malaria we added the construct of Expressive Language.

To assess Executive Function we selected tests that

measured the constructs prescribed in the Saving Brains

Programme Core Metrics, that is: Working Memory

and Attention (Sustained Attention, Shift, and Inhibi-

tory Control). Achievement, the application of skills in

school based learning, was measured by the develop-

ment of literacy and numeracy skills.

The next step was to identify a potential pool of

measures of the concepts and constructs, and to review

their content for potential challenges to engagement.

Existing literature, highlighting theoretical frameworks

of both published and open-source measures previously

used in similar contexts, where possible, constituted our

test pool of measures for each core concept.

Test preparation involved translation of the instruc-

tions, and piloting of the stimuli, visual and verbal, as

well as of the administration procedures. The key aspects

of test equivalence that the adaptation process strove to

maintain were: (a) item equivalence, such that each item

should contribute in a similar manner to the overall test

score; (b) semantic equivalence, such that each item

should mean the same thing in each context; and (c) pro-

cedural equivalence, such that each test or item should be

administered in an equivalent manner. The first two

focus on content, and the last on administration.

A panel of experts that included key personnel from

the three different locations consolidated culturally

appropriate conceptual vocabularies to guide the prep-

aration of materials for each site. The panel focused

on the content, sensitivity, and face value of the tools

against the cultural background of children in the

locality in which the work was to be implemented.

Guidelines for conceptual translations and item selec-

tion were discussed at a workshop attended by represen-

tatives of each study site. A rigorous review of the

individual items and level of difficulty was subsequently

established in a pre-pilot in children of the target age

group at each location.

Instructions for the tests were produced in each

of the respective local languages through a multistep

process. The first step—translation into the local

language—was checked through multiple iterations of

a back-translation process, to evaluate the semantic

and conceptual equivalence of the translations to the

original instruction (Werner & Campbell, 1970). Three

independent back translations were made into English

to refine the translation process. Further refinement of

the instructions and general assessment procedures fol-

lowed close observation of children’s responses during

the pre-piloting process.

Visual stimuli, images or pictures, were screened for

cultural relevance. For example, a party scene with cakes

and balloons that would not be familiar to a child in any

of the study locations was replaced by a culturally equiva-

lent image of a celebration that was easily recognizable to

children in all three countries. If a replaced item was

not appropriate for all sites, alternatives for a site were

selected. Table 3 summarizes the modifications made.

Training assessors

With the exception of Bangladesh, where assessors

were chosen in response to a classified advertisement

Table 3. Test source and adaption process.

Core concept Construct Test name Source

Changes

Visual Stimuli Verbal stimuli Procedures

General

intelligence

Learning Atlantis KABC II No changes Changing

pronunciation

Extended instructions,

changed start stop rules

Sequential

processing

Hand movements No changes Nonverbal No changes

Simultaneous

processing

Footsteps Local adaptation

sub-tests KABC II

100%change in

pictures and layout

Nonverbal Extended instructions,

changed start stop rules

Planning Story completion 90% of images

replaced

Nonverbal Extended instructions,

changed start stop rules

Verbal intelligence Kilifi naming test

(KNT)

Wekulo and Holding

personal

communication

10%of images

replaced

100%translated Item order

Executive

function

Working memory Rey Osterrieth

complex figure

Rey (1941) and

Osterrieth (1944)

No change Nonverbal No changes

Inhibitory control NOGO Original No change Nonverbal

Attentional shift Shift Original No change Nonverbal

Selective/sustained

attention

People search Connolly and

Pharoah (1993) and

Holding et al. (2004)

Layout and length Nonverbal No changes

Achievement Literacy and

Numeracy tests

Local adaptations of

UWEZO TZ (2012),

UNESCO Bangladesh

Survey (2005), WRAT

Script and Language

to match local con

4 P. HOLDING ET AL.

Downloaded by [Melba Gomes] at 11:35 27 July 2016

requiring candidates with a psychological or child devel-

opment background, most assessors elsewhere had no

prior experience in testing procedures. Some had a

background in psychology, but most were drawn from

an institutional advertisement of the research post

available at the different study sites. All had post high

school education (degrees or diplomas). They were

primarily selected for their fluency in the local lan-

guages and prior health research experience, rather than

experience in assessment. Table 4 summarizes assessor

characteristics.

In all three countries, the assessors’ training was con-

ducted and supervised by qualified psychologists who

accredited the assessors only after demonstration of an

acceptable skill level in test administration, based on

set criteria. Training involved a combination of theory

and practice sessions of test administration spanning

four weeks. In the first week, a workshop covered mod-

ules on the basic elements of child development;

research methods; theoretical models for assessment,

particularly the Luria model selected for this study;

and managing individual differences in a standardized

research setting. In the second week, the assessors were

introduced to the practice of testing, starting with test-

ing peers and then young children usually selected from

schools. Detailed feedback was provided evaluating the

performance of the assessors against a standard guide-

line on assessment techniques. This supervision and

feedback was provided In Tanzania through the use of

videotaped sessions; in Bangladesh two psychology

supervisors observed the test administrations; in Ghana

the supervising psychologist provided oversight.

Data collection process

Different samples of children not involved in the main

re-enrolment study were recruited through schools and

the general community to evaluate test-retest and inter-

rater reliabilities. Those with obvious medical or neuro-

logical signs of disability at the time of assessment were

excluded. The retest phase was completed within two to

three weeks after initial testing at all sites. In Ghana, all

children were selected only from schools, in Bangladesh

they were selected from the schools and community,

while in Tanzania the children for test-retests and

inter-rater reliabilities were selected only from the

community. Inclusion criteria were children from the

appropriate age bracket (>6–18 years of age). After

the test-retest process was satisfactorily completed, the

tests were used in an expanded sample of healthy chil-

dren drawn from schools, to represent what might be

considered those developing “optimally” in their context.

Ethical Clearance for this study was provided by

University of Oxford (OXTREC), Ethical Review

Committee of Chittagong Medical College, Bangladesh,

the Institutional Review Board of Navrongo Health

Research Centre, Ghana, and the Ethics Review Com-

mittee of the National Institute of Medical Research,

Tanzania. Human data included in this manuscript

were obtained in compliance with the WMA Helsinki

Declaration. There was no reward for participation.

All children who participated were served lunch to

ensure children were fed during testing.

Administration procedures

Each child was tested following the same protocol in all

three sites. Each assessor assessed two children each day

for four days per week. The tests were presented in the

same order: Atlantis, Rey-Osterrieth, Hand Movements,

Footsteps, NOGO, Shift, People Search, Story Com-

pletion, and KNT. The duration of assessment for each

child was about two and half hours. Breaks were taken

in administration when necessary. While essentially a

common procedure was followed, the specific protocol

followed in each country was dictated by local require-

ments. These are outlined in the following sections.

Bangladesh

Assessment was conducted in the residence of the

children. A prior appointment was made before the

assessment. Informed consent was taken directly from

the subject if he/she was 18 years of age; if the child

was between 8–9 years age then consent was taken from

parent/guardians; and if aged between 10–17 years con-

sent was taken from guardian and assent taken from the

child by the assessors after describing the details of the

assessment process.

Ghana

Two schools were selected from the Kassena-Nankana

Districts where the two predominant ethnic groups

reside. One was an elementary school (Year 1 to 6),

the other a Junior High School (Year 1 to 3). The assess-

ments were carried out in empty classrooms using

furniture provided by the research team, during school

hours (8 am and 3 pm). Permission was first sought

from the Ghana Education Service District office which

Table 4. Summary of assessor characteristics for each country.

Country

No assessors

recruited

No assessors

requiring extended

training/failure to

pass training

Assessor

gender:

male/female

Assessor years

of education

completed

mean (SD)

Bangladesh 12 0/0 7/5 14.25 (1.86)

Ghana 12 2/2 5/7 16 (0)

Tanzania 19 5/3 7/12 14.74 (1.52)

APPLIED NEUROPSYCHOLOGY: CHILD 5

Downloaded by [Melba Gomes] at 11:35 27 July 2016

sent notices to the schools that, in turn, informed the

parents. Informed consent was taken from available par-

ents. If a child was selected, but the parent was not avail-

able, consent was sought through the Chairperson of the

Parent-Teacher Association in the village. Assent was

also sought from children.

Tanzania

Assessments were carried out at the convenience of the

family where a suitable environment was available in

homes, close to home or at school. Assessment sessions

in schools were carried out either when schools were

closed, or in classrooms separate from the main school

compound. Informed consent was taken from a parent/

guardian a day before the planned assessments were car-

ried out. Detailed information was given to the parent/

guardian about the study and the aims of the project.

After checking that the parent/guardian understood

what was being explained, and was in agreement, signed

consent was taken. Informed assent was taken from the

child just before assessments.

Analytic plan

STATA was used for data management and transforma-

tions of variables. The psychometric analyses were car-

ried out using SPSS version 19. Confirmatory factor

analysis was conducted using AMOS version 22.

As the main focus of the analysis was to evaluate the

consistency with which individual tests behaved

between different linguistic and cultural settings, perfor-

mance on each test from the battery was evaluated

separately by country. Analyses investigated:

a) Within Population Variance, measured through an

examination of the distribution of test scores:

b) Reliability, measured through the evaluation of:

i. Consistency across items within a test with mul-

tiple items (internal consistency) using Guttman’s

Split half for tests where item variability is

primarily by degree of difficulty, and Cronbach’s

alpha where items are intended to also sample

related, but not identical skills.

ii. Consistency across time (test-retest reliability)

using the Intraclass Correlation, to take into

account the use of multiple assessors.

iii. Consistency across assessors (inter-rater

reliability) using Kappa Coefficients.

c) Responsiveness through a series of univariate analy-

sis exploring the relationship between variability in

test performance and the background characteristics

of the children, age and gender differences, as well as

school exposure.

d) Confirmatory Factor Analyses (CFA) was underta-

ken to investigate measurement invariance, that is

the assumption that the underlying association

between the nine sub-tests that constituted the neu-

rocognitive battery were comparable across coun-

tries, and to what level that comparability is

evident (Van de Vijver & Leung, 1997). To carry

out this analysis the scores were standardized to

obtain a similar comparative unit for all measures.

In the first step we examined configural invariance,

testing whether there was a single factor, general

intelligence, in each country with significant loadings

on each subtest, accounting for the correlations

between tests that we found in each country.

The second step, examining measurement weights

(metric invariance), investigated whether the factor

loadings were identical across countries, indicating that

each test made the same contribution to the general fac-

tor in each country.

The third step tested measurement intercept invar-

iance (also called differential item functioning). This

analysis tested whether the regression line that links

the latent factor, intelligence, to subtest scores had the

same intercept (made the same initial contribution) to

each group. This is commonly used as test of scalar

invariance, required to support the integration of scores

across countries in an analysis of variance.

The next step explored invariance of the structural

covariance. This examined whether the error variance

of the latent factor is identical across countries. Finally,

the analysis of measurement residuals tested whether the

error components of the observed variables are

identical.

The first three analyses are usually considered the

most important as they indicate whether there is a joint

latent factor (configural invariance), whether the latent

factor is measured the same way in each country

(measurement weights invariance), and whether scores

can be directly compared across countries (measure-

ment intercepts). We used various criteria to evaluate

the goodness of fit of our tested models: v

values

should be nonsignificant to support a good fit, Δv

values should be nonsignificant, values of Comparative

Fit Index (CFI) should be .90 or above, decreases in

CFI between subsequent analyses should not be larger

than .01, Tucker-Lewis Index (TLI) values should be

above .90, Standardised Root Mean Residual (SRMR)

values should be .06 or less, and Root Mean Square

Error Approximation (RMSEA) values should be less

than .06. As the v

tests are known to be sensitive for

sample size, we did not rely on their outcomes in a rigid

way, but examined the global constellation of fit indices.

6 P. HOLDING ET AL.

Downloaded by [Melba Gomes] at 11:35 27 July 2016

Results

In total, 786 children, with ages ranging between 6 years

to 18 years, were tested across the three countries. A

summary of sample size and age characteristics is

presented in Table 5. The sample from Ghana was

smaller, and the only one where all children were in

school. Table 6 describes the rounds of data collection,

highlighting the numbers of children available for the

different levels of analysis.

MCAR tests showed nonsignificant results in Ghana

and Tanzania and a significant value in Bangladesh

(31) ¼70.65, p <.001). On the basis of these results

it was decided to replace missing values using an EM

algorithm.

Within-population variance. Table 7 shows how

scores for all tests across all three sites ranged from the

lowest towards the maximum possible. In only one test,

and only in one site, did the proportion of no responses,

children who failed to engage with the task at all, and

could make no attempt) reach a significant number of

children (NOGO in Bangladesh at 34%).

While mean scores were similar across the three sites,

the data from Ghana was less consistent with the other

two sites. There were significant deviations in the means

for Ghana on Atlantis (mean 59 vs. for 71 and 76 in

Tanzania and Bangladesh respectively), Story Com-

pletion (5 vs. 9/8), KNT (70 vs. 79/76), Rey Osterrieth

Copy (16 vs. 11/12), and Recall (25 vs. 16/16). The

Ghana sample was also characterized by smaller

standard deviations.

Normality of distribution. Normality of scores

reported in Table 7 was based on a skewness value

between 2 and þ2. The majority of distributions

were evaluated as Normal. Only NOGO results in

Bangladesh, and People Search in Tanzania deviated

from normal. The former demonstrated a negative

skewness (mass of distribution concentrated on the

right), and the latter positive skewness (mass of distri-

bution concentrated on the left).

Consistency over time (test–retest reliability). A

significant correlation was achieved between scores at

the two time points (see Table 8). The only test that

did not achieve either a moderate level of agreement

(.5–.6) or strong level agreement (.7–.8) across time

points was NOGO. Although not identical, the pattern

was similar across two of the sites. The exception was

Tanzania, where the sample size was too small to draw

clear conclusions.

Consistency across test items. Again, with the

notable exception of Atlantis, test content indicated

good levels of internal consistency, of .7 and above.

Consistency across assessors (interrater). Results

were very variable, ranging from no better than chance

(nearing 0), to near perfect/perfect agreement (>.9).

The tests that drew the worst results were the KNT

and Rey (copy and recall), in Ghana and Tanzania.

Responsiveness. Age was re-categorized into three

groups: 6 to 10 years, 11 to 14 years, and 15 to 18 years.

Age (Table 9) was associated with score variance, the

majority of effect sizes were large (>.138 shaded in dark

grey in Table 9) or medium (>.059 shaded in light

grey). In contrast, gender had a limited association with

score variance. Only Footsteps and KNT showed signifi-

cant associations, with medium effect sizes (Cohen,

1988).

Atlantis. Post hoc analyses (Scheffé test) identified

differences in patterns across the sites. In Bangladesh,

scores rose from the younger to the middle age group,

and then were found to drop again. In Ghana, changes

were linear, increasing with age, while in Tanzania

increase in age was curvilinear, leveling off from the

middle to the oldest age group.

Hand movements. Effect sizes for age varied

between sites, they were only significant in Tanzania,

Table 5. Characteristics of the sample of children in each country.

Country N Mean age (SD) Gender

Age grouping

N (%) 7–10 years 11–14 years 15–18 years

N (%) N (%) N (%) In school

Tanzania 323 12.26 (2.92) Girls 49 (29.9) 66 (40.2) 49 (29.9) 265 (82)

Boys 61 (38.4) 57 (35.8) 41 (25.8)

Ghana 166 12.42 (2.81) Girls 27 (35.1) 31 (40.3) 19 (24.7) 166 (100)

Boys 27 (30.3) 39 (43.8) 23 (25.8)

Bangladesh 297 13.26 (3.08) Girls 34 (20.4) 66 (39.5) 67 (40.1) 217 (73)

Boys 35 (26.9) 48 (36.9) 47 (36.2)

Total 786 12.67 (2.99) Girls 110 (27.2) 160 (39.5) 135 (33.3) 648 (83)

Boys 123 (32.7) 142 (37.8) 111 (29.5)

Table 6. Sample sizes in test-retest and interrater analyses.

Country

Test-retest Inter-rater

Initial

test pool

Total pool used in

construct validity

N N N N

Bangladesh 80 58 80 297

Ghana 131 64 166 166

Tanzania 13 54 64 323

Total 224 176 310 786

(with additional “optimal” children).

APPLIED NEUROPSYCHOLOGY: CHILD 7

Downloaded by [Melba Gomes] at 11:35 27 July 2016

where multiple comparisons using Scheffé test identified

that this was accounted for by the difference between

the youngest and the oldest age groups.

Footsteps. Scores on this test were associated with

both age and gender differences, with the effect sizes lar-

ger for age than gender. Multiple comparisons sug-

gested a gradual increase in scores across the age

groups, only significant between the 6 to 10 years and

15 to 18 years age groups. A significant gender differ-

ence favored males.

Story completion. While age, and not gender, was

associated with score variance, the pattern of the older

age group differed in the three sites. In Bangladesh

the oldest group was similar to the youngest, in Ghana

the two younger groups were similar. It was only in

Tanzania where the change from youngest to oldest

was linear.

KNT. While scores showed a significant increase

with age, post hoc analyses showed site differences. In

Bangladesh the change was linear, in the other two sites

the change was curvilinear. There was also significant

association with gender for Ghana and Tanzania but

not for Bangladesh. In both cases, the difference in

scores favored males.

Rey Osterrieth. For the first, Copy, stage of the test,

the pattern was similar across sites, with large effect

sizes, that reflected a superior performance among

younger children. This pattern was repeated in the

Recall stage of the task, although, with the exception

of Tanzania, the effect sizes were smaller. Also in

Table 7. Population variance for tests.

Atlantis N Max possible Mean SD Min Max % No response Normality of distribution

Bangladesh 294 108 76.24 17.09 18 106 0 Normal

Ghana 167 59.49 15.69 21 93 0 Normal

Tanzania 323 73.09 16.45 16 105 0 Normal

Hand movements

Bangladesh 295 23 8.15 3.09 1 17 0.7 Normal

Ghana 166 8.31 2.80 1 17 0.6 Normal

Tanzania 323 8.61 3.24 2 23 0.0 Normal

Footsteps

Bangladesh 295 48 16.24 9.04 0 40 1.7 Normal

Ghana 166 14.84 7.29 2 36 0.6 Normal

Tanzania 323 21.25 10.99 1 43 0.0 Normal

Story completion

Bangladesh 293 36 8.11 5.46 0 31 1.1 Normal

Ghana 166 5.36 2.28 1 17 1.2 Normal

Tanzania 321 9.16 5.33 0 24 0.6 Normal

KNT

Bangladesh 295 122 76.04 20.84 12 119 0.7 Normal

Ghana 166 70.33 14.66 25 97 1.2 Normal

Tanzania 323 79.04 18.70 14 117 Normal

Rey O copy

Bangladesh 292 36 16.20 3.52 2 32 1.7 Normal

Ghana 160 25.34 8.69 0 36 4.2 Normal

Tanzania 322 15.45 6.11 0 35 0.4 Normal

Rey O recall

Bangladesh 292 36 10.67 4.48 0 28 1.7 Normal

Ghana 157 16.59 8.38 0 34 6.0 Normal

Tanzania 320 11.68 5.44 0 27 0.3 Normal

NOGO

Bangladesh 196 NA 0.88 0.16 0.33 1.00 34.0 Negative

Ghana 164 0.85 0.17 0.45 1.00 1.8 Normal

Tanzania 322 0.89 0.16 0.17 1.00 0.2 Normal

Shift

Bangladesh 295 NA 0.42 0.29 0.00 1.00 0.7 Normal

Ghana 164 0.28 0.24 0.00 0.85 1.8 Normal

Tanzania 322 0.58 0.23 0.03 1.00 0.2 Normal

People search

Bangladesh 292 NA 0.18 0.67 0.04 0.60 1.2 Normal

Ghana 166 0.14 0.05 0.06 0.32 1.2 Normal

Tanzania 322 0.14 0.60 0.03 0.67 0.2 Positive

Literacy

Bangladesh 297 6 3.99 1.70 0 6 0 Normal

Ghana 166 1.85 1.32 0 5 1.2 Normal

Tanzania 323 4.19 1.40 0 6 0 Normal

Numeracy

Bangladesh 297 6 3.73 1.70 0 6 0 Normal

Ghana 166 2.74 1.06 0 6 1.2 Normal

Tanzania 323 3.69 1.47 0 6 0 Normal

Normal (approximating normal); Positive (mass of distribution concentrated on the left); Negative (mass of distribution concentrated on the right).

8 P. HOLDING ET AL.

Downloaded by [Melba Gomes] at 11:35 27 July 2016

Tanzania a significant gender effect was found, favoring

males.

NOGO. Post hoc comparisons identified similar age

effects between Ghana and Tanzania, and improved

performance with age, whereas in Bangladesh, younger

children made fewer errors on the test.

Shift. Age had a significant effect on scores for

Bangladesh and Tanzania, showing linear improvement

with age. In Tanzania only the oldest children had

significantly higher scores.

People Search. A moderate effect of age was

observed in all three countries. A post hoc analysis

showed that performance improvement was linear in

Ghana and Tanzania, while in Bangladesh, the differ-

ence between 11 to 14 years and 15 to 18 years was

not significant.

Literacy and numeracy. In Bangladesh, multiple

comparison analyses showed an increase in scores with

age that was curvilinear, and not significant between 11

to 14 years and 15 to 18 years. The results for Ghana

and Tanzania displayed a more linear improvement in

scores with age for both tests. It can be concluded that,

despite considerable variation across sites and tests,

most instruments showed score increments across ages.

We were also interested in the relationship between

the cognitive tests scores and schooling. Unfortunately

school experience was not consistently measured in

the pilot sample across all sites. However, this data

was available only for the sample included in the analy-

ses reported in the next section. We found that edu-

cation was strongly related to the general intelligence

factor; the standardized loadings were .65, .75, and .64

for Bangladesh, Ghana, and Tanzania, respectively. This

association is strong. It should be noted that our design

is cross-sectional and selective drop-out from schooling

could have affected our samples. If drop-out is nega-

tively related to (among many other factors) previous

educational achievement, it is likely that the brighter

Table 9. Test responsiveness to age and gender.

F Values, df (g²)

Test

Effect of age Effect of gender

Bangladesh Ghana Tanzania Bangladesh Ghana Tanzania

Atlantis 3.63*, 283 (.025) 4.06*, 166 (.05) 19.27

, 322 (.11) 0.29, 289 (.00) 3.80, 166 (.02) 4.06, .045 (.01)

Hand movement 2.49, 290 (.02) 4.11*, 166 (.05) 11.66

, 323 (.07) 0.38, 290 (.00) 0.06, 166 (.00) 0.02, 323 (.00)

Footsteps 37.07

, 290 (.21) 14.25

, 166 (.015) 61.75

, 323 (.28) 25.45

, 290 (.08) 11.65, 164 (.07) 18.61

, 323 (.06)

Story completion 7.59

, 288 (.05) 11.65

, 166 (.13) 3 2.91, 288 (.01) 3.26, 166 (.02) 0.01, 321 (.00)

KNT 39.95, 290 (.22) 30.24

, 166 (.27) 79.27

, 322 (.33) 2.82, 290 (.01) 14.69

, 166 (.08) 6.55, 322 (.02)

Rey O copy 16.24

, 292 (.10) 13.64

, 160 (.15) 45.54

, 292 (.22) 0.57, 289 (.00) 1.27, 160 (.01) 2.98, 322 (.01)

Rey O recall 5.65* 292 (.04) 4.21*, 157 (.05) 52.47

, 320 (.25) 0.19 (.00) 1.94, 157 (.01) 4.83*, 320 (.02)

NOGO 29.35

, 191 (.24) 3.85*, 164 (.05) 27.73

, 322 (.15) 0.07, 191 (.00) 1.01, 164 (.01) 1.04, 322 (.00)

Shift 39.52, 290 (.22) 2.69, 164 (.03) 32.66

, 322 (.17) 0.03, 290 (.00) 0.09, 164 (.00) 3.27, 322 (.01)

PS efficiency 14.63

, 292 (.09) 25.59, 166 (.24) 63.44

, 322 (.29) 0.15, 292 (.00) 0.01, 166 (.00) 1.67, 322 (.01)

Literacy 27.01, 292 (.16) 38.05

, 166 (.32) 89.12

, 323 (.36) 0.65, 292 (.00) 2.51, 166 (.02) 1.20, 323 (.00)

Numeracy 16.51

, 292 (.10) 17.99, 166 (.18) 92.87*, 323 (.37) 2.27, 292 (.01) 0.29, 166 (.00) 0.05, 323 (.00)

**significant at .001. *significant at .05.

Table 8. Reliabilities of tests.

Test-re-test (ICC) Internal consistency

Inter-rater (Kappa)

B G T B G T B G T

N 80 131 13 297 166 323 58 64 54

Atlantis

.63 .58 .55 1.00 0.41 0.99 .96 .74 .75

Hand movements

.66 .76 .25 – – – 1 .67 .79

Footsteps

.68 .77 .64 0.93 0.86 0.81 .88 .64 .86

Story completion

.76 .71 .82 0.76 0.60 Neg .96 .88 .86

Kilifi naming

.74 .82 .74 0.89 .69 .99 .94 .40 .22

Rey O copy .80 .87 .91 – – – 1 .17 .14

Rey O recall .81 .80 .80 – – – 1 .06 .13

NOGO .80 .48 .54 – – – 1 .84 .88

Shift .72 .71 .49 – – – .98 .63 .67

People search .54 .69 .65 – – – 1 .80 .98

Literacy

.87 .92 .78 .84 .85 .85 – – –

Numeracy

.71 .56 .55 .82 .70 .70 – – –

Notes. ICC ¼Intra-class correlation; B ¼Bangladesh; G ¼Ghana; T ¼Tanzania.

Consistency over time (test-re-test reliability): For the majority of the tests, over all three sites there was a slight increase in the mean score from time 1 to time

2. The exception to this was the NoGo Shift tasks, where there were reductions in the scores over time. A significant correlation was achieved for the majority

of the tests, and the majority a moderate agreement (.5–.6) or strong agreement (.7–.8). The pattern was similar across two of the sites, although not

identical. The exception was Tanzania, where the sample size was too small to draw clear conclusions. Consistency within the test items: Again, with

the notable exception of Atlantis, test content indicated good levels of internal consistency, of .7 and above. Consistency across assessors (inter-rater).

The results were very variable, ranging from no better than chance (nearing 0), to near perfect/perfect agreement (>.9–1). The source of variability

seems to be both test and team, with the team with greater previous experience achieving higher agreement levels.

Split half.

Alpha.

APPLIED NEUROPSYCHOLOGY: CHILD 9

Downloaded by [Melba Gomes] at 11:35 27 July 2016

students are more likely to remain in education, which

could augment the correlation and performance across

age groups. It can be concluded that our results are con-

sistent with other studies in that education can be

expected to have a major influence on cognitive test

performance (Ceci, 1991; Falch & Sandgren, 2011;

Ritchie, Bates, & Deary, 2015).

Underlying constructs: The validity of the

instruments selected

Correlation coefficients are presented in Table 10).

The majority of the correlations was positive and

ranged from low to moderate indicating a linear

relationship among many variables. The highest

coefficients were found between the Rey Osterrieth

scores (Copy and Recall) in every site. Median cor-

relations did not differ much across sites (Tanzania:

Md ¼.29; Ghana: Md ¼.26; Bangladesh: Md ¼.35).

There were no observable differences in the pattern

of correlations between those tests that were utilized

to measure General Intelligence and Executive

Functions.

Confirmatory factor analysis

We employed confirmatory factor analysis to address

(a) the dimensionality of the test battery and (b) the

invariance (similarity) of this factor structure across

the groups. Initially we tested a two-factor model,

distinguishing between a general intelligence and an

executive functioning factor. However, as mentioned,

the correlations between the intelligence and executive

functioning subtests were virtually indistinguishable.

The two-factor model had a poor fit and required mul-

tiple double loadings of subtests on both latent factors.

Therefore, we opted for a one-factor solution. We found

a reasonable fit for a model in which all subtests loaded

on a single factor, although we had to allow for corre-

lated errors between Atlantis and People Search,

between Footsteps and NOGO, and between NOGO

and Shift (note that the latter are two of the executive

functioning tests). The results of the invariance tests

are presented in Table 11. The v

tests were highly

significant in all cases (which is common in multigroup

confirmatory factor analyses as this statistic is sensitive

to sample size) (Fan, Thompson, & Wang, 1999). How-

ever, all other statistics pointed to a fairly good fit of the

configural invariance model (which allows all estimated

parameters to differ across groups), CFI ¼.993 (recom-

mended: above .90), TLI ¼.900 (same recommended

value), SRMR ¼.043 (recommended: smaller than .05),

and RMSEA ¼.04 (recommended: smaller than .06).

As can be seen in Table 11, all more restrictive invar-

iance models showed worse fit values, notably the

measurement residuals model. All factor loadings were

positive and significant, as can be expected. Yet, when

the loadings of the configural and measurement weights

are compared (Table 12), it is clear that the differences

in factor loadings are small and would not give rise

Table 10. Correlations between the sub-tests, by country.

Test Atlantis Rey Osterrieth Hand movements Footsteps NOGO Shift People search Story completion

Bangladesh

Rey Osterrieth (copy) .356**

Hand movements .460** .302**

Footsteps .277** .412** .356**

NOGO .182** .184** .216** .456**

Shift .201** .171** .263** .478** .670**

People search .402** .423** .347** .353** .236** .371**

Story completion .384** .317** .349** .247** .021 .063 .364**

KNT .367** .447** .319** .538** .403** .457** .508** .349**

Ghana

Rey Osterrieth (copy) .288**

Hand movements .349** .242**

Footsteps .271** .195

.292**

NOGO .059 .100 .129 .302**

Shift .202** .080 .300** .254** .284**

People search .248** .200** .335** .405** .160

.265**

Story completion .233** .123 .217** .207** .059 .262** .274**

Kilifi naming test .336** .322** .276** .399** .188

.172

.507** .353**

Tanzania

Rey Osterrieth (copy) .360**

Hand movements .292** .256**

Footsteps .263** .291** .296**

NOGO .245** .255** .130 .113

Shift .367** .171

.232** .278** .305**

People search .283** .438** .243** .355** .215** .238**

Story completion .351** .325** .372** .318** .270** .394** .372**

Kilifi naming test .497** .355** .190** .446** .231** .333** .438** .459**

*p <.05. **p <.01.

10 P. HOLDING ET AL.

Downloaded by [Melba Gomes] at 11:35 27 July 2016

to different interpretations of subtests or the general

factor across the three countries. We conclude that

the loadings can be treated as identical across the coun-

tries although it should be acknowledged that the stat-

istical evidence does not completely underscore this

conclusion.

Discussion and conclusion

We found that the systematic process of adaptation of

tests to three different cultural and linguistic contexts

led to instruments with adequate psychometric proper-

ties. The underlying variance in the scores on individual

tests supported the benefits of the adaptation process.

Engagement of children (access to and understanding

of the material) was reflected in the high proportion

of children who were able to attempt the tests, and in

the approximation to normality of the underlying

variance.

Most tests showed adequate reliabilities. However,

there is some concern over interrater consistency. The

source of variability between sites seems to have been

related to the team experience, with the Bangladesh

team (with two experienced psychologists in the field

and supervising all assessments) achieving higher agree-

ment levels. This finding led to changes in the system of

supervision for the main data collection process: we

repeated some training elements, and increased the level

of supervision. Subsequent assessments were evaluated

using a standard observation guideline of assessment

practice. Supervisors reported that the assessment team

improved their level of practice against these guidelines.

The changes led to improved inter-rater reliabilities post

the piloting reported here.

The SEM model suggests that the variability in

outcome across sites is more efficiently explained in

reference to what might be termed “g,” a general under-

lying ability denoting general intelligence, in line with

literature on intelligence models. Despite all the differ-

ences in samples and various confounding factors, the

associations between the instruments are similar across

the three groups of children, with Executive Function

and Intelligence tests merging in all three groups.

Although not further documented here, a split in

younger and older age groups did not change the

one-factor solution. This is not in line with the Western

literature in which the distinction between general

intelligence and executive functioning has been noted

(e.g., Friedman et al., 2006). The strong presence of a

general factor may reflect the numerous individual dif-

ferences in background variables, such as socioeco-

nomic status and opportunities to learn, which tend to

coalesce and are often related to drop-out. It could well

be that more homogeneous non-Western samples

would also allow for a sharper distinction between intel-

ligence and executive functioning.

The comparison of factor weights in the configural

invariance and measurement weights solution shows

that there are almost no differences, thus the cognitive

structure of the battery (with “g” in the apex) is very

stable. Scalar invariance is not supported, and thus a

direct comparison of scores cannot be made.

In summary, our data supports the feasibility of

applying a single battery across multiple settings, to

Table 11. Fit indexes of the invariance tests of the one-factor model.

Model v

(df) Δv

CFI ΔCFI TLI SRMR RMSEA

Configural invariance 162.61

(72) .933 .900 .043 .044

Measurement weights 210.73

(88) 48.12

.910 .023 .889 .069 .047

Measurement intercepts 898.10

(106) 687.37

.416 .494 .405 .098 .108

Structural covariances 899.09

(108) .99 .417 .001 .417 .111 .107

Measurement residuals 1525.01

(132) 625.92

.000 .417 .160 .102 .129

**significant at .001.

Table 12. Factor loadings (standardized) of the configural in variance and measurement weights solutions.

Test

Configural invariance Measurement weights

Bangladesh Ghana Tanzania Bangladesh Ghana Tanzania

Atlantis .52 .47 .60 .58 .46 .53

Hand movement .50 .49 .43 .53 .39 .44

Footsteps .66 .57 .56 .66 .57 .46

Story completion .47 .45 .64 .52 .59 .46

KNT .77 .71 .71 .74 .69 .73

Rey O recall .59 .39 .55 .60 .22 .59

NOGO .43 .24 .38 .26 .14 .48

Shift .53 .38 .49 .44 .37 .49

People search efficiently .67 .67 .60 .64 .63 .67

Note that the unstandardized factor loadings in the measurement weights solution are identical and that the differences in the cells are due to the

standardization.

APPLIED NEUROPSYCHOLOGY: CHILD 11

Downloaded by [Melba Gomes] at 11:35 27 July 2016

measure a common cognitive concept, but where the

specific methodology used in one context (language,

materials, administration procedures) required modifi-

cation to suit the new context (Holding et al., 2004).

The process was time-consuming, but establishing

equivalence ensured that the adaptation maintained

acceptable reliability and validity and provided mean-

ingful interpretations of test scores. While we found evi-

dence of conceptual equivalence that supports the

ability to make comparisons across countries, we cannot

directly compare means across country data sets. Other

methods to summarize impact across settings such as

the comparison of effect sizes are required. Additionally

the variability in performance on individual tests across

settings does not allow us to examine performance in

more discrete cognitive constructs.

References

Adetunde, I. A., & Akensina, A. P. (2008). Factors affecting

the standard of female education: A case study of senior

secondary schools in the Kassena-Nankana district, Journal

of Social Sciences, 4, 338–342. doi:10.3844/jssp.2008.

338.342

Berry, J. W., Poortinga, Y. H., Breugelmans, S. M., Chasiotis,

A., & Sam, D. (2011). Cross-cultural psychology: Theory

and applications (3rd ed.). Cambridge, UK: Cambridge

University Press.

Ceci, S. J. (1991). How much does schooling influence general

intelligence and its cognitive components? A reassessment

of the evidence, Developmental Psychology, 27, 703–722.

doi:10.1037/0012–1649.27.5.703

Cohen, J. (1988). Statistical power analysis for the behavioral

sciences (2nd ed.). Hillsdale, NJ: Erlbaum.

Falch, T., & Sandgren, M. S. (2011). The effect of education

on cognitive ability, Economic Inquiry, 49, 838–856.

doi:10.1111/j.1465–7295.2010.00312.x

Fan, X., Thompson, B., & Wang, L. (1999). Effects of sample

size, estimation methods, and model specification on

structural equation modeling fit indexes, Structural Equa-

tion Modeling: A Multidisciplinary Journal, 6, 56–83.

doi:10.1080/10705519909540119

Friedman, N. P., Miyake, A., Corley, R. P., Young, S. E.,

DeFries, J. C., & Hewitt, J. K. (2006). Not all executive

functions are related to intelligence, Psychological Science,

17, 172–179. doi:10.1111/j.1467–9280.2006.01681.x

Gomes, M., Faiz, M. A., Gyapong, J., Warsame, M., Agbenyega,

T., Babiker, A., .. . White, N. J., for the Study 13 Research

Group. (2009). Pre-referral rectal artesunate to prevent

death and disability in severe malaria: A placebo-controlled

trial, Lancet, 373, 557–566. doi:10.1016/s0140–6736(08)

61734–1

Helms-Lorenz, M., Van de Vijver, F. J. R., & Poortinga, Y. H.

(2003). Cross-cultural differences in cognitive performance

and Spearman’s hypothesis: g or c?, Intelligence, 31, 9–29.

doi:10.1016/s0160-2896(02)00111-3

Holding, P., Abubakar, A., & Kitsao Wekulo, P. (2010).

Where there are no tests: A systematic approach to

test adaptation. In M. L. Landow (Ed.), Cognitive impair-

ment: Causes, diagnosis and treatments (pp. 189–200).

New York, NY: Nova Science Publishers.

Holding, P., & Boivin, M. J. (2013). The assessment of

neuropsychological outcomes in pediatric severe malaria.

In M. J. Boivin & B. Giordani (Eds.), Neuropsychology

of children in Africa: Perspectives on risk and resilience, spe-

cialty topics in pediatric neuropsychology (pp. 235–275).

New York, NY: Springer.

Holding, P., & Kitsao-Wekulo, P. (2009). Is assessing

participation in daily activities a suitable approach for

measuring the impact of disease on child development

in African children? Journal of Child & Adolescent

Mental Health, 21, 127–138. doi:10.2989/jcamh.2009.21.

2.4.1012

Holding, P. A., Taylor, H. G., Kazungu, S. D., Mkala, T.,

Gona, J., Mwamuye, B., Mbonani, L., & Stevenson, J.

(2004). Assessing cognitive outcomes in a rural African

population: Development of a neuropsychological battery

in Kilifi District, Kenya, Journal of the International

Neuropsychological Society, 10, 246–260. doi:10.1017/

s1355617704102166

Hui, C. H., & Triandis, H. C. (1985). Measurement in cross-

cultural psychology. A review and comparison of strategies,

Journal of Cross-Cultural Psychology, 16, 131–152.

doi:10.1177/0022002185016002001

Kitsao-Wekulo, P., Holding, P., Taylor, H. G., Abubakar, A.,

Kvalsvig, J., & Connolly, K. (2012). Neuropsychological

testing in a rural African school-age population: Evaluating

contributions to variability in test performance, Assessment,

20, 776–784. doi:10.1177/1073191112457408

Osterrieth, P. A. (1944). Filetest de copie d'une figure complex:

Contribution a l'etude de la perception et de la memoire.

[The test of copying a complex figure: A contribution to

the study of perception and memory]. Archives de Psychologie

30, 286–356.

Pharoah, P.O. and Connolly, K.J. (1993) Effects of maternal

iodine supplementation during pregnancy. Arch Dis Child.

Jan. 66(1), 145–147. PMCID: PMC1793189

Rey, A. (1941). L'examen psychologique dans les cas

d'encéphalopathie traumatique. (Les problems.) [The

psychological examination in cases of traumatic ence-

pholopathy. (Problems.)] Archives de Psychologie, 28,

215–285.

Ritchie, S. J., Bates, T. C., & Deary, I. J. (2015). Is education

associated with improvements in general cognitive ability,

or in specific skills? Developmental Psychology, 51,

573–582. doi:10.1037/a0038981

United Nations Educational, Scientific, and Cultural Organi-

zation (UNESCO), Institute for Statistics. (2013). Retrieved

from http://www.uis.unesco.org/DataCentre/Pages/country-

profile.aspx?code=TZA&regioncode=40540

United Nations Educational, Scientific, and Cultural Organi-

zation (UNESCO), Institute for Statistics. (2015). UNESCO

Bangladesh Survey, 2005. Retrieved from http://www.uis.

unesco.org/DataCentre/Pages/country-profile.aspx?code=

TZA&regioncode=40540.

UWEZO Tanzania Tests (2012). Adult and youth literacy, 1990–

2015: Analysis of data for 41 selected countries. Retrieved

from http://www.uis.unesco.org/literacy/Documents/UIS-

literacy-statistics-1990-2015-en.pdf.

12 P. HOLDING ET AL.

Downloaded by [Melba Gomes] at 11:35 27 July 2016

Van de Vijver, F. J. R. (1997). Meta-analysis of cross-cultural

comparisons of cognitive test performance, Journal of

Cross-Cultural Psychology, 28, 678–709. doi:10.1177/

0022022197286003

Van de Vijver, F. J. R. (2002). Inductive reasoning in Zambia,

Turkey, and The Netherlands: Establishing cross-cultural

equivalence, Intelligence, 30, 313–351. doi:10.1016/s0160-

2896(02)00084-3

Van de Vijver, F. J. R., & Leung, K. (1997). Methods and data

analysis for cross-cultural research. Newbury Park, CA:

Sage.

Werner, O., & Campbell, D. T. (1970). Translating, working

through interpreters, and the problem of decentering. In

R. Naroll & R. Cohen (Eds.), A handbook of cultural

anthropology (pp. 398–419). New York, NY: American

Museum of Natural History.

APPLIED NEUROPSYCHOLOGY: CHILD 13

Downloaded by [Melba Gomes] at 11:35 27 July 2016

Cross-Country (Brazil and Iran) Invariance of Fractionation of Executive Functions in Early Adolescence

Article

Aug 2023

Cultural background can influence cognition, including executive functions (EFs), abilities that encompass skills responsible for selfregulation of thoughts and behavior. The seminal unity and diversity model of EFs proposes the existence, in adulthood, of at least three correlated but separable EF latent (shared variance in more than one task/indicator) domains: inhibition, updating and shifting. However, evidence of the cross-cultural generality of this framework is lacking, especially in adolescence, an age during which these domains become more clearly separable. We tested whether this EF fractionation could be observed in early adolescents (9- to 15-year-olds) from metropolitan areas in Brazil (São Paulo) and Iran (Tehran) (total sample: 739; 407 Iranians; 358 girls). Participants carried out two open-access tasks that are representative of each EF domain and that were adapted to each cultural context. Seven latent model configurations were tested. The three-correlated latent factor structure had adequate fit, and multiplegroup confirmatory factor analysis invariance testing showed invariance for country at the level of the latent factor structure (configural), factor loadings (metric), and partial invariance at the intercept (scalar) level. Iranians had higher scores in all domains. Multiple indicators multiple causes invariance testing showed model invariance across age (except for one task) and parental education. Performance in all domains improved with age and only minimally with parental schooling. We conclude that EF fractionation into three domains is present in the first half of adolescence in two samples from underrepresented populations in the literature, suggesting a potential generality of EF latent unity/diversity development at this age.

Tests of measurement invariance of three Wechsler intelligence tests in economically developing nations in South Asia and Sub-Saharan Africa

Article

Full-text available

Aug 2023

Russell Warne

Tests of measurement invariance are essential to determining whether individual scores or group averages are comparable across populations. While international comparisons of mean IQ scores are common, tests of measurement invariance for intelligence test batteries (necessary for comparisons to be empirically supported) are rare. In this study, four archival sets of Wechsler test IQ scores from Ghana, Kenya, Pakistan, and Sudan were tested for measurement invariance when compared to the American norm data for the same tests. Results indicate that two datasets – from Ghana and Kenya – demonstrated strict measurement invariance. However, the other two data sets failed to meet the requirements of scalar or strict measurement invariance, which indicates that global IQ scores from these latter data sets cannot be compared to American IQ scores on the same tests. Tests of measurement invariance should be regularly conducted when making comparisons of scores between industrialized and economically developing nations.

Is Neurodevelopmental Assessment in Early Childhood Predictive of Performance Assessed Later in Childhood and Adolescence in Sub-Saharan Africa? A Systematic Review of the Literature

Article

Full-text available

Jul 2023
ARCH CLIN NEUROPSYCH

Background Most neurodevelopmental tests used to assess child development in sub-Saharan Africa were developed in western or high-income countries, raising the question of their usefulness with African children. Objective This systematic review identified and synthesized key findings from studies measuring development in children in Sub-Saharan Africa in early childhood and again at school age, to assess neurocognitive associations longitudinally from infancy through middle childhood. Methods The study was based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses method, selecting articles referenced in the PubMed, PsycInfo, and Embase databases using the following inclusion criteria: published between 2000 and 2022, written in French or English, and presenting results dealing with the objective assessment of child’s neurodevelopment. All articles were registered in the Zotero reference manager and analyzed by title, abstract, and full text. Results Several of the seven selected studies confirmed that attention and working memory in infancy can predict children’s neurocognitive performance, including mathematical ability, at school age. In two of the studies, children with poor mental development at 1 year of age are more likely to present with poorer behavioral development at school age, including learning difficulties in school and risk for grade repetition. Conclusion Cognitive ability assessed in early childhood is strongly associated with performance at school age in cohorts of African children followed longitudinally. Even with assessments adapted cross-culturally, infants and preschoolers at risk for poor developmental outcomes can be identified to better receive strategic early interventions to enhance their development.

Cross-country (Brazil and Iran) invariance of fractionation of executive functions in early adolescence

Preprint

Dec 2022

Cultural background can influence cognition, including executive functions (EFs), abilities that encompass skills responsible for self-regulation of thoughts and behaviour. The seminal unity and diversity model of EFs proposes the existence, in adulthood, of three correlated but separable EF latent (shared variance in more than one task/indicator) domains: inhibition, updating and shifting. However, evidence of the cross-cultural generality of the development of this framework is lacking, especially regarding adolescence, an age during which these domains become more clearly separable. We tested whether EF unity/diversity could be observed in early adolescents (9-15-year-olds) from Brazil and Iran (total sample: 739; 407 Iranians; 358 girls). Participants carried out two open-access tasks that are representative of each EF domain and that were adapted to each cultural context. Seven latent model configurations were tested. The three-correlated latent factor structure had adequate fit and multiple-group confirmatory factor analysis invariance testing showed invariance for country at the level of the latent factor structure (configural invariance), factor loadings (metric invariance), and partial invariance at the intercept (scalar) level. Iranians had higher scores in all domains. Multiple indicators multiple causes invariance testing showed model invariance across age (except for one task) and parental education and that performance in all domains improved with age and only minimally with parental schooling. We conclude that EF fractionation into three domains is already present in the first half of adolescence in two samples from underrepresented populations in the literature, suggesting a potential generality of EF latent unity/diversity development at this age.

National Mean IQ Estimates: Validity, Data Quality, and Recommendations

Article

Full-text available

Dec 2022

Russell Warne

Estimates of mean IQ scores for different nations have engendered controversy since their first publication in 2002. While some researchers have used these mean scores to identify relationships between the scores and other national-level variables (e.g., economic and health variables) or test theories, others have argued that the scores are without merit and that any study using them is inherently and irredeemably flawed. The purpose of this article is to evaluate the quality of estimates of mean national IQs, discuss the validity of different interpretations and uses of the scores, point out shortcomings of the dataset, and suggest solutions that can compensate for the deficiencies in the data underpinning the estimated mean national IQ scores. My hope is that the scientific community can chart a middle course and reject the false dichotomy of either accepting the scores without reservation or rejecting the entire dataset out of hand.

Chronic neuropsychiatric sequelae of SARS-CoV-2: Protocol and methods from the Alzheimer's Association Global Consortium Elizabeta Mukaetova-Ladinska 3 Ekkehart Staufenberg 33 Mirjam I. Geerlings 40 Debby Tsuang 30 Nino Valishvili 64 Srishti Shrestha 65

Article

Full-text available

Sep 2022

Chronic neuropsychiatric sequelae of SARS-CoV-2: Protocol and methods from the Alzheimer's Association Global Consortium

Article

Full-text available

Sep 2022

Introduction: Coronavirus disease 2019 (COVID-19) has caused >3.5 million deaths worldwide and affected >160 million people. At least twice as many have been infected but remained asymptomatic or minimally symptomatic. COVID-19 includes central nervous system manifestations mediated by inflammation and cerebrovascular, anoxic, and/or viral neurotoxicity mechanisms. More than one third of patients with COVID-19 develop neurologic problems during the acute phase of the illness, including loss of sense of smell or taste, seizures, and stroke. Damage or functional changes to the brain may result in chronic sequelae. The risk of incident cognitive and neuropsychiatric complications appears independent from the severity of the original pulmonary illness. It behooves the scientific and medical community to attempt to understand the molecular and/or systemic factors linking COVID-19 to neurologic illness, both short and long term. Methods: This article describes what is known so far in terms of links among COVID-19, the brain, neurological symptoms, and Alzheimer's disease (AD) and related dementias. We focus on risk factors and possible molecular, inflammatory, and viral mechanisms underlying neurological injury. We also provide a comprehensive description of the Alzheimer's Association Consortium on Chronic Neuropsychiatric Sequelae of SARS-CoV-2 infection (CNS SC2) harmonized methodology to address these questions using a worldwide network of researchers and institutions. Results: Successful harmonization of designs and methods was achieved through a consensus process initially fragmented by specific interest groups (epidemiology, clinical assessments, cognitive evaluation, biomarkers, and neuroimaging). Conclusions from subcommittees were presented to the whole group and discussed extensively. Presently data collection is ongoing at 19 sites in 12 countries representing Asia, Africa, the Americas, and Europe. Discussion: The Alzheimer's Association Global Consortium harmonized methodology is proposed as a model to study long-term neurocognitive sequelae of SARS-CoV-2 infection.

Whose child is it? A psychological perspective on responsibility and accountability in decision making on nurturing care in early childhood

Article

Full-text available

May 2024

Development of working memory and inhibitory control in early childhood: Cross-sectional analysis by age intervals and gender in Ecuadorian preschoolers

Article

Full-text available

May 2024
PLOS ONE

Working memory (WM) and inhibitory control (IC) play a crucial role in learning during early childhood. The literature suggests a non-linear developmental trajectory of executive functions (EFs) with varied results according to gender, usually attributed to environmental factors. However, there is insufficient and inconclusive data on whether this pattern is reproduced in the Latin American preschool population since most studies have been conducted in English-speaking, European, and Asian environments. Thus, objectively comparing children’s executive performance across diverse international geographical contexts becomes challenging. This study aimed to conduct a cross-sectional analysis of the performance in WM and IC of 982 Ecuadorian preschoolers aged between 42 and 65 months (M = 53.71; SD = 5.714) and belonging to medium-high, medium, and low-medium socioeconomic strata. The participants consisted of 496 boys (M = 53.77; SD = 5.598) and 486 girls (M = 53.65; SD = 5.834), representing nine cities in Ecuador. To assess the effect of age and gender on performance in these two domains, the sample was divided into four 6-month age intervals. Two tests were administered to the participants, and a survey was conducted with 799 of their usual caregivers. Viewing the cross-sectional mean scores of the WM and IC tests as a temporal continuum reveals an upward trend in each age interval studied. Girls outperformed boys on the IC test, showing statistically significant differences in the earliest age interval. The gender differences in executive performance reported in the literature emphasize the need to explore the modulating effect of environmental variables on early childhood development. This information could offer valuable insights for adapting and optimizing cognitive and didactic strategies in early childhood tailored to the characteristics and needs of the preschool population.

The cross-cultural generalizability of cognitive ability measures: A systematic literature review.

Article

Mar 2023

Christopher Wilson

Is Education Associated With Improvements in General Cognitive Ability, or in Specific Skills?

Article

Full-text available

Mar 2015

Previous research has indicated that education influences cognitive development, but it is unclear what, precisely, is being improved. Here, we tested whether education is associated with cognitive test score improvements via domain-general effects on general cognitive ability (g), or via domain-specific effects on particular cognitive skills. We conducted structural equation modeling on data from a large (n = 1,091), longitudinal sample, with a measure of intelligence at age 11 years and 10 tests covering a diverse range of cognitive abilities taken at age 70. Results indicated that the association of education with improved cognitive test scores is not mediated by g, but consists of direct effects on specific cognitive skills. These results suggest a decoupling of educational gains from increases in general intellectual capacity.

Measurement in Cross-Cultural PsychologyA Review and Comparison of Strategies

Article

Full-text available

Jun 1985

Notions of equivalence in cross-cultural measurement were related to the abstraction-concreteness and the universality-cultural difference continua. Various methods proposed for attaining satisfactory measurement were reviewed and compared within this framework. Each strategy has its own merits and shortcomings. Moreover, the level of cross-cultural equivalence presupposed, the type of equivalence demonstrated and/or improved, and the equivalence assumptions doubted or explicitly rejected are different for different strategies. It was suggested that the strategies are complementary to each other. More than one strategy should be employed and combined for more meaningful and precise measurement.

Berry, J. W., Poortinga, Y. H., Segall, M. H., & Dasen, P. R. (2002). Cross-cultural psychology: research and applications (Second, revised edition).

Book

Full-text available

Jan 2002

1. Introduction Part I. Similarities and Differences in Behavior across Cultures: 2. Individual development: infancy and early childhood 3. Individual development: childhood, adolescence and adulthood 4. Social behavior 5. Personality 6. Cognition 7. Emotion 8. Language 9. Perception Part II. Relationships between Behavior, Culture and Biology: 10. Contributions of cultural anthropology 11. Contributions of evolutionary biology 12. Methodology and theory Part III. Applying Research Findings across Cultures: 13. Acculturation 14. Intercultural relations 15. Intercultural communication and training 16. Work and organizations 17. Health 18. Culturally informed and appropriate psychology Epilogue Glossary of key terms.

Statistical power analysis for the behavioral sciences

Book

Jan 1988

J. Cohen

Filetest de copie d'une figure complex: Contribution a l'etude de la perception et de la memoire [the test of copying a complex figure: A contribution to the study of perception and memory]

Article

Jan 1994

P.A. Osterrieth

The Assessment of Neuropsychological Outcomes in Pediatric Severe Malaria

Chapter

Apr 2013

Cerebral malaria is one of the most common childhood encephalopathies, with severe malaria accounting for a significant number of hospital admissions in endemic malarial areas. This overview is guided by the assumption that the neuropsychological effects of severe malaria should be considered as part of a syndrome. This is because the proposed neuropathological mechanisms and neuropsychological outcomes implicate a variety of pathways for risk and resilience. Our systematic review of these brain/behavior effects is embedded within a consideration of the effects of a complex web of poverty, contributing more distal neuropsychological risk (malnutrition) and protective (education) factors to the proximal effects of the disease itself. A structural equation model is used across four country settings to consider both disease-related and environmental/family influences on the neuropsychology of children exposed to malaria. Taken together this review illustrates how a co-constructivist framework provides a cohesive and systematic manner from which to describe the neuropsychology of one of the most significant public-health threats to African children.

Where there are no tests: A systematic approach to test adaptation

Article

Apr 2009

Le test de copie d'une figure complex: Contribution a l'etude de la perception et de la memoire

Article

Jan 1944
Arch Psychol

P.A. Osterrieth

Translating, Working Through Interpreters, and the Problem of Decentering

Chapter

How Much Does Schooling Influence General Intelligence and Its Cognitive Components? A Reassessment of the Evidence

Article

Sep 1991

Stephen Ceci

This is a review of the relationship between schooling, IQ, and the cognitive processes presumed to underpin IQ. The data suggest that much of the causal pathway between IQ and schooling points in the direction of the importance of the quantity of schooling one attains (highest grade successfully completed). Schooling fosters the development of cognitive processes that underpin performance on most IQ tests. In western nations, schooling conveys this influence on IQ and cognition through practices that appear unrelated to systematic variation in quality of schools. If correct, this could have implications for the meaning one attaches to IQ in screening and prediction as well as for efforts to influence the development of IQ through changes in schooling practices.