ArticlePDF Available

Genetic relatedness of indigenous ethnic groups in northern Borneo to neighboring populations from Southeast Asia, as inferred from genome-wide SNP data

Authors:

Abstract

The region of northern Borneo is home to the current state of Sabah, Malaysia. It is located closest to the southern Philippine islands and may have served as a viaduct for ancient human migration onto or off of Borneo Island. In this study, five indigenous ethnic groups from Sabah were subjected to genome-wide SNP genotyping. These individuals represent the “North Borneo”-speaking group of the great Austronesian family. They have traditionally resided in the inland region of Sabah. The dataset was merged with public datasets, and the genetic relatedness of these groups to neighboring populations from the islands of Southeast Asia, mainland Southeast Asia and southern China was inferred. Genetic structure analysis revealed that these groups formed a genetic cluster that was independent of the clusters of neighboring populations. Additionally, these groups exhibited near-absolute proportions of a genetic component that is also common among Austronesians from Taiwan and the Philippines. They showed no genetic admixture with Austro-Melanesian populations. Furthermore, phylogenetic analysis showed that they are closely related to non–Austro-Melansian Filipinos as well as to Taiwan natives but are distantly related to populations from mainland Southeast Asia. Relatively lower heterozygosity and higher pairwise genetic differentiation index (FST) values than those of nearby populations indicate that these groups might have experienced genetic drift in the past, resulting in their differentiation from other Austronesians. Subsequent formal testing suggested that these populations have received no gene flow from neighboring populations. Taken together, these results imply that the indigenous ethnic groups of northern Borneo shared a common ancestor with Taiwan natives and non–Austro-Melanesian Filipinos and then isolated themselves on the inland of Sabah. This isolation presumably led to no admixture with other populations, and these individuals therefore underwent strong genetic differentiation. This report contributes to addressing the paucity of genetic data on representatives from this strategic region of ancient human migration event(s).
Received: 9 August 2017 Revised: 7 January 2018 Accepted: 18 January 2018
DOI: 10.1111/ahg.12246
ORIGINAL ARTICLE
Genetic relatedness of indigenous ethnic groups in northern
Borneo to neighboring populations from Southeast Asia,
as inferred from genome-wide SNP data
Chee Wei Yew1Mohd Zahirul Hoque2Jacqueline Pugh-Kitingan3
Alexander Minsong4Christopher Lok Yung Voo1Julian Ransangan5
Sophia Tiek Ying Lau1Xu Wang6Wo ei Yuh Saw6Rick Twee-Hee Ong6,7
Yik-Ying Teo6,7, 8,9,10 Shuhua Xu11,12,13 Boon-Peng Hoh14,15 Maude E. Phipps16
S. Vijay Kumar1
1Biotechnology Research Institute, Universiti Malaysia Sabah, Jalan UMS, Sabah, Malaysia
2Faculty of Medicine and Health Sciences, Universiti Malaysia Sabah, Jalan UMS, Sabah, Malaysia
3Holder of the Kadazandusun Chair, Universiti Malaysia Sabah, Jalan UMS, Sabah, Malaysia
4Faculty of Humanities, Arts & Heritage, Universiti Malaysia Sabah, Jalan UMS, Sabah, Malaysia
5Borneo Marine Research Institute, Universiti Malaysia Sabah, Jalan UMS, Sabah, Malaysia
6Department of Statistics and Applied Probability, Faculty of Science, National University of Singapore, Singapore
7Saw Swee Hock School of Public Health, National University of Singapore, Singapore
8NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore
9Life Sciences Institute, National University of Singapore, Singapore
10Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore
11Max Planck Independent Research Group on Population Genomics, Chinese Academy of Sciences and Max Planck Society Partner Institute for Computa-
tional Biology (PICB), Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
12School of Life Science and Technology, ShanghaiTec University, Shanghai, China
13Collaborative Innovation Centre of Genetics and Development, Shanghai, China
14Institute for Molecular Medical Biotechnology, Universiti Teknologi MARA, Selangor, Malaysia
15Faculty of Medicine and Health Sciences, UCSI University, Kuala Lumpur, Malaysia
16Jeffrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Selangor, Malaysia
Correspondence
S. VijayKumar, Biotechnology Research
Institute, Universiti MalaysiaSabah, Jalan
UMS, 88400 Kota Kinabalu, Sabah, Malaysia.
Email: vijay@ums.edu.my
Funding information
This project was funded by the National
Biotechnology Division of the Ministry of Sci-
ence, Technologyand Innovation of Malaysia
(project code: 100-RMI/BIOTEK 16/6/2 B
(1/2011)), and by the Ministry of Higher Edu-
cation of Malaysia (Fundamental Research
Grant Scheme, project code: FRG0449-STG-
1/2016)
Abstract
The region of northern Borneo is home to the current state of Sabah, Malaysia. It is
located closest to the southern Philippine islands and may have served as a viaduct
for ancient human migration onto or off of Borneo Island. In this study, five indige-
nous ethnic groups from Sabah were subjected to genome-wide SNP genotyping.
These individuals represent the “North Borneo”-speaking group of the great Aus-
tronesian family. They have traditionally resided in the inland region of Sabah. The
dataset was merged with public datasets, and the genetic relatedness of these groups
to neighboring populations from the islands of Southeast Asia, mainland Southeast
Asia and southern China was inferred. Genetic structure analysis revealed that these
Ann Hum Genet. 2018;1–11. © 2018 John Wiley & Sons Ltd/University College London 1wileyonlinelibrary.com/journal/ahg
2YEW ET AL.
groups formed a genetic cluster that was independent of the clusters of neighbor-
ing populations. Additionally, these groups exhibited near-absolute proportions of
a genetic component that is also common among Austronesians from Taiwan and
the Philippines. They showed no genetic admixture with Austro-Melanesian popu-
lations. Furthermore, phylogenetic analysis showed that they are closely related to
non–Austro-Melansian Filipinos as well as to Taiwan natives but are distantly related
to populations from mainland Southeast Asia. Relatively lower heterozygosity and
higher pairwise genetic differentiation index (FST) values than those of nearby pop-
ulations indicate that these groups might have experienced genetic drift in the past,
resulting in their differentiation from other Austronesians. Subsequent formal test-
ing suggested that these populations have received no gene flow from neighboring
populations. Taken together, these results imply that the indigenous ethnic groups of
northern Borneo shared a common ancestor with Taiwan natives and non–Austro-
Melanesian Filipinos and then isolated themselves on the inland of Sabah. This iso-
lation presumably led to no admixture with other populations, and these individuals
therefore underwent strong genetic differentiation. This report contributes to address-
ing the paucity of genetic data on representatives from this strategic region of ancient
human migration event(s).
KEYWORDS
Genetic structure, genome-wide SNPs, indigenous ethnic groups, northern Borneo
1INTRODUCTION
The island of Borneo is replete with multiple ethnic groups
characterized by diverse languages and cultures as well as
high genetic diversity, across the three countries that share
the island: Brunei, Indonesia, and Malaysia. It is a unique
island that is isolated from the mainland of Southeast Asia
yet is part of the Sunda continental shelf. Sea level rise during
the last glacial maximum flooded the savanna corridor, which
might have served as a route of dispersal for modern humans
to this region (Bird, Taylor, & Hunt, 2005). This flooding
also resulted in refugial rainforests, where many fauna and
flora native to the island flourished (Cannon, Morley, & Bush,
2009).
The northern region of Borneo, historically known as North
Borneo, is currently home to the state of Sabah, Malaysia.
Northern Borneo is located nearest to the South Philippine
Islands and might have served as a viaduct for human migra-
tion event(s) such as ‘’Out of Taiwan” toward the archipelago
of Southeast Asia (i.e., the Philippines, Borneo, Indonesia,
and Papua New Guinea) (Jinam et al., 2012a; Lipson et al.,
2014; Mӧrseburg et al., 2016; Tabbada et al., 2010). This
north-to-south direction of migration has been questioned,
and a south-to-north migration from the islands of South-
east Asia toward mainland Southeast Asia (e.g., Thailand
and Cambodia) and southern China has been proposed (The
HUGO Pan-Asian SNP Consortium, 2009). The “Out of
Taiwan” hypothesis has been debated, despite the fact that
it was derived from multidisciplinary data (as discussed by
Denham & Donohue, 2012). Nevertheless, it is commonly
accepted that Southeast Asia was initially populated by the
Austro-Melanesians (Barker et al., 2007; Détroit et al., 2004;
Matsumura & Pookajorn, 2005). Subsequent Neolithic disper-
sals from mainland Southeast Asia or Taiwan left traces of
genetic admixture in the extant populations in Southeast Asia
(The HUGO Pan-Asian SNP Consortium, 2009; Delfin et al.,
2011; Rasmussen et al., 2011; Lipson et al., 2014; Matsumura
& Oxenham, 2014; Brand¯
ao et al., 2016; Soares et al., 2016;
Corny et al., 2017; Wall, 2017).
Similarly, archaeological findings have shown that Borneo
Island was initially populated by Austro-Melanesians (Abdul-
lah, 2005; Barker et al., 2007). However, the genetic related-
ness of the extant populations in northern Borneo to Austro-
Melanesians and other neighboring populations in Southeast
Asia remains unclear. Previous population genetic studies
using genome-wide single nucleotide polymorphisms (SNPs)
have often involved sparse sampling from Borneo Island. In
addition to the Iban and Bidayuh ethnic groups from Sarawak,
the Dayak, Lebbo and an unnamed population from Kaliman-
tan, and the Dusun and Murut from Brunei, no representa-
tive ethnic groups from northern Borneo have been reported
(Li et al., 2008b; Mallick et al., 2016; Mӧrseburg et al., 2016;
Pagani et al., 2016 Pierron et al., 2014; The HUGO Pan-Asian
YEW ET AL.3
FIGURE 1 Location of northern Borneo (Malaysian Sabah) and the approximate distribution of indigenous ethnic groups by linguistic family.
the majority of the Dusunic groups are dispersed from the west coast to central regions; the Paitanic groups are found in the northeast region; and the
Murutic groups are found in the southwest of Sabah [Colour figure can be viewed at wileyonlinelibrary.com]
SNP Consortium, 2009; Wollstein et al., 2010; Xing et al.,
2009). The lack of genetic data from northern Borneo (Sabah)
represents a void in our understanding of the migration history
of the archipelago.
Sabah is home to more than 40 ethnic groups, including 32
indigenous Austronesian groups (King & King, 1984). Lin-
guistic studies summarized by Lewis, Simons, and Fenning
(2015) revealed that the “North Borneo” language family,
under the great Austronesian superfamily, extends from the
southern islands of the Philippines to a vast area of Sabah,
reaching the interior regions of Sarawak and Kalimantan.
There are three major “North Borneo” language subfami-
lies within Sabah: Dusunic, Paitanic, and Muritic (Figure 1).
The Dusunic family is spoken by the major population in the
state, who have been officially categorized under the umbrella
grouping ‘”Kadazandusun,” based on similarities of sociocul-
tural traditions. This broad categorization has been used in
several medical genetics studies (Goh, Chong, Chua, Chuah,
& Lee, 2014; Teh et al., 2015). However, ethnic groups that
are similar in culture and language are not always genetically
homogenous (Helgason, Yngvadóttir, Hrafnkelsson, Gulcher,
& Stefánsson, 2004; The HUGO Pan-Asian SNP Consor-
tium, 2009; Xu et al., 2010). Disregarding plausible under-
lying genetic structure among related ethnic groups presents
the risk of faulty interpretation of the results of genome-wide
association studies of diseases (Patterson et al., 2010; Price
et al., 2008; Price, Zaitlen, Reich, & Patterson, 2010).
We chose to investigate the genetic background of
five unique indigenous ethnic groups representing the
Dusunic- (Dusun, Rungus, and Sonsogon), Paitanic- (Sungai-
Lingkabau), and Murutic-speaking (Murut-Paluan) groups
in northern Borneo. To our knowledge, this study is the first
genome-wide SNP genotyping investigation of the indigenous
Rungus, Sonsogon, and Sungai-Lingkabau ethnic groups in
this region. Analyses of the genetic structure of the Dusun and
Murut from Brunei were previously reported by Mӧrseburg
et al. (2016). This study aimed to characterize the genetic
structure among the indigenous ethnic groups of northern
Borneo and to infer their genetic relatedness to other neigh-
boring populations from islands in Southeast Asia, mainland
Southeast Asia, and southern China. This new dataset will be
useful for inferring human migration history in this region and
for genome-wide association studies of diseases in the future.
2MATERIALS AND METHODS
2.1 Sample collection and SNP genotyping
Ethical clearance was obtained from the Medical Research
Ethics Committee of the Universiti Malaysia Sabah (ref. no.:
JKEtika 4/10(3)). Subsequently, approvals were obtained
from the District Officers of Ranau, Pitas, Kota Marudu
and Nabawan for collecting blood samples from the fol-
lowing indigenous communities: Dusun (Dusunic), Rungus
4YEW ET AL.
(Dusunic), Sonsogon (Dusunic), Sungai-Lingkabau (Pai-
tanic), and Murut-Paluan (Murutic) (Figure 1). Prior to
any sampling activity, approval was also obtained from the
respective village chiefs and chairpersons of the Committee
for Village Development and Security. Informed consent
forms were then signed by healthy volunteers. Finally,
10 mL of peripheral blood was collected and stored in tubes
containing acid citrate dextrose as an anticoagulant.
A total of 117 samples from five indigenous ethnic groups
were obtained (Dusun n=21, Rungus n=22, Sonsogon
n=24, Sungai-Lingkabau n=28, and Murut-Paluan n=22).
Genomic DNA was isolated from whole blood or buffy coat
using the DNeasy Blood and Tissue kit (Qiagen). Next, 200 ng
of intact genomic DNA was used for genotyping with Illu-
mina's Human Omni2.5 bead chip array, containing 2,379,855
genome-wide SNP markers, as described by the manufac-
turer's protocol.
2.2 Quality assessment of data
Calling of SNP genotypes was performed in Genome Studio
(Illumina) with the default GenCall score of 0.15. The orienta-
tion of each SNP was then flipped to the positive strand with
respect to the reference genome, in accordance to the infor-
mation available in the manifest file provided by the man-
ufacturer. Sample quality assessment was then conducted to
exclude samples (i) exhibiting a <98% call rate, (ii) showing
discrepancies in reported sex, or (iii) coming from first-degree
relatives (Aghakhanian et al., 2015), as determined with the
program KING (Manichaikul et al., 2010). Next, principal
component analysis (smartPCA) was conducted using the
EIGEN 6.0 package (Patterson, Price, & Reich, 2006) to iden-
tify individuals who might be admixed within these five ethnic
groups.
A total of 98 individuals (Dusun n=20, Rungus
n=20, Sonsogon n=19, Sungai-Lingkabau n=19,
and Murut-Paluan n=20) were retained for subsequent
quality assessment. SNPs that (i) exhibited >5% uncalled
genotypes, (ii) deviated from Hardy–Weinberg equilibrium
(P-value <2.14 ×108after Bonferroni correction), or (iii)
were not located on autosomes were removed. After these
steps, a total of 2,274,632 SNPs were retained. The qual-
ity assessment steps described above were conducted using
PLINK (Purcell et al., 2007).
2.3 Analysis of genetic relatedness with
neighboring populations
i) Merging public datasets
This new dataset, referred to as the Northern Bornean
dataset, was then merged with four public datasets: HapMap
(YRI, CEU, GIH, CHB, and JPT only), the Human Genome
Diversity Project (HGDP, selected continental populations),
the Singapore Genome Variation Project (SGVP), and the
HUGO Pan-Asian SNP Consortium (PASNP) (Rosenberg,
2006; Teo et al., 2009; The International HapMap 3 Con-
sortium, 2010; Yang et al., 2011). This merged dataset con-
sisted of 1669 individuals from 89 worldwide populations
(Table S1) and was further subdivided into 57 regional
populations originating from only East Asia or Southeast
Asia. The Austro-Melanesians were excluded (Table S1).
Because different genotyping panels from different vendors
were employed to generate these datasets, the number of SNPs
shared by these merged populations was only 8,945 after SNP
quality assessment as described above. Finally, SNPs that
were in linkage disequilibrium (r2>0.2) were removed with
PLINK, resulting in the retention of 7,634 unlinked SNPs that
were used for subsequent analyses.
Inclusion of the PASNP dataset (55,000 SNPs) rendered
the great reduction of the number of overlapping SNPs. Never-
theless, PASNP dataset is vital to be included in this work, as it
provides the most comprehensive representation of Southeast
Asian populations (both mainland and insular), in compari-
son to other newly available datasets such as those from David
Reich's lab (Lazaridis et al., 2014, 2016), the Simons Genome
Diversity project (Mallick et al., 2016) and the Estonian Bio-
centre Human Genome Diversity Panel (Pagani et al., 2016)
datasets. Since the aim of this work is to infer genetic rela-
tionships of Northern Borneans to neighboring populations,
inclusion of the PASNP dataset merits the drastic loss of
SNPs. Previous works that used approximately 8,000 SNPs
showed that the low number of overlapping SNPs is still ade-
quate to resolve the genetic structure of the Austronesians
(Lipson et al., 2014; Mӧrseburg et al., 2016; Pierron et al.,
2014). Furthermore, the genetic structure of the Northeast and
Southeast Asia populations was also resolved by Jinam et al.
(2012b), who used as little as approximately 4,300 SNPs.
ii) Genetic structure and admixture
Population structure and plausible genetic admixture were
inferred with smartPCA and ADMIXTURE (Alexander,
Novembre, & Lange, 2009), respectively. For PCA, a dot-plot
was drawn using the first and second principal components to
infer the genetic clustering among the Northern Borneans and
their genetic affiliation with worldwide and regional popula-
tions. For ADMIXTURE analysis, the optimum numbers of
hypothetical ancestral populations (K) among the worldwide
and regional populations were determined via (i) the cross-
validation test (–cv =10), (ii) inspection of the consistency
of each component over ascending K values, and (iii) denial
of higher K values that singled out a population (Behar et al.,
2010). Ten independent replicates with different random seeds
were performed. The consensus admixture pattern for each
individual from the respective populations was then generated
with CLUMPAK (Kopelmann et al., 2015). Finally, selected
YEW ET AL.5
populations were tested using admixture f3statistics, with
the “three population test,” in the ADMIXTOOLS package
(Patterson et al., 2012; Reich et al., 2009). A population was
considered admixed if the Z-score was 3 (Raghavan et al.,
2014). Outgroup f3statistics were also calculated using the
same package to test amounts of shared genetic drift between
individual Northern Bornean populations, and neighboring
and worldwide populations (Patterson et al., 2012; Raghavan
et al., 2014; Reich et al., 2009). The Mbuti was used as the
outgroup population. This test would help to confirm formally
the qualitative patterns observed via PCA and ADMIXTURE
analysis (Haak et al., 2015).
iii) Phylogenetic relationships
Allele frequencies for each bi-allelic SNPs were calculated.
The PHYLIP package (Felsenstein, 1989) was used to deter-
mine the phylogenetic relationships of the 89 worldwide pop-
ulations. The allele frequencies of all SNPs were sampled with
a total of 200 bootstrap replicates. The maximum likelihood
tree was then calculated with CONTML, with “global rear-
rangement” and “randomize input order of species” turned on.
The consensus phylogenetic tree was then drawn with MEGA
ver. 5.2 (Tamura, Dudley, Nei, & Kumar, 2007).
iv) Genetic heterozygosity and differentiation
To calculate genetic heterozygosity for each population, the
command “-het” in PLINK was executed to obtain the number
of nonmissing genotypes (N.NM) and the observed number
of homozygous (O.HOM) sites. The observed heterozygosity
(Ho) of each individual was then determined with the formula
Ho=(N.NM - O.HOM)/N.NM. The average Hofor each pop-
ulation was subsequently calculated. Additionally, calculation
of genetic differentiation, FST (Cavalli-Sforza, Menozzi, &
Piazza, 1994), was performed using the program SmartPCA,
with the ‘FST only’ parameter enabled.
3RESULTS
3.1 Genetic structure among Northern
Borneans
The first principal component resolved the five ethnic groups
of northern Borneo into three genetic clusters with the
following compositions: Muruts only, Dusun-Rungus, and
Sonsogon-Sungai (Figure S1). The second principal compo-
nent separated each individual by their ethnicity, albeit with
minor overlap. Subsequent assignment of genetic components
with ADMIXTURE identified only one genetic component
among Northern Borneans.
3.2 Genetic structure relative to worldwide
and regional populations
The PCA results for the 89 worldwide populations clustered
the Northern Borneans together with populations from Asia
(Figure S2), and they showed no overlap with the Austro-
Melanesians. According to the PCA of the 57 regional pop-
ulations (with Austro-Melanesians excluded) from Northeast
Asia and Southeast Asia, the Northern Borneans formed their
own genetic cluster (Figure 2). When only the Austronesians
were analyzed, a unique genetic cluster among the North-
ern Borneans, excluding the Murut-Paluans, was observed
(Figure S3). Murut-Paluans clustered away from the general
Northern Bornean cluster and were closest to the non–Austro-
Melanesian Filipinos, Taiwan natives, and some Indonesian
ethnic groups (e.g., Mentawai and Toraja).
Additionally, analysis of genetic components among the 89
worldwide populations showed that the Northern Borneans
carried the East Asian genetic component (dark green-
colored) at K =5 (Figure S4). At K =6, the Austrone-
sians predominantly carried the blue-colored component. At
K=7toK=9, another component (red-colored) that was
widespread among the southern Chinese minorities, mainland
Southeast Asians, and Austronesians was observed. However,
the proportion of this red-colored component among the Tai-
wan natives, Filipinos, and Northern Borneans was relatively
FIGURE 2 Principal component analysis of 57 regional popula-
tions from Northeast Asia to Southeast Asia. The Northern Borneans
formed a unique genetic cluster and were observed to be closest to popu-
lations from Taiwan and islands in Southeast Asia [Colour figure can be
viewed at wileyonlinelibrary.com]
6YEW ET AL.
low. Further analyses at K =10 and 11 revealed a light green-
colored component that was mainly found among southern
Chinese minorities and mainland Southeast Asians. Under
these conditions, the dark green-colored component was com-
monly found among only Asians from Northeast Asia, such as
Yakut, Mongols, Japanese, and northern Han Chinese. It was
also nearly absent among the Austronesians.
At K =12, the pink-colored component arose pre-
dominantly among the Austronesians. However, Northern
Borneans (except for Murut-Paluans) retained most of their
blue-colored genetic component, with only a small propor-
tion of the pink-colored component (Figure S4). In contrast,
Murut-Paluans received approximately half of their genetic
makeup from the pink-colored component. Notably, this blue-
colored component was widespread among the populations
from the islands of Southeast Asia and the Malays from
peninsular Malaysia and Singapore but was almost absent
from the majority of the populations from mainland Southeast
Asia. The Northern Borneans exhibited no admixture with
Austro-Melanesian populations, as they carried neither the
bright-green nor light-purple component, which were preva-
lent among Austro-Melanesians from peninsular Malaysia
and the islands of Southeast Asia, respectively. This obser-
vation was confirmed with the formal f3test (Table S2).
When the 57 regional populations (from which non-
Asians, Central and South Asians, and Austro-Melanesians
were excluded) were analyzed, a finer population structure
was revealed (Figure 3). Generally, the population structure
depicted at up to K =12 among the 89 worldwide popula-
tions was similar to that for K =6 in the 57 regional popula-
tions. From K =7toK=10, two additional genetic com-
ponents (beige- and pine green-colored components) were
observed, but these components were restricted to southern
Chinese minorities and mainland Southeast Asians. At K =9,
the Bidayuhs from Sarawak were observed to carry a unique
genetic component (blue-colored). These components were
present in negligible proportions in Northern Borneans. Addi-
tionally, Murut-Paluan was the only ethnic group that exhib-
ited a high proportion of the pink-colored component, which
was commonly found among Austronesians (Figure 3).
The outgroup f3statistics confirmed the genetic structure
of the Northern Borneans depicted by PCA and ADMIX-
TURE analyses. In general, all Northern Borneans (in excep-
tion of Murut-Paluan) shared the most genetic drift among
themselves, followed by the Taiwan natives, non–Austro-
Melanesian Filipinos and some Indonesian ethnic groups such
as Dayak, Toraja, and Mentawai (Figure S5). However, the
Murut-Paluan ethnic group shares the most genetic drift with
FIGURE 3 Genetic admixture analysis of 57 regional populations. At K =4, the dark-blue colored component was the predominant genetic
component among individuals from the islands of Southeast Asia and Taiwan natives (all are Austronesians). At K =5 and above, the Austronesians
carried three major genetic components, colored pink, red and dark blue. The Northern Borneans exhibited near-absolute proportions of the dark-
blue–colored component, which was present in low proportions in other Austronesians. The abbreviations DDS, DRG, DSO, PSG, and RPL refer to
Dusun, Rungus, Sonsogon, Sungai-Lingkabau and Murut-Paluan, respectively [Colour figure can be viewed at wileyonlinelibrary.com]
YEW ET AL.7
Ami (a Taiwan native group). Relatively, this test also revealed
that the Northern Borneans shared intermediate amounts
of genetic drift with the Daic-speaking groups in mainland
Southeast Asia and south China such as Jiamao, Zhuang, and
Tai-Khuen (Figure S5).
3.3 Phylogenetic relationships
All East Asians and Southeast Asians were grouped under a
common node in the phylogenetic tree for the 89 worldwide
populations (Figure S6). Populations from Northeast Asia,
southern China, and mainland Southeast Asia were grouped
into the same clade, whereas the Northern Borneans, Tai-
wan natives, non–Austro-Melanesian Filipinos, and some
Indonesian ethnic groups, such as the Toraja, Mentawai,
and Dayak groups, were placed in another clade. The
Austro-Melanesians from Indonesia and the Philippines, but
not those from peninsular Malaysia, were grouped into a
subclade together with the Austronesians mentioned above.
Intriguingly, the Bidayuhs from Sarawak were not grouped
with the Northern Borneans and Dayak (Kalimantan), but
with the Javanese and Temuan, who fell into the same clade
as the mainland Southeast Asians.
3.4 Reduced genetic heterozygosity and
increased FST index
Northern Borneans exhibited lower genetic heterozygosity
overall (0.3111–0.3161, Figure S7) than the other popu-
lations in Southeast Asia and southern China. Neverthe-
less, they were found to exhibit higher genetic heterozy-
gosity than the Taiwan natives (0.3065–0.3097), Mentawai
(0.3090), and Mlabri (0.2407). The population that exhib-
ited the highest heterozygosity was the Malay-Kelantan
population (0.3537). Additionally, the pairwise FST values
between Northern Borneans themselves were high (0.014–
0.027, Table S3) compared with those among the popula-
tions within Northeast Asia (e.g., CHB, JPT, Mongols), for
which the FST values ranged from 0.001 to 0.005. Intrigu-
ingly, the Northern Bornean groups showed lower FST val-
ues compared with Southeast Asians than between themselves
(Table S3).
4DISCUSSION
The division of Northern Borneans into three genetic clus-
ters indicated that the genetic structure among the five ethnic
groups of Northern Borneans is not always correlated with
linguistic groupings (Figure S1). The clustering of Sonso-
gons (Dusunic) and Sungai-Lingkabaus (Paitanic) could be
attributed to genetic admixture, as these groups reside close to
each other. This was further confirmed by their high amounts
of shared genetic drift (Figure S5).
In comparison with other neighboring populations, the
formation of a unique genetic cluster among the Northern
Borneans indicated that they are distinct from other groups
(Figure 2). These Northern Bornean populations are closely
related to Austronesians, particularly those from Taiwan and
the Philippines (Figure S3). Intriguingly, they are not closely
related to other Borneans [i.e., Bidayuh (Sarawak) and Dayak
(East Kalimantan)]. This is subsequently confirmed by out-
group f3statistics (Figure S5). In addition, the phylogenetic
tree showed that the Northern Borneans share a most recent
common ancestor with non–Austro-Melanesian Filipinos and
Taiwan natives rather than other Borneans (Figure S6). Simi-
larly, the Dusun and Murut from Brunei were also reported
to be closely related to Taiwan natives and non–Austro-
Melanesian Filipinos by Pagani et al. (2016) and Soares et al.
(2016). Notably, the topology of the phylogenetic tree in this
study is similar to the PASNP's tree (The HUGO Pan-Asian
SNP Consortium, 2009). Hence, the sharing of common node
with the Taiwan natives and non–Austro-Melanesians sug-
gests that the Northern Borneans are genetically related to dis-
persal of the Austronesians rather than to those from mainland
Southeast Asia.
Generally, Austronesians (inclusive of all Northern
Borneans) carry three major genetic components (the pink,
dark blue, and red components in Figure 3). The pink-
colored component is widespread among all Austronesians.
Individuals originating from Taiwan, the Philippines, and
northern Borneo carry a higher proportion of the blue-colored
component and a negligible proportion of the red-colored
component, but the opposite was found for individuals from
Indonesia, peninsular Malaysia, and mainland Southeast
Asia. The red-colored component could be attributed to
genetic admixture with Neolithic expansion from mainland
Southeast Asia, as suggested by Lipson et al. (2014) and
Hudjashov et al. (2017), whereas the dark-blue component
is restricted to populations from the islands of Southeast
Asia. The Northern Borneans from Malaysian Sabah (except
the Murut-Paluans) were found to carry the dark-blue
genetic component at near-absolute proportions (Figure 3).
In contrast, a similar genetic component with similar pro-
portions was not found in Dusun and Murut from Brunei.
Their genetic structure resembled to Lebbo (Kalimantan)
and Malay (Peninsular Malaysia) instead (Mӧrseburg et al.,
2016). This merits further analyses in the future. Hence, the
additional genetic component found in this study provides
new insights related to our current understanding of the
genetic structure of pan-Asian populations.
A decreasing gradient of the dark-blue–colored component
from Northern Borneans to other populations in Southeast
Asia indicates genetic sharing among these populations (Fig-
ure 3). However, the only statistically supported gene flow was
from Northern Borneans to the Malays in Singapore (MAS),
who are also Austronesians (Table S2). This finding supports
8YEW ET AL.
a shared ancestry among Austronesians, in line with previous
reports showing a distinctive genetic affinity among Austrone-
sians and non-Austronesians from mainland Southeast Asia
based on uniparental and autosomal genetic markers (He et al.,
2012; Hudjashov et al., 2017; Peng et al., 2010; Stoneking
& Delfin, 2010). However, this does not explain the sharing
of the dark-blue component in other Southern China minori-
ties and Southeast Asians who are not Austronesians. It has
been suggested that the proto-Austronesians were genetically
related to Neolithic populations, plausibly the Daic-speaking
group in southern China before they entered Taiwan approxi-
mately 6,000 years ago (Brand¯
ao et al., 2016; Li et al., 2008a;
Mirabal, Cadenas, Garcia-Bertrand, & Herrera, 2013). By
correlating the finding of the outgroup f3statistics (Figure
S5), the dark-blue component of this study may refer to this
common ancestry before the Austronesian expansion.
Interestingly, the other neighboring Borneans (i.e., the
Bidayuhs from Sarawak and the Dayaks from East Kaliman-
tan) do not share a similar genetic structure with the North-
ern Borneans. Instead, their genetic structure resembles that
of the Malays and populations from the islands of Suma-
tra and Java (Figure 3 and Figure S3). Jinam et al. (2012a)
inferred that the Bidayuh in Sarawak, which is located west
of Sabah, may have originated from southern China based on
their mitochondrial haplogroups. The Murut ethnic subgroups
extend into the interior land of Sarawak and Kalimantan (King
& King, 1984), which could explain the intermediate clus-
tering position of the Murut-Paluans between the Northern
Borneans and other Austronesians. The high proportion of
the pink-colored component found in the Murut-Paluans rel-
ative to the other Northern Borneans further supports this
inference.
Intriguingly, no trace of Negrito genetic components
(bright-green and light-purple components) was found in
any Bornean population (Northern Borneans, Bidayuhs, and
Dayaks) in the present study (Figure S4 and Table S2), even
though it has been proposed that this island was initially been
peopled by the former group based on archaeological findings
from Balambangan Cave in Sabah and Niah Cave in Sarawak
(Abdullah, 2005; Barker et al., 2007). Coupled with the pre-
vious findings on Bidayuh, we suggest that there were pos-
sibly at least two waves of historical migrations of modern
humans to Borneo Island after its initial peopling by Australo-
Melanesians: one involving Austronesians and another from
southern China or mainland Southeast Asia. As such, Borneo
Island might have served as the crossroads for these two waves
of human migration. Nevertheless, the extant populations of
northern Borneo show no genetic admixture with either the
Australo-Melanesians or people from the region of southern
China/mainland Southeast Asia. Unfortunately, the reason for
this remains elusive. As such, these findings suggest a unique
genetic structure of Northern Borneans compared with their
neighboring populations.
The low heterozygosity and high pairwise FST values
observed within Northern Borneans and between Northern
Borneans and other populations indicate that the Northern
Borneans might have experienced genetic drift in the past
(Figure S7 and Table S3). This was further confirmed with
the outgroup f3statistics (Figure S5). In particular, the Son-
sogons, a minority ethnic group with a dwindling size of
approximately 2,000–4,000 individuals, may have experi-
enced greater drift than the other groups, which could be
associated with their isolated residence, deep in the interior
lands of the Northeast Sabah. Genetic drift might have con-
tributed to the near-absolute proportion of the dark-blue com-
ponent, which is a common genetic component among Tai-
wan natives and populations from the islands of Southeast
Asia.
It is important to highlight that the genetic structure within
these five ethnic groups represents the tip of the iceberg of the
complete population structure of northern Borneo, as there are
multiple ethno-linguistic groups, such as the Bajau, Sea Gyp-
sies, Brunei, Suluk, Ida'an, Bugis, Banjar, Tidong, Lundayeh,
whose genetic structure has yet to be characterized. These
groups exhibit diverse historical or geographical links with
the surrounding lands and archipelagos of Borneo Island, and
such information will be invaluable for research on the his-
tory of human migration and genetic diseases in the islands of
Southeast Asia.
5CONCLUSION
This is the first report on the genetic relatedness of five indige-
nous ethnic groups from Sabah (northern Borneo) to neigh-
boring populations from Southeast Asia based on genome-
wide high-density SNP data. The Northern Borneans form
a unique genetic entity, and the data obtained in this study
therefore contribute to filling the current void regarding infor-
mation on populations from this region, which may have
served as a viaduct for human migration events onto or
off of Borneo Island. Notably, the Northern Borneans are
not admixed with the Austro-Melanesians, who once pop-
ulated this region, and they could have experienced strong
genetic drift in the past, rendering them distinctive from
other populations. Overall, the Northern Borneans are closely
related to Taiwan natives and non–Austro-Melansian Fil-
ipinos, rather than populations from other parts of Borneo
Island.
ACKNOWLEDGEMENTS
We are grateful to the communities and individuals who vol-
untarily participated in this research. We thank the Faculty of
Medicine and Health Sciences, Universiti Malaysia Sabah, for
assistance in sample collection.
YEW ET AL.9
COMPETING INTERESTS
The authors have declared that no competing interest exists.
AUTHOR CONTRIBUTIONS
SVK, JPK, BPH, MEP: conceived of and designed the experi-
ment; CWY, AM, MZH, JR, STYL, SVK: organized commu-
nity visits and sample recruitment; CWY: performed wet-lab
experiments; CWY, CLYV, XW, WYS, THO: performed data
analysis; CWY, SVK, BPH, THO, SHX, YYT: interpretation
of results; CWY, SVK: manuscript writing.
ORCID
Chee Wei Yew http://orcid.org/0000-0002-9378-2675
S. Vijay Kumar http://orcid.org/0000-0002-0384-1580
REFERENCES
Abdullah, J. (2005). Human teeth of the Palaeolithic period from Gua
Balambangan, Sabah. In Z. Majid (Ed.), The Perak Man and other
prehistoric skeletons of Malaysia (pp. 229–237). Penang, Malaysia:
Penerbit Universiti Sains Malaysia.
Aghakhanian, F., Yunus, Y., Naidu, R., Jinam, T., Manica, A., Hoh, B. P.,
& Phipps, M. E. (2015). Unravelling the genetic history of negritos
and indigenous populations of Southeast Asia. Genome Biology and
Evolution,7, 1206–1215.
Alexander, D. H., Novembre, J., & Lange, K. (2009). Fast model-based
estimation of ancestry in unrelated individuals. Genome Research,
19, 1655–1664.
Barker,G.,Barton,H.,Bird,M.,Daly,P.,Datan,I.,Dykes,A.,...Tur-
ney, C. (2007). The ‘human revolution’ in lowland tropical Southeast
Asia: The antiquity and behavior of anatomically modern humans
at Niah Cave (Sarawak, Borneo). Journal of Human Evolution,52,
243–261.
Behar, D. M., Yunusbayev, B., Metspalu, M., Metspalu, E., Rosset, S.,
Parik, J., ... Villems, R. (2010). The genome-wide structure of the
Jewish people. Nature,466, 238–242.
Bird, M. I., Taylor, D., & Hunt, C. (2005). Paleoenvironments of insular
Southeast Asia during the Last Glacial Period: A savanna corridor in
Sundaland? Quaternary Science Reviews,24, 2228–2242.
Brand¯
ao, A., Eng, K. K., Rito, T., Cavadas, B., Bulbeck, D., Gandini, F.,
... Soares, P. (2016). Quantifying the legacy of the Chinese Neolithic
on the maternal genetic heritage of Taiwan and Island Southeast Asia.
Human Genetics,135, 363–376.
Cannon, C. H., Morley, R. J., & Bush, A. B. G. (2009). The current
refugial rainforests of Sundaland are unrepresentative of their bio-
geographic past and highly vulnerable to disturbance. Proceedings
of the National Academy of Sciences of the United States of America,
106, 11188–11193.
Cavalli-Sforza, L. L., Menozzi, P., & Piazza, A. (1994). The history and
geography of human gene. Oxford, United Kingdom: Princeton Uni-
versity Press.
Corny, J., Galland, M., Arzarello, M., Bacon, A. -M., Demeter, F.,
Grimaud-Hervé, D., ... Détroit, F. (2017). Dental phenotypic shape
variation supports a multiple dispersal model for anatomically mod-
ern humans in Southeast Asia. Journal of Humam Evolution,112,
41–56.
Delfin, F., Salvador, J. M., Calacal, G. C., Perdigon, H. B., Tabbada,
K. A., Villamor, L. P., ... De Ungria, M. C. A. (2011). The Y-
chromosome landscape of the Philippines: Extensive heterogeneity
and varying genetic affinities of Negrito and non-Negrito groups.
European Journal of Human Genetics,19, 224–230.
Denham, T., & Donohue, M. (2012). Reconnecting genes, languages and
material culture in Island Southeast Asia: Aphorisms on geography
and history. Language Dynamics and Change,2, 184–211.
Détroit, F., Dizon, E., Falguères, C., Hameau, S., Ronquillo, W., &
Sémah, F. (2004). Upper Pleistocene Homo sapiens from the Tabon
cave (Palawan, the Philippines): Description and dating of new dis-
coveries. Comptes Rendus Palevol,3, 705–712.
Felsenstein, J. (1989). PHYLIP - Phylogeny Inference Package (version
3.2). Cladistics,5, 164–166.
Goh, L. P. W., Chong, E. T. J., Chua, K. H., Chuah, J. A., & Lee, P.
C. (2014). Significant genotype difference in the CYP2E1 PstI poly-
morphism of indigenous groups in Sabah, Malaysia with Asian and
non-Asian populations. Asian Pacific Journal of Cancer Prevention,
15, 7377–7381.
Haak, W., Lazaridis, I., Patterson, N., Rohland, N., Mallick, S., Llamas,
B., ... Reich, D. (2015). Massive migration from the steppe is a source
for Indo-European languages in Europe. Nature,522, 207–211.
He, J. -D., Peng, M. -S., Quang, H. H., Dang, K. P., Trieu, A. V., Wu, S. -
F., ... Zhang, Y. -P. (2012). Patrilineal perspective on the Austronesian
diffusion in mainland Southeast Asia. PLoS One,7, e36437.
Helgason, A., Yngvadóttir, B., Hrafnkelsson, B., Gulcher, J., & Stefáns-
son, K. (2004). An Icelandic example of the impact of population
structure on association studies. Nature Genetics,37, 90–95.
Hudjashov, G., Karafet, T. M., Lawson, D. J., Downey, S., Savina, O.,
Sudoyo, H., ... Cox, M. P. (2017). Complex patterns of admixture
across the Indonesian Archipelago. Molecular Biology and Evolu-
tion,34, 2439–2452.
The HUGO Pan-Asian SNP Consortium. (2009). Mapping human
genetic diversity in Asia. Science,326, 1541–1545.
The International HapMap 3 Consortium. (2010). Integrating common
and rare genetic variation in diverse human populations. Nature,467,
52–58.
Jinam, T. A., Hong, L., Phipps, M. E., Stoneking, M., Ameen, M., Edo,
J., & The HUGO Pan-Asian SNP Consortium, & Saitou, N. (2012a).
Evolutionary history of continental Southeast Asians: “Early Train”
hypothesis based on genetic analysis of mitochondrial and autosomal
DNA data. Molecular Biology and Evolution,29, 3513–3527.
Jinam, T. A., Nishida, N., Hirai, M., Kawamura, S., Oota, H., Umetsu,
K., ... Saitou, N. (2012b). The history of human populations in the
Japanese Archipelago inferred from genome-wide SNP data with a
special reference to the Ainu and the Ryukyuan populations. Journal
of Human Genetics,57, 787–795.
King, J., & King, J. W. (1984). Languages of Sabah: A survey report.
Department of Linguistics, Research School of Pacific Studies.Can-
berra: Australian National University.
Kopelman, N., Mayzel, J., Jakobsson, M., Rosenberg, N. A., & Mayrose,
I. (2015). CLUMPAK: A program for identifying clustering modes
and packaging population structure inferences across K.Molecular
Ecology Resources,15, 1179–1191.
10 YEW ET AL.
Lazaridis, I., Patterson, N., Mittnik, A., Renaud, G., Mallick, S., Kir-
sanow, K., ... Krause, J. (2014). Ancient human genomes suggest
three ancestral populations for present-day Europeans. Nature,513,
409–413.
Lazaridis, I., Nadel, N., Rollefson, G., Merrett, D. C., Rohland, N.,
Mallick, S., ... Reich, D. (2016). Genomic insights into the origin
of farming in the ancient Near East. Nature,536, 419–424.
Lewis, M. P., Simons, G. F., & Fenning, C. D. (2015). Ethnologue:
Languages of the world (18th ed). Dallas, TX: SIL International.
Retrieved from https://www.ethnologue.com.
Li, H., Wen, B., Chen, S. J., Su, B., Pramoonjago, P., Liu, Y., ... Jin, L.
(2008a). Paternal genetic affinity between western Austronesians and
Daic populations. BMC Evolutionary Biology,8, 146–157.
Li, J. Z., Absher, D. M., Tang, H., Southwick, A. M., Casto, A. M.,
Ramachandran, S., ... Myer, R. M. (2008b). Worldwide human rela-
tionships inferred from genome-wide patterns of variation. Science,
319, 1100–1104.
Lipson, M., Loh, P. R., Patterson, N., Moorjani, P., Ko, Y. -C., Stonek-
ing, M., ... Reich, D. (2014). Reconstructing Austronesian population
history in Island Southeast Asia. Nature Communications,5, 4689.
Mallick, S., Li, H., Lipson, M., Mathieson, I., Gymrek, M., Racimo,
F., ... Reich, D. (2016). The Simons Genome Diversity Project: 300
genomes from 142 diverse populations. Nature,538, 201–206.
Manichaikul, A., Mychaleckyj, J. C., Rich, S. S., Daly, K., Sale, M., &
Chen, W. M. (2010). Robust relationship inference in genome-wide
association studies. Bioinformatics,26, 2867–2873.
Matsumura, H., & Oxenham, M. F. (2014). Demographic transitions
and migration in prehistoric East/Southeast Asia through the lens of
nonmetric dental traits. American Journal of Physical Anthropology,
155, 45–65.
Matsumura, H., & Pookajorn, S. (2005). A morphometric analysis of
the Late Pleistocene human skeleton from the Moh Khiew Cave in
Thailand. HOMO Journal Comparative Human Biology,56, 93–118.
Mirabal, S., Cadenas, A. M., Garcia-Bertrand, R., & Herrera, R. J.
(2013). Ascertaining the role of Taiwan as a source for the Austrone-
sian expansion. American Journal of Physical Anthropology,150,
551–564.
Mӧrseburg, A., Pagani, L., Ricaut, F. -X., Yngvadottir, B., Harney, E.,
Castillo, C., ... Kivisild, T. (2016). Multi-layered population structure
in Island Southeast Asia. European Journal of Human Genetics,24,
1605–1611.
Pagani, L., Lawson, D. J., Jagoda, E., Mӧrseburg, A., Eriksson, A., Mitt,
M., ... Metspalu, M. (2016). Genomic analyses inform on migration
events during the peopling of Eurasia. Nature,538, 238–242.
Patterson, N., Petersen, D. C., van der Ross, R. E., Sudoyo, H., Glashoff,
R. H., Marzuki, S., ... Hayes, V. M. (2010). Genetic structure of
a unique admixed population: Implications for medical research.
Human Molecular Genetics,19, 411–419.
Patterson, N., Moorjani, P., Luo, Y., Mallick, S., Rohland, N., Zhan, Y.,
... Reich, D. (2012). Ancient admixture in human history. Genetics,
192, 1065–1093.
Patterson, N., Price, A. L., & Reich, D. (2006). Population structure and
Eigen analysis. PLoS Genetics,2, e190.
Peng, M. -S., Quang, H. H., Dang, K. P., Trieu, A. V., Wang, H. -W., Yao,
Y. -G., ... Zhang, Y. -P. (2010). Tracing the Austronesian footprint in
mainland Southeast Asia: A perspective from mitochondrial DNA.
Molecular Biology and Evolution,27, 2417–2430.
Pierron, D., Razafindrazaka, H., Pagani, L., Ricaut, F. -X., Antao,
T., Capredon, M., ... Kivisild, T. (2014). Genome-wide evi-
dence of Austronesian-Bantu admixture and cultural reversion in a
hunter-gatherer group of Madagascar. Proceedings of the National
Academy of Sciences of the United States of America,111, 936–
941.
Price, A. L., Zaitlen, N. A., Reich, D., & Patterson, N. (2010). New
approaches to population stratification in genome-wide association
studies. Nature Reviews Genetics,11, 459–463.
Price, A. L., Butler, J., Patterson, N., Capelli, C., Pascali, V. L., Scar-
nicci, F., ... Hirschhorn, J. N. (2008). Discerning the ancestry of Euro-
pean Americans in genetic association studies. PLoS Genetics,4,
e236.
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A.
R., Bender, D., ... Sham, P. C. (2007). PLINK: A toolset for whole-
genome association and population-based linkage analysis. American
Journal of Human Genetics,81, 559–575.
Raghavan, M., Skoglund, P., Graf, K. E., Metspalu, M., Albrechtsen,
A., Molkte, I., ... Willerslev, E. (2014). Upper Paleolithic Siberian
genome reveals dual ancestry of Native Americans. Nature,505, 87–
91.
Rasmussen, M., Guo, X., Wang, Y., Lohmueller, K. E., Rasmussen, S.,
Albrechtsen, A., ... Willerslev, E. (2011). An aboriginal Australian
genome reveals separate human dispersals into Asia. Science,334,
94–98.
Reich, D., Thangaraj, K., Patterson, N., Price, A. L., & Singh, L. (2009).
Reconstructing Indian population history. Nature,461, 489–495.
Rosenberg, N. A. (2006). Standardized subset of the HGDP-CEPH
Human Genome Diversity Cell Line Panel, accounting for atypical
and duplicated samples and pairs of close relatives. Annals of Human
Genetics,70, 841–847.
Soares, P. A., Trejaut, J. A., Rito, T., Cavadas, B., Hill, C., Eng, K. K.,
... Richards, M. B. (2016). Resolving the ancestry of Austronesian-
speaking populations. Human Genetics,135, 309–326.
Stoneking, M., & Delfin, F. (2010). The human genetic history of East
Asia: Weaving a complex tapestry. Current Biology,20, R188–R193.
Tabbada, K. A., Trejaut, J., Loo, J. -H., Chen, Y. -M., Lin, M., Mirazon-
Lahr, M., ... De Ungria, M. C. A. (2010). Philippine mitochondrial
DNA diversity: A populated viaduct between Taiwan and Indonesia?
Molecular Biology and Evolution,27, 21–31.
Tamura, K., Dudley, J., Nei, M., & Kumar, S. (2007). MEGA4: Molec-
ular Evolutionary Genetics Analysis (MEGA) software version 4.0.
Molecular Biology and Evolution,24, 1596–1599.
Teo, Y. Y., Sim, X., Ong, R. T. H., Tan, A. K. S., Chen, J., Tantoso,
E., ... Chia, K. -S. (2009). Singapore Genome Variation Project: A
haplotype map of three Southeast Asian populations. Genome Res,
19, 2154–2162.
Teh, L. K., George, E., Lai, M. I., Tan, J. A. M. A., Wong, L., & Ismail,
P. (2014). Molecular basis of transfusion dependent beta-thalassemia
major patients in Sabah. Journal of Human Genetics,59, 119–123.
Wall, J. D. (2017). Inferring human demographic histories of non-
African populations from patterns of alelle sharing. American Jour-
nal of Human Genetics,100, 766–772.
YEW ET AL.11
Wollstein, A., Lao, O., Becker, C., Brauer, S., Trent, R. J., Nurnberg, P.,
... Kayser, M. (2010). Demographic history of Oceania inferred from
genome-wide data. Current Biology,20, 1983–1992.
Xing, J., Watkins, W. S., Witherspoon, D. J., Zhang, Y., Guthery, S. L.,
Thara, R., ... Jorde, L. B. (2009). Fine-scaled human genetic structure
revealed by SNP microarrays. Genome Research,19, 815–825.
Xu, S., Kangwanpong, D., Seielstad, M., Srikummool, M., Kampuan-
sai, J., Jin, L., & The HUGO Pan-Asian SNP Consortium. (2010).
Genetic evidence supports linguistic affinity of Mlabri - a hunter-
gatherer group in Thailand. BMC Genetics,11, 18–30.
Yang, X., Xu, S., & The HUGO Pan-Asian SNP Consortium. (2011).
Identification of close relatives in the HUGO Pan-Asian SNP
database. PLoS One,6, e29502.
SUPPORTING INFORMATION
Additional Supporting Information may be found online in the
supporting information tab for this article.
How to cite this article: Yew CW, Hoque MZ, Pugh-
Kitingan J, et al. Genetic relatedness of indigenous
ethnic groups in northern Borneo to neighboring
populations from Southeast Asia, as inferred from
genome-wide SNP data. Ann Hum Genet. 2018;1–11.
https://doi.org/10.1111/ahg.12246
... The last three are also found throughout the island of Borneo; including the neighbouring countries of Malaysia and Indonesia. Studies have suggested that they are likely to be genetically related to the Amis in Taiwan who might have migrated to Borneo through the Philippines [6, 10,11]. ...
... The Malays are one of the largest Austronesian population groups spreading over Island Southeast Asia and as far away as South Africa. Although the different sub-ethnic groups share considerable cultural and linguistic ties, they are not genetically homogenous [6, 11,22]. Adding to this genetic diversity, population genetic structure analysis conducted here has, for the rst time, [11] reported that Malays from East Malaysia, speci cally those residing in the state of Sabah, share a common ancestry with the Filipinos. ...
... Although the different sub-ethnic groups share considerable cultural and linguistic ties, they are not genetically homogenous [6, 11,22]. Adding to this genetic diversity, population genetic structure analysis conducted here has, for the rst time, [11] reported that Malays from East Malaysia, speci cally those residing in the state of Sabah, share a common ancestry with the Filipinos. Given the close geographical proximity, genetic admixture among these people would not have been unexpected. ...
Preprint
Full-text available
Background: The Malays and their many sub-ethnic groups collectively make up one of the largest population groups in Southeast Asia. However, their genomes, especially those from the nation of Brunei, remain very much underrepresented and understudied. Results: Here, we analysed the publicly available whole genome sequencing and genotyping data of two and 39 Bruneian Malay individuals, respectively. Next generation sequencing reads from the two individuals were first mapped against the GRCh38 human reference genome and their variants called. Of the total ~5.28 million short nucleotide variants and indels identified, ~217K of them were found to be novel; with some predicted to be deleterious and associated with risk factors of common non-communicable diseases in Brunei. Unmapped reads were next mapped against the recently reported novel Chinese and Japanese genomic contigs and de novo assembled. ~227 Kbp genomic sequences missing in GRCh38 and a partial open reading frame encoding a potential novel small zinc finger protein were successfully discovered. Although the Malays in Brunei, Singapore and Malaysia share >83% common genetic variants, principal component and admixture analysis looking into the genetic structure of the local Malays and other Asian population groups suggested that they are genetically closer to some Filipino ethnic groups than the Malays in Malaysia and Singapore. Conclusions: Taken together, our work provides the first comprehensive insight into the genomes of the Bruneian Malay population.
... This migration model is consistent with the alternative "Early Train" migration hypothesis proposed by Jinam et al. (2012), which argues that there was a migration originating from Indochina or South China~30-10 kya. Both hypotheses are supported at least in part, by several other lines of evidence: 1) close genetic affinity between the Sabahan natives and the Taiwanese aborigines and the Philippines aborigines but distantly related to the populations from mainland SEA (Yew et al., 2018a); 2) putative signals of positive selection driven by malaria infection found in the Sabahan natives occurred~5 kya, which coincides with the period during Austronesian expansion (Hoh et al., 2020); 3) inference divergence time between the Negrito and Senoi coincides with the proposed period of "Early Train" hypothesis, posing the plausibility of the swiddening Austroasiatic agriculturist migration to Peninsular Malaysia, which resulted in declined effective population size of Negrito (Yew et al., 2018b); 4) inference using both uniparental and autosomal markers suggested primarily common ancestry for Taiwan or islands of SEA populations established before the Neolithic period (Soares Frontiers in Genetics | www.frontiersin.org January 2022 | Volume 13 | Article 767018 7 et al., 2016); 5) the native from Sarawak (the Iban) showed a closer genetic affinity to Indonesia than the mainland SEA (Simonson et al., 2011). ...
... (iii) Both mtDNA molecular clocking and inference of divergence time using autosomal DNA support the notion that the ancestors of Peninsular Malaysia Negrito may be the earliest inhabitant of SEA at least 50 kya (Hill et al., 2006;Yew et al., 2018a;Deng et al., 2021). (iv) The native populations from Peninsular Malaysia and Borneo are genetically distinct, each with a unique population history (Yew et al., 2018b). (v) Austronesian expansion occurred at least in the SEA region, southwards to the Philippines, towards the Northern Borneo (Hill et al., 2007;Lipson et al., 2014;Yew et al., 2018b). ...
... (iv) The native populations from Peninsular Malaysia and Borneo are genetically distinct, each with a unique population history (Yew et al., 2018b). (v) Austronesian expansion occurred at least in the SEA region, southwards to the Philippines, towards the Northern Borneo (Hill et al., 2007;Lipson et al., 2014;Yew et al., 2018b). ...
Article
Full-text available
Southeast Asia (SEA) has one of the longest records of modern human habitation out-of-Africa. Located at the crossroad of the mainland and islands of SEA, Peninsular Malaysia is an important piece of puzzle to the map of peopling and migration history in Asia, a question that is of interest to many anthropologists, archeologists, and population geneticists. This review aims to revisit our understanding to the population genetics of the natives from Peninsular Malaysia and Borneo over the past century based on the chronology of the technology advancement: 1) Anthropological and Physical Characterization; 2) Blood Group Markers; 3) Protein Markers; 4) Mitochondrial and Autosomal DNA Markers; and 5) Whole Genome Analysis. Subsequently some missing gaps of the study are identified. In the later part of this review, challenges of studying the population genetics of natives will be elaborated. Finally, we conclude our review by reiterating the importance of unveiling migration history and genetic diversity of the indigenous populations as a steppingstone towards comprehending disease evolution and etiology.
... Much the same can be said for PNG (Bergström et al. 2017), with the exception of the north coast settlements and offshore islands, which have extensive AN admixture, as has long been known. Negrito people are also found in Borneo (Yew et al. 2018a(Yew et al. , 2018b and in the Philippines (Arenas et al. 2020), where they are particularly well known. There is no concrete evidence that they made it all the way to Taiwan, but it is more than likely that they did. ...
... Three smaller-scale genomics studies of AM Negrito and AN tribes on Borneo (Deng et al. 2019;Yew et al. 2018aYew et al. , 2018b show the expected deep differences between them and confirm Denisovan ancestry in the Negrito peoples. Perhaps surprisingly, the various AN people show little or no AM admixture. ...
Article
Full-text available
The Austronesian Diaspora is a 5,000-year account of how a small group of Taiwanese farmers expanded to occupy territories reaching halfway around the world. Reconstructing their detailed history has spawned many academic contests across many disciplines. An outline orthodox version has eventually emerged but still leaves many unanswered questions. The remarkable power of whole-genome technology has now been applied to people across the entire region. This review gives an account of this era of genetic investigation and discusses its many achievements, including revelation in detail of many unexpected patterns of population movement and the significance of this information for medical genetics.
... The native populations have remained underrepresented in the major global population genome sequencing projects. Recent years have witnessed efforts of understanding the genetic architecture of these native populations using high throughput DNA sequencing techniques (Deng et al. 2014(Deng et al. , 2015NurWaliyuddin et al. 2015;Yew et al. 2018). The Orang Asli population representing *0.7% of Peninsular Malaysia comprises three major tribes, namely Semang/Negrito, Senoi/Sakai and Proto-/Aborginal-Malays. ...
Article
Differences in the distribution of RBC antigens defining the blood group types among different populations have been well established. Fewer studies exist that have explored the blood group profiles of indigenous populations worldwide. With the availability of population-scale genomic datasets, we have explored the blood group profiles of theOrang Aslis, who are the indigenous population in Peninsular Malaysia and provide a systematic comparison of the same with major global population datasets. Variant call files fromwhole genome sequence data (hg19) of 114 Orang Asli were retrieved from The Orang Asli Genome Project. Systematic variant annotations were performed using ANNOVAR and only those variants mapping back to genes associated with 43 blood group systems and transcription factors KLF1 and GATA1 were filtered. Blood group-associated allele and phenotype frequencies were determined and were duly compared with other datasets including Singapore SequencingMalay Project, aboriginal western desert Australians and global population datasets including The 1000 Genomes Project and gnomAD. This study reports four alleles (rs12075, rs7683365, rs586178 and rs2298720) of DUFFY, MNS, RH and KIDD blood group systems which were significantly distinct between indigenous Orang Asli and cosmopolitanMalaysians. Eighteen alleles that belong to 14 blood group systems were found statistically distinct in comparison to global population datasets. Although not much significant differences were observed in phenotypes of most blood group systems, major insights were observed when comparing Orang Asli with aboriginal Australians and cosmopolitanMalaysians.This study serves as the first of its kind to utilize genomic data to interpret blood group antigen profiles of the OrangAsli population. In addition, a systematic comparison of blood group profiles with related populations was also analysed and documented.
... org/). been previously reported50,51 . This ancestral component might be attributed to the first wave of human migration into SEA via the southern coastal route or later gene flow from SA during the expansion of Indian culture into Peninsular Malaysia in the first century A.D.52 . ...
Article
Full-text available
Southeast Asia comprises 11 countries that span mainland Asia across to numerous islands that stretch from the Andaman Sea to the South China Sea and Indian Ocean. This region harbors an impressive diversity of history, culture, religion and biology. Indigenous people of Malaysia display substantial phenotypic, linguistic, and anthropological diversity. Despite this remarkable diversity which has been documented for centuries, the genetic history and structure of indigenous Malaysians remain under-studied. To have a better understanding about the genetic history of these people, especially Malaysian Negritos, we sequenced whole genomes of 15 individuals belonging to five indigenous groups from Peninsular Malaysia and one from North Borneo to high coverage (30X). Our results demonstrate that indigenous populations of Malaysia are genetically close to East Asian populations. We show that present-day Malaysian Negritos can be modeled as an admixture of ancient Hoabinhian hunter-gatherers and Neolithic farmers. We observe gene flow from South Asian populations into the Malaysian indigenous groups, but not into Dusun of North Borneo. Our study proposes that Malaysian indigenous people originated from at least three distinct ancestral populations related to the Hoabinhian hunter-gatherers, Neolithic farmers and Austronesian speakers.
... Recent years have witnessed the efforts of understanding the genetic architecture of these native populations using high throughput DNA sequencing techniques. 7,8,9,10 The Orang Asli population representing ~0.7% of peninsular Malaysia comprises three major tribes namely 3 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. ...
Preprint
Full-text available
Purpose Differences in the distribution of RBC antigens defining the blood group types among different populations have been well established. However, very few studies exist that have explored the blood group profiles of indigenous populations worldwide. With the rapid advent of next generation sequencing techniques and availability of population scale genomic datasets, we have successfully explored the blood group profiles of the Orang Aslis, who are the indigenous population of Malaysia and provide a systematic comparison of the same with major global population datasets. Methods Variant call files from whole genome sequence data (hg19) of 114 Orang Asli were retrieved from The Orang Asli Genome Project (OAGP). Systematic variant annotations were performed using ANNOVAR and only those variants spanning genes of 43 blood group systems and transcription factors KLF1 and GATA1 were filtered. Blood group associated allele and phenotype frequencies were determined and were duly compared with other datasets including Singapore Sequencing Malay Project (SSMP), aboriginal western desert Australians and global population datasets including The 1000 Genomes Project and gnomAD. Results This study reports 4 alleles (rs12075, rs7683365, rs586178 and rs2298720) of DUFFY, MNS, RH and KIDD blood group systems which were significantly distinct between indigenous Orang Asli and cosmopolitan Malaysians. Eighteen (18) alleles which belong to 14 blood group systems were found distinct in comparison to global population datasets. Although not much significant differences were observed in phenotypes of most blood group systems, major insights were observed on comparing Orang Asli with aboriginal Australians and cosmopolitan Malaysians. Conclusion This study serves as the first of its kind to utilize genomic data to interpret blood group antigen profiles of the Orang Asli population. In addition, systematic comparison of blood group profiles with related populations were also analysed and documented.
Article
Differences in the distribution of RBC antigens defining the blood group types among different populations have been well established. Fewer studies exist that have explored the blood group profiles of indigenous populations worldwide. With the availability of population-scale genomic datasets, we have explored the blood group profiles of the Orang Aslis, who are the indigenous population in Peninsular Malaysia and provide a systematic comparison of the same with major global population datasets. Variant call files from whole genome sequence data (hg19) of 114 Orang Asli were retrieved from The Orang Asli Genome Project. Systematic variant annotations were performed using ANNOVAR and only those variants mapping back to genes associated with 43 blood group systems and transcription factors KLF1 and GATA1 were filtered. Blood group-associated allele and phenotype frequencies were determined and were duly compared with other datasets including Singapore Sequencing Malay Project, aboriginal western desert Australians and global population datasets including The 1000 Genomes Project and gnomAD. This study reports four alleles (rs12075, rs7683365, rs586178 and rs2298720) of DUFFY, MNS, RH and KIDD blood group systems which were significantly distinct between indigenous Orang Asli and cosmopolitan Malaysians. Eighteen alleles that belong to 14 blood group systems were found statistically distinct in comparison to global population datasets. Although not much significant differences were observed in phenotypes of most blood group systems, major insights were observed when comparing Orang Asli with aboriginal Australians and cosmopolitan Malaysians. This study serves as the first of its kind to utilize genomic data to interpret blood group antigen profiles of the Orang Asli population. In addition, a systematic comparison of blood group profiles with related populations was also analysed and documented.
Article
Full-text available
For a long time, the term "Dusun" has captivated our curiosity. Its origin is unmistakably an exonym. However, few people have written about it, synthesised it, or can explain it in depth. Thanks to advances in information technology, now it is possible to assemble documents or record about historical Borneo from diverse sources around the world. The tasks of collecting historical materials from the colonial of a bygone era has become easier. Due to this, it is a good time to explore a fresh perspective on the historical phases of this ethnic category. To describe the evolution of the phrase, the study includes explorers' journals, semi-autobiographies, historical sources, and other written scientific studies.
Article
The Austronesian Diaspora is a 5000-year account of how a small group of Taiwanese farmers expanded to occupy territories reaching halfway round the world. Reconstructing their detailed history has spawned many academic contests across many disciplines. An outline orthodox version has eventually emerged, but still leaves many unanswered questions. The remarkable power of whole-genome technology has now been applied to people across the entire region. This review gives an account of this era of genetic investigation and discusses its many achievements including revelation in detail of many unexpected patterns of population movement and the significance of this information for medical genetics.
Article
Full-text available
Indonesia, an island nation as large as continental Europe, hosts a sizeable proportion of global human diversity, yet remains surprisingly undercharacterized genetically. Here, we substantially expand on existing studies by reporting genome-scale data for nearly 500 individuals from 25 populations in Island Southeast Asia, New Guinea, and Oceania, notably including previously unsampled islands across the Indonesian archipelago. We use high-resolution analyses of haplotype diversity to reveal fine detail of regional admixture patterns, with a particular focus on the Holocene. We find that recent population history within Indonesia is complex, and that populations from the Philippines made important genetic contributions in the early phases of the Austronesian expansion. Different, but interrelated processes, acted in the east and west. The Austronesian migration took several centuries to spread across the eastern part of the archipelago, where genetic admixture postdates the archeological signal. As with the Neolithic expansion further east in Oceania and in Europe, genetic mixing with local inhabitants in eastern Indonesia lagged behind the arrival of farming populations. In contrast, western Indonesia has a more complicated admixture history shaped by interactions with mainland Asian and Austronesian newcomers, which for some populations occurred more than once. Another layer of complexity in the west was introduced by genetic contact with South Asia and strong demographic events in isolated local groups.
Article
Full-text available
Readable link: http://rdcu.be/kt5n High-coverage whole-genome sequence studies have so far focused on a limited number of geographically restricted populations or been targeted at specific diseases, such as cancer. Nevertheless, the availability of high-resolution genomic data has led to the development of new methodologies for inferring population history and refuelled the debate on the mutation rate in humans. Here we present the Estonian Biocentre Human Genome Diversity Panel (EGDP), a dataset of 483 high-coverage human genomes from 148 populations worldwide, including 379 new genomes from 125 populations, which we group into diversity and selection sets. We analyse this dataset to refine estimates of continent-wide patterns of heterozygosity, long- and short-distance gene flow, archaic admixture, and changes in effective population size through time as well as for signals of positive or balancing selection. We find a genetic signature in present-day Papuans that suggests that at least 2% of their genome originates from an early and largely extinct expansion of anatomically modern humans (AMHs) out of Africa. Together with evidence from the western Asian fossil record, and admixture between AMHs and Neanderthals predating the main Eurasian expansion, our results contribute to the mounting evidence for the presence of AMHs out of Africa earlier than 75,000 years ago.
Article
Full-text available
We report genome-wide ancient DNA from 44 ancient Near Easterners ranging in time between ~12,000 and 1,400 BCE, from Natufian hunter–gatherers to Bronze Age farmers. We show that the earliest populations of the Near East derived around half their ancestry from a ‘Basal Eurasian’ lineage that had little if any Neanderthal admixture and that separated from other non-African lineages before their separation from each other. The first farmers of the southern Levant (Israel and Jordan) and Zagros Mountains (Iran) were strongly genetically differentiated, and each descended from local hunter–gatherers. By the time of the Bronze Age, these two populations and Anatolian-related farmers had mixed with each other and with the hunter–gatherers of Europe to drastically reduce genetic differentiation. The impact of the Near Eastern farmers extended beyond the Near East: farmers related to those of Anatolia spread westward into Europe; farmers related to those of the Levant spread southward into East Afri
Article
Full-text available
The history of human settlement in Southeast Asia has been complex and involved several distinct dispersal events. Here, we report the analyses of 1825 individuals from Southeast Asia including new genome-wide genotype data for 146 individuals from three Mainland Southeast Asian (Burmese, Malay and Vietnamese) and four Island Southeast Asian (Dusun, Filipino, Kankanaey and Murut) populations. While confirming the presence of previously recognised major ancestry components in the Southeast Asian population structure, we highlight the Kankanaey Igorots from the highlands of the Philippine Mountain Province as likely the closest living representatives of the source population that may have given rise to the Austronesian expansion. This conclusion rests on independent evidence from various analyses of autosomal data and uniparental markers. Given the extensive presence of trade goods, cultural and linguistic evidence of Indian influence in Southeast Asia starting from 2.5 kya, we also detect traces of a South Asian signature in different populations in the region dating to the last couple of thousand years.European Journal of Human Genetics advance online publication, 15 June 2016; doi:10.1038/ejhg.2016.60.
Article
The population history of anatomically modern humans (AMH) in Southeast Asia (SEA) is a highly debated topic. The impact of sea level variations related to the Last Glacial Maximum (LGM) and the Neolithic diffusion on past population dispersals are two key issues. We have investigated competing AMH dispersal hypotheses in SEA through the analysis of dental phenotype shape variation on the basis of very large archaeological samples employing two complementary approaches. We first explored the structure of between-and within-group shape variation of permanent human molar crowns. Second, we undertook a direct test of competing hypotheses through a modeling approach. Our results identify a significant LGM-mediated AMH expansion and a strong biological impact of the spread of Neolithic farmers into SEA during the Holocene. The present work thus favors a " multiple AMH dispersal " hypothesis for the population history of SEA, reconciling phenotypic and recent genomic data.
Article
Recent human-genetics studies have come to different conclusions regarding how and when modern humans spread out of Africa and into the rest of the world. I present here a simple parsimony-based analysis that suggests that East Asians and Melanesians are sister groups, and I discuss what implications this has for recent claims made about the demographic histories of non-African populations.
Article
Here we report the Simons Genome Diversity Project data set: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioural modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that of other non-Africans.